Aerospike February 2024 Meetup at Adjust Berlin office

Posted by ads on Friday, 2024-02-23
Posted in [Berlin][Events]

On Wednesday, February 21st, Adjust hosted the second Aerospike Meetup in their Berlin office. About 50 visitors showed up, which demonstrates a huge interest in the Aerospike database technology.

Aerospike Meetup at Adjust office
Aerospike Meetup at Adjust office

Last time, in June 2023, the Meetup was in the meeting room on the third floor, which is a bit limited in space. This time they moved it to their offices on the fourth floor, where attendees can expand into the floor space. Also there is a kitchen nearby, whereas one has to walk across the entire office to the kitchen on the third floor.

Meetup Attendees
Meetup Attendees

As usual, the Meetup had two speakers: Bubunyo “Bubu” Nyavor from Adjust, and Behrad Babaee from Aerospike.

Bubunyo “Bubu” Nyavor: How to store 26 billion records

Bubunyo “Bubu” Nyavor
Bubunyo "Bubu" Nyavor

Bubunyo opened his talk with the following quote:

You can’t escape SQL

Even though Aerospike is a NoSQL database, and it’s optimized for certain types of queries, other, more complex queries require SQL. And knowledge how to use SQL. To that extend, Adjust operates both very large Aerospike clusters as well as very large PostgreSQL databases. And large Kafka and Ceph clusters for data storage, in the Petabytes range.

He also …adjusted the number in the talk title, it’s not 26 billion records, it’s 45.1 billion records in Aerospike. This equals to roughly 351.9 TB in cluster disk space.

Number of operations in the Aerospike cluster
Number of operations in the Aerospike cluster

Adjust engineers “ssh a lot”, when they work on the Aerospike cluster. It’s 60 servers in each cluster they have.

Interesting side fact: Adjust runs almost everything on Gentoo (a Linux distribution where most of the OS and applications are self compiled in order to gain more performance), but Aerospike doesn’t officially support Gentoo. The Debian packages do just fine.

Another lesson learned when operating large clusters:

Networks will fail!

The law of large numbers (LLN) applies, and tells us that a large cluster has a certain probability of failure over time. The more components are involved, the higher the probability that one component will fail. It’s not a matter of “if something fails”, but “when something fails”. As a company operating such a large cluster, be prepared for outages and how to deal with them. A failing disk or a failing power supply is dealt with locally on the server, but failing network components will degrade the overall cluster performance.

For Aerospike, it means that a number of servers in the cluster will eventually fail. The database is partition tolerant, and can deal with outages.

To that extend:

Aerospike is both webscale and is not

Some of the more heavy tasks for Adjust are weekly cleanup jobs, which are very expensive to run. Certain data needs to be removed from the database, which involves more lookups in other data.

And upgrades of this size are always a challenge. To quote Bubunyo:

Ask yourself if you want to upgrade. And ask yourself again in 14 days.

All in all this is an interesting insight into how a company operates a very large Aerospike cluster.

Aerospike tradeoffs
Aerospike tradeoffs

You can find the slides for Bubunyo’s talk here.

Pizza

Everyone likes pizza. It was delivered for the half-time break. And to everyone’s surprise there even was pizza with pineapple. I don’t want to hear complains, it was delicious!

Plenty of Pizza
Plenty of Pizza

The pizza was provided by Aerospike, Adjust provided the hot and cold beverages.

Behrad Babaee: Leveraging Moore’s Law to Optimize Database Performance

Behrad Babaee
Behrad Babaee

Behrad used Moore’s Law to show how computers grew and grow in terms of performance, CPU, I/O and storage. Gordon Moore is one of the co-founders of Fairchild Semiconductor and Intel, and in 1975 he predicted what became “Moore’s law”. He observed that every two years the number of components in an integrated circuit will double. This is not a physical law, but rather a prediction.

If we start in 2006 with 1x, and roughly double the number every two years, in 2024 we are at 512x:

  • 2006: 1
  • 2008: 1 * 2 = 2
  • 2010: 2 * 2 = 4
  • 2012: 4 * 2 = 8
  • 2014: 8 * 2 = 16
  • 2016: 16 * 2 = 32
  • 2018: 32 * 2 = 64
  • 2020: 64 * 2 = 128
  • 2022: 128 * 2 = 256
  • 2024: 256 * 2 = 512

According to Moore’s Law, the hypothetical value representing the increase in computing power will go from 1x in 2006 to 512x in 2024.

During the presentation, Behrad shows that today the number of CPU cores is roughly 30x compared to 2006, while the clock frequency stayed roughly the same at 3 GHz. This does not take into account the increased number of transistors in each CPU, and also does not include the detail that today modern CPUs do much more periphery work, which was placed in separate chips (Northbridge) before.

Number of CPU cores is 30x between 2006 and 2024
Number of CPU cores is 30x between 2006 and 2024

The storage in servers today is roughly 600x, compared to servers in 2006. Also the bandwidth and technology is much better today. In 2006, almost all physical storage was spinning disk, whereas today servers rarely see any such disks anymore. It’s all replaced by SSD and NVRam. Keep in mind that this also dramatically increases the number of transistors in a server: a magnetic disk has no transistors (and just “a few” in the disk controller), SSD and NVram are “all transistor”.

Storage size in servers is 600x between 2006 and 2024
Storage size in servers is 600x between 2006 and 2024

And last but not least the memory bandwidth increased by much more than Moore’s prediction. Memory got really fast, and computers can handle large amounts of data.

Memory Bandwidth in servers is 3750x between 2006 and 2024
Memory Bandwidth in servers is 3750x between 2006 and 2024

To summarize: while back in 2006 the CPU was fast and disk was slow, these days it’s the other way around. Memory and storage are fast, and large. CPUs keep up by adding more and more cores. This increases the number of components, and as we learned in the talk from Bubunyo: components fail. Providing uptime for services is a question of resilience against component failures. It also increases the complexity, as operations need to scale across dozens or hundreds of CPU cores. This brings new challenges, data structures and task execution sometimes must be locked.

SLA Uptime Percentile
SLA Uptime Percentile

Aerospike is positioned to deal with this kind of problems: it tolerates the failure of a certain number of cluster members, and deals with rebuilding the database and the indexes.

You can find the slides for Behrad’s talk here.

Discussion

The questions and discussion after the talks showed a profound interest.

Someone asked for support on PMEM. While it’s there, the technology itself is outdated.

What are the challenges for Aerospike in the next 10 years: Algorithms, not hardware. The hardware will be more powerful, the algorithms need to keep up with the technology. Also tools like VectorDB provide unique ways how to store, access and query certain datasets.

Someone else asked what the current bottlenecks are: previously it was network and storage, today it’s CPU.

Can Aerospike use FPGAs (Field-programmable gate array), ASICs (Application-specific integrated circuit) and GPUs (Graphics processing unit): Currently there are no good use cases, as the data is distributed across the nodes, and Aerospike users are not running the kind of queries which usually benefit from vector operations. Not opposed to implement the technology, if the use cases show up.

Can Aerospike do Joins (like SQL Joins): While Aerospike can do joins, it’s not optimized for it. After all, it’s a NoSQL database and such databases usually have their strengths in scalability, their distributed architecture, as well as variety and flexibility for the data schemas.

Can Aerospike store data on disk: That’s a config option, but it will be slower.

Aerospike

Aerospike is a high-performance NoSQL database system that has been designed to meet the demands of real-time, high-throughput applications. Particularly well-suited for use cases such as web-scale applications, ad tech, and gaming, Aerospike stands out for its remarkable speed, scalability, and efficiency in handling large volumes of data with low-latency read and write operations. One of its notable features is its hybrid memory architecture, which intelligently combines both RAM and solid-state storage to optimize overall performance. It provides a robust solution for organizations requiring rapid and reliable data access.

Adjust

Adjust is a prominent analytics suite utilized by over 135,000 apps. The company exists since 2012. Positioned as a global leader in analytics, Adjust is known for its commitment to excellence, setting high standards in privacy and product quality. The company emphasizes a growth mindset, continually developing innovative solutions and fostering the potential of its employees to stay ahead in the industry. Adjust operates several very large Aerospike clusters, as well as hundreds of PostgreSQL databases in the TB range.


Categories: [Berlin] [Events]

Share: