Welcome back to the month of profiling! The first two weeks we took a look at using the async-profiler to generate flame graphs and bcc-tools to understand your hardware. This week we’ll be focusing on eBPF, a rapidly evolving technology that lets you extract a ton of useful information out of the Linux Kernel even if a tool wasn’t built specifically for it. Think of it as a tool that lets you create small programs that run safely in the Linux Kernel with almost no overhead for the purpose of observability.
Welcome back to my series on performance profiling! In this second installment, we’re shifting our focus towards a hands-on exploration of our hardware using the power of bcc-tools. Building on the foundational knowledge from our first post about flame graphs, we are now ready to delve deeper into the intricacies of system performance. These tools are a great way of understanding if we’re seeing bottlenecks in our hardware.
For a little background, bcc-tools is a collection of utilities that expose underlying kernel statistics through eBPF. eBPF is an amazing technology that lets us tap into what’s going on in the Linux kernel with negligible performance overhead. Fun stuff!
Welcome to the first article of ‘The Month of Profiling,’ a series I will be presenting throughout November. We’ll explore a range of technologies that are crucial for identifying and resolving performance issues, as well as for fine-tuning your systems. The series will conclude with a detailed walkthrough on how to integrate these technologies for effective performance optimization.
Introduction
Optimizing the performance of a complex application like a database requires a deep understanding of multiple components; its resource utilization, the behavior of the JVM, and how nodes interact with each other. There are several useful approaches to understanding an application’s behavior, either when performance tuning or dealing with an outage. In most circumstances, teams will rely on a set of dashboards based on various metrics emitted from the database and the underlying operating system. However, the picture is only as complete as the metrics that are available, and in my experience, there can never be enough metrics get a complete understanding of an application’s behavior. In an ideal world, we’d know exactly where our time is spent and what is allocating memory, two things that have an enormous impact on performance. By understanding what’s happening in our database and the tuning that are available, we can often tweak some settings for a significant improvement in performance.