Where this fits in K8s strategy
Jaeger displays “tracing data” for distributed services. It highlights downtime/slow-load risk and errors.
Why it’s important
Helps tough task of tracking issues among 10s of services that may each have many sub-services
Let’s explore the basics before getting into the tactics for boosting Jaeger.
Jaeger was founded in 2015 within the walls of Uber. Yes, that Uber. Yuri Shkuro created it to help engineers work out where issues were popping up.
This was important because Uber had a complex network of services. Many of these depended on other services as well as their own sub-services.
To the left: a glimpse of the services network that drives the Uber app. A large number of these services get triggered every time you request an Uber. (Source: Youtube, Jaeger Intro – Yuri Shkuro)
Chances of the whole request falling apart were high. Uber risked losing a ride fare if one or some of the component services failed or slowed down.
“In deep distributed systems, finding what is broken and where is often more difficult than why“— Yuri Skhuro, Founder & Maintainer, CNCF Jaeger
Jaeger helps us find out what services are experiencing issues and where. That’s useful to know. It can help engineers fix small issues before they snowball into serious ones.
Do you even need Jaeger?
You might be wondering whether you even need Jaeger. Your use case might not be as complex as Uber’s. It has a complex web of services and millions of requests per day.
Tracing is not an absolute must-have for simpler services setups. But it is handy for finding bottlenecks if you run more than a handful of services.
Also, imagine this situation. Your application suddenly gets a traffic spike and requests are not completing. How will you find the culprit fast enough to fix the issue?
One more stop and then we’ll start to cover how to optimise Jaeger for tracing.
How Jaeger works
See a simplified view of how Jaeger works below.
We’ll cover the highlighted terms in greater depth in the tactics section
Jaeger Agent collects “span data” by observing UDP packets of services
Data (service name, start time, duration) gets sent on to the Collector
Collector sends data to 2 places: Analytics and Visual Dashboard