Help build the future of open source observability software Open positions

Check out the open source projects we support Downloads

Grot cannot remember your choice unless you click the consent notice at the bottom.

Identify anomalies, outlier detection, forecasting: How Grafana Cloud uses AI/ML to make observability easier

Identify anomalies, outlier detection, forecasting: How Grafana Cloud uses AI/ML to make observability easier

2 Jul, 2024 4 min

At Grafana Labs, our No. 1 approach when building AI/ML tools is to enable humans (a.k.a. all of us!) to understand complex systems. In other words, we want to make observability still human, but less complicated. (Our second use case? Making social media more fun.)

We believe that AI/ML tools in observability should work towards minimizing toil and the need for everyone in your organization to have the same deep domain knowledge about your increasingly complex stack.

After all, AI/ML should ultimately make it easier and less stressful for teams to run your services and to help you find and resolve issues effectively and efficiently — sometimes even before they impact users. Here’s how we do that in Grafana Cloud.

Learn why your system isn’t working: Sift investigations

Sift investigation

Something wrong in your application? Sift is a diagnostic tool that uses machine learning to help teams identify anomalies and investigate issues. It reduces toil and speeds up response times by offering:

  • Automated checks across metrics, logs, and traces to uncover anomalies in logs, patterns in your HTTP errors, slow requests via traces, and more.
  • Explanations of log errors that summarize anomalies and offer potential fixes in easy-to-follow steps.

You can start a Sift investigation anywhere in Grafana Cloud, including in your dashboards and Explore, and run Sift across many workflows. You can also trigger Sift automatically from Grafana Incident, which automates the time-consuming tasks of incident management so you can actually fix the issue faster, and Grafana OnCall, our on-call management tool that integrates with all parts of the Grafana LGTM Stack.

Best of all? Sift is available across all Grafana Cloud tiers at no additional cost.

Learn more in our Sift documentation.

Predict anomalies: Forecasting & outlier detection

Anomaly detection sensitivity setting

These features tell you when your system is not healthy — now and in the future. Forecasting and outlier detection in Grafana Cloud help you learn the expected values of metrics over time and apply dynamic alerting to predict and detect anomalies.

Forecasting

With forecasting in Grafana Cloud, you can learn from the historical performance of a time series and predict values for your current usage and in the future. Instead of tuning thresholds on alerts, you can simply alert on when a metric is out of bounds. You can also capture daily and weekly seasonality to help set thresholds for peak and off-peak hours.

Forecasting also helps you with capacity planning and autoscaling so you can look into the future and confidently predict what your usage will look like in a week or a month.

Outlier detection

With outlier detection, you can monitor a group of services and identify when a feature is not performing at the same level as the others. (Cue the memory leaks, noisy neighbors, and other headaches.)

To learn more, check out our outlier detection documentation and our forecasting tutorial.

Simplify your workflows (and your life): Grafana LLM plugin

When your observability stack produces a ton of data, sometimes you just need to get to the point. By adding the Grafana LLM plugin to your stack, here’s how you can do just that:

Understand flame graphs faster

Flame graph AI uses LLM to assist with flame graph data interpretation so you can identify bottlenecks, root causes, and suggested fixes faster.

Flame graph AI

Summarize incidents in one click

The OpenAI integration automatically generates concise, actionable summaries of incidents. This analysis not only captures the essence of the incident quickly but also helps teams ensure no critical details are overlooked when documenting and communicating incident impacts.

AI-generated titles and descriptions for Grafana dashboards

No more wondering how to summarize all the data you packed into your dashboard in one title. Grafana includes a new AI-powered tool that automatically summarizes the information in your panels and dashboards and creates detailed titles and descriptions for your dashboards.

Learn how to enable the Grafana LLM plugin — and get started with all of these tools and more — in your Grafana Cloud stack today.

To learn more about AI/ML and Grafana, check out our GrafanaCON 2024 session “AI/ML + Grafana: How to create intelligent Grafana apps leveraging LLMs” available on demand now.

Grafana Cloud is the easiest way to get started with metrics, logs, traces, dashboards, and more. We have a generous forever-free tier and plans for every use case. Sign up for free now!