Prometheus.io on Azure Data Explorer

Prometheus.io on Azure Data Explorer

Prometheus is a well known open-source monitoring solution which is being used widely across the industry. Besides many other features it has a multi dimensional data model and a flexible query language named PromQL. The default way of storing data is on the local storage of the Prometheus server. Because of various reasons like high availability and scalability the project provides an interface for the integration of remote storages. This article is about the implementation of a new remote endpoint based on Azure Data Explorer.

I have chosen Azure Data Explorer (ADX) because it is a fully managed PaaS service on Azure which has been designed for data exploration and time-series analysis. It handles large amounts of structured, semi-structured (JSON-like nested types) and unstructured (free-text) data equally well. It allows you to search for specific text terms, locate particular events, and perform metric-style calculations on structured data. ADX bridges the worlds of unstructured text logs and structured numbers and dimensions by extracting values in runtime from free-form text fields. Data exploration is simplified by combining fast text indexing, column store, and time series operations. Last but not least it's heavily used within Microsoft to store telemetry data and it powers great services like Azure Monitor or Time Series Insights. Some more details can be found in the documentation.

In order to create a new remote storage I picked Azure Functions. They're serverless and it's very convenient to develop simple HTTP trigger. Because of the perfect integration into my development stack I've chosen to use Azure Application Insights for monitoring and Azure Key Vault to store all necessary secrets. No Lambda architecture without a reliable event streaming platform like Azure Event Hubs. It serves as a buffer and offeres the possibility to attach many other services in a seamless fashion. At the bottom of this post you'll find potential extensions of this architecture.

The code can be found on my GitHub repository

Es wurde kein Alt-Text für dieses Bild angegeben.

The ADX cluster has been set up with the smallest possible node setup. After the infrastructure has been set up and the code for the funtions has been deployed three things are left to be done...

Table definition in ADX

The table definition follows the principle of derived tables. This means creating a raw data table and use update policies to automatically populate another table which is specified for the use-case.

Es wurde kein Alt-Text für dieses Bild angegeben.

Connecting EventHub with ADX

Es wurde kein Alt-Text für dieses Bild angegeben.

This is straight forward following the documentation. You just need to specify the Event Hub you intent to consume from, the format of the messages plus the ADX table and corresponding column mapping.

Configuring Prometheus

Before starting the Prometheus server it needs to be configured with the new remote_write and remote_read urls.

Es wurde kein Alt-Text für dieses Bild angegeben.

Testing = fun

Now we are ready to go... everything is in place and you just need to start your various Prometheus server (I used 3)! Additionally I created a small Grafana dashboard to produce a decent amout of read request in parallel.

After a couble of hours I took a screenshot from Application Map of Application Insights. On the left you see the read function and some of the connected services (AAD & ADX). It has been called ~19.000 times. On the right sight you see the executions of the write function which has been called ~1.800 times. What's missing in this graph is Azure Key Vault and Event Hub.

Es wurde kein Alt-Text für dieses Bild angegeben.

Besides the application map took a closer look on the performance of the functions. The "Read" function returned in average after 194ms and the "Write" function after 108ms. The distribution of the duration looks sane. In the operation times window you'll notice that I added some Prometheus server over time.

Es wurde kein Alt-Text für dieses Bild angegeben.

The following graph shows the amount of time series that have been interchanged in the period of testing. In a little more than four hours the three Prometheus server created ~885k time series samples. The Grafana dashboards (constant reload every 5s) requested via Prometheus & the "Read" function more then 25m objects.

Es wurde kein Alt-Text für dieses Bild angegeben.

This is one of the Grafana dashboards which has been constantly reloaded every 5 seconds.

Es wurde kein Alt-Text für dieses Bild angegeben.

If you go a little deeper and explore the details of a visualization you get this graph which a little bit the amout of queries that have been processed by Azure Data Explorer.

Es wurde kein Alt-Text für dieses Bild angegeben.

One of the many KQL queries which have been executed on ADX is shown in the following picture. As you can see it created almost no load on the system. 

Es wurde kein Alt-Text für dieses Bild angegeben.

Next Steps

Once that this setup is in place one could invest in a full-blown Lambda architecture. I added a batch layer and a hot path including another visualization interface using PowerBI.

Es wurde kein Alt-Text für dieses Bild angegeben.

Furthermore one could explore the results in ADX directly using the Kusto Explorer:

Es wurde kein Alt-Text für dieses Bild angegeben.

Conclusion

In this article I've briefly shown how to connect Prometheus with Azure Data Explorer. I've chosen a serverless event streaming infrastructure using Azure Functions and Event Hub. The metrics in Application Insights show extraordinary performance even if the smallest possible ADX cluster configuration has been selected.

Have fun with testing it on your metrics!

Anargyros Tomaras

Principal Software Engineer @ Esri | Distributed Systems | Cloud Computing | Azure | Location Intelligence | Geospatial

4y

Henning Rauch I am trying to do the same but without storing the original timeseries object in a column expecting that I can recreate it on the fly in my read function. My issue is that although the json I create on the fly is identical to what your solution stored in adx i keep getting "snappy corrupt input" errors in Grafana. Any ideas?

Like
Reply

Very cool! Also checkout we actually have a way directly get prometheus metrics in Azure Monitor for Containers which also uses ADX on the backend. https://azure.microsoft.com/en-us/updates/azure-monitor-for-containers-prometheus-integration-is-now-in-preview/

Orr Ganani

Group Product Manager @ BigPanda

4y

Nice!

Yossi Attas

VP Development at SafeBreach

4y

A complete e2e integration of a telemetry back-end. Very cool.

To view or add a comment, sign in

Insights from the community

Others also viewed

Explore topics