Prometheus.io on Azure Data Explorer
Prometheus is a well known open-source monitoring solution which is being used widely across the industry. Besides many other features it has a multi dimensional data model and a flexible query language named PromQL. The default way of storing data is on the local storage of the Prometheus server. Because of various reasons like high availability and scalability the project provides an interface for the integration of remote storages. This article is about the implementation of a new remote endpoint based on Azure Data Explorer.
I have chosen Azure Data Explorer (ADX) because it is a fully managed PaaS service on Azure which has been designed for data exploration and time-series analysis. It handles large amounts of structured, semi-structured (JSON-like nested types) and unstructured (free-text) data equally well. It allows you to search for specific text terms, locate particular events, and perform metric-style calculations on structured data. ADX bridges the worlds of unstructured text logs and structured numbers and dimensions by extracting values in runtime from free-form text fields. Data exploration is simplified by combining fast text indexing, column store, and time series operations. Last but not least it's heavily used within Microsoft to store telemetry data and it powers great services like Azure Monitor or Time Series Insights. Some more details can be found in the documentation.
In order to create a new remote storage I picked Azure Functions. They're serverless and it's very convenient to develop simple HTTP trigger. Because of the perfect integration into my development stack I've chosen to use Azure Application Insights for monitoring and Azure Key Vault to store all necessary secrets. No Lambda architecture without a reliable event streaming platform like Azure Event Hubs. It serves as a buffer and offeres the possibility to attach many other services in a seamless fashion. At the bottom of this post you'll find potential extensions of this architecture.
The code can be found on my GitHub repository.
The ADX cluster has been set up with the smallest possible node setup. After the infrastructure has been set up and the code for the funtions has been deployed three things are left to be done...
Table definition in ADX
The table definition follows the principle of derived tables. This means creating a raw data table and use update policies to automatically populate another table which is specified for the use-case.
Connecting EventHub with ADX
This is straight forward following the documentation. You just need to specify the Event Hub you intent to consume from, the format of the messages plus the ADX table and corresponding column mapping.
Configuring Prometheus
Before starting the Prometheus server it needs to be configured with the new remote_write and remote_read urls.
Testing = fun
Now we are ready to go... everything is in place and you just need to start your various Prometheus server (I used 3)! Additionally I created a small Grafana dashboard to produce a decent amout of read request in parallel.
After a couble of hours I took a screenshot from Application Map of Application Insights. On the left you see the read function and some of the connected services (AAD & ADX). It has been called ~19.000 times. On the right sight you see the executions of the write function which has been called ~1.800 times. What's missing in this graph is Azure Key Vault and Event Hub.
Besides the application map took a closer look on the performance of the functions. The "Read" function returned in average after 194ms and the "Write" function after 108ms. The distribution of the duration looks sane. In the operation times window you'll notice that I added some Prometheus server over time.
The following graph shows the amount of time series that have been interchanged in the period of testing. In a little more than four hours the three Prometheus server created ~885k time series samples. The Grafana dashboards (constant reload every 5s) requested via Prometheus & the "Read" function more then 25m objects.
This is one of the Grafana dashboards which has been constantly reloaded every 5 seconds.
If you go a little deeper and explore the details of a visualization you get this graph which a little bit the amout of queries that have been processed by Azure Data Explorer.
One of the many KQL queries which have been executed on ADX is shown in the following picture. As you can see it created almost no load on the system.
Next Steps
Once that this setup is in place one could invest in a full-blown Lambda architecture. I added a batch layer and a hot path including another visualization interface using PowerBI.
Furthermore one could explore the results in ADX directly using the Kusto Explorer:
Conclusion
In this article I've briefly shown how to connect Prometheus with Azure Data Explorer. I've chosen a serverless event streaming infrastructure using Azure Functions and Event Hub. The metrics in Application Insights show extraordinary performance even if the smallest possible ADX cluster configuration has been selected.
Have fun with testing it on your metrics!
Principal Software Engineer @ Esri | Distributed Systems | Cloud Computing | Azure | Location Intelligence | Geospatial
4yHenning Rauch I am trying to do the same but without storing the original timeseries object in a column expecting that I can recreate it on the fly in my read function. My issue is that although the json I create on the fly is identical to what your solution stored in adx i keep getting "snappy corrupt input" errors in Grafana. Any ideas?
Very cool! Also checkout we actually have a way directly get prometheus metrics in Azure Monitor for Containers which also uses ADX on the backend. https://azure.microsoft.com/en-us/updates/azure-monitor-for-containers-prometheus-integration-is-now-in-preview/
Group Product Manager @ BigPanda
4yNice!
VP Development at SafeBreach
4yA complete e2e integration of a telemetry back-end. Very cool.
Founder and CTO
4yWowsie.