Prometheus Store
All collected metrics are stored in a Prometheus server inside the mesh. This store can be accessed by both APIs and a built-in UI. To access the UI, visit the Prometheus URL from the Intelligence 360 Application. The default path of the UI is /services/prometheus/latest
.
NOTE: you can always find the URLs of core Grey Matter components, like the Prometheus base URL, with the toggles path.
Metrics
Grey Matter aggregates metrics from every instance of every Service throughout the Fabric mesh and presents them for insight and analysis. The main key indicators are brought forth in the historical and instance views of the Intelligence 360 Application, but a great deal more can be accessed whenever needed.
From the UI, you can execute queries against the collected metrics and graph the results.


Querying
In addition to the UI, Prometheus exposes /api/{version}/query
to be used as an API endpoint. This can be used to pull historical metrics for reporting and custom analysis. The examples below demonstrate the types of queries that can be performed, but a full explanation of the options available can be found in the Prometheus Documentation.
Using prerecorded rules
The Prometheus server deployed with Grey Matter comes with many useful recording rules to access frequently needed or computationally expensive expressions. You can see a list of all available recording rules by navigating to the Status > Rules page in the Prometheus UI, or by accessing the /rules
route.
These rules can be used as is, or built upon to form more complex queries. For example, the overviewQueries:avgUpPercent:avg
rule computes the up time for a service at each scrape interval (usually every 15s) and stores it as a new timeseries. We can combine this new timeseries with Prometheus's built in avg_over_time
function to return the percentage of uptime for the edge
service over the past hour:
avg_over_time(overviewQueries:avgUpPercent:avg{job="edge"}[1h]) * 100
Running this query returns an instant vector result. The value
array contains a timestamp representing the instant that the metric was captured and a corresponding percentage value.
$ curl https://{prometheus_endpoint}/api/v1/query --data-urlencode "query=avg_over_time(overviewQueries:avgUpPercent:avg{job='edge'})*100"
{
"status": "success",
"data": {
"resultType": "vector",
"result": [
{
"metric": {
"job": "edge"
},
"value": [
1598394589.487,
"100"
]
}
]
}
}
Querying metrics directly
Grey Matter metrics can also be queried directly. For example, we can find the system CPU usage for all services that Prometheus monitors by running the following query:
system_cpu_pct
This gives us an array of instant vector results.
$ curl https://{prometheus_endpoint}/api/v1/query?query=system_cpu_pct
{
"status": "success",
"data": {
"resultType": "vector",
"result": [
{
"metric": {
"__name__": "system_cpu_pct",
"instance": "10.0.179.118:8080",
"job": "example"
},
"value": [
1598392226.087,
"12.596401008059724"
]
},
{
"metric": {
"__name__": "system_cpu_pct",
"instance": "10.0.158.182:8080",
"job": "edge"
},
"value": [
1598392226.087,
"5.236907732468766"
]
}
}
}
...
To narrow down results, we can provide a job parameter to the request. The job should map to the discovered proxy name of the service:
system_cpu_pct{job='edge'}
$ curl https://{prometheus_endpoint}/api/v1/query --data-urlencode "query=system_cpu_pct{job='edge'}"
{
"status": "success",
"data": {
"resultType": "vector",
"result": [
{
"metric": {
"__name__": "system_cpu_pct",
"instance": "10.0.138.150:8080",
"job": "edge"
},
"value": [
1598392692.487,
"5.01253132453294"
]
}
]
}
}
Similarly, the request duration for a specific route can be queried:
http_request_duration_seconds_sum{key='/services/catalog/latest'}
$ curl https://{prometheus_endpoint}/api/v1/query --data-urlencode "query=http_request_duration_seconds_sum{key='/services/catalog/latest'}"
{
"status": "success",
"data": {
"resultType": "vector",
"result": [
{
"metric": {
"__name__": "http_request_duration_seconds_sum",
"instance": "192.168.37.172:8081",
"job": "edge",
"key": "/services/catalog/latest",
"method": "GET",
"status": "200"
},
"value": [
1598409389.582,
"1.5646367309999996"
]
},
{
"metric": {
"__name__": "http_request_duration_seconds_sum",
"instance": "192.168.37.172:8081",
"job": "edge",
"key": "/services/catalog/latest",
"method": "GET",
"status": "503"
},
"value": [
1598409389.582,
"0.029742157"
]
}
]
}
}
Alerting
Alerting is currently performed directly through Prometheus alerting configuration. This requires the setup of an AlertManager and configuration of alerting rules. In the Prometheus configuration of the deployment.
Last updated
Was this helpful?