# Prometheus Store

All collected metrics are stored in a [Prometheus](https://prometheus.io/) server inside the mesh. This store can be accessed by both APIs and a built-in UI. To access the UI, visit the Prometheus URL from the Intelligence 360 Application. The default path of the UI is `/services/prometheus/latest`.

> NOTE: you can always find the URLs of core Grey Matter components, like the Prometheus base URL, with the [toggles](/grey-matter-documentation/usage/sense/intelligence-360.md#toggles) path.

## Metrics

Grey Matter aggregates metrics from every instance of every Service throughout the Fabric mesh and presents them for insight and analysis. The main key indicators are brought forth in the [historical](/grey-matter-documentation/usage/sense/historical.md) and [instance](/grey-matter-documentation/usage/sense/instance.md) views of the Intelligence 360 Application, but a great deal more can be accessed whenever needed.

From the UI, you can execute queries against the collected metrics and graph the results.

![Prometheus Query](/files/-M3sm6s9ZhuP0TR3sA9E)

![Prometheus Graph](/files/-M3sm6sBbKJTHyCf6tSZ)

### Querying

In addition to the UI, Prometheus exposes `/api/{version}/query` to be used as an API endpoint. This can be used to pull historical metrics for reporting and custom analysis. The examples below demonstrate the types of queries that can be performed, but a full explanation of the options available can be found in the [Prometheus Documentation](https://prometheus.io/docs/prometheus/latest/querying/basics/).

#### Using prerecorded rules

The Prometheus server deployed with Grey Matter comes with many useful [recording rules](https://prometheus.io/docs/prometheus/latest/configuration/recording_rules/) to access frequently needed or computationally expensive expressions. You can see a list of all available recording rules by navigating to the Status > Rules page in the Prometheus UI, or by accessing the `/rules` route.

These rules can be used as is, or built upon to form more complex queries. For example, the `overviewQueries:avgUpPercent:avg` rule computes the up time for a service at each scrape interval (usually every 15s) and stores it as a new timeseries. We can combine this new timeseries with Prometheus's built in [`avg_over_time` function](https://prometheus.io/docs/prometheus/latest/querying/functions/#aggregation_over_time) to return the percentage of uptime for the `edge` service over the past hour:

```bash
avg_over_time(overviewQueries:avgUpPercent:avg{job="edge"}[1h]) * 100
```

Running this query returns an [instant vector result](https://prometheus.io/docs/prometheus/latest/querying/basics/#instant-vector-selectors). The `value` array contains a timestamp representing the instant that the metric was captured and a corresponding percentage value.

```bash
$ curl https://{prometheus_endpoint}/api/v1/query --data-urlencode "query=avg_over_time(overviewQueries:avgUpPercent:avg{job='edge'})*100"

{
  "status": "success",
  "data": {
    "resultType": "vector",
    "result": [
      {
        "metric": {
          "job": "edge"
        },
        "value": [
          1598394589.487,
          "100"
        ]
      }
    ]
  }
}
```

#### Querying metrics directly

Grey Matter metrics can also be queried directly. For example, we can find the system CPU usage for all services that Prometheus monitors by running the following query:

```bash
system_cpu_pct
```

This gives us an array of [instant vector results](https://prometheus.io/docs/prometheus/latest/querying/basics/#instant-vector-selectors).

```bash
$ curl https://{prometheus_endpoint}/api/v1/query?query=system_cpu_pct

{
  "status": "success",
  "data": {
    "resultType": "vector",
    "result": [
      {
        "metric": {
          "__name__": "system_cpu_pct",
          "instance": "10.0.179.118:8080",
          "job": "example"
        },
        "value": [
          1598392226.087,
          "12.596401008059724"
        ]
      },
      {
        "metric": {
          "__name__": "system_cpu_pct",
          "instance": "10.0.158.182:8080",
          "job": "edge"
        },
        "value": [
          1598392226.087,
          "5.236907732468766"
        ]
      }
    }
   }
   ...
```

To narrow down results, we can provide a job parameter to the request. The job should map to the [discovered proxy name](/grey-matter-documentation/usage/fabric/api/proxy.md#name) of the service:

```bash
system_cpu_pct{job='edge'}
```

```bash
$ curl https://{prometheus_endpoint}/api/v1/query --data-urlencode "query=system_cpu_pct{job='edge'}"

{
  "status": "success",
  "data": {
    "resultType": "vector",
    "result": [
      {
        "metric": {
          "__name__": "system_cpu_pct",
          "instance": "10.0.138.150:8080",
          "job": "edge"
        },
        "value": [
          1598392692.487,
          "5.01253132453294"
        ]
      }
    ]
  }
}
```

Similarly, the request duration for a specific route can be queried:

```bash
http_request_duration_seconds_sum{key='/services/catalog/latest'}
```

```bash
$ curl https://{prometheus_endpoint}/api/v1/query --data-urlencode "query=http_request_duration_seconds_sum{key='/services/catalog/latest'}"

{
  "status": "success",
  "data": {
    "resultType": "vector",
    "result": [
      {
        "metric": {
          "__name__": "http_request_duration_seconds_sum",
          "instance": "192.168.37.172:8081",
          "job": "edge",
          "key": "/services/catalog/latest",
          "method": "GET",
          "status": "200"
        },
        "value": [
          1598409389.582,
          "1.5646367309999996"
        ]
      },
      {
        "metric": {
          "__name__": "http_request_duration_seconds_sum",
          "instance": "192.168.37.172:8081",
          "job": "edge",
          "key": "/services/catalog/latest",
          "method": "GET",
          "status": "503"
        },
        "value": [
          1598409389.582,
          "0.029742157"
        ]
      }
    ]
  }
}
```

## Alerting

Alerting is currently performed directly through Prometheus [alerting configuration](https://prometheus.io/docs/alerting/overview/). This requires the setup of an AlertManager and configuration of alerting rules. In the Prometheus configuration of the deployment.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://greymatter.gitbook.io/grey-matter-documentation/usage/sense/prom.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
