Metrics
The Grey Matter Metrics Filter sets up a local metrics server to gather and report real-time statistics for the sidecar, microservice, and host system.
Gathered Metrics
Total Stats
metrics version
total requests
total HTTP
total HTTPs
total RPC
total RPC/TLS
total requests
total 200
total 2xx
latency (avg)
latency (count)
latency max
latency min
latency sum
latency p50
latency p90
latency p95
latency p99
latency p9990
latency p9999
number of errors
incoming throughput
outgoing throughput
Route Stats
For each route that is addressed, the following stats will be computed and reported.
total requests
total 200
total 2xx
latency (avg)
latency (count)
latency max
latency min
latency sum
latency p50
latency p90
latency p95
latency p99
latency p9990
latency p9999
number of errors
incoming throughput
outgoing throughput
Host Stats
number of goroutines
start time
CPU percent used
CPU cores on system
os
os Architecture
memory available
memory used
memory used %
process memory used
Prometheus
Optionally, this filter can serve the computed statistics in a form suitable for scraping by Prometheus. The prometheus endpoint will be hosted at {METRICS_HOST}:{METRICS_PORT}{METRICS_PROMETHEUS_URI_PATH}, which can then be scraped directly through the supported Prometheus service discovery mechanisms.
AWS CloudWatch
The metrics filter can also push the compiled statistics directly to AWS Cloudwatch. This allows the Grey Matter Proxy metrics to be directly used to trigger things like AutoScale actions or just for tighter monitoring directly in AWS.
Filter Configuration Option
metrics_port
Integer
8081
Port the metrics server listens on
metrics_host
String
0.0.0.0
Host the metrics server listens on
metrics_dashboard_uri_path
String
/metrics
The HTTP path to query JSON metrics data
metrics_prometheus_uri_path
String
/prometheus
The HTTP path to be scraped by Prometheus
prometheus_system_metrics_interval_seconds
Integer
15
metrics_ring_buffer_size
Integer
4096
Size of the cache of active metrics data
metrics_key_function
String
""
Function to provide internal rollup of URL paths when reporting metrics
metrics_key_depth
String
"1"
Truncate URLs to the first path section
use_metrics_tls
Boolean
false
If true, metrics server
uses TLS
server_ca_cert_path
String
SSL Trust file to use when serving metrics over TLS
server_cert_path
String
SSL Certificate to use when serving metrics over TLS
server_key_path
String
SSL Private Key file to use when serving metrics over TLS
enable_cloudwatch
Boolean
false
If true, report metrics to AWS Cloudwatch
cw_reporting_interval_seconds
Integer
Interval to send metrics to AWS Cloudwatch
cw_namespace
String
Namespace for Cloudwatch Metrics
cw_dimensions
String
Dimensions to report to Cloudwatch
cw_metrics_routes
String
URI paths to send metrics for
cw_metrics_values
String
Metrics keys to send metrics for
cw_debug
Boolean
false
Verbose debugging for Cloudwatch connection
aws_region
String
AWS region for access
aws_access_key_id
String
AWS access key
aws_secret_access_key
String
AWS Secrete Access Key
aws_session_token
String
AWS Session Token
aws_profile
String
AWS Profile to use for login
aws_config_file
String
Location on disk of AWS config file
Example Configuration
Example Responses
/metrics
/prometheus
Per-Route configuration
See Routing.
Setting metrics_key_depth value
metrics_key_depth valueTypically, the greater metrics_key_depth, the finer-grained metrics you will end up with for analysis. However, there are some tradeoffs to consider.
Edge Proxy
As you see in the gm.metrics filter documentation, metrics_key_depth will be set to 1 by default. The resulting metrics for an edge proxy would look something like this:

Note that key field above only goes down 1 subdirectory. Does this provide enough granularity of the information? It depends.
Let's say we have following endpoints:
https://greymatter.io/apis/my-service/stores/https://greymatter.io/apis/my-service/users/37https://greymatter.io/apis/another-service/featured/2020/09https://greymatter.io/apis/another-service/home.html
With metrics_key_depth of 1, the average response time for the above routes get rolled up to one key:
/apis
If you chose metrics_key_depth of 2, the same URLs get rolled up to two:
/apis/my-service/apis/another-service
This would likely give you an idea of the average response time for each micro service. If URLs are structured as something like https://[domain]/[service]/ in your environment, you can get the same granularity of the information for metrics_key_depth of 1 (i.e. key="/my-service" and key="/another-service").
If you chose metrics_key_depth of 3, the URLs in the example would get rolled up to:
/apis/my-service/stores//apis/my-service/users//apis/another-service/featured//apis/another-service/home.html
These look fine for these example URLs. But if URLs are structured like https://[domain]/[service]/ and my-service has millions of users, then you will end up with keys that look like: /my-service/users/[id] for each and every single user IDs - which will be millions.
The motivation behind choosing the default value of 1 is to minimize the size of the data storage. As stated in Prometheus' best practices:
CAUTION: Remember that every unique combination of key-value label pairs represents a new time series, which can dramatically increase the amount of data stored. Do not use labels to store dimensions with high cardinality (many different label values), such as user IDs, email addresses, or other unbounded sets of values.
Keep in mind that this example is for the edge proxy where requests for many different microservices will flow through just the same. For this reason, the safe option would be to choose a small number for metrics_key_depth to prevent the cardinality explosions due to a service that may get added in future.
Sidecar Proxy
Service sidecars can also have gm.metrics filter. Because this is specific to a service it sits next to, we can go down a little deeper if we wanted to.
Let's take my-service from the first example:
https://greymatter.io/apis/my-service/stores/https://greymatter.io/apis/my-service/users/
metrics_key_depth of 1 will give us:
/stores/users
It is typical to have a mesh route object that will rewrite a path /apis/my-service/ to / before forwarding the request to a side car. So even though we have a depth of 1, it still gives us timeseries data with finer-grained path.
Balancing between data storage and data granularity
In short, the greater the metrics_key_depth, the faster the data storage will fill up. However, if highly rolled up "average" metrics will not give users the information they need, then there is no point in collecting them. In these scenarios, other strategies besides reducing the metrics_key_depth value should be considered (such as data retention periods or shipping to cheaper storage).
Last updated
Was this helpful?