Health Checks
Last updated
Was this helpful?
Last updated
Was this helpful?
Reference:
Grey Matter supports the configuration of active health checks on an upstream cluster. Health checking is configured per object in the field health_check
, and will be used by Envoy to determine whether or not to route to the cluster. Grey Matter offers two different types of health checking, HTTP and TCP.
Health checking in Grey Matter is set through the health_check
field. This field takes a list of desired health check objects. A cluster with a health check enabled should look like the object below:
timeout_msec
The time in milliseconds to wait for a health check response. If the timeout is reached without a response, the health check attempt will be considered a failure. This value is required and must be greater than 0.
interval_msec
interval_jitter_msec
An optional jitter amount that is added to each interval value calculated by the proxy. Defaults to 0.
unhealthy_threshold
The number of failed health checks required before a host is marked as unhealthy. Note that for http health checking, if a host responds with a 503 status this value is ignored and the host is considered unhealthy immediately.
healthy_threshold
The number of successful health checks required before a host is marked healthy. During startup, only a single successful health check is required to mark a host healthy.
reuse_connection
A boolean value indicating whether or not to reuse a health check connection between health checks. Defaults to true
.
no_traffic_interval_msec
unhealthy_interval_msec
unhealthy_edge_interval_msec
healthy_edge_interval_msec
health_checker
An object that defines the type of health checking to use. This object is required and one and only one of the following fields must be set.
Fields:
http_health_check
Configures the HTTP health check endpoint for each instance in a cluster.
Fields:
host
the value of the host header in the HTTP health check request
defaults to an empty string
if empty, the name of the cluster being health checked will be used.
path
the HTTP path that will be requested during health checking
this value is required and cannot be an empty string
service_name
an optional value which is compared to the X-Envoy-Upstream-Healthchecked-Cluster
header to validate the identity of the health checked cluster
request_headers_to_add
a list of HTTP readers that should be added to each health check request that is sent to the cluster
tcp_health_check
Configures the TCP health check endpoint for each instance in a cluster.
Fields:
send
a base64 encoded string representing an array of bytes to be sent in health check requests
if empty, implies a connect-only health check
receive
an array of base64 encoded strings, each representing an array of bytes that is expected in health check responses
a "fuzzy" matching is preformed when checking the response, such that each binary block must be found and in the order specified, but not necessarily contiguously
If the Envoy log level in the Sidecar is set to debug, the logs will also show when health checking has been implemented.
When active health checking is configured on a cluster, Envoy will route to the target instance if the health check is successful, and will not route to the instance while the health check is failing.
Note: The following fields are required: , , .
The time interval in between health checks after the first health check. The first round of health checks will occur during startup before any traffic is routed to a cluster, so the first interval of health checks will be the value of . This value is required and must be greater than 0.
When a cluster has never had traffic routed to it (ie on startup), this is the interval used for health checking instead of . Once the cluster has been used for traffic routing, the interval will shift to the interval_msec
value. This should be a longer interval, which allows cluster info to be checked without sending large amounts of active health checking traffic for no reason. Defaults to 60s.
When a cluster is marked as unhealthy, this is the interval used for health checking instead of . As soon as the host is marked as healthy, the interval will shift back to the interval_msec
value. Defaults to the value of interval_msec
.
The health check interval used for the first health check immediately after a host is marked as unhealthy. After this initial health check, the interval will shift to . Defaults to the value of unhealthy_interval_msec
.
The health check that is used for the first health check immediately after a host is marked as healthy. After this initial health check, the interval will shift back to the standard . Defaults to the value of interval_msec
.
If health checking is enabled on a cluster, a series of will be reported in its /stats
endpoint, and will look like the following:
Envoy uses the active health check results combined with the service discovery status to make other decisions about routing, more details on this information can be found in the .