outlier detection
Summary
Outlier detection is a passive health check that tracks which instances assigned in a cluster are up or down, using user-defined rules. If a cluster is found to be down, the proxy will eject the unresponsive instance, diverting traffic preventing timeouts and disruptions throughout the mesh. After a specified amount of time, that instance will come back online, however the ejection time grows with each subsequent ejection.
Example Object
Fields
interval_msec
interval_msec
The time interval between ejection analysis sweeps. This can result in both new ejections due to success rate outlier detection as well as hosts being returned to service. Defaults to 10s and must be greater than 0.
base_ejection_time_msec
base_ejection_time_msec
The base time that a host is ejected for. The real time is equal to the base time multiplied by the number of times the host has been ejected. Defaults to 30s. Setting this to 0 means that no host will be ejected for longer than interval_msec
.
max_ejection_percent
max_ejection_percent
The maximum % of an upstream cluster that can be ejected due to outlier detection. Defaults to 10% but will always eject at least one host.
consecutive_5xx
consecutive_5xx
The number of consecutive 5xx responses before a consecutive 5xx ejection occurs. Defaults to 5. Setting this to 0 effectively turns off the consecutive 5xx detector.
enforcing_consecutive_5xx
enforcing_consecutive_5xx
The % chance that a host will be actually ejected when an outlier status is detected through consecutive 5xx. This setting can be used to disable ejection or to ramp it up slowly. Defaults to 100.
enforcing_success_rate
enforcing_success_rate
The % chance that a host will be actually ejected when an outlier status is detected through success rate statistics. This setting can be used to disable ejection or to ramp it up slowly. Defaults to 100.
success_rate_minimum_hosts
success_rate_minimum_hosts
The number of hosts in a cluster that must have enough request volume to detect success rate outliers. If the number of hosts is less than this setting, outlier detection via success rate statistics is not performed for any host in the cluster. Defaults to 5. Setting this to 0 effectively triggers the success rate detector regardless of the number of valid hosts during an interval (as determined by success_rate_request_volume
).
success_rate_request_volume
success_rate_request_volume
The minimum number of total requests that must be collected in one interval (as defined by the interval duration) to include this host in success rate based outlier detection. If the volume is lower than this setting, outlier detection via success rate statistics is not performed for that host. Defaults to 100.
success_rate_stdev_factor
success_rate_stdev_factor
This factor is used to determine the ejection threshold for success rate outlier ejection. The ejection threshold is the difference between the mean success rate, and the product of this factor and the standard deviation of the mean success rate: mean - (stdev * success_rate_stdev_factor). This factor is divided by a thousand to get a double. That is, if the desired factor is 1.9, the runtime value should be 1900. Defaults to 1900. Setting this to 0 effectively turns off the success rate detector.
consecutive_gateway_failure
consecutive_gateway_failure
The number of consecutive gateway failures (502, 503, 504 status or connection errors that are mapped to one of those status codes) before a consecutive gateway failure ejection occurs. Defaults to 5.
enforcing_consecutive_gateway_failure
enforcing_consecutive_gateway_failure
The % chance that a host will be actually ejected when an outlier status is detected through consecutive gateway failures. This setting can be used to disable ejection or to ramp it up slowly. Defaults to 0.
Last updated
Was this helpful?