# Troubleshooting

## SPIRE Setup

Note that the SPIRE server **must** be running all containers before any other pods in the installation are created. Otherwise, any pods that are already existing when the server is installed will not have identities created. If you suspect that on install this may not have been the case, try deleting all pods (other than the Spire ones) and letting new ones come up.

If both the [server](#spire-server) and [agent](#spire-agent) appear to be behaving correctly, the problem could be with the sidecars connecting via [SDS](https://greymatter.gitbook.io/grey-matter-documentation/1.3/usage/secrets#sds) - follow the steps to [troubleshoot SPIRE sidecar configurations](#spire-sidecar-configuration).

### SPIRE Server

To verify that entries are being created for identities in the mesh run:

```bash
kubectl exec -it server-0 -n spire -c server -- /opt/spire/bin/spire-server entry show -registrationUDSPath /run/spire/socket/registration.sock
```

You'll see a list of all of the SPIFFE identities existing in the mesh. If the identity of your service (or any/all services) is missing, this will be a problem. First try deleting any pod that is missing from the list of entries. When the new pod is created, rerun the command to see if the entry is now there. If it is not, there is likely a deeper problem with the Spire server and it's permissions within your environment.

If all entries appear to be there, one for each of the core services, one with SPIFFE id `spiffe://<spire-trust-domain>/agent` for each node in the k8s cluster, and one for any service that you have launched into the mesh, exit the server container and check on the [Spire agents](#spire-agent).

### SPIRE Agent

To check that the agents are behaving properly, get the name for any agent pod `kubectl get pods -n spire` and run:

```bash
kubectl exec -it <agent-pod-name> -n spire -- /opt/spire/bin/spire-agent api fetch -socketPath /run/spire/socket/agent.sock
```

If you do not see a SVID listed for the agent, attestation from the agent to the server failed.

## SPIRE Sidecar Configuration

In a Spire enabled deployment, your sidecar's should be configured to get their certificates from the Spire server via the [secret](https://greymatter.gitbook.io/grey-matter-documentation/1.3/reference/api/fabric-api/listener/secret) configuration field on their [listener](https://greymatter.gitbook.io/grey-matter-documentation/1.3/reference/api/fabric-api/listener) objects. Using the [greymatter CLI](https://greymatter.gitbook.io/grey-matter-documentation/1.3/installation/commands-cli), run `greymatter get listener <sidecar-listener>` for your sidecar's listener. It should have a `secret` object configured.

If the `/stats` admin endpoint in the [verify mTLS](https://greymatter.gitbook.io/grey-matter-documentation/1.3/troubleshoot#verify-mtls-configuration) section indicated values for `ssl.fail_verify_san` and/or you saw `TLS error: 268435581:SSL routines:OPENSSL_internal:CERTIFICATE_VERIFY_FAILED` in the sidecar debug logs, check that the sidecar listener secret has the correct [subject\_names](https://greymatter.gitbook.io/grey-matter-documentation/1.3/reference/api/fabric-api/listener/secret#subject_names) configured. This value should be a list of **all** SPIFFE identities that can communicate with the sidecar. If the sidecar only needs to be reached by edge, the only value should be `spiffe://<spire-trust-domain>/edge`. If another service will need to make egress requests to this sidecar, there should be a list of those identities.

If this is not the problem, to verify that your sidecar has received its certs, execute into the container and run `curl localhost:8001/certs`. You should see something like the following:

```javascript
{
 "certificates": [
  {
   "ca_cert": [
    {
     "path": "\u003cinline\u003e",
     "serial_number": "5e6bb7c3",
     "subject_alt_names": [],
     "days_until_expiration": "3400",
     "valid_from": "2020-03-13T16:41:39Z",
     "expiration_time": "2030-03-11T16:41:39Z"
    }
   ],
   "cert_chain": [
    {
     "path": "\u003cinline\u003e",
     "serial_number": "e47a12e8c054b2c537d8ee647a3a359d",
     "subject_alt_names": [
      {
       "uri": "spiffe://quickstart.greymatter.io/fibonacci"
      }
     ],
     "days_until_expiration": "0",
     "valid_from": "2020-11-17T21:54:00Z",
     "expiration_time": "2020-11-17T22:54:10Z"
    }
   ]
  }
 ]
}
```

If the certificates list is empty, there is a problem getting certs from the Spire agents. Run `curl localhost:8001/config_dump` and check to see if there are any `dynamic_warming_secrets`.

If there are no secrets in the certificates list, and no `dynamic_warming_secrets`, your listener [secret](https://greymatter.gitbook.io/grey-matter-documentation/1.3/reference/api/fabric-api/listener/secret) configuration is likely missing or the sidecar is incorrectly configured. Go back to [this step](https://greymatter.gitbook.io/grey-matter-documentation/1.3/guides/launch-service-k8s#listener) and verify that the secret and mesh configs are set correctly.

If your `dynamic_warming_secrets` section is not empty, this is a problem.

```javascript
"dynamic_warming_secrets": [
  {
    "name": "spiffe://quickstart.greymatter.io/fibonacci",
    "version_info": "uninitialized",
    "last_updated": "2020-11-18T17:49:39.419Z",
    "secret": {
      "@type": "type.googleapis.com/envoy.extensions.transport_sockets.tls.v3.Secret",
      "name": "spiffe://quickstart.greymatter.io/fibonacci"
    }
  }
]
```

There is likely a misconfigured [listener secret](https://greymatter.gitbook.io/grey-matter-documentation/1.3/reference/api/fabric-api/listener/secret). See what the values listed for `"name"` in the `dynamic_warming_secrets` are. This should **only** be the identity of that particular sidecar - `spiffe://<spire-trust-domain>/<sidecar-name>`. For example, for the fibonacci sidecar it should be `spiffe://<spire-trust-domain>/fibonacci`, for edge it should be `spiffe://<spire-trust-domain>/edge`.

If this section does have only its own identity, try deleting the pod and retry the request when a new one comes up.

If this section contains an identity for a different sidecar, a secret is misconfigured. Check the listener object `greymatter get listener <listener-key>` for this sidecar's ingress and verify that the value in `secret_name` is its own identity `spiffe://<spire-trust-domain>/<service-name>` (e.g. `spiffe://<spire-trust-domain>/fibonacci` for fibonacci). If the sidecar has an egress route to another sidecar in the mesh i.e. edge to fibonacci cluster, it could be a misconfigured [cluster secret](https://greymatter.gitbook.io/grey-matter-documentation/1.3/reference/api/fabric-api/cluster#secret). In this case, also check any egress cluster objects `greymatter get cluster <cluster-key>` and verify again that the value in `secret_name` is its **own identity** `spiffe://<spire-trust-domain>/<service-name>`.

If everything with the sidecar's listener and cluster secret configurations look correct and the above steps don't indicate any problems, try the [troubleshooting Spire server](#spire-server) section and see if there is an entry for your service in the server. If there isn't, try uninstalling your service/sidecar and following [this guide again](https://greymatter.gitbook.io/grey-matter-documentation/1.3/guides/launch-service-k8s) step by step.

## Other Issues

If you are still running into issues and need assistance please contact us at [Grey Matter Support](https://support.greymatter.io/support/home).
