# CLI

A command-line interface called `gmdatatool.sh` supports bulk and automated scenarios of using gm-data. This eases the implementation burden for some very common tasks, such as:

* Upload a full tree of data, such as a full React application
* Create initial directories for the uploads to go into
* Run from desktops or servers

## Installation

* Go to the [Grey Matter Nexus repository](https://nexus.greymatter.io/#browse/browse:raw) and selected greymatter -> gm-data to download the gm-data bundle consistent with the deployed version of gm-data with which you will be communicating.
* Un-package the .tar bundle and give the `gmdatatool` binary permissions, corresponding with your operating system.

### Mac OS

```bash
  $ tar -C gmdata -xvf gm-data-1.1.0.tar.gz
  ...
  $ chmod +x gmdata/gmdatatool.osx
  $ gmdata/gmdatatool.osx --version
  gmdatatool version 0.0.0
```

## Configuration

The CLI commands all need to be able to connect in an authenticated manner, so there are environment variables associated with connecting. Here is an example of connecting to a PKI enabled setup. The environment variables only need to be set once in a script. After environment variables are setup, create the below script locally and name it `gmdatatool.sh`:

```bash
#!/bin/bash
# Name this script:
# gmdatatool.sh
## Environmental setup - depends on how gm-data TLS and address is configured
(
u=`uname`
if [ "${u}" == "Darwin" ]
then
  b64="base64"
else
  b64="base64 -w 0"
fi
export MONGO_USE_TLS=false
export CLIENT_PORT=10808
export CLIENT_CN=greymatter
export CLIENT_ADDRESS=a4bbfef46c6b4401aade067cac0f31f1-1743204055.us-east-1.elb.amazonaws.com
export CLIENT_HOST=$CLIENT_ADDRESS
export CLIENT_PREFIX=/services/data/latest
export CLIENT_USE_TLS=true
echo "Q" | openssl s_client -servername $CLIENT_CN -connect $CLIENT_HOST:$CLIENT_PORT > dgchain 2>&1
# wherever your  are
export CLIENT_CERT=`cat  quickstart.crt    | ${b64}`
export CLIENT_KEY=`cat  quickstart.key    | ${b64}`
export CLIENT_TRUST=`cat dgchain | ${b64}`
export MONGO_USE_TLS=false
./gmdatatool.osx $*
)
```

{% hint style="info" %}
If you do not have an intermediate CA certificate locally, you can download it using `openssl`:

```bash
openssl s_client -showcerts -connect a4bbfef46c6b4401aade067cac0f31f1-1743204055.us-east-1.elb.amazonaws.com:10808 </dev/null 2>/dev/null|openssl x509 -outform PEM >intermediate.crt
```

{% endhint %}

## Usage - CLI

Read root directory:

```bash
./gmdatatool.sh get /list/1/home
[]
```

By default there isn't anything in gm-data. We can create a quickstart\@deciphernow directory using:

```bash
./gmdatatool.sh mkdir 1/home/quickstart@deciphernow.com
```

Now if we see our files consist of one directory:

```bash
./gmdatatool.sh get /list/1/home
[
  {
    "tstamp": "16111f0d2d56e97e",
    "userpolicy": {
      "label": "CN=quickstart,OU=Engineering,O=Decipher Technology Studios,L=Alexandria,ST=Virginia,C=US"
    },
    "jwthash": "0ce8a07798fe4ea8a6e5b6057b3207781e811cb882eeb27ddd1d063ad60defa7",
    "schemaversion": 10,
    "name": "quickstart@deciphernow.com",
    "action": "U",
    "oid": "16111f0365ea8bb0",
    "parentoid": "161119570d276e9a",
    "expiration": "7fffffffffffffff",
    "checkedtstamp": "16111f0365f41bbd",
    "objectpolicy": {
      "label": "default",
      "requirements": {
        "f": "yield",
        "a": [
          {
            "v": "C"
          },
          {
            "v": "R"
          },
          {
            "v": "U"
          },
          {
            "v": "D"
          },
          {
            "v": "P"
          },
          {
            "v": "X"
          }
        ]
      }
    },
    "derived": {},
    "security": {
      "label": "GMDATA",
      "foreground": "white",
      "background": "green"
    },
    "fulldir": "/0000000000000001/",
    "policy": {
      "policy": [
        "C",
        "R",
        "U",
        "D",
        "P",
        "X"
      ]
    },
    "cluster": "default"
  }
]
```

Let's upload a file to the directory /home/quickstart\@deciphernow\.com/newdir

```bash
# make new dir
./gmdatatool.sh mkdir /home/quickstart@deciphernow.com/newdir
...
# upload file
./gmdatatool.sh upload /home/quickstart@deciphernow.com/newdir intermediate.crt
[
  {
    "tstamp": "16111f63e830d8a6",
    "userpolicy": {
      "label": "CN=quickstart,OU=Engineering,O=Decipher Technology Studios,L=Alexandria,ST=Virginia,C=US"
    },
    "jwthash": "4318f26aafce11b3ea9dba4266712367a0fda4cae1479f372adcf6cb4568ec34",
    "schemaversion": 10,
    "name": "intermediate.crt",
    "action": "C",
    "oid": "16111f63e8252552",
    "parentoid": "16111f541ef0d243",
    "expiration": "7fffffffffffffff",
    "checkedtstamp": "16111f541efa661f",
    "objectpolicy": {
      "requirements": {
        "f": "if",
        "a": [
          {
            "f": "contains",
            "a": [
              {
                "v": "email"
              },
              {
                "v": "quickstart@deciphernow.com"
              }
            ]
          },
          {
            "f": "yield-all"
          },
          {
            "f": "yield",
            "a": [
              {
                "v": "R"
              },
              {
                "v": "X"
              }
            ]
          }
        ]
      }
    },
    "mimetype": "application/x-x509-ca-cert",
    "rname": {
      "Any": "36c/cd/16111f63e80cd36c",
      "S3": "36c/cd/16111f63e80cd36c"
    },
    "size": 2313,
    "derived": {},
    "isfile": true,
    "security": {
      "label": "GMDATA",
      "foreground": "white",
      "background": "green"
    },
    "sha256plain": "e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855",
    "fulldir": "/0000000000000001/",
    "originalobjectpolicy": "(if (contains email quickstart@deciphernow.com)(yield-all)(yield R X))",
    "encrypteddata": {
      "master": "oSolNVlIdl4LDH/vyiIiAwmRPN2DhbPYV6Cw1AqR90JFXNtQTfBVShfvotsLiohxvB6QO++7vppl2WghiY1oUA70zRqe+21xSlqTwWZJISdshf1UAPE0wIX07+crb1Cxw/WJPAa6AyECWPJ8+Z2sjW7eXWdNeqF1pgs="
    },
    "encryptednonce": {
      "master": "tgSHMWmMba57sVu6"
    },
    "cluster": "default"
  }
]
```

## Usage - Client Library

The tool `./gmdatatool.sh` is a special-case use of the go package `github.com/deciphernow/gm-data/client`. The client is based around two important ideas:

* Listening for changes in gm-data, and invoking callbacks when they happen.
* Providing an API to respond to changes. Example uses:
  * Statically generated thumbnails
  * Run AWS Rekognition to upload derived files on images, such as object-labelling.
  * The written back files are json, and they point to the image that they are derived from
* Responding to changes may happen through REST or Kafka.

There is a responder, with REST or Kafka constructors. The REST constructor filters out information based on objectPolicy (ie: it runs as a real user). The Kafka constructor runs on a privileged, unfiltered view of all events that happen on gm-data. Generally, the Kafka view is appropriate for back-end processes. The REST constructor is usable from front-end (ie: not originating from within Fabric itself, possibly even from web browsers calling the `/notifications` endpoint), or back-end.

```go
    // Create a client at the root
    c, err := client.NewRESTResponder(
        logger,
        client.GetURL(),
        getClient(),
        listing.DefaultRootOID,
        policy.CurrentTstamp(),
        1000,
        time.Duration(2)*time.Second,
        client.CLIENT_IDENTITY.Str(),
        func(c *client.Responder, ev *listing.Event) error {
            return nil
        },
    )
    if err != nil {
        log.Printf("create client failed: %v", err)
        panic(err)
    }
```

This responder will poll every second for new information, and get up to 1000 events at a time. The callback allows us to inspect events with our code. Generally, when we see something interesting in the event (`ev`), we call different parts of the API:

```go
    # Get an io.Reader on ev, as it is a file type that we are interested in
    blobData, err := c.StreamOf(ev.Oid, ev.Tstamp)
```

We may then go do something outside the scope of gm-data, such as turn a blob into a json file (ie: submit a jpg and get back a json description of it). Note that when we are doing listen and write-back like this, we typically end up setting `Derived` fields, so that we can track the lineage of *why* the file exists, and *what* created it. We can correlate a jpg of a face with a json about it, so that we can delete them both if we are asked to delete the file.

```go
    m := c.NewWriteMarshaler()
    defer m.Close()
    err = m.Append(&listing.EventArgs{
        Action:       policy.ActionUpdate,
        IsFile:       true,
        ParentOID:    ev.ParentOID,
        Name:         newFname,
        MimeType:     "application/json",
        ObjectPolicy: policy.ForReadAllFull,
        Derived: listing.Derived{
            Oid:    ev.Oid,
            Tstamp: ev.Tstamp,
            Type:   kind,
        },
        Security:      ev.Security,
        BlobAlgorithm: "none",
    }, newFname)
...
    req, err := c.NewWriteRequest(m)
...
    res, evs, err := c.DoWriteRequest(req)
...
```

Functions supported by the client API, all required to respond to changes in gm-data with write-backs of new derived files. For things related to read endpoints:

* NewRESTResponder/NewKafkaResponder - Listen on `/notifications`, which is the critical reason for having a client library, to respond to changes being made in gm-data.
* StreamOf - Get the bytes for an `(oid,tstamp)`, where tstamp is optional, so that you get the latest blob.
* EventOf - Get the properties for an `(oid,tstamp)`, or latest if tstamp is not included.
* DerivedOf - Find out what is already derived from this file. This is how you could know that a thumbnail already exists for a file.
* Self - Discover what we are authenticated as, which is important for troubleshooting.
* HistoryOf - Every event pertaining to an `oid`. This is the lifecycle of the inode, across all changes (including name, parent, policy, security labels, etc).

> Note that more complex paging options are not being used with these simple client libraries.

For things related to the write endpoint, which are a bit more difficult to write directly against the API for yourself than the read endpoints:

* AppendTree - Perform a bulk upload of a large directory, where you have the opportunity to set security labels and policies individually
* Append - A raw append to update an individual file or directory

Example use case:

* GDPR laws require that if a demand is to remove files "about" an individual, that individual can make this demand.
* In order to comply, if we have a jpg with attached metadata that says that the individual is named in the file, then we can issue a delete on *both* files.
* This is possible because we track the `Derived` file pointers.
* The `/derived` endpoint lets us find all files that point to us with a `Derived` pointer, so that we can find an entire tree of files that started from a single input file. Example: `elasticSearchEntry derivedfrom facesIndex, facesIndex derivedfrom jpg`
