CLI

A command-line interface called gmdatatool.sh supports bulk and automated scenarios of using gm-data. This eases the implementation burden for some very common tasks, such as:

  • Upload a full tree of data, such as a full React application

  • Create initial directories for the uploads to go into

  • Run from desktops or servers

Installation

  • Go to the nexus repository and selected greymatter -> gm-data to download the gm-data bundle consistent with the deployed version of gm-data with which you will be communicating.

  • Unpackage the .tar bundle and give the gmdatatool binary permissions, corresponding with your operating system.

Mac OS

  $ tar -C gmdata -xvf gm-data-1.1.1.tar.gz
  ...
  $ chmod +x gmdata/gmdatatool.osx
  $ gmdata/gmdatatool.osx --version
  gmdatatool version 0.0.0

Configuration

The CLI commands all need to be able to connect in an authenticated manner, so there are environment variables associated with connecting. Here is an example of connecting to a PKI enabled setup. The environment variables only need to be set once in a script. After environment variables are setup, create the below script locally and name it gmdatatool.sh:

#!/bin/bash
# Name this script:
# gmdatatool.sh
## Environmental setup - depends on how gm-data TLS and address is configured
(
u=`uname`
if [ "${u}" == "Darwin" ]
then
  b64="base64"
else
  b64="base64 -w 0"
fi
export MONGO_USE_TLS=false
export CLIENT_PORT=10808
export CLIENT_CN=greymatter
export CLIENT_ADDRESS=a4bbfef46c6b4401aade067cac0f31f1-1743204055.us-east-1.elb.amazonaws.com
export CLIENT_HOST=$CLIENT_ADDRESS
export CLIENT_PREFIX=/services/data/latest
export CLIENT_USE_TLS=true
echo "Q" | openssl s_client -servername $CLIENT_CN -connect $CLIENT_HOST:$CLIENT_PORT > dgchain 2>&1
# wherever your  are
export CLIENT_CERT=`cat  quickstart.crt    | ${b64}`
export CLIENT_KEY=`cat  quickstart.key    | ${b64}`
export CLIENT_TRUST=`cat dgchain | ${b64}`
export MONGO_USE_TLS=false
./gmdatatool.osx $*
)

If you do not have an intermediate CA certificate locally, you can download it using openssl:

openssl s_client -showcerts -connect a4bbfef46c6b4401aade067cac0f31f1-1743204055.us-east-1.elb.amazonaws.com:10808 </dev/null 2>/dev/null|openssl x509 -outform PEM >intermediate.crt

Usage - CLI

Read root directory:

./gmdatatool.sh get /list/1/home
[]

By default there isn't anything in gm-data. We can create a quickstart@deciphernow directory using:

./gmdatatool.sh mkdir 1/home/quickstart@deciphernow.com

Now if we see our files consist of one directory:

./gmdatatool.sh get /list/1/home
[
  {
    "tstamp": "16111f0d2d56e97e",
    "userpolicy": {
      "label": "CN=quickstart,OU=Engineering,O=Decipher Technology Studios,L=Alexandria,ST=Virginia,C=US"
    },
    "jwthash": "0ce8a07798fe4ea8a6e5b6057b3207781e811cb882eeb27ddd1d063ad60defa7",
    "schemaversion": 10,
    "name": "quickstart@deciphernow.com",
    "action": "U",
    "oid": "16111f0365ea8bb0",
    "parentoid": "161119570d276e9a",
    "expiration": "7fffffffffffffff",
    "checkedtstamp": "16111f0365f41bbd",
    "objectpolicy": {
      "label": "default",
      "requirements": {
        "f": "yield",
        "a": [
          {
            "v": "C"
          },
          {
            "v": "R"
          },
          {
            "v": "U"
          },
          {
            "v": "D"
          },
          {
            "v": "P"
          },
          {
            "v": "X"
          }
        ]
      }
    },
    "derived": {},
    "security": {
      "label": "GMDATA",
      "foreground": "white",
      "background": "green"
    },
    "fulldir": "/0000000000000001/",
    "policy": {
      "policy": [
        "C",
        "R",
        "U",
        "D",
        "P",
        "X"
      ]
    },
    "cluster": "default"
  }
]

Let's upload a file to the directory /home/quickstart@deciphernow.com/newdir

# make new dir
./gmdatatool.sh mkdir /home/quickstart@deciphernow.com/newdir
...
# upload file
./gmdatatool.sh upload /home/quickstart@deciphernow.com/newdir intermediate.crt
[
  {
    "tstamp": "16111f63e830d8a6",
    "userpolicy": {
      "label": "CN=quickstart,OU=Engineering,O=Decipher Technology Studios,L=Alexandria,ST=Virginia,C=US"
    },
    "jwthash": "4318f26aafce11b3ea9dba4266712367a0fda4cae1479f372adcf6cb4568ec34",
    "schemaversion": 10,
    "name": "intermediate.crt",
    "action": "C",
    "oid": "16111f63e8252552",
    "parentoid": "16111f541ef0d243",
    "expiration": "7fffffffffffffff",
    "checkedtstamp": "16111f541efa661f",
    "objectpolicy": {
      "requirements": {
        "f": "if",
        "a": [
          {
            "f": "contains",
            "a": [
              {
                "v": "email"
              },
              {
                "v": "quickstart@deciphernow.com"
              }
            ]
          },
          {
            "f": "yield-all"
          },
          {
            "f": "yield",
            "a": [
              {
                "v": "R"
              },
              {
                "v": "X"
              }
            ]
          }
        ]
      }
    },
    "mimetype": "application/x-x509-ca-cert",
    "rname": {
      "Any": "36c/cd/16111f63e80cd36c",
      "S3": "36c/cd/16111f63e80cd36c"
    },
    "size": 2313,
    "derived": {},
    "isfile": true,
    "security": {
      "label": "GMDATA",
      "foreground": "white",
      "background": "green"
    },
    "sha256plain": "e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855",
    "fulldir": "/0000000000000001/",
    "originalobjectpolicy": "(if (contains email quickstart@deciphernow.com)(yield-all)(yield R X))",
    "encrypteddata": {
      "master": "oSolNVlIdl4LDH/vyiIiAwmRPN2DhbPYV6Cw1AqR90JFXNtQTfBVShfvotsLiohxvB6QO++7vppl2WghiY1oUA70zRqe+21xSlqTwWZJISdshf1UAPE0wIX07+crb1Cxw/WJPAa6AyECWPJ8+Z2sjW7eXWdNeqF1pgs="
    },
    "encryptednonce": {
      "master": "tgSHMWmMba57sVu6"
    },
    "cluster": "default"
  }
]

Usage - Client Library

The tool ./gmdatatool.sh is a special-case use of the go package github.com/greymatter-io/gm-data/client. The client is based around two important ideas:

  • Listening for changes in gm-data, and invoking callbacks when they happen.

  • Providing an API to respond to changes. Example uses:

    • Statically generated thumbnails

    • Run AWS Rekognition to upload derived files on images, such as object-labelling.

    • The written back files are json, and they point to the image that they are derived from

  • Responding to changes may happen through REST or Kafka.

There is a responder, with REST or Kafka constructors. The REST constructor filters out information based on objectPolicy (ie: it runs as a real user). The Kafka constructor runs on a privileged, unfiltered view of all events that happen on gm-data. Generally, the Kafka view is appropriate for back-end processes. The REST constructor is usable from front-end (ie: not originating from within Fabric itself, possibly even from web browsers calling the /notifications endpoint), or back-end.

    // Create a client at the root
    c, err := client.NewRESTResponder(
        logger,
        client.GetURL(),
        getClient(),
        listing.DefaultRootOID,
        policy.CurrentTstamp(),
        1000,
        time.Duration(2)*time.Second,
        client.CLIENT_IDENTITY.Str(),
        func(c *client.Responder, ev *listing.Event) error {
            return nil
        },
    )
    if err != nil {
        log.Printf("create client failed: %v", err)
        panic(err)
    }

This responder will poll every second for new information, and get up to 1000 events at a time. The callback allows us to inspect events with our code. Generally, when we see something interesting in the event (ev), we call different parts of the API:

    # Get an io.Reader on ev, as it is a file type that we are interested in
    blobData, err := c.StreamOf(ev.Oid, ev.Tstamp)

We may then go do something outside the scope of gm-data, such as turn a blob into a json file (ie: submit a jpg and get back a json description of it). Note that when we are doing listen and write-back like this, we typically end up setting Derived fields, so that we can track the lineage of why the file exists, and what created it. We can correlate a jpg of a face with a json about it, so that we can delete them both if we are asked to delete the file.

    m := c.NewWriteMarshaler()
    defer m.Close()
    err = m.Append(&listing.EventArgs{
        Action:       policy.ActionUpdate,
        IsFile:       true,
        ParentOID:    ev.ParentOID,
        Name:         newFname,
        MimeType:     "application/json",
        ObjectPolicy: policy.ForReadAllFull,
        Derived: listing.Derived{
            Oid:    ev.Oid,
            Tstamp: ev.Tstamp,
            Type:   kind,
        },
        Security:      ev.Security,
        BlobAlgorithm: "none",
    }, newFname)
...
    req, err := c.NewWriteRequest(m)
...
    res, evs, err := c.DoWriteRequest(req)
...

Functions supported by the client API, all required to respond to changes in gm-data with write-backs of new derived files. For things related to read endpoints:

  • NewRESTResponder/NewKafkaResponder - Listen on /notifications, which is the critical reason for having a client library, to respond to changes being made in gm-data.

  • StreamOf - Get the bytes for an (oid,tstamp), where tstamp is optional, so that you get the latest blob.

  • EventOf - Get the properties for an (oid,tstamp), or latest if tstamp is not included.

  • DerivedOf - Find out what is already derived from this file. This is how you could know that a thumbnail already exists for a file.

  • Self - Discover what we are authenticated as, which is important for troubleshooting.

  • HistoryOf - Every event pertaining to an oid. This is the lifecycle of the inode, across all changes (including name, parent, policy, security labels, etc).

Note that more complex paging options are not being used with these simple client libraries.

For things related to the write endpoint, which are a bit more difficult to write directly against the API for yourself than the read endpoints:

  • AppendTree - Perform a bulk upload of a large directory, where you have the opportunity to set security labels and policies individually

  • Append - A raw append to update an individual file or directory

Example use case:

  • GDPR laws require that if a demand is to remove files "about" an individual, that individual can make this demand.

  • In order to comply, if we have a jpg with attached metadata that says that the individual is named in the file, then we can issue a delete on both files.

  • This is possible because we track the Derived file pointers.

  • The /derived endpoint lets us find all files that point to us with a Derived pointer, so that we can find an entire tree of files that started from a single input file. Example: elasticSearchEntry derivedfrom facesIndex, facesIndex derivedfrom jpg

Last updated

Was this helpful?