CLI
Last updated
Was this helpful?
Last updated
Was this helpful?
A command-line interface called gmdatatool.sh
supports bulk and automated scenarios of using gm-data. This eases the implementation burden for some very common tasks, such as:
Upload a full tree of data, such as a full React application
Create initial directories for the uploads to go into
Run from desktops or servers
Go to the and selected greymatter -> gm-data to download the gm-data bundle consistent with the deployed version of gm-data with which you will be communicating.
Unpackage the .tar bundle and give the gmdatatool
binary permissions, corresponding with your operating system.
The CLI commands all need to be able to connect in an authenticated manner, so there are environment variables associated with connecting. Here is an example of connecting to a PKI enabled setup. The environment variables only need to be set once in a script. After environment variables are setup, create the below script locally and name it gmdatatool.sh
:
Read root directory:
By default there isn't anything in gm-data. We can create a quickstart@deciphernow directory using:
Now if we see our files consist of one directory:
Let's upload a file to the directory /home/quickstart@deciphernow.com/newdir
The tool ./gmdatatool.sh
is a special-case use of the go package github.com/greymatter-io/gm-data/client
. The client is based around two important ideas:
Listening for changes in gm-data, and invoking callbacks when they happen.
Providing an API to respond to changes. Example uses:
Statically generated thumbnails
Run AWS Rekognition to upload derived files on images, such as object-labelling.
The written back files are json, and they point to the image that they are derived from
Responding to changes may happen through REST or Kafka.
There is a responder, with REST or Kafka constructors. The REST constructor filters out information based on objectPolicy (ie: it runs as a real user). The Kafka constructor runs on a privileged, unfiltered view of all events that happen on gm-data. Generally, the Kafka view is appropriate for back-end processes. The REST constructor is usable from front-end (ie: not originating from within Fabric itself, possibly even from web browsers calling the /notifications
endpoint), or back-end.
This responder will poll every second for new information, and get up to 1000 events at a time. The callback allows us to inspect events with our code. Generally, when we see something interesting in the event (ev
), we call different parts of the API:
We may then go do something outside the scope of gm-data, such as turn a blob into a json file (ie: submit a jpg and get back a json description of it). Note that when we are doing listen and write-back like this, we typically end up setting Derived
fields, so that we can track the lineage of why the file exists, and what created it. We can correlate a jpg of a face with a json about it, so that we can delete them both if we are asked to delete the file.
Functions supported by the client API, all required to respond to changes in gm-data with write-backs of new derived files. For things related to read endpoints:
NewRESTResponder/NewKafkaResponder - Listen on /notifications
, which is the critical reason for having a client library, to respond to changes being made in gm-data.
StreamOf - Get the bytes for an (oid,tstamp)
, where tstamp is optional, so that you get the latest blob.
EventOf - Get the properties for an (oid,tstamp)
, or latest if tstamp is not included.
DerivedOf - Find out what is already derived from this file. This is how you could know that a thumbnail already exists for a file.
Self - Discover what we are authenticated as, which is important for troubleshooting.
HistoryOf - Every event pertaining to an oid
. This is the lifecycle of the inode, across all changes (including name, parent, policy, security labels, etc).
Note that more complex paging options are not being used with these simple client libraries.
For things related to the write endpoint, which are a bit more difficult to write directly against the API for yourself than the read endpoints:
AppendTree - Perform a bulk upload of a large directory, where you have the opportunity to set security labels and policies individually
Append - A raw append to update an individual file or directory
Example use case:
GDPR laws require that if a demand is to remove files "about" an individual, that individual can make this demand.
In order to comply, if we have a jpg with attached metadata that says that the individual is named in the file, then we can issue a delete on both files.
This is possible because we track the Derived
file pointers.
The /derived
endpoint lets us find all files that point to us with a Derived
pointer, so that we can find an entire tree of files that started from a single input file. Example: elasticSearchEntry derivedfrom facesIndex, facesIndex derivedfrom jpg