Data
Last updated
Was this helpful?
Last updated
Was this helpful?
Grey Matter Data is a platform service for the versioned and encrypted storage of media blobs and assets.
Note that we mark sensitive values with "❗" so that it is clear what must be kept private, versus what is safely made public.
Grey Matter Data uses JavaScript Web Tokens (JWT) for authentication. Each request to Grey Matter Data must include a cookie within the header that is based on the authentication JWT. Grey Matter Data tracks information through Event Objects. These Event Objects capture all changes and reflect the Kafka event queue that supports the system. Each Event Object is associated with each file through Object ID (oid
) parameters. The parameters of the Event Object form relationships between files in the system.
A JWT service, such as jwt-server
, assumes a system has authenticated you via proxy, and it will insert the USER_DN
header. The JWT service will take a redirect argument and a path argument. The path
is the URLs over which the cookie will be sent. The redirect
is an URL in the path. The cookie is written out with name userpolicy
and with HttpOnly
set to true, preventing client scripts from accessing this cookie.
The JWT token includes claims with the following format:
Label: the name that the token will be logged under goes here.
Values: a hashtable from string to lists of strings is used to evaluate the JWT token against an objectPolicy.
Here is an example of a JWT token representing a userPolicy.
Grey Matter Data tracks all changes through JSON Event object. Events represent a portion (limited by user’s security access) of the Kafka messaging queue that supports Grey Matter Data. Any modification to the system will be carried out through the /write endpoint by supplying a single or multiple Events that describes required actions.
Event parameters define the relationships between files in Grey Matter Data. For example, the parentoid
parameter defines folder-to-child relationships. Updates will effectively move an Object from one folder to another. Parameter derived
will point to Object IDs related to the current oid
. For example, the thumbnails that might be derived from an image can be pointed to that image through the derived
Event parameter.
Note: when action: “C” (create / upload) is specified, the system will backfill Object ID when created internally. In this case, you should not specify oid
parameter.
Note: when action: “C” (create / upload) is specified, the parameters below are the bare minimum that must be specified for the create action to complete internally.
Note: when action: “D” (delete) is specified, the system will backfill most of the Object parameters, thus it is only necessary to specify oid
, action
, parentoid
, and objectpolicy
.
Note: when action: “U” (update) is specified, all parameters – except the few that are being updated and tstamp
– must be specified, mimicking the previous Event associated with the Object ID that is being updated.
Note: when action: “U” (update) is specified, all parameters – except the few that are being updated and tstamp
– must be specified, mimicking the previous Event associated with the Object ID that is being updated.
This section introduced several Events Objects that Grey Matter Data uses to track information. Understanding these objects will help you perform the following actions: uploading (action:“C”), moving, renaming, altering (action: “U”), or removing (action: “D”).
Grey Matter Data tracks stored data through unique Object IDs that are assigned on upload of files into the system. Relationships between Object IDs are established through the parentoid parameter of the Event. Creating an update Event with a new parentoid effectively moves an Object to a new folder. Learn more in the /write endpoint section.
When an Event with (param){action: “C”} (create) is sent into the system on upload through /write endpoint, (param){oid:} does not need to be specified. The system will assign it to this Event internally.
This section covers accessing and manipulating data within Grey Matter Data.
We begin with the overarching concepts of information retrieval and information modification. Then we dive deeper into specifics of each API endpoint and code examples.
When starting to use the API, you will most likely direct your first request at the root folder to get initial file listings. You can accomplish this with a GET
request to the /list
endpoint with path of /1 (GET
/list/1/).
The root directory has the well known Object ID (oid
) of 1 by default. This will be the root folder for each user. However due to specific permissions prescribed through authentication JWT, each user will only be able to see and manipulate a subset of folders.
You can extract data from the system in the following three ways, leveraging numerous read endpoints:
As a JSON
Object mimicking internal Events Object through one of the read endpoints
As a raw byte stream through the /stream
endpoint, used to download the Object locally, and
As a raw byte stream within an iFrame that displays security meta data of the Object, through the /show
endpoint, used to view the Object within the browser window.
More information regarding each of those methods can be found in respected endpoint sections (/read
, /stream
, /show
)
More details regarding data modification can be found in the endpoints /write
section.
When authenticating to the API, there is a prioritized set of options. Our JWT is a format that allows for LDAP-like groups. They are signed by our signer that we trust, and have a label
field that has the username or generic name to be logged in audits. It has the values
field which is a map[string][]string
, which is to say a set of multi-valued values; similar to LDAP groups. This is done so that we can write policies as boolean combinations of these attributes. In short, you need a userpolicy
somehow as a prerequisite to make use of this API. The order we look is:
http parameter setuserpolicy
set to a JWT, which we turn into a setcookie and re-forward you back in with this parameter removed. This may be used in setups without a JWT server or an edge proxy.
http parameter userpolicy
set to a JWT. This may be used in setups without a JWT server or an edge proxy.
cookie userpolicy
set to a JWT. This may be used in setups without a JWT server or an edge proxy.
http header userpolicy
set to a JWT, and is set by the edge server, usually using USER_DN
header as input. This is used in conjunction with the JWT filter.
configurable header USER_DN
, which we trust was securely set by the edge server(!!). This can be used to look up a JWT in the JWT server. This must be used with an edge proxy with inheaders enabled.
anonymous.
This section covers multiple examples of http request configurations and explains the results they return.
All requests in this section can be accomplished by modifying javaScript code presented below.
Request Method
Endpoint URL
Request Body
Credentials Include
Description
GET
/list/1
None
True
Get a listing of the root Object ID (oid
) of 1, choosing a path / relative to it. / symbol at the end of listing path URL is mandatory. Each folder within /1 root folder will have its own unique security policy thus limiting access to groups of users. Each user navigating to /1 folder will see a unique folder landscape tailored by their security credentials.
GET
/list/1/Project1Folder
None
True
This returns listings for Project1Folder, a folder that is child of root folder. This folder may have unique security settings rendering it invisible to groups of users.
GET
/list/42/
None
True
If the Project1Folder dir had an Object ID (oid
) of 42, then this would be an equivalent URL to list it. Note how we include / symbol at the end of the path.
GET
/props/42/
None
True
This URL would produce the metadata about the Project1Folder directory. Once we have found an Object that we are looking for, we can perform operations on it.
GET
/stream/900/
None
True
This will produce a bytestream of an Object with Object ID (oid
) of 900. Presume this Object’s name property is resume.pdf.
GET
/stream/42/resume.pdf
None
True
The metadata of Object ID with name resume.pdf. Returns an Event Object with associated properties.
GET
/props/900/
None
True
The metadata of Object ID with name resume.pdf. Returns an Event Object with associated properties.
GET
/history/900/
None
True
A list of Event Objects for every state of resume.pdf, ordered by time stamp of the Event.
GET
/show/900/
None
True
Is a convenience wrapper around stream to show an html security banner with file’s security metadata around the byte stream.
Above GET
requests can be dispatched separately or in bulk using POST
request to the /read endpoint. This lets you minimize back-and-forth HTTP
traffic to improve performance in low bandwidth situations.
Request Method
Endpoint URL
Request Body
Credentials Include
Description
POST
/read
stringified([{URL:”/list/900/“}, {URL:”/list/42/“}])
True
This endpoints requires a string encoded array in the body of the request in the following form: [{URL:”/list/900/“}, {URL:”/list/42/“}]. A detailed example can be found in the Read endpoint section. This call will yield an array with data identical to the same calls performed individually using GET requests. In this specific example, we list two directories simultaneously. This allows for quick file system exploration with significantly fewer requests.
POST
/read
stringified([{URL:”/history/900/?count=10“}, {URL:”/history/42/?count=10“}])
True
Simultaneously getting last 10 revisions of 2 separate Object IDs
POST
/read
stringified([{URL:”/derived/900/“}, {URL:”/derived/42“}])
True
Simultaneously getting derived file meta data from 2 separate Object IDs.
To get data into the system, a request with attached multi-part/form-data needs to be performed to /write
endpoint. The transaction is an array of individual JSON Event Objects, in the order in which they need to be applied in the database (optionally including file objects in BLOB format appended to the form data when performing an upload). Detailed examples can be found in the /write
endpoint section.
Request Method
Endpoint URL
Request Body
Credentials Include
Description
POST
/write
form data [{'meta':[Event1Object]}]
True
This endpoints requires a form data with appended array of Event Objects under ‘meta’ property, specifying a modification to the system. Detailed example can be found in the /write
endpoint section.
POST
/write
form data [{'meta':[Event1Object, Event2Object]}]
True
This endpoint can accept multiple Event objects at the same time.
POST
/write
form data [{'meta':[Event1Object, Event2Object]}, {'blob':[BLOB1]}, {'blob':[BLOB2]}]
True
This endpoint can accept multiple Event objects at the same time.
HTTP Error Code
Common Causes
400
Bad Request code is most often caused when using /write
endpoint and Event Object in form data is malformed.
403
Forbidden code is most often caused when JWT authentication token doesn't match Object's privileges.
404
Not Found code is most often caused when Object ID (oid
) that is specified in the request is incorrect
There is a command-line interface to support bulk, and automated scenarios. This should help ease the implementation burden for some very common tasks:
Upload a full tree of data, such as a full React application
Create initial directories for the uploads to go into
Run this from desktops or servers. There are a few platforms available, all for Intel architecture:
The CLI commands all need to be able to connect in an authenticated manner, so there are environment variables associated with connecting. Here is an example of connecting to a PKI enabled setup. The environment variables only need to be set once in a script. After environment variables are setup:
The tool ./gmdatatool.sh
is a special-case use of the go package github.com/greymatter-io/gm-data/client
. The client is based around two important ideas:
Listening for changes in gm-data, and invoking callbacks when they happen.
Providing an API to respond to changes. Example uses:
Statically generated thumbnails
Run AWS Rekognition to upload derived files on images, such as object-labelling.
The written back files are json, and they point to the image that they are derived from
Responding to changes may happen through REST or Kafka.
There is a responder, with REST or Kafka constructors. The REST constructor filters out information based on objectPolicy (ie: it runs as a real user). The Kafka constructor runs on a privileged, unfiltered view of all events that happen on gm-data. Generally, the Kafka view is appropriate for back-end processes. The REST constructor is usable from front-end (ie: not originating from within Fabric itself, possibly even from web browsers calling the /notifications
endpoint), or back-end.
This responder will poll every second for new information, and get up to 1000 events at a time. The callback allows us to inspect events with our code. Generally, when we see something interesting in the event (ev
), we call different parts of the API:
We may then go do something outside the scope of gm-data, such as turn a blob into a json file (ie: submit a jpg and get back a json description of it). Note that when we are doing listen and write-back like this, we typically end up setting Derived
fields, so that we can track the lineage of why the file exists, and what created it. We can correlate a jpg of a face with a json about it, so that we can delete them both if we are asked to delete the file.
Functions supported by the client API, all required to respond to changes in gm-data with write-backs of new derived files. For things related to read endpoints:
NewRESTResponder/NewKafkaResponder - Listen on /notifications
, which is the critical reason for having a client library, to respond to changes being made in gm-data.
StreamOf - Get the bytes for an (oid,tstamp)
, where tstamp is optional, so that you get the latest blob.
EventOf - Get the properties for an (oid,tstamp)
, or latest if tstamp is not included.
DerivedOf - Find out what is already derived from this file. This is how you could know that a thumbnail already exists for a file.
Self - Discover what we are authenticated as, which is important for troubleshooting.
HistoryOf - Every event pertaining to an oid
. This is the lifecycle of the inode, across all changes (including name, parent, policy, security labels, etc).
Note that more complex paging options are not being used with these simple client libraries.
For things related to the write endpoint, which are a bit more difficult to write directly against the API for yourself than the read endpoints:
AppendTree - Perform a bulk upload of a large directory, where you have the opportunity to set security labels and policies individually
Append - A raw append to update an individual file or directory
Example use case:
GDPR laws require that if a demand is to remove files "about" an individual, that individual can make this demand.
In order to comply, if we have a jpg with attached metadata that says that the individual is named in the file, then we can issue a delete on both files.
This is possible because we track the Derived
file pointers.
The /derived
endpoint lets us find all files that point to us with a Derived
pointer, so that we can find an entire tree of files that started from a single input file. Example: elasticSearchEntry derivedfrom facesIndex, facesIndex derivedfrom jpg
The gm-data service creates a binary called gmdatax.linux
, that is configured entirely by environment variables (to avoid a requirement to mount files). This binary however is packaged with some other files.
./runforever
- a shell script that keeps ./gmdatax.linux
in a re-start loop to handle non-intentional crashes of the binary. This allows us to catch things like array out of bounds
, nil pointer dereference
, or catastrophic resource exhaustion such as out of file handles
. It is these latter cases that drive the decision to allow the binary to die.
./gmdatax.linux
- the actual gmdata binary, that reads in environment variables.
./VERSION
- the version of this service
./static/
- a bundle of runtime API user documentation, and test user interface. this directory is served literally out of gm-data under the URL /static/
./certs/
- a directory that the binary can write certificates into on startup. the certificates originate from environment variables passed in as single-line base64 encoding full pem
files.
./logs/
- a place to write logs (in non-default cases), and may be mounted over to keep the root partition from running out of space.
gm-data will make every possible attempt to look at your configuration and immediately crash with a detailed explanation of what to actually do about it. This includes looking up hostnames in DNS to verify that they exist. Always look in the log files for gm-data if something does not seem right on startup. But it cannot detect inconsistency issues at a higher level, such as one service offering a cert that is then trusted by a service that will try to connect to it. That would require analyzing a larger set of environment variables that are destined for multiple services.
MASTERKEY
❗ is mandatory. This is the key that is used to encrypt data.
JWT_PUB
is the single-line base64 encode of the signing key that the gm-data server trusts to sign JWT tokens. This is a mandatory parameter. It is not an X509 certificate. It is an actual Elliptic Curve key that is suitable for ES512
in the JWT standard.
FILE_BUCKET
is mandatory (aka: AWS_S3_BUCKET
). This says where we write gm-data ciphertext out to AWS.
FILE_PARTITION
is mandatory (aka: AWS_S3_PARTITION
). This should be set to a value that is unique to a set of replicated Fabric clusters. It is literally a subdirectory in FILE_BUCKET
. This exists so that we don't need to create lots of buckets constantly, yet can still distinguish which bucket data belongs to which installation.
AWS_REGION
is required if USES3=true
.
AWS_S3_ENDPOINT
is only required in government setups that need to point to a different hostname for S3.
AWS_ACCESS_KEY_ID
and AWS_SECRET_ACCESS_KEY
❗ may be set to give AWS credentials in the case where IAM roles are not used for the EC2 instance. AWS_SECRET_ACCESS_KEY
❗ is a secret, obviously.
When you disable S3 use like
USES3=false
, the bucket and partition are still used. The directory./buckets/${FILE_BUCKET}/${FILE_PARTITION}
should exist and be writable by the gm-data process./gmdatax.linux
.
The JWT_PUB
is the public part of an elliptic curve key. The private part of it is PRIVATE_KEY
❗ for the JWT server. The parameters for use with the JWT libraries are rather specific, due to the curve name secp521r1
. This is how we generate our keypairs, which is done specifically for gm-jwt-security
to get a private key for signing (file jwtES512.key
❗), and then the public key derived from that (jwtES512.key.pub
) and set for gm-data as JWT_PUB
.
Prefix patterns. When gm-data needs to make reference to another service, these are relevant environment variables:
CLIENT_PREFIX
is the URL that the gateway is mapping gm-data service to. This is done so that we can send back links that resolve properly in html files. We do this because we cannot hardcode even our own service name, and also cannot correctly give a relative path. Example: /services/gmdatax/latest
CLIENT_JWT_PREFIX
is the URL that the gateway is mapping our peer service gm-jwt-security to. This is done so that we can send back links that resolve properly in html files. Example: /services/jwt-server/1.0
, or /services/jwt-server-gov/1.0
.
We have explicit dependencies on these things:
a JWT token issuer, that has a proper sidecar, and is reachable through the edge
a Mongo database, which is not mounted into the Fabric framework; so is not reached via a sidecar, or through the edge.
Kafka, which is not mounted into the Fabric framework; so is not reached via a sidecar, or through the edge.
Services that use TLS will end up creating a large number of environment variables. We follow a principle of passing in pem files as a single line of base64 of the original pem file. That means that we create such files as environment variables on the host that is preparing the deployment. Here is an example of setting up the trust for our Mongo dependency:
When TLS connections to peer services are involved, this pattern in name suffixes arises:
ADDRESS
(or HOST
) - ip or hostname of the peer.
PORT
- port for the peer.
USE_TLS
- use TLS
CERT
- a base64 single-line encode of the pem cert (which also happens to be multi-line base64).
KEY
❗ - base64 single-line encode of the pem key (which is a multi-line base64). This is also a secret.
TRUST
- this is similar to CERT
. It may encode a concatenated list of pem files for certs.
CN
- the ServerName expected. This is usually the same as the CN in the remote cert, but may also be an SNI name that matches a wildcard in the CN. If this is not set, then we will contact the server to try to grab the CN out of the remote certificate.
With that being said, these variables are grouped together.
MONGO related connect info
MONGOHOST
- slightly violates our pattern. This can be a list of host:port pairs, like mongodata:27017,mongodata:27017
. This is because in a clustered setting, connections are not made to individual machines, but to entire clusters. The PORT
part is already taken care of.
MONGODB
- is not strictly part of TLS, but we need to know the database that we are connecting to.
MONGO_USE_TLS
- says whether to use the TLS variables to make a TLS connection.
MONGO_CERT
- is the client PKI cert that we identify ourselves with.
MONGO_KEY
❗ - is the key that goes with MONGO_CERT
.
MONGO_TRUST
- is the trust file to connect to Mongo servers.
MONGO_CN
- is SNI name for the mongo cert, the manually set serverName expected. If this is not set, then we will contact the server to try to grab the CN out of the remote certificate.
MONGO_INITDB_ROOT_USERNAME
- is the username we will use (not necessarily related to the root username however).
MONGO_INITDB_ROOT_PASSWORD
- the password for MONGO_INITDB_ROOT_USERNAME
. This is a secret of course.
GMDATA TLS info, for our own service. This generally only happens when the sidecar egress is mTLS.
GMDATA_USE_TLS
- Says whether to use TLS. This will need to be coordinated with how our sidecar is setup. Our sidecar EGRESS will need to be a client of this TLS connection.
GMDATA_CERT
- The identity cert of gmdata that will be presented to sidecar.
GMDATA_KEY
❗ - The key that goes with GMDATA_CERT
GMDATA_TRUST
- The sidecar will need to present a cert that is signed by something in this TRUST
CLIENT_JWT_ENDPOINT prefixed environment variables are relevant to gmdata looking up userpolicyid
(a random key to find a JWT) to get a userpolicy
(an actual JWT token). This is only needed in cases where we have a jwt server indirectly via userpolicyid
.
CLIENT_JWT_ENDPOINT_ADDRESS
- is the hostname of the JWT server
CLIENT_JWT_ENDPOINT_PORT
- is the port of the JWT server
CLIENT_JWT_ENDPOINT_USE_TLS
CLIENT_JWT_ENDPOINT_CERT
CLIENT_JWT_ENDPOINT_KEY
❗
CLIENT_JWT_ENDPOINT_CN
- Expected SNI name
CLIENT_JWT_ENDPOINT_TRUST
CLIENT_JWT_ENDPOINT_PREFIX
- if we connect directly or to the sidecar, then this is just left at empty string "". But if we go through the edge, which is an unlikely case, this ends up needing to be set to the same value as CLIENT_JWT_PREFIX
.
JWT_API_KEY
- is a base64 password that the JWT server will require to accept connections to resolve access codes for JWT tokens (userpolicyid
) to actual JWT tokens (userpolicy
).
Note that for the JWT server, we are trying to form a connection URL like:
Internally, gm-data sees a userpolicyid
header, and connects to that URL to try to get a userpolicy
object, which may be too large to have fit into an http header. Notice that the inclusion of CLIENT_JWT_ENDPOINT_PREFIX
exists only to go through the edge instead of the sidecar. In the normal case CLIENT_JWT_ENDPOINT_PREFIX=""
, because we want to talk to the sidecar.
Examples:
Talk to our own local sidecar in plaintext to reach JWT (preferred):
CLIENT_JWT_ENDPOINT_PREFIX=/services/jwt-server/latest
CLIENT_JWT_ENDPOINT_ADDRESS=gmdata-proxy
CLIENT_JWT_ENDPOINT_PORT=8080
Talk to a JWT sidecar directly (not preferred):
CLIENT_JWT_ENDPOINT_PREFIX=
CLIENT_JWT_ENDPOINT_ADDRESS=jwt-server-proxy
CLIENT_JWT_ENDPOINT_PORT=8080
CLIENT_JWT_ENDPOINT_USE_TLS
may require connecting to a sidecar-issued cert, that may not exist at the time gm-data launches. So, note that usingGMDATA_USE_TLS
in the mesh may be complicated by this fact.
DONT_PANIC
- is an advanced parameter that says to only WARN, but do not CRASH when inconsistent environment variables are detected. If you run with this setting, you run the risk of creating a setup that we cannot support. Sometimes you need to temporarily ignore known problems. So, this should be disabled as soon as possible if it is ever used.
LESS_CHATTY_INFO
- by default, we like less chatty logs. If you want a lot more logging information that includes the begin and end of sessions in which there were no problems, then you can set this to false
.
GMDATAX_SESSION_MAX
- is an admission control value. This imposes a limit on the number of outstanding requests gm-data will allow to be concurrently serviced. It is literally a maximum population at which gm-data just issues 503
to tell the client to get out of line, and come back later. It exists because if we run out of filehandles, the server will become unstable and crash in an irregular manner. If this server runs out of filehandles, than GMDATAX_SESSION_MAX
should be lowered to a value that causes us to stop running out of filehandles. It may need to be raised if we get 503
errors that actually originate from gm-data itself. Our proxy may also issue 503
in the case of admission control, which complicated determining which one ran out. It is more likely that Envoy will run out of filehandles before gm-data will, because the front-end is dealing with a lot of services concurrently.
GMDATA_NAMESPACE
Typical value is world
. In order to avoid having to create root access tokens to get the system bootstrapped, We allow for the creation of a self-service directory. If this value is /world
then the home directory can be created here, on the condition that the directory is named after the field mentioned GMDATA_NAMESPACE_USERFIELD
, which is typically email
. For example: /world
is created empty on init of gm-data. User uses static/ui
to create directory /world/rob.johnson@email.com
, which is only allowed because he came in with a JWT token matching {values: {email: ["rob.johnson@email.com"]}}
.
GMDATA_NAMESPACE_USERFIELD
Typical value is email
.
If an environment variable you are looking for was not mentioned here, it's likely something that is not something that you should need to change in a normal setup. For more detail of the auto-generated documentation on environment variables used in gm-data, see:
In order to point to a Kafka, in the simplest plaintext case, set env vars relating to Kafka. At a minimum, point to the brokers and name the topics.
Name
Default
Description
Example
Type
DISABLE_LOOKUPS
false
don't dns check env vars representing hosts
true
DONT_PANIC
false
disable panic when environment looks mis-configured
true
LESS_CHATTY_INFO
true
chatty info logs will write something to the log when a transaction begins, when there are no problems
false
CLIENT_JWT_PREFIX
/services/gm-jwt-security/1.0
endpoint prefix for primary jwt service to resolve pointers to JWT tokens
/services/gm-jwt-security-gov/1.0
CLIENT_JWT_ENDPOINT_ADDRESS
ip of jwt server in the network
a hostname
CLIENT_JWT_ENDPOINT_PORT
port of jwt server in the network
8443
an unsigned int
CLIENT_JWT_ENDPOINT_CERT
JWT server client cert
base64 line pem written to certs/jwt.cert.pem
CLIENT_JWT_ENDPOINT_KEY
❗JWT server client key
base64 line pem written to certs/jwt.key.pem
CLIENT_JWT_ENDPOINT_TRUST
JWT server trust
base64 line pem written to certs/jwt.trust.pem
CLIENT_JWT_ENDPOINT_PREFIX
prefix to reach the CLIENT_JWT_PREFIX when proxied
localhost
CLIENT_JWT_ENDPOINT_USE_TLS
false
use tls to connect to jwt endpoint
true
CLIENT_JWT_ENDPOINT_CN
the server name expected for this cert
GMDATA_FABRIC_CLUSTER
default
the name of this fabric cluster
us-east
ZEROLOG_LEVEL
WARN
logging level: INFO, DEBUG, WARN, ERR
INFO
MASTERKEY
❗❗Master key for the encrypted content
som3r9doMg1bberish
master key for the data
AWS_REGION
Bucket location
us-east-1
some non-whitespace token
AWS_S3_BUCKET
Bucket name, overridden by FILE_BUCKET
AWS_S3_BUCKET= must match a token without or special chars
AWS_S3_PARTITION
Subdirectory within the S3 bucket, overridden by FILE_PARTITION
username
FILE_BUCKET
Bucket name
FILE_BUCKET= must match a token without whitespace or special chars
FILE_PARTITION
Subdirectory within the file bucket
username
AWS_S3_ENDPOINT
Bucket host override
s3.region.aws.com
a hostname
AWS_REKOGNITION_ENDPOINT
Bucket host override
rek.region.aws.com
a hostname
AWS_ACCESS_KEY_ID
Set if not using IAM roles for the machine
AKAI...
iam roles used
AWS_SECRET_ACCESS_KEY
❗Set if not using IAM roles for the machine
AEFE...
iam roles used
USES3
true
Use S3
false
S3 bucket setup
S3_TASKS
512
Max number of concurrent S3 tasks
64
an unsigned int
KAFKA_PEERS
Kafka nodes to talk to directly. A comma-delimited list of host:port pairs
localhost:9092
a comma-delimited list of host:port
KAFKA_TOPIC_UPDATE
Kafka topic for update events
gmdu
some non-whitespace token
KAFKA_TOPIC_READ
Kafka topic for read events
gmdr
some non-whitespace token
KAFKA_TOPIC_ERROR
Kafka topic for errors
gmde
some non-whitespace token
KAFKA_CONSUMER_GROUP
test1
Kafka consumer group id
imageconverters
some non-whitespace token
KAFKA_CERT
id cert
single line base64 of pem
KAFKA_CERT is expecting a single-line base64 encoded string
KAFKA_KEY
id key
single line base64 of pem
KAFKA_KEY is expecting a single-line base64 encoded string
KAFKA_TRUST
id trust
single line base64 of pem
KAFKA_TRUST is expecting a single-line base64 encoded string
KAFKA_USE_TLS
false
use tls for kafka directly
true
KAFKA_CN
false
cn for kafka
true
TEST_JWT_PRIV
❗❗test only! a base64 encoded single line of the private key for internal signing during tests
base64 encoded line
JWT_PUB
the single-line base64 encode of the public key of jwt tokens we accept
export JWT_PUB=cat jwtRS256.key.pub \| base64 -w 0
JWT_PUB is expecting a single-line base64 encoded string
JWT_PUB_1
the single-line base64 encode of the public key of jwt tokens we accept
export JWT_PUB=cat jwtRS256.key.pub \| base64 -w 0
JWT_PUB_1 is expecting a single-line base64 encoded string
JWT_PUB_2
the single-line base64 encode of the public key of jwt tokens we accept
export JWT_PUB=cat jwtRS256.key.pub \| base64 -w 0
JWT_PUB_2 is expecting a single-line base64 encoded string
JWT_PUB_3
the single-line base64 encode of the public key of jwt tokens we accept
export JWT_PUB=cat jwtRS256.key.pub \| base64 -w 0
JWT_PUB_3 is expecting a single-line base64 encoded string
JWT_PUB_4
the single-line base64 encode of the public key of jwt tokens we accept
export JWT_PUB=cat jwtRS256.key.pub \| base64 -w 0
JWT_PUB_4 is expecting a single-line base64 encoded string
JWT_NOT_BEFORE_SKEW_SECONDS
86400
seconds that not-before is in the past, to handle mutual clock skews
60
an unsigned int
MONGOHOST_MASTER
Mongo host ip:port that we replicate with
m1:27017,m2:27017
a comma-delimited list of host:port
MONGODB_MASTER
Mongo database we replicate with
gmdatadev
some non-whitespace token
MONGOHOST
Mongo host ip:port
m1:27017,m2:27017
a comma-delimited list of host:port
MONGODB
gmdatax
Mongo database
gmdatadev
some non-whitespace token
MONGO_CERT
Mongo TLS cert base64
cat ./certs/server.cert.pem | base64 -w 0
MONGO_CERT is expecting a single-line base64 encoded string
MONGO_KEY
❗Mongo TLS cert key base64
cat ./certs/server.key.pem | base64 -w 0
MONGO_KEY is expecting a single-line base64 encoded string
MONGO_TRUST
Mongo TLS trust base64
cat ./certs/server.trust.pem | base64 -w 0
MONGO_TRUST is expecting a single-line base64 encoded string
MONGO_CN
Mongo SNI name
MONGO_SOURCE
Mongo login source
$external
MONGO_MECHANISM
Mongo login mechanism
MONGODB-X509
MONGO_USE_TLS
false
Mongo use TLS
true
MONGO_INITDB_ROOT_USERNAME
MongoDB user id
mongoadmin
MONGO_INITDB_ROOT_PASSWORD
❗MongoDB password
S0m3Pass
TEST_LOAD_ITERATIONS
number of iterations for load test
10000
an unsigned int
GMDATA_NAMESPACE
A Directory in the root that lets you create content as yourself
GMDATA_NAMESPACE_USERFIELD
The field that is that matches up with the directory you can create
GMDATA_NAMESPACE_TEMPLATE
(if (contains %s "%s") (yield-all) (yield R X))
The default template to create a user implicitly
DELETE_EXPIRED
false
Actually remove expired entries periodically to comply with privacy laws
DELETE_EXPIRED= should be true or false
DELETE_EXPIRED_POLL_SECONDS
600
Number of seconds to poll for expired data
3600
an unsigned int
NOTIFICATION_CACHE_SIZE
1000
Number of items to cache when watching notifications on an oid
100
an unsigned int
MIMETYPES_OVERRIDE
Supply an alternate mime.types
./mime.types
LISTING_DEBUG
false
Turn on debug for listing package
true
BIND_ADDRESS
0.0.0.0
bind address for port
127.0.0.1
a hostname
BIND_PORT
8181
bind port
9123
an unsigned int
PRETTY_PRINT
true
pretty print returning json by default. set this to false in production, as it makes json larger.
false
HTTP_TRANSPORT_CANCEL_HOURS
4
Hours before http call is cancelled
24
an unsigned int
USE_PPROF_CPU
true
CPU profiling in pprof
false
USE_PPROF_MEM
true
mem profiling in pprof
false
HTTP_CACHE_SECONDS
10
http default cache in seconds
60
an unsigned int
TRACE_LOG
write a trace to file name
/logs/trace.out
REKOGNITION_FACE_INDEX
Set a face index for AWS Rekognition
hackathon
LOG_OPEN_FILE_HANDLES
true
log open file handles to look for leaks
false
GMDATAX_CATCH_PANIC
false
catch panics rather than restarting gmdatax
true
GMDATAX_SESSION_MAX
4096
max http sessions in progress
10000
an unsigned int
JWT_API_KEY
jwt api key
a password
JWT_API_KEY is expecting a single-line base64 encoded string
NAMED_BANNER
true
include name in banner
false
GMDATA_CERT
id cert
single line base64 of pem
GMDATA_CERT is expecting a single-line base64 encoded string
GMDATA_KEY
id key
single line base64 of pem
GMDATA_KEY is expecting a single-line base64 encoded string
GMDATA_TRUST
id trust
single line base64 of pem
GMDATA_TRUST is expecting a single-line base64 encoded string
GMDATA_USE_TLS
false
use tls for gmdata directly
true
GMDATA_REQUIRE_CLIENT_CERT
true
demand a client cert
false
GMDATA_AUTHENTICATION_HEADER
USER_DN
a header that is TRUSTED to contain an authenticated user id. disable with value '-'.
-
POLICY_CACHE_LIFETIME
60
amount of time an object lives in objectpolicy cache
30
an unsigned int
For a conceptual insight into why gm-data is designed the way that it is:
The only way to modify the content of Grey Matter Data is through the /write
endpoint. When request is sent to /write
endpoint, the request body has to carry with an appended {'meta': [Event ]} object.
OSX -
Windows -
Linux -
README.html