# Using NiFi with Grey Matter Data

## Introduction

[Apache NiFi](https://nifi.apache.org/docs/nifi-docs/html/overview.html) is a data flow automation tool that can be used to migrate, collect, transform, and manipulate enterprise-level data; solving many historically challenging issues within the field of enterprise data architecture. Grey Matter Data is an advanced data store API and a pillar of Grey Matter’s secure data platform capability. Given the functionality and ubiquity of NiFi, we’ve created a set of custom NiFi Processors and Templates to turbocharge migrations from murky "Data Lakes" towards a distributed "Data Mesh" powered by Grey Matter. Templates and Processors form the basis of the Grey Matter SDK and empower rapid interoperability between Grey Matter Data and your data requirements.

This guide will provide a comprehensive walkthrough on how a user can connect a local NiFi instance with a remote Grey Matter Data instance deployed with a Grey Matter service mesh.

If you are unfamiliar with NiFi or Grey Matter Data, please review the respective documentation.

* [NiFi documentation](https://nifi.apache.org/docs.html)
* [Grey Matter Data Overview](https://greymatter.gitbook.io/grey-matter-documentation/1.3/usage/platform-services/data)

This guide has been tested on MacOS but should work on Linux with minimal changes.

## Prerequisites

* NiFi 1.11.4 - 1.12.1
* Java 8
* Grey Matter Data 1.1.1 - 1.1.5
* Grey Matter Data NiFi SDK 1.0.2

## Install NiFi

If you haven't already done so, follow the [NiFi installation instructions](https://nifi.apache.org/docs/nifi-docs/html/getting-started.html#downloading-and-installing-nifi).

## Download the Grey Matter NiFi SDK

1. Download the SDK NAR

   ```bash
    curl -SL --output nifi-data-nar-1.0.2.nar https://github.com/greymatter-io/nifi-sdk/releases/download/v1.0.2/nifi-data-nar-1.0.2.nar
   ```
2. Download the Templates

   ```bash
    curl -SL https://github.com/greymatter-io/nifi-sdk/releases/download/v1.0.0/templates.tar.gz | tar -xz
   ```
3. Copy the .NAR file to your NiFi lib folder, found using `nifi status`

   ```bash
    cp /path/to/nifi-data-nar-1.0.2.nar /path/to/nifi/lib
   ```
4. Start NiFi

   ```bash
    nifi.sh start
   ```

## Importing Templates

1. Open a web browser to your NiFi UI e.g. <http://localhost:8080/nifi>
2. On the left-hand side of the NiFi Flow UI, locate the Operate Card. In the row of gray icons, click on *Upload Template*.
3. A pop-up should appear. Click on the magnifying glass to bring up your computer's file browser.
4. Navigate to the folder where you unpacked the templates archive. Select one and click upload.
5. NiFi should tell you the operation was successful.
6. Repeat steps 1–3 for the remaining templates.

To check if the processors were loaded properly, you can either click and drag a Grey Matter Data tagged Processor from the top toolbar, or just click and drag one of the imported Templates. Since the next section will involve configuring one such template, go ahead and add the *File System to Grey Matter Data (Static Permissions)* template to the NiFi editor. If the installation was successful, a series of Processors cards should appear on the screen.

Congratulations, at this point, you have full access to the custom Grey Matter Data NiFi Processors and Templates.

{% hint style="info" %}
If you get an error about missing Processors, check to make sure you copied the correct NAR file into the correct directory. Ensure that you are running the correct version of NiFi. If NiFi was running while you performed this process, restart it and try again.
{% endhint %}

## Configuring your First Template

Since NiFi is a data flow tool and Grey Matter Data is a data store API, we will cover the basic use case of transferring FlowFiles (data) in the form of a system file to the remote Grey Matter Data instance. We will use the SDK's *File System to Grey Matter Data (Static Permissions)* template since it has this flow created already. We just need to configure the correct Processor attributes. In the end, the flow will automatically watch for files in a folder, retrieve them, and upload them into Grey Matter Data with their necessary object policies and folder locations.

If you are not continuing from the previous step, add the template to the NiFi Editor by clicking and dragging the template icon from the top toolbar, then selecting *File System to Grey Matter Data (Static Permissions)*. A series of Processor cards should appear.

There are only four Processors which require explicit configuration:

* List Files in File System
* Build Folder Hierarchy
* Prepare Request for Grey Matter Data
* Send to Grey Matter Data

For a more in-depth description of the key configuration for each Processors, see [here](https://github.com/greymatter-io/nifi-sdk/blob/v1.0.2/doc/flows/GM_Data_to_FileSystem.md).

To open a Processor's Details Menu, double-click on its card. A Processor must be stopped to edit its properties.

### List Files In File System

#### Change `Input Directory`

Change the field `Input Directory` to the absolute path of the folder which contains the data you want uploaded.

### Build Folder Hierarchy

This Processor works similarly to the `mkdir -p` command, building a nested folder sequence.

#### Change `Remote URL`

Change this field to the root URL of the Grey Matter Data instance running behind the mesh. This will typically look something like: `https://mesh.domain.com/services/{grey-matter-data-host-and-path}/{grey-matter-data-version}`

Two important things to note:

* The root URL is not the same path that serves the API Explorer.
* The root URL should not have a trailing slash.

#### Remove `USER_DN`

Remove the `USER_DN` field at the bottom. The edge proxy will handle this for us and setting it explicitly will likely break authorization.

{% hint style="info" %}
If you are connecting directly to Grey Matter Data (bypassing the mesh or without a mesh) then you will need to set this. DNs are case-sensitive and order-sensitive.
{% endhint %}

#### Set `Object Policy`

Data’s Object Policies are powerful constructs used to enforce security permissions. These will ideally change based upon your organizational and technical needs. For more information on this subject, refer to [Grey Matter Data Object Policies](https://greymatter.gitbook.io/grey-matter-documentation/1.3/usage/platform-services/data/internals/auth).

For simplicity we'll "disable" permissions by giving anyone full access to the file(s).

```javascript
{
"requirements": {
      "f": "yield-all"
    }
}
```

#### Set `Intermediate Folder Prefix`

Grey Matter Data will create a folder path equal to the value of the field, and place every uploaded file within this prefix path. For example, setting the folder prefix to `myfolder/excel` will cause Grey Matter Data to create that path and upload all files and folders found in the Input Directory into `myfolder/excel`.

#### Set `SSL Context Service`

In order to authenticate yourself to the mesh and to authorize yourself to Grey Matter Data, you need to link a SSLContextService to your PKI certificates. You should have been given certificates when you applied for access to your Grey Matter mesh instance. If you do not have certs, please perform the necessary steps at your organization to obtain them.

To properly configure the template to use TLS, we need two files:

* Your public-private key pair as a signed certificate (called a keystore in Java)
* A list of server CA certs (called the trust store in Java).

Your personal cert is probably in a pkcs12 format (.p12 or .pfx), or split between a .crt and .key We have found that the JKS format works better. Just to be safe, back up your certs before converting them.

1. If your cert is not in the pkcs12 format (do not include brackets), run this to convert it:

   ```bash
    openssl pkcs12 -export -inkey /path/to/key/file -in /path/to/crt/file -out /path/for/new/p12.p12 -password pass:{password of the cert}
   ```
2. To convert pkcs12 to JKS:

   ```bash
    keytool -importkeystore -noprompt -srckeystore /path/to/p12.p12 -srcstoretype pkcs12 -destkeystore /path/for/new/jks.jks -storepass {password of new jks} -srcstorepass {password of p12}
   ```
3. Now we need to establish the trust store, so NiFi can accept the server's certificate. Download the certificate from the Grey Matter Edge proxy:

   ```bash
    openssl s_client -connect {mesh ingress domain}:{mesh ingress port} -showcerts </dev/null 2>/dev/null | openssl x509 -outform PEM -out gmedge-cert.crt
    # mesh ingress domain should not include the protocol (i.e. https://)
   ```
4. Convert the trust store to JKS format:

   ```bash
    keytool -importcert -noprompt -keystore {name of the truststore}.jks -file gmedge-cert.crt -storepass {password for new truststore} -alias {unique name} -storetype PKCS12
   ```
5. Return to the NiFi UI and complete the TLS configuration. 1. Double-click on the *Build Folder Hierarchy Processors*. 2. In the SSL Context Service line, on the far right, click the arrow icon. 3. In the new menu, click on the gear icon in the far right of the Local-SSLContextService line. 4. A Controller Service Details menu should appear. Set the *Keystore Filename* to the *absolute* path of the Keystore you just generated. 5. Set the *Keystore Password* to the password you used to generate the Keystore. 6. Set *Key Password* to the original certificate key file’s password. 7. Set the *Keystore Type* to JKS 8. Set the *Truststore Filename* to the Truststore you just generated. 9. Set the *Truststore Password* to the password of the Truststore. 10. Set the *Truststore Type* to JKS. 11. Click Apply. NiFi should switch the state to Validating. Wait a few seconds and refresh the page by clicking on the refresh icon in the lower left. 12. Enable the Controller Service by clicking on the lightning bolt icon. 13. Return to the main NiFi Editor

### Prepare Request for Grey Matter Data

#### Change `Object Policy`

Use the Object Policy object from the previous section.

### Send To Grey Matter Data

#### Remove `USER_DN`

Remove the `USER_DN` field at the bottom. The edge proxy will handle this for us and setting it explicitly will likely break authorization.

{% hint style="info" %}
If you are connecting directly to Grey Matter Data (bypassing the mesh or without a mesh) then you will need to set this. DNs are case-sensitive and order-sensitive.
{% endhint %}

#### Set `Remote URL`

Set this field to the same value as the Remote URL in 'Build Folder Hierarchy', but append /write to the URL. For example, if your Grey Matter Data root URL is `https://mesh.greymatter.yourcompany.com/services/gm-data/1.0` then this value needs to be `https://mesh.greymatter.yourcompany.com/services/gm-data/1.0/write`

If the previous configuration worked, there should be no NiFi warning symbols. Now, turn on the flow by clicking on the play button in the Operate Menu. All the Processors should switch their states to running. You may need to refresh to see this change. In the directory you chose as the Input Directory, create or copy in a file. Return to the NiFi UI, refresh, and watch as each Processor accepts its input, performs its function, and passes its output to the next Processor.

## Configuring your second template

See [this document](https://github.com/greymatter-io/nifi-sdk/blob/v1.0.2/doc/flows/GM_Data_to_FileSystem.md) for additional explanation into each Processor.

Now that you can push your local filesystem into Grey Matter Data, let's set up a template to do the reverse, to pull down data. To accomplish this, we will use the *Recreate Filesystem from Grey Matter Data* template. Just like the first template, go ahead and drag down the Template icon, but select Grey Matter Grey Matter Data to File System instead.

### List Files in Grey Matter Data

The first Processor will connect with Grey Matter Data to explore its filesystem. Open up its properties menu by double-clicking.

1. Enter the Controller Services menu by clicking on the arrow in the SSLContextService line. Unless you want to pull data as a different user (i.e. using different certificates than pushing), delete the new Local-SSLContextService (the one with the warning symbol). Since we already configured working certificates, we can re-use the same Controller.
2. Return to the Processor properties menu.
3. Select the only available SSLContextService.
4. Update the Remote URL field to the Grey Matter Data root URL. This value should be the same from the previous template.

{% hint style="info" %}
Ensure that the root URL does not have a trailing slash.
{% endhint %}

The next required field is the Input Directory. Unlike the previous template, this Input Directory is with respect to the remote, Grey Matter Data. Grey Matter Data has a few ways of referencing objects. Note that `/1/` always refers to the root, and within it, there will always be a namespace folder whose name can be found in the object returned by the `/config` route. However, we will be using Grey Matter Data's Object IDs to uniquely and succinctly reference folders.

Grey Matter Data is a RESTful API, so you can explore its filesystem programmatically; however, the UI makes interaction much faster. Access the UI at `https://{grey-matter-data-host-and-path/static/ui}`.

1. On the left-hand side, within the Action box, click Props.
2. Click the folder you want to sync your filesystem with.
3. Click Send Request, located on the far right side.
4. Within the response JSON, locate the oid key and copy it.

Returning to the NiFi Processor:

1. Set the Input Directory to the oid of the desired folder, like so: `/{oid}/` Note: the oid should not contain quotes.
2. Remove the `USER_DN` field

### Get File From Grey Matter Data

The second Processor retrieves the resources within the folder (or file) referenced by the oid we found in the previous steps.

1. Update the SSL Context Service.
2. Set the Remote URL to `https://{grey-matter-data-host-and-path}/stream/${gmdata.oid}`. Do not replace `${gmdata.oid}` It is the configured oid as a variable, shared by the previous Processor.
3. Remove the `USER_DN` field.

### Set Common Output Directory

The third Processor exposes multiple separate configuration variables for the other Processors. The only required field which needs to change is *baseOutputDirectory*. Set it to the path where the downloaded files should go.

The other Processors have sensible defaults, and probably won't be changed. At this point, the flow should be in working order. However, you can optionally configure the files to which Report Success and Report Failures log events. The process is the same in both.

1. Open the Processor Details menu.
2. Open Script Body and change the CSV file variable to the desired path. Make sure the path exists. NiFi will create the file, but will not create the folder structure.

Start the new Processors, refresh, and watch as your filesystem synchronizes.

## Additional Documentation

* [Grey Matter Data NiFi SDK](https://github.com/greymatter-io/nifi-sdk/tree/v1.0.2)
* [Grey Matter Data NiFi SDK - Processors, Scripts, and Templates](https://github.com/greymatter-io/nifi-sdk/tree/v1.0.2/doc)
* [NiFi User Guide](https://nifi.apache.org/docs.html)
