Using Grey Matter APIer to accelerate data observability and availability

Introduction

The next-gen paradigm of data architecture and governance is indisputably about data meshes. Yet, despite such revelations, real-world data patterns still need to be supported, used, and discovered before they can begin to be migrated, or for that matter, used in the future. Part of thinking in a data mesh fashion is placing the source first and for many business domains, their source happens to integrate naturally with spreadsheets.

Spreadsheets have been a de facto tabular data store for years. They have tremendous benefits, namely, ease of use and ubiquity - almost anyone can create and manage spreadsheets. However, spreadsheets are not always democratized.

With a frenzied proliferation of data across all business domains, archaic practices around data must change. Spreadsheets heavily clash with modern computing architectures. They localize and silo data. They lock it down into increasingly large files prohibiting data movement, collaboration, and observability. Spreadsheets can grind a business’s data analytics and intelligence to a screeching halt.

To workaround this architectural roadblock, Grey Matter offers the APIer service. APIer enables data access on spreadsheets by transforming them into interoperable and standardized RESTful APIs. APIer takes a spreadsheet found in a remote data source or locally on disk and exposes that data as queries across the data mesh, allowing developers and data consumers to interact with familiar interfaces.

Coupled with Grey Matter’s sidecar proxy, APIer rapidly facilitates data movement and real-time observability, bridging the divide between siloed spreadsheets and modern data-laden web applications, while still maintaining domain-oriented decentralized data ownership.

Why APIer?

APIer was tried and tested on our own publicly-accessible COVID API Hub, a collection of public COVID-19 data sources to assist in the dissemination of researcher’s analytic efforts. During our creation of COVID API Hub, we ran into a problem. We lacked a unified interface for exposing disparate data sources. Many data sources were exposed as spreadsheets rather than RESTful APIs.

What can it do?

The great benefit of APIer is its flexibility and agility, which means it can not only incorporate known tabular data sources into your data mesh, but it can also unlock new opportunities for data integration by making potentially costly integrations nearly trivial.

By simply uploading spreadsheets to a shared data storage solution (like Grey Matter Data, which also provides permission governance), HR, Finance, or any other department can allow internal developers to build novel and useful dashboards or other services around company-wide data.

APIer has a lightweight footprint and makes a great addition to ETL pipelines, transforming clunky tabular data into modern and usable JSON data. For instance, with a simple script or microservice you could create an automatic pipeline to grab a research team’s experimental data from a file server and send the resulting JSON to a more accessible non-relational database for use in a web app.

Building a React + APIer + Grey Matter Data Application

Let's use APIer to transform a hypothetical employee vacation process, where managers track vacation in a spreadsheet, by exposing the data to a web application. We will set up a multi-service application consisting of a Grey Matter Data instance, an APIer instance reading from Grey Matter Data, and a simple React web app to display the spreadsheet data. The data we will consist of employee vacation requests and some additional data for managers.

For a more in-depth description of APIer and its configuration options, read the APIer reference documentation.

Prerequisites

A running Grey Matter mesh. See Install on Kubernetes for an EKS installation.
- Another alternative is to install locally on k3d.
- You may need to install various CLI tools like kubectl, awscli, and helm.
An instance of Grey Matter Data installed into the mesh. See Generate Configurations for details.
The Grey Matter CLI installed.

Tested Software Versions

This guide has only be tested on a MacOS using Grey Matter with the following software versions:

Grey Matter Data 1.1.3 - 1.1.5
Grey Matter APIer 2.0.4
Grey Matter CLI 2.0.0 (Control API 1.5.0)
Grey Matter Helm Charts (release-2.3)
Kubectl 1.19.3
Helm 3.4.2

Overview

Here is what we will do.

Deploy APIer mesh configs
Upload a spreadsheet to Grey Matter Data
Deploy an APIer service
Explore the Example Dashboard UI

Deploy APIer mesh configs

Clone the APIer repository and navigate to the vacation-example.

 git clone https://github.com/greymatter-io/apier
 cd apier/examples/vacation-example

Create and apply the mesh object configurations to allow for edge-to-service ingress, sidecar-to-service ingress, and service-to-service egress (between APIer and Grey Matter Data). Follow this guide for instructions on how to accomplish this. We have provided mesh objects (See the vacation-example/example-mesh-configs folder ) which can help inform this process, but may not work on your exact mesh setup.
- For the purposes of naming the service, use apier-vacation as the instructions below will assume that name and may break if you change it. This is only truly relevant for the name key in the proxy object, since it must match the given XDS_CLUSTER variable found in the provided Grey Matter (Kubernetes) deployment.

Upload spreadsheet to Grey Matter Data

We’ll use the spreadsheet included in the example repo, called vacation-logs.xlsx. It has two sheets and about 20 entries, listing employees and some realistic information that they might provide for requesting vacation time.

Navigate to https://{your_domain}/{gm_data_path}/static/ui
If you have not yet created a personal non-admin folder do so:
1. Select the Write action at the bottom of the left hand action section.
2. Select on the Grey Matter Data Namespace folder (usually home) in file system browser section.
3. Click Modify, then Create New Folder.
4. On the right side, click Send Request.
Select the Write action.
Select your personal folder.
Click Upload and follow the prompt to select the vacation-logs.xlsx file.
Click Send Request.
In the JSON response, find the oid (Object ID) field and copy its value for later.

To double check that the file was uploaded, we can use the stream action in the UI or use the /stream endpoint with the path to the file as a path parameter:

https:/{your_domain}/{gm_data_ingress_path}/stream/1/namespace/folder/file

or the file's Object ID as a path parameter:

https://{your_domain}/{gm_data_ingress_path}/stream/{oid}

You can also combine both approaches, reducing the length of long paths, while still maintaining some self-documenting clarity. For this guide, we'll just use the Object ID format, using the oid of the Excel file we just uploaded.

If all went well, your browser should prompt you to download the file.

Since oids are unknown before creating the object in Grey Matter Data, you obviously cannot predict their value. So, you cannot hardcode them into another service and should take care using them with ephemeral clusters, meshes, or data storage as they will change, causing URLs to break. An absolute pathing scheme would be a better option in those situations.

Deploy APIer Service

Now that we have a URL for APIer, we can configure and launch an instance into the mesh. We just need to set five of APIer's environment variables to configure its behavior.

SOURCE_URL - endpoint where the spreadsheet is served up from
SOURCE_FORMAT - the file classification, either 'excel' or 'csv'
DATASET_NAME - a useful name for the spreadsheet in the API docs (optional)
ROOT_PATH - the extra path prefix added by a proxy (optional)
DOCS_URL - the endpoint from which the documentation page is served (optional)

If the end of the SOURCE_URL string contains a recognizable file extension, then APIer will infer the SOURCE_FORMAT automatically. If you choose a full path scheme to refer to the spreadsheet in Grey Matter Data, you do not need to set SOURCE_FORMAT, but it is required using the oid path scheme.

The URL set in SOURCE_URL should not be ingress URL we used in the browser to access the spreadsheet directly from Data. Instead, it should point to the local sidecar, using the route prefix for Data you set in the egress route mesh object. Since the traffic between the sidecar and APIer doesn't leave the local machine, we can use HTTP.

Ensure the ROOT_PATH, matches the ingress route prefix prepended by the edge proxy. The prefix was configured in the edge route mesh object and typically follows the pattern /service/{service-name}/{service-version}. A misconfigured value will prevent the OpenAPI specification from being rendered correctly.

Modify the Kubernetes deployment config located at examples/vacation-example/apier-deployment.yml and edit the environment variable values for the apier container, using the below values.

env:
  - name: DATASET_NAME
    value: "Vacation Data"
  - name: ROOT_PATH
    value: "/services/apier-vacation/latest"
  - name: DOCS_URL
    value: "/"
  - name: SOURCE_FORMAT
    value: "EXCEL"
  - name: SOURCE_URL
    value: "http://localhost:10909/{egress/route/to/data}/stream/{oid of file | full path to file}"

Once you have set those values, deploy the service and watch for its deployment status to show three containers ready.

kubectl apply -f apier-deployment.yml
kubectl get pod -w

Assuming all three containers start and no errors are logged by Kubernetes, then we have a successful deployment of APIer.

Hitting https://{your mesh domain}/services/apier/latest/sheets should return a list of two sheet objects. Like this:

[
  {
    "id": "5de314150f51",
    "name": "Vacation Requests"
  },
  {
    "id": "ce74c8a54c54",
    "name": "Vacation Metrics"
  }
]

See this document for more information on APIer’s routes.

Explore the Example Dashboard UI

Although you can start using the APIer instance by crafting HTTP requests, viewing a visual dashboard is a bit more exciting and will hopefully kickstart some of your own ideas on how to leverage your newly accessible spreadsheets.

The Kubernetes deployment you applied already contained the dashboard application. We just have to access it. Port forward to both the APIer container and the dashboard container:

kubectl port-forward $(kubectl get pods | awk '$1 ~ "apier" { print $1 }') 3000:3000 > /dev/null 2>&1 &
kubectl port-forward $(kubectl get pods | awk '$1 ~ "apier" { print $1 }') 8000:8000 > /dev/null 2>&1 &

In a real scenario, you should give the web application its own pod, sidecar proxy, and corresponding mesh object configuration with edge ingress and APIer egress routing.

Access the UI at http://localhost:3000. You should see a simple dashboard displaying some raw and derived data from our spreadsheet. There is also an API explorer you can use to more easily test APIer. Feel free to take a look at the source code or use it to jump start your own projects!

PreviousUsing NiFi with Grey Matter Data NextAWS

Last updated 5 years ago

Was this helpful?

hashtagIntroduction

hashtagWhy APIer?

hashtagWhat can it do?

hashtagBuilding a React + APIer + Grey Matter Data Application

hashtagPrerequisites

hashtagTested Software Versions

hashtagOverview

hashtagDeploy APIer mesh configs

hashtagUpload spreadsheet to Grey Matter Data

hashtagDeploy APIer Service

hashtagExplore the Example Dashboard UI