Open Policy Agent

Because the gm-data microservice is a permissioned media file service that ships as part of Grey Matter, it is important to keep in mind why the permission system is the way that it is. It integrates with Authentication and Authorization to follow dissemination rules, and permissions for updating files. The main uses of it are as follows:

  • File attachments, especially large media, such as image, audio, and video files.

  • A robust, general, and multi-tenant permission system that is adaptable to the kinds of complex requirements seen in Military, Intelligence, Health, and GDPR (Privacy law) situations.

  • Hosting of assets along with media. Media files do not necessarily stand alone. A map layer may be uploaded along with a JavaScript application that renders it. Static websites with static media files, that make references to microservices hosted in the same mesh are possible.

  • Deal with encryption-at-rest and performance in an ideal manner. Range requesting (common in video), etags, and caching are critical to serving up media-rich sites under load.

  • Basic filesystem functionality for read and write. Follow well-established standards for the read operations; as is required of media in browsers.

  • Work properly when users rotate through the system. The files, and the permissions on them should be immutable, even when staff rotates through job roles. Because of this, the system is designed to calculate access based on attributes about a user. The actual user is really only relevant from an auditing sense. When a user leaves the system, the file will be available to whoever steps into the role that the user had. This leads naturally to a system in which it is easy for files to be owned by multiple people. And there may be periods where a file is owned by nobody, until a person is put into a role to deal with that file.

Authentication

Authentication is the job of the Grey Matter edge proxy. Ultimately, it sets a username for incoming users. An authentication is some kind of identity proof. The identity itself only binds a user to a primary key used to track a user. In some cases we do not need to be authenticated. In other cases, authentication is a prerequisite to authorization later on.

Common authentication methods:

  • Username/Password proof. A user proves that they know the password for a username.

  • Certificate proof. A user proves that he is the rightful owner of a certificate. There is a cryptographic handshake in which the server challenges the client for proof of certificate ownership.

  • OpenAuth2/OpenIDConnect. Some combination of Username/Password, unexpired tokens, and redirect to a service that validates a user is authenticated.

  • It might even require some muti-factor authentication. In any case, some complex third-party proof of identity was done to our satisfaction.

At the end of either exchange, a username can confidently be set. It doesn’t actually matter which method was used. In gm-data, the Grey Matter proxy will set a privately controlled header, USER_DN. gm-data will accept this USER_DN header, as the proxy is trusted to set it to a truthful value and disallow users outside of the mesh from setting it to an arbitrary value.

Authorization

Authorization is the job of the JWT server. It exchanges Authentication proof for an Authorization proof.

Once we have figured out who a user is, it is straightforward to issue this user a digitally signed statement of what they are authorized to actually do.

A token should:

  • Expire in a timely fashion to prevent old, leaked tokens from being a security hazard.

  • Be limited to a safe scope to take advantage of an ability to limit the blast-radius of user mistakes or to limit the consequences of a temporary leak.

  • Allow for calculation versus an actual resource to determine what can be done with it.

A JWT token is just an encoding for signing an arbitrary chunk of JSON. The JWT tokens that gm-data deals with have a restriction that they should have a values field, that is a map from string to an array of strings, to facilitate access calculations later on.

In exchange for a proof that we own email rob@gmail.com, set in the USER_DN header, we can execute GET /policies with USER_DN=rob@gmail.com against the JWT server for a signed JWT of what we are allowed to do. This is a decoded claim in that JWT, which is what you get when you GET /self against gm-data:

These claims are what gm-data actually reads. gm-data doesn't make any decisions based on the user's Authentication (i.e. USER_DN value). We don't care who the user actually is. We care what they are allowed to do, and we can only use the USER_DN to look up what the user is allowed to do; which is what is in these JWT claims.

Access and Policy

Access is a calculation made on an Authorization versus a resource. The calculation is defined in a Policy, which is the whole point of gm-data's use of OpenPolicyAgent.

When gm-data makes a decision, it needs an Authorization and a Policy. The Authorization comes in on the userpolicy header as JWT token, or we use a USER_DN header to get a userpolicy.

The Policy is literally a tiny piece of code that exists per resource (a file or a directory that we want to access), and the input is the Authorization JWT claims. There are various standards for Authentication. Authorization in gm-data uses the JWT standard, as it's a document that attests to facts about the user that are relevant to computing access in the JWT claims. But Policy, up until recently didn't have such an obvious standard. OpenPolicyAgent (i.e. OPA, Rego) is emerging as that standard.

OpenPolicyAgent is really just a special-purpose language. All that it has to do is look at the JWT claims and make access decisions by setting output variables.

The code making these decisions has to:

  • Be extremely compact for representing Access Control List functionality.

  • Be a safe and sandboxed language. We are making security decisions, and can't have harmful side-effects such as opening files, or going into infinite loops.

  • Support very tiny programs that are compiled and executed at a high rate.

  • Be easy to manipulate and generate with automated tooling. Policy editing will generally be hidden behind user interfaces that wire together groups, roles, and usernames to create appropriate restrictions on resources.

In gm-data, we support two languages for this purpose. The first, that existed before OpenPolicyAgent support is a simple LISP variant:

This calculation sets some variables to be true, based on a JWT claim input:

  • C - Create is allowed, which is applicable to directories.

  • R - Read is allowed, which returns metadata about an object.

  • U - Update is allowed on an object.

  • D - Delete is allowed on an object, which is more of a "hide" due to gm-data maintaining old versions; accessible with a tstamp parameter.

  • X - Execute is allowed, which means to actually open up and see the contents of a file. The difference between R and X is that you can see something in a listing with R, but you need X to GET /stream on that file.

  • P - Purge is allowed. This actually removes the file physically, and out of history.

  • has some [field] [valu0] [value1] ... - Means to look at the input.claims.values for a field to see if any of the values match it. This is effectively OR logic on the list of values.

  • has every [field] [value0] [value1] ... - Is similar, but with AND logic, so that every value must be in the list.

  • if - Just evaluates its first arg to true or false and reduces to the second argument if true, or to the third argument if it's false (if it exists).

The whole point of this bit of code is just to evaluate the input.claims that came from the JWT, to properly set output variables. It actually hardly matters what language is used for this purpose. So we can use OpenPolicyAgent language, which is technically called Rego for this purpose. All that matters is that the output variables are set.

Here is the Rego equivalent of the original LISP language. OpenPolicyAgent sets variables to be true when the expressions in curly braces evaluate to true. Expressions not in curly braces are OR'd together.

In both cases, the input.claims is the JWT claims that were digitally signed, such as this input:

In this case, the access calculated, when Authorization is plugged into Policy would just be this, because canView evaluated to true:

In this case, X means that we are allowed to GET /stream on the file, and R means that we can see it show up in listings.

Objects

The objects being protected are either a file or a directory. They are like an "inode" in a filesystem. Every version of every object has a JSON chunk that describes it. They have descriptions similar to this simplified version of it:

For the file "home/rob@gmail.com/docs/resume.pdf", objectpolicy.rego is the function that evaluates what we can do with this file. In this case, if we were to plug in the JWT of alice@gmail.com, she gets execute access on the file, because the email is a match.

Notes about this way of doing things:

  • A very very large list of users won't create large lists of users in our policies. The user might not even exist in an actual database anywhere. But they have a valid token that asserts the truth of these attributes; so we do not need to go look them up anywhere.

  • If Alice surrenders her email to another person after she leaves, then ownership is transferred seamlessly, because in the JWT, every field is multi-valued, including common fields like email. For continuity policy, Alice's boss probably accumulates the email of all former employees in his JWT.

  • We do not know ahead of time how our customers do business.

    • They may treat a Distinguished Name (DN) as a unique identifier for a user (government).

    • They may treat an email address as a unique identifier for a user (commercial).

      • They may have asserted that email by looking it up from their PKI entry (government).

      • They may have used OpenAuth2 to set the USER_DN value to contain an actual email, that it asserted to; because the user proved the ability to login with that email.

Usage Examples

In order to clear up how gm-data actually works, it's useful to look at it as a completely stand-alone service.

Assume the following:

  • gm-data is running plaintext on a port that we can reach.

  • We can create a JWT of completely arbitrary JSON. Normally, the JWT service would look in a database to create such a JWT. But the whole nature of JWT is that however we manage to get a signed token, it's fine if it works.

  • Say that we run this with ./curltest.sh PG-13 Titanic.mp4 as arguments. The intent is that we want to upload a movie called Titanic.mp4 with a security label set to PG-13 on it.

  • We need authorization to use the system. In this case, we are going direct to gm-data and have access to the JWT keys in our ./gmdatatool program. The JWT is sitting in file thejwt.txt. The JWT will expire in 60000 seconds.

  • Once we have the JWT, we will be able to create a POST /write with a proposed object to upload.

  • Given the upload.json that describes what we are trying to do, and a JWT that authorizes us to do it, we are ready to actually perform an upload. The upload itself is only the last line of code in the script.

  • The objectpolicy.rego field is important. It says that "localuser@deciphernow.com has full control over the file, with C R U D X P set to true. Nobody else is actually given privileges on this file. So, it is solely owned by the uploader. Because nobody else has R on the file, nobody else in this system even knows that the file exists. It will not show up in the listings without R.

In this case, we set objectpolicy.rego. We could have set originalobjectpolicy to this Rego as well and it would be understood by gm-data. We can only see this file in the listing if we are the user with email localuser@deciphernow.com. Note that in /static/ui/, on the upper-right of the screen, it has some idea who we are. If we look at the properties that got created, it took what we uploaded and enhanced it with more fields that it needs to do book keeping.

uiscreenshot.png

Things to note:

  • The policy field is an ephemeral field of the access that got calculated. It is ["C","R","U","D","X","P"] in this case. That means that any user interface can tell if it's a read-only file, or if it should try to make a link to actually GET /stream on it, or to render any options for update, delete, or purge. The server will enforce these rules if we try to break them.

  • The server guessed the mimetype for us.

  • The field where LISP would normally go at objectpolicy.requirements got set to a vacuous yield with no arguments at all. That means that LISP gives us no access at all. But because objectpolicy.rego exists, it will use that field to calculate access. This is a consequence of still having two policy evaluation languages in use at the moment.

  • The userpolicy.label field has the label that came from the userpolicy.label object. This is a public field. It is effectively the current "owner" of the file, which means the last person that touched it. The label doesn't have to be a DN of a user, but it usually is. In pseudononymous cases, as with GDPR, the userpolicy.label might be something generic that identifies a person as best as it can, like "US Citizen adult". It depends on what got written into the JWT claims. JWT claims may or may not have personally identifying information.

Policy Language

A deeper look at the Rego policy language of OpenPolicyAgent should be done by looking at its public documentationarrow-up-right and the Rego Playgroundarrow-up-right specifically. But some basics about the language:

  • A package must be declared. In our case, we call the package "policy". Rego in the policy package will read input.claims.values for making calculations, and expect boolean output for variables named: C, R, U, D, X, P.

  • Variables are assigned by a block of curly brackets to the right of it. The boolean statements in the block may be separated by lines or a semicolon. They are AND logic for combining them to figure out what to assign the variable.

  • OR is done implicitly by using duplicate statements.

A simple example that only gives permission R and X to everyone:

Has literally set:

Give access R if you are in role with sirius_admin or zetareticuli_admin. Note that this bizarre way of handling the OR operation. Just duplicate the condition, and re-use variables to make common code if you have to.

Has literally set this, only for a JWT that meets the role criteria:

If it hasn't met the role criteria, it has literally set no value at all. R is not even false, it just is not set to true. Be aware of this. Your policy should look for !true, or be guaranteed to set the value you want to true or false explicitly.

Realistic cases usually need to distinguish between who owns (i.e. can make modifications to) a file. RX for users in a particular group; but you must be a specific person with an email address to be the owner.

Which sets this for Alice, even if she is not in any of these groups:

gm-data would say that Alice doesn't have access to any of: C, R, U, D, X, P. But another application might care about other values, as this is OpenPolicyAgent, which has a broader scope than just gm-data. A more realistic use that gm-data would use would be like:

But note that even though gm-data treats unset variables as if they were set to false, other applications might take it to mean that the value is unknown. This sort of example is vary typical. A file might be jointly owned by a group of users. It might be viewable to people in certain roles. Importantly, in this design:

  • We do not presume the kinds of attributes that will be used to uniquely identify a user i.e. email, userDN, unixLogin, ActiveDirectoryLogin, etc.

  • We do not presume the use of any kind of group or role. Administrators define them by giving such attributes to users i.e. citizenship, organization, age, etc.

  • It is the job of the operations people to setup the system so that the JWT has attributes that fit the policies that are actually being written.

  • An organization might write their own JWT server. It just needs to have a signer that is trusted by applications such as gm-data.

Last updated

Was this helpful?