Provenance Records

Provenance Records record how data inside the Trust Framework is originated, processed, and transferred between participants, alongside records of the licenses covering the data transfer, and any permission granted by end users.

Provenance is recorded in a decentralised manner, with signed records being passed between participants whenever data is transferred. The format is extensible by Schemes.

Note: This document uses US English. To align with W3C and other prevalent standards, IB1 uses US English in its technical specifications and technical documentation.

Encoded and signed container format

Records are composed of steps which describe a discrete component of provenance. Participants create a series of steps, then sign them with signing certificates issued by the Directory.

Steps are serialised as JSON, then Base64 encoded for inclusion in a JSON encoded container. The structure and cryptographic choices borrow from JWS, with adaptations for nested signatures and use of X.509 certificates to provide keys and identify participants.

Participants receive Provenance Records alongside the data transferred from other participants, and then add additional steps to describe their processing. To preserve the signatures of the participant who created those steps, their encoded and signed form is included as received. To assert that the participant creating the record is relying on these steps and to prevent modification later, this encoded data and signature is included in the data to be signed.

A Python reference implementation of a Provenance library is available at https://github.com/icebreakerone/provenance.

Container structure

{
    "ib1:provenance":
            "https://registry.core.trust.ib1.org/trust-framework",
    "origins": [
        "JlktJgnddq45PldDsKMf"
    ]
    "steps": [
        [ included steps ],
        "Base64 encoded JSON",
        "Base64 encoded JSON",
        ... 
        [
            0, // version
            "certificate serial",
            "signing timestamp",
            "signature"
        ]
    ],
    "certificates": {
        "34983462": ["PEM encoded cert", "issuer serial", ...]
        // ...
    }
}

ib1:provenance
- The URL of the Trust Framework. All signatures are from certificates issued by this Trust Framework's Directory.
- This property identifies the structure as a Provenance Record.
origins
- An array of the id properties of the origin steps, in the order they appear in the record.
- Scheme logging policy may require that all participants are able to find all the Provenance records they rely on which match an origin ID.
steps
- A Signed Step List, see below.
certificates
- (Optional) Map of certificate serial number (as a string) to array of a PEM encoded certificate followed by the serial numbers of issuer certificates in the chain to the root CA certificate.
- If the certificates are not included to reduce the size of the record, they may be obtained from the Directory identified using the Trust Framework URL in the ib1:provenance property.

Schemes MAY NOT add any top level properties to a Provenance document.

Signed Step List format

A Signed Step List (the value of the top level steps property) is an array which has one or more elements which are either a Base64 encoded step, or another Signed Step List, followed by a final signature element.

The signature element is an array with 4 elements, in order:

0 - the version of the container and steps.
Serial number of the signing certificate, as a String.
Timestamp of signature as an ISO8601 date with 'Z' denoting UTC
String encoded signature, using the algorithm required by the format of the public key in the certificate.
- All implementations are required to support the "ES256" algorithm from JWS.

To generate the signature:

Create an array (the signature data array) containing the Trust Framework URL from the ib1:provenance property.
For each element in the step list array (without the final signature element):
- If the element is an Array
  - Push "%" to the signature data array
  - Recurse into the Array.
  - Push "&" to the signature data array
- else
  - Convert the value to a String, and push it to the signature data array
- Join the contents of the signature data array into a single String with a "." separator between elements.
- Sign this String and generate the signature element of the Signed Step List.

Versioning

Two Provenance records need to be combined into a single Provenance record when a process combines data from two sources. These records could be created with different versions of the Provenance specification, as participants may migrate to new formats at different speeds and use historic data.

The version number of the container and the data within the steps is included in the Signed Step List signature element so that Records can be merged without migrating data and breaking signatures. The decoding library must transform the data to a single version.

Schemes must version their data by using different property names.

Steps

Each participant involved in the data origination and processing adds steps to the record which describe their activity.

Common properties

{
    "id": "3E9qRZRpmHgvAljB1wzh",
    "type": "transfer",
    "timestamp": "2024-09-16T15:32:56Z",
    "scheme": "https://registry.core.trust.ib1.org/scheme/perseus",
    "assurance": { ... },
    "perseus:assurance": { ... },
    "perseus:region": "scheme defined value",
    // ...
}

id
- 15 bytes of random data generated from a cryptographically secure random number generator, base64 encoded.
- IDs are unique within Trust Framework to enable records to be merged without having to alter any of the signed data.
type
- See below for step types and their definitions
scheme
- The Registry URL of the Scheme where this data originated or was processed. Every step must have an explicit scheme property.
- Steps may have different scheme properties to allow cross-Scheme data transfer, but all Schemes must be members of the same Trust Framework.
timestamp
- ISO8601 date with 'Z' denoting UTC, of the time the action described in this step took place.
- The step timestamp may be different to the signing timestamp in the container.
assurance
- Assurance metadata defined across the Trust Framework.
<scheme>:assurance (e.g. perseus:assurance)
- Assurance metadata defined by a Scheme MUST be represented as a data structure under this property.
- Multiple assurance properties may be used in a single step for different Schemes.
<scheme>:scheme (eg perseus:scheme)
- Other scheme defined information SHOULD be represented as a data structure under this property.
<scheme>:property (eg perseus:otherThing)
- A Scheme MAY define any additional property where the name is prefixed by the scheme's short name and a :, where representation under the <scheme>:scheme property would be inappropriate.
_<name>
- Properties with a _ prefix are reserved for use by the library to add additional information when decoding and verifying the record.
_signature
- Added by the library when decoding a record to give information about which participant signed and relied on each step.

Steps do not contain the identifier of the participant who created the step. This identifier is specified by the certificate used to sign this step. The library makes the URL of the participant available in the _signature.signed.member property.

Permission

Permission steps are added to explicitly state that permission has been obtained and provide details of how to find the evidence in the logs. The id of this step is used in the permissions properties of Origin, Transfer and Process steps to show the permission they rely on.

The permission step does NOT need to be created by the same member that is relying on it, where a member is relying on a signed assertion of permission which was obtained by another member.

There is no relationship between the permission step and the Permission record created as part of the OAuth mechanism. This is to ensure that the Provenance records are not personal data. Schemes will generally require the participant which generates the permission step must be able to use the timestamp and account properties to locate the evidence for this permission in their audit trail.

{
    "id": "V1VFKWxXsXUtiaFEInSF",
    "type": "permission",
    "timestamp": "2024-09-16T15:32:56Z",
    "account": "iuPgAg4c8x4diYfdl6ADN4ULy3ir/B88",
    "allows": {
        "licenses": [
            "https://registry.core.trust.ib1.org/scheme/
                  perseus/license/energy-consumption-data/2024-12-05",
            "https://smartenergycodecompany.co.uk/
                                  documents/sec/consolidated-sec/"
        ],
        "processes": [
            "https://registry.core.trust.ib1.org/scheme/
                      perseus/process/emissions-calculations/2024-12-05"
        ]
    },
    "expires": "2025-09-16T15:32:56Z",
}

"type": "permission"
account
- An identifier for the account holder or end user.
- The identifier MUST be opaque to everyone apart from the participant who provided the data and MUST NOT be the OAuth token.
- Identifiers must never be reused for different account holders.
- The value IS NOT required to be the same for every data transfer relating to this account holder or end user across the entire Trust Framework for the account holder's relationship with the participant, but the creator of the permission step must be able to match it to the account holder during an audit.
allows
- A statement of the things that the permission is sufficient to use or do:
  - licenses
    - An array of Licence URLs.
    - These may be Registry URLs, or external licenses, eg the SEC license for smart meter data by an EDP)
  - processes
    - An array of Process URLs
expires
- Timestamp of when the permission expires, when processing and transfer must cease. The Scheme may require that stored data is deleted.

Origin

This step describes how data is originated, whether by the participant, or brought into the Trust Framework from an external source.

A Provenance record must contain at least one origin step. It may include more than one when Provenance records are merged.

The id value is copied to the top level origins property.

{
    "id": "4cN6b85eT7F5MCTTxhiI",
    "type": "origin",
    "timestamp": "2024-09-16T15:32:56Z",
    "scheme": "https://registry.core.trust.ib1.org/scheme/perseus",
    "sourceType": "https://registry.core.trust.ib1.org/scheme/
                                           perseus/source-type/Meter",
    "origin": "https://www.smartdcc.co.uk/",
    "originLicense": "https://smartenergycodecompany.co.uk/
                                  documents/sec/consolidated-sec/",
    "external": true,
    "permissions": ["V1VFKWxXsXUtiaFEInSF", ...],
    // ...
}

"type": "origin"
sourceType
- Registry URL of the type of source of data.
origin
- A URL describing the origin of the data.
originLicense
- The URL of the license which applies to the data, if a license explicitly applies.
- When bringing open data into the Trust Framework, this will be the URL of the license document specified by the owner of that data.
external
- Boolean flag to note whether data was generated by a party who is not a member of this Trust Framework.
permissions
- If Permission has been granted by the account holder (usually using an OAuth issuer), the ID of one or more Permission steps that are being relied on to bring data into this Trust Framework.
The Scheme COULD include assurance metadata.

Transfer

Created by a participant to record their transfer of data to another participant.

{
    "id": "51H/KU9Yw4VDxLnaIx+O",
    "type": "transfer",
    "timestamp": "2024-09-16T15:32:56Z",
    "of": "4cN6b85eT7F5MCTTxhiI",
    "to": "https://directory.core.trust.ib1.org/member/387262",
    "scheme": "https://registry.core.trust.ib1.org/scheme/perseus",
    "standard": "https://registry.core.trust.ib1.org/scheme/
                perseus/standard/energy-consumption-data/2024-12-05", 
    "license": "https://registry.core.trust.ib1.org/scheme/
                          perseus/energy-consumption-data/2024-12-05",
    "service": "https://api.example.com/v1/consumption",
    "path": "/readings",
    "parameters": {
        "measure": "import",
        "from": "2023-10-18Z",
        "to": "2023-10-19Z"
    },
    "permissions": ["V1VFKWxXsXUtiaFEInSF", ...],
    // ...
}

"type": "transfer"
of
- The id of a previous Origin, Process or Receipt step to identify the data transferred.
to
- The Directory URL of the participant that the data has been transferred to.
standard
- URL of Scheme Catalog Requirements document in the Registry, which defines the API used.
license
- The license governing the use of the data.
service
- The Directory URL of the instance of this type of data source (the ID allocated by the participant for their DCAT catalogue entry)
path
- The path from the OpenAPI file for the specific endpoint used within the data service.
parameters
- Any parameters used for the API call, excluding any which contain personal data.
permissions
- If Permission has been granted by the account holder (usually using an OAuth issuer), the ID of one or more Permission steps that are being relied on to make this transfer.

Receipt

Confirmation that a data transfer has taken place to the satisfaction of the receiving party. Complete Provenance records will contain matching Transfer and Receipt pairs, where the sending participant creates and signs a Transfer step to assert what they did, and the receiving participant creates and signs a Receipt step to assert they received some data and the Transfer step meets their expectations.

{
    "id": "lQo4LTEXAG7SMPlZ6t7a",
    "type": "receipt",
    "timestamp": "2024-09-16T15:32:56Z",
    "scheme": "https://registry.core.trust.ib1.org/scheme/perseus",
    "transfer": "3E9qRZRpmHgvAljB1wzh"
}

"type": "receipt"
transfer
- The ID of the transfer step which is being confirmed.
Assurance data COULD be included.

The participant confirming receipt is implied by the signature.

The receiving party is expected to verify the transfer step before adding a receipt step, including:

to is the URL of the receiving party
standard, license, service and path
parameters matches the API call (unless varied by the Scheme)
account is present if an OAuth token was used, otherwise not present
The step is signed by the expected participant.
Any other requirements in the Scheme rules

Process

Data processing performed on one or more input data sets.

{
    "id": "4cN6b85eT7F5MCTTxhiI",
    "type": "process",
    "timestamp": "2024-09-16T15:32:56Z",
    "scheme": "https://registry.core.trust.ib1.org/scheme/perseus",
    "inputs": [
        "4cN6b85eT7F5MCTTxhiI",
        "RkAnE+6fwkGzRiMVZqu+"
    ],
    "process": "https://registry.core.trust.ib1.org/scheme/
                    perseus/process/emissions-calculation/2024-12-05",
    "permissions": ["V1VFKWxXsXUtiaFEInSF", ...],
    // ...
}

"type": "process"
inputs
- Array of IDs of other steps in this record which were used as inputs to the process.
- Only origin, receipt and process steps can be used as inputs.
process
- Registry URL of the process performed on the data
- This property may be omitted (where permitted by the Scheme), in which case the _signature.signed.application property identifies the data processing.
permissions
- If the processing relies on Permission being granted by the account holder, the id of one or more Permission steps that are being relied on for this data processing.
Assurance data COULD be added if the process identifies anything relevant to the scheme.

The Application URL which performed the process is not included as it is within the certificate. It is available in the _signature.signed.application property added on decoding the record.