Dataset Commitments
Below is a survey of the principles for making effective commitments that deliver maximum value for data consumers.
General Principles
Commit production data to create an objective verifiable record of the data. Commitments resemble running a stock market index where a credible party permanently records all rebalancing trades.
Commit data as soon as it is available to consumers to achieve high fidelity of record timestamps.
Commit revisions and corrections to data similarly. The sequence of commitments thus created offers an accurate view of the data that consumers can replay as if they were receiving live historical records.
Prompt commitment of revisions reduces or eliminates lengthy and expensive live testing required for legacy financial data. With legacy workflows, consumers can't be sure of the precise timing of data delivery and revisions unless these were delivered into their production processes, requiring expensive testing.
External Examples
vBase commitments and the ability to rewind the information to "as it actually was"
enable bitemporal modeling for any data: https://en.wikipedia.org/wiki/Bitemporal_modeling. Indeed, the underlying objective cryptographic commitments enable verification, bitemporal modeling, and historical simulation for any untrusted 3rd party data as if the consumer received historical data in production or generated it themselves.
Some non-vBase datasets with support for (low-fidelity) vintage timestamps are important macroeconomic series with frequent revisions:
ADP: Total Nonfarm Private Payroll Employment: https://alfred.stlouisfed.org/series/downloaddata?seid=ADPWNUSNERSA.
U.S. Bureau of Labor Statistics: All Employees, Total Nonfarm: https://alfred.stlouisfed.org/series?seid=PAYEMS
vBase Commitment Examples
We will use the above Total Nonfarm (PAYEMS) series to illustrate how vBase commitments establish high-fidelity "vintage timestamps" identifying when data and its revisions became available.
For simplicity, we will assume all commitments are made at midnight GMT, beginning with the 20221104
vintage.
Initial commitment
The initial dataset is the full history for the 2022-11-04
vintage:
These initial and subsequent records can be committed as VBaseJsonSeries
or VBaseStringSeries
objects. If data is available in an AWS S3 bucket, it can also be committed using the commit_s3_objects
tool: https://docs.chronocloud.tech/reference/vbase-py-tools/commit_s3_objects.
Data receives the following timestamp:
The initial dataset history has been established.
Incremental commitment
At the next reporting period, an incremental commitment records observation for the new month:
Data receives the following timestamp upon commitment:
Revisions
At the 2022-12-02
reporting period, historical data was also revised. This revision can be submitted as a dataset of changed values:
Alternatively, the entire history can be re-submitted:
Both approaches will record revisions with a high-fidelity timestamp:
Regardless of which method the producer finds most convenient, vBase consumer and simulation APIs will make the latest known data available for a given query or simulation time.
Last updated