When we deliver data to a destination like Azure Data Lake, BigQuery, AWS Athena, AWS Redshift, or Redshift Spectrum, we append additional metadata unique to the information resident in a record.
Your tables and views will include a series of system-generated fields that provide users with vital information about the meaning of the data we collected on your behalf.
Not only does this provide critical context about a record, but it also simplifies queries and data modeling.
System Generated Metadata Fields
In addition to the fields retrieved from source files, your table will include a series of system-generated fields (all with a prefix of ob_*
). Each of these fields is described below.
ob_date
- The UTC date used in the request to retrieve data from the source system as the data or "report date." This field is only in non-batch integrations (e.g., Facebook, Amazon Seller Central, Amazon Advertising...)
ob_transaction_id
- A system-generated unique id based on a hash of field values for a given row of data. This field is used to prevent or minimize duplicate data from being loaded to a table. The system generates the hash value for each loaded row and compares it to all previously loaded data. The new row is excluded from the load process if a matching record is found. Note: The de-duplication process is applicable to cloud warehouses, not data lakes.
For more on data de-duplication, see De-Duplication And Dealing With Updates Over A Long Time Horizon.
ob_file_name
- A system-generated field including a temporary path and file name from which the data was loaded. Openbridge uses this field for quality control to validate that the source outputs and that data from a particular source file were loaded successfully to the target destination.
ob_processed_at
- A system-generated field represents the UTC timestamp that data processing started. For some sources that provide lifetime metrics, this date means the date for which lifetime metrics are valid.
ob_modified_date
- A system-generated field represents the UTC timestamp that a record was modified. For some sources that use attribution or lifetime metrics, this can define the most recent update.