When we deliver data to a destination like Azure Data Lake, BigQuery, AWS Athena, AWS Redshift, or Redshift Spectrum, we append additional metadata unique to the information resident in a record.

Your tables and views will include a series of system-generated fields that provide users with vital information about the meaning of the data we collected on your behalf.

Not only does this provide critical context about a record, but it also simplifies queries and data modeling.

System Generated Metadata Fields

In addition to the fields retrieved from source files, your table will include a series of system-generated fields (all with a prefix of ob_*). Each of these fields is described below.

ob_date - The UTC date used in the request to retrieve data from the source system as the data or "report date." This field is only in non-batch integrations (e.g., Facebook, Amazon Seller Central, Amazon Advertising...)

ob_transaction_id - A system-generated unique id based on a hash of field values for a given row of data. This field is used to prevent or minimize duplicate data from being loaded to a table. The system generates the hash value for each loaded row and compares it to all previously loaded data. The new row is excluded from the load process if a matching record is found. Note: The de-duplication process is applicable to cloud warehouses, not data lakes.

For more on data de-duplication, see De-Duplication And Dealing With Updates Over A Long Time Horizon.

ob_file_name - A system-generated field including a temporary path and file name from which the data was loaded. Openbridge uses this field for quality control to validate that the source outputs and that data from a particular source file were loaded successfully to the target destination.

ob_processed_at - A system-generated field represents the UTC timestamp that data processing started. For some sources that provide lifetime metrics, this date means the date for which lifetime metrics are valid.

ob_modified_date - A system-generated field represents the UTC timestamp that a record was modified. For some sources that use attribution or lifetime metrics, this can define the most recent update.

Did this answer your question?