When we deliver data to a destination like Azure Data Lake, BigQuery, AWS Athena, AWS Redshift, or Redshift Spectrum, we append additional metadata unique to the information resident in a record.

Your tables and views will include a series of system-generated fields that provide users with vital information about the meaning of the data we collected on your behalf.

Not only does this provide critical context about a record, but it also simplifies queries and data modeling.

System Generated Metadata Fields

In addition to the fields retrieved from source files, your table will include a series of system generated fields (all with a prefix of ob_*). Each of these fields is described below.

ob_date - The date used in the request to retrieve data from the source system as the data or "report date." This field is only present in non-batch integrations (e.g., Facebook, Amazon Seller Central, Amazon Advertising...)

ob_transaction_id - A system-generated unique id based on a hash of field values for a given row of data. This field is used to prevent or minimize duplicate data from being loaded to a table. The system generates the hash value for each row being loaded and compares it to all previously loaded data. The new row is excluded from the load process if a matching record is found. Note: The de-duplication process is applicable to cloud warehouses, not data lakes.

For more on data de-duplication, see De-Duplication And Dealing With Updates Over A Long Time Horizon

ob_file_name - A system-generated field including a temporary path and file name from which the data was loaded. This field is used by Openbridge for quality control purposes to validate that the source outputs and that data from a particular source file were loaded successfully to the target destination.

ob_processed_at - A system-generated field represents the timestamp that data processing started. For some sources that provide lifetime metrics, this date means the date for which lifetime metrics are valid.

ob_modified_date - A system-generated field represents the timestamp that a record was modified. For some sources that use attribution or lifetime metrics, this can define the most recent update.

Did this answer your question?