Transactional logs will be manifested as two tables;
ob_transactions_*
ob_transactions_metadata_*
Transactions (ob_transactions_*)
Transactions reflect responses received from upstream source systems and downstream data destinations. Here are the columns;
subscription_id
subscription_name
payload_name
product_name
modified_at
status
file_path
sender
error_code
error_message
transaction_id
ob_transaction_id
ob_file_name
ob_processed_at
ob_modified_date
Transaction events are typically at the API request level. For example, when calling the Amazon API for order data, there would be collections of events for each unique order type data feed (e.g., mws_orders_by_date
, mws_orders_by last_update
). This is because each API request may have different request limits, throttles, permissions, and retry constraints.
Any error messages are supplied by the source or destination system directly, which allows you to look up possible reasons for an error. For example, if an authorization token has expired, an event would list the cause as
"code":"UNAUTHORIZED","details":"Not authorized to access scope 441232344543454","requestId":"123234211223443"
In the event of an error, retries will occur for up to 8 hours. Once the initial retry attempts have failed, the event messages are transferred to a dead-letter queue or DLQ.
A DLQ reflects a holding queue for messages that cannot be processed as expected.
For example, if we get an AUTHORIZATION ERROR
that prevents our ability to connect to a source or destination; the event will ultimately get routed to a DLQ. Since these messages cannot be processed, they are stored for later retrieval and re-processing.
Re-processing is a secondary retry process at random intervals based on the source or destination. For example, in the event of an AUTH
error for Facebook, we will retry data collection once a day for seven days after the initial failure.
Another example is if a BigQuery data destination removes access permissions. In this case, all loading will fail. The load attempt event messages will ultimately be placed into a DLQ because of the failed attempts. The messages in the DLQ will be reattempted at various intervals, though no more than seven days. If the permission issues are not resolved, any DLQ message will expire or no longer be attempted.
In the event of hard failures like the ones detailed above, meaning we can not connect due to issues like authorization failures or long-term source/destination system outages, we expire the event message, and the trigger will "suspend" the pipeline until the customer can take action. In cases of prolonged failure, there is a risk that collected data can not be recovered.
Meta Data (ob_transactions_metadata_*)
Metadata contains information about a specific pipeline. For example, the ID, when it was created, who it was created by, associated authorizations, the status of the pipeline, and connected data destinations.
subscription_id
subscription_created_at
subscription_name
subscription_status
first_name
last_name
product_name
remote_identity_name
invalid_identity
invalidated_at
storage_name
storage_key_name
ob_transaction_id
ob_file_name
ob_processed_at
ob_modified_date
The metadata is useful as you can join the transactions to identify the specific subscription ID an event is linked to. For example, if you observe an AUTH error in transactions, you can look up the details of the subscription ID in the metadata. The AUTH
error may have occurred because a user made a breaking change to a subscription in our system (i.e., removed permission).