All Collections
Data Sources
Understanding Data Pipeline Scheduling
Understanding Data Pipeline Scheduling

Data pipeline schedules are driven by integration behaviors for both sources and destinations

Openbridge Support avatar
Written by Openbridge Support
Updated over a week ago

Once you've initiated a data pipeline, it may require 24-48 hours for the initial sync to happen. However, this timeline can fluctuate significantly based on the source system's data generation speed. For instance, syncing Amazon Pay Settlement Reports could take several weeks.

Data Pipeline Schedules Classification

There are four primary classifications of schedules:

1. Daily

2. Hourly

3. Lookback

4. Historical

Daily Schedules

These schedules entail data requests based on a pre-defined offset to align with data availability. For instance, on Amazon, we use a -1 offset. So, on 1/6/2021, we ask for the data of 1/5/2021, corresponding to a -1 offset. This type of schedule is predominantly used for reporting and insights APIs.

Hourly Schedules

Hourly schedules involve data requests every hour (e.g., from 1 AM to 2 AM, 2 AM to 3 AM, and so on). This schedule type is particularly common for transactional systems like orders or shipping data APIs and is often denoted as "real-time" or "near real-time."

Lookback Schedules

Lookback schedules retrieve data from previous dates and often run with standard daily processes. Certain data sources might update past dates with new data, as Amazon updates impression counts for an earlier date. Lookbacks are frequently seen in ad platforms, which permit changes to performance attribution metrics such as sales, impressions, and clicks.

Historical Schedules & Requests

Historical schedules aim to reconstruct a by-date snapshot of data from a source. For example, a request on July 1 for data from the past 180 days would create daily API requests for 180 days of data. If the connector manages five reports, it results in 900 reports (180x5). Hence, the API requests could surpass 3000 to recreate the data of 180 days.

As these historical requests can be demanding, they're scheduled to run as separate processes to minimize interference with daily, hourly, or lookback schedules. These jobs are optimized to run as long-lasting background tasks, requesting past dates data as the API capacity allows. For example, recreating a year's worth of data for Amazon retail reports could require almost 10,000 API requests and might take up to 4 weeks to complete, assuming an API limit of 60 requests per hour.

Due to the process of reconstructing daily snapshots of historical data, these requests can often take days or even weeks to finalize.

Further Information on Schedules & Timing

For a deeper understanding of Data Source and Destination automation and timing, please refer to our article, Key Considerations For Data Source and Destination Automation Timing.

Did this answer your question?