Skip to main content

Key Considerations For Data Source and Destination Automation Timing

Understanding how to set custom job schedules for your data pipelines

Openbridge Support avatar
Written by Openbridge Support
Updated over 2 weeks ago

It is essential to understand that APIs are not unlimited resources; they have certain restrictions. Accordingly, connectors are engineered to optimize data availability and quality. Several elements shape a data pipeline schedule, affecting when sync jobs start and how frequently they happen.

Custom Job Scheduling

⚠️ Advanced Scheduling Note: Changing job times without factoring in API-specific limits, retry windows, or data-settling behavior is risky. These adjustments are considered an advanced function and should only be made if you fully understand the upstream system’s constraints. Just because you have a preference for a job runtime does not mean the upstream API will accept that request.

Every source system (Amazon, Facebook, Google Ads, etc.) has its own API limits, retry behavior, and data-settling rules. Overlapping or poorly timed requests can increase the risk of empty data, no data, or request cancellation. For example, Amazon Reporting API enforces a “one request per 4 hours” limit for different FBA reports.

If multiple tools (or multiple schedules within the same account) request the same report too close together, the later requests are cancelled. For example, you set job schedule at 00:30 UTC, 01:00 UTC, 01:30 UTC, and 02:00 UTC. One or more of these requests are likely to fail given the guidance from Amazon. A better schedule would be 00:30 UTC, 05:00 UTC, 10:30 UTC, and 14:00 UTC as it spreads out requests over period that better aligns with the docs. (See "Limits and Throttles" below for more examples)

Similarly, requesting data immediately after a reporting period closes may yield incomplete results because the data has not fully “settled.” For example, scheduling a request at 1 minutes after midnight (00:01 UTC) expecting the source system to have aggregated the report for the prior date within 60 seconds of the period close will likely yield partial data, no data, or an empty data (a report with a header, but no records).

Guidelines for Setting Custom Job Schedules

Here are the five critical factors that determine the timing of data pipeline syncs from the source to the destination:

1. Limits & Throttles

2. Scheduling

3. Constraints

4. Availability

5. Errors

Limits & Throttles

All source systems have defined API limits, throttles, and data availability restrictions. Similarly, data destinations have loading limits and throttles that could have cost implications. For instance, Snowflake bills for CPU time and hourly data loads will add to your charges.

All sources and destinations have concurrency limits, i.e., the number of simultaneous requests. For example, Amazon will reject multiple report requests within the same hour since API restrictions allow only one request per hour.

Openbridge adopts a cautious API request strategy, ensuring we comply with established rate limits and throttling constraints. We meticulously model each endpoint to request and re-request data, minimizing our use of API rate limits.

If you have another app, tool, or process requesting data every 30 minutes (see App A), and you schedule Openbridge (App B) to make a request for the same report, there is a significant changes the request Openbridge makes will be blocked or cancelled because App A is over consuming the API resources.

Hence, Openbridge's data pipeline scheduling aligns strictly with each system's policies and rules. For example, Amazon Seller Central allows only one daily sync for Referral Fees reports, whereas hourly syncs are permissible for Order transactions.

Ultimately, the availability of data via the API is determined and controlled by each unique data source, not Openbridge.

Scheduling

Openbridge dynamically establishes a default schedule based on the data source and the number of existing data pipelines when creating a new one. For example, if you already have 5 data pipelines for an Amazon Advertising profile, we'll set a dynamic schedule when adding a sixth pipeline, ensuring compliance with the data source API limits. Otherwise, the simultaneous operation of all six pipelines might breach Amazon's limits, leading to failed syncs.

When setting custom schedules, avoid setting schedules for the same account (e.g, Seller, Vendor, Advertiser...) at the same time. In the example below, if you have three pipelines for the same Seller account, avoid setting a custom job schedule that shares the exact same schedules. Why? In our example there are 9 report requests happening all happening at the exact same time. This will likely trigger API limits.

A better approach is to stagger your custom schedule to help avoid request collisions.

Constraints

When Openbridge initiates a data pipeline job, we don't have direct control over the completion time or when data is loaded. The job completion, including the time to load extracted data into your destination, depends on two external factors:

1. The data source's response to our requests for data

2. The availability of the target data destination

In the case of data source delays, system outages, authorization issues, or temporary account blocks, Openbridge will re-queue the requests for later processing.

Various destinations have different loading capacities. For instance, if you're using multiple Redshift processes for data analytics, our data loading requests get queued within Redshift's system. This results in delayed loading time as our ability to write data depends on the destination's capacity and availability.

Under such circumstances, Openbridge will queue our requests to a destination to manage the back-pressure from destination load capacity limits. Delays due to these reasons can range from a few minutes to several days in the worst cases.

Availability

Different sources provide data at various intervals, typically when the data has "settled" or been packaged for delivery via the API. For instance, when requesting data for an Amazon Retail report on 10/20, we ask for the "settled" data from 10/19, ensuring a complete data day. If we request the same day's data, Amazon might reject it or deliver incorrect or incomplete results, as the data may not have settled. Therefore, our data pipeline schedules reflect requests we know the source system can fulfill.

Errors

Errors during a data pipeline can occur for various reasons, such as:

1. Authorization and credential issues

2. API limits and throttles

3. Connection failures and errors

4. Source or destination outages

Openbridge verifies user permissions and account configuration during and post-activation. Insufficient permissions, changes after pipeline setup, or improper source configuration can cause a pipeline to fail.

Besides documented API limits, there might be undocumented limits or throttles. For example, although the Amazon API states you can make 60 requests an hour, some reports can only be requested once daily.

As mentioned earlier, customers might have another app or tool exhausting API capacity, which could impact our ability to collect data on your behalf. For instance, Amazon may report NO_DATA or CANCELLED in response to our API request. You should review any other apps connecting to the data source API.

Summary

  • Know the limits — Some APIs allow frequent queries; others restrict certain reports to once per day or per several hours.

  • Allow for retries — Build in buffer time between your scheduled request and your reporting deadline, so failed requests have time to be retried.

  • Avoid collisions — If multiple apps or integrations hit the same report, coordinate schedules to prevent conflicts.

  • Respect data-settling windows — Many APIs finalize prior-day data 1–2 hours after midnight UTC (or equivalent cutoff). Requesting too early may return partial, incomplete, or no results.

Additional Information For Data Pipeline Scheduling

For additional information on scheduling, timing, and automation, see our article Understanding Data Pipeline Scheduling.


Did this answer your question?