Airbridge orchestrates data ingestion pipelines from Airbyte data sources like Stripe, Facebook, or Google to Airbyte data destination connectors like S3 data lake, Redshift, Snowflake, or BigQuery.
Airbridge is a configuration-driven, Airbyte connector service. It is open source, released under the MIT license. For more detail on Airbridge, visit the project on Github: https://github.com/openbridge/airbridge
For docs on activating Airbridge and Airbyte on your AWS account, please see Activating Airbridge In Your AWS Account.
Understanding Airbyte Connector Configurations
A core feature of an open source Airbyte Cloud runtime for a data pipeline is the “connector .” Airbyte has two classes of connectors: data source and destination. These connector acts as a bridge, facilitating data movement between various platforms. Both the source and destination require "configs" to connect, authorize, and process data.
Airbyte Data Source Config
The term "data source" in the context of Airbyte refers to the original location or platform from which data is extracted. This could be a CRM system, web analytics, databases, or any other platform where raw data is generated and stored.
Here is an example of a klaviyo soruce config which requires an API key and date:
{
"api_key": "required_value",
"start_date": "required_value"
}
However, you can see the variation in the salesforce source config requirements;
{
"is_sandbox": "optional_value",
"auth_type": "optional_value",
"client_id": "required_value",
"client_secret": "required_value",
"refresh_token": "required_value",
"start_date": "optional_value",
"force_use_bulk_api": "optional_value",
"streams_criteria": "optional_value"
}
The respective configs are defined by the source itself.
Scheduling and Timing When Collecting Data From A Source
While you may want "live", real-time data, the data source system defines when data is available and the frequency it can be requested.
Scheduling defines the cadence at which the data pipeline will run. Data source APIs are not unlimited resources; they have restrictions. If you set a schedule that exceeds the capacity, the source API will block, throttle, or fail your requests. A schedule should only be based on recommended frequencies for a given data source.
For more information on scheduling, Understanding Data Source APIs
Airbyte Source Catalogs
One of the key components of Airbyte's configuration is the catalog.json
file, which defines the streams (i.e., tables, collections, or any other structured data) that a specific source connector can read from. This file plays an integral role in the ETL (Extract, Transform, Load) process, as it defines the schema for the data that will be extracted from a source and loaded into a destination.
To find the catalog.json
, you will need to navigate to the respective sources on Github. For example, you were interested in Chargebee, go to source-chargebee/integration_tests/
. In that folder, you would find the configured_catalog.json
. Typically, no changes are needed to the catalog unless you are familiar with customizing this file. For most, leaving it as-is would be sufficient.
Airbyte Data Destinations Config
The term "data destination" in the context of Airbyte Cloud refers to where the integrated data will land. This could be a specific cloud database, data warehouse, data lake, or another storage platform.
Like sources, destinations also have configs. The following is an example config for Amazon Web Services S3:
{
"access_key_id": "optional_value",
"secret_access_key": "optional_value",
"s3_bucket_name": "required_value",
"s3_bucket_path": "required_value",
"s3_bucket_region": "required_value",
"format": {
"format_type": "optional_value",
"compression_codec": "optional_value",
"flattening": "optional_value",
"compression": {
"compression_type": "optional_value"
},
"block_size_mb": "optional_value",
"max_padding_size_mb": "optional_value",
"page_size_kb": "optional_value",
"dictionary_page_size_kb": "optional_value",
"dictionary_encoding": "optional_value"
},
"s3_endpoint": "optional_value",
"s3_path_format": "optional_value",
"file_name_pattern": "optional_value"
}
References
The following is a reference collection of data source documentation. This is not meant to be a comprehensive list, merely a waypoint to help get people pointed in the right direction.
Connector Name | Documentation Page |
Postgres | |
ActiveCampaign | |
Adjust | |
Aha API | |
Aircall | |
Airtable | |
AlloyDB for PostgreSQL | |
Alpha Vantage | |
Amazon Ads | |
Amazon Seller Partner | |
Amazon SQS | |
Amplitude | |
Apify Dataset | |
Appfollow | |
Apple Search Ads | |
AppsFlyer | |
Appstore | |
Asana | |
Ashby | |
Auth0 | |
AWS CloudTrail | |
Azure Blob Storage | |
Azure Table Storage | |
Babelforce | |
Bamboo HR | |
Baton | |
BigCommerce | |
BigQuery | |
Bing Ads | |
Braintree | |
Braze | |
Breezometer | |
CallRail | |
Captain Data | |
Cart.com | |
Chargebee | |
Chargify | |
Chartmogul | |
ClickHouse | |
ClickUp API | |
Clockify | |
Close.com | |
CockroachDB | |
Coda | |
CoinAPI | |
CoinGecko Coins | |
Coinmarketcap API | |
Commcare | |
Commercetools | |
Configcat API | |
Confluence | |
ConvertKit | |
Convex | |
Copper | |
Courier | |
Customer.io | |
Datadog | |
DataScope | |
Db2 | |
Delighted | |
Dixa | |
Dockerhub | |
Dremio | |
Drift | |
Drupal | |
Display & Video 360 | |
Dynamodb | |
End-to-End Testing Source for Cloud | |
End-to-End Testing Source | |
Elasticsearch | |
EmailOctopus | |
Everhour | |
Exchange Rates API | |
Facebook Marketing | |
Facebook Pages | |
Faker | |
Fastbill | |
Fauna | |
Files (CSV, JSON, Excel, Feather, Parquet) | |
Firebase Realtime Database | |
Firebolt | |
Flexport | |
Freshcaller | |
Freshdesk | |
Freshsales | |
Freshservice | |
FullStory | |
Gainsight-API | |
GCS | |
Genesys | |
getLago API | |
GitHub | |
GitLab | |
Glassfrog | |
GNews | |
GoCardless | |
Gong | |
Google Ads | |
Google Analytics 4 (GA4) | |
Google Analytics (Universal Analytics) | |
Google Directory | |
Google PageSpeed Insights | |
Google Search Console | |
Google Sheets | |
Google-webfonts | |
Google Workspace Admin Reports | |
Greenhouse | |
Gridly | |
Gutendex | |
Harness | |
Harvest | |
HTTP Request | |
Hubplanner | |
HubSpot | |
Insightly | |
Instatus | |
Intercom | |
Intruder.io API | |
Ip2whois API | |
Iterable | |
Jenkins | |
Jira | |
K6 Cloud API | |
Kafka | |
Klarna | |
Klaviyo | |
Kustomer | |
Kyriba | |
Kyve Source | |
Launchdarkly API | |
Lemlist | |
Lever Hiring | |
LinkedIn Ads | |
LinkedIn Pages | |
Linnworks | |
Lokalise | |
Looker | |
Magento | |
Mailchimp | |
MailerLite | |
Mailersend | |
MailGun | |
Mailjet - Mail API | |
Mailjet - SMS API | |
Marketo | |
Merge | |
Metabase | |
Microsoft Dataverse | |
Microsoft Dynamics AX | |
Microsoft Dynamics Customer Engagement | |
Microsoft Dynamics GP | |
Microsoft Dynamics NAV | |
Microsoft Teams | |
Mixpanel | |
Monday | |
Mongo DB | |
Microsoft SQL Server (MSSQL) | |
My Hours | |
MySQL | |
N8n | |
NASA | |
Oracle Netsuite | |
News API | |
Newsdata API | |
Notion | |
New York Times | |
Okta | |
Omnisend | |
OneSignal | |
Open Exchange Rates | |
OpenWeather | |
Opsgenie | |
Oracle Peoplesoft | |
Oracle Siebel CRM | |
Oracle DB | |
Orb | |
Orbit | |
Oura | |
Outreach | |
PagerDuty | |
Pardot | |
Partnerstack | |
Paypal Transaction | |
Paystack | |
Pendo | |
PersistIq | |
Pexels-API | |
Pipedrive | |
Pivotal Tracker | |
Plaid | |
Plausible | |
PokéAPI | |
Polygon Stock API | |
Postgres | |
PostHog | |
Postmarkapp | |
PrestaShop | |
Primetric | |
Public APIs | |
Punk-API | |
PyPI | |
Qonto | |
Qualaroo | |
QuickBooks | |
Railz | |
RD Station Marketing | |
Recharge | |
Recreation.gov API | |
Recruitee | |
Recurly | |
Redshift | |
Reply.io | |
Retently | |
RingCentral | |
Robert Koch-Institut Covid | |
Rocket.chat API | |
RSS | |
S3 | |
Salesforce | |
Salesloft | |
SAP Business One | |
sap-fieldglass | |
SearchMetrics | |
Secoda API | |
Sendgrid | |
Sendinblue API | |
Senseforce | |
Sentry | |
SFTP Bulk | |
SFTP | |
Shopify | |
Shortio | |
Slack | |
Smaily | |
SmartEngage | |
Smartsheets | |
Snapchat Marketing | |
Snowflake | |
Sonar Cloud API | |
SpaceX-API | |
Spree Commerce | |
Square | |
Statuspage.io API | |
Strava | |
Stripe | |
Sugar CRM | |
SurveySparrow | |
SurveyCTO | |
SurveyMonkey | |
Talkdesk Explore | |
Tempo | |
Teradata | |
The Guardian API | |
TiDB | |
TikTok Marketing | |
Timely | |
TMDb | |
Todoist | |
Toggl API | |
TPL/3PL Central | |
Trello | |
TrustPilot | |
TVMaze Schedule | |
Twilio Taskrouter | |
Twilio | |
Tyntec SMS | |
Typeform | |
Unleash | |
US Census API | |
Vantage API | |
VictorOps | |
Visma e-conomic | |
Vittally | |
Waiteraid | |
Weatherstack | |
Webflow | |
Whisky Hunter | |
Wikipedia Pageviews | |
WooCommerce | |
Wordpress | |
Workable | |
Workramp | |
Wrike | |
Xero | |
XKCD | |
Yahoo Finance Price | |
Yandex Metrica | |
Yotpo | |
Younium | |
YouTube Analytics | |
Zapier Supported Storage | |
Zencart | |
Zendesk Chat | |
Zendesk Sell | |
Zendesk Sunshine | |
Zendesk Support | |
Zendesk Talk | |
Zenefits | |
Zenloop | |
Zoho CRM | |
Zoom | |
Zuora |