Skip to main content
All CollectionsData SourcesAirbridge
Airbridge: Overview Of Airbyte Data Flows
Airbridge: Overview Of Airbyte Data Flows
Openbridge Support avatar
Written by Openbridge Support
Updated over a year ago

Airbridge orchestrates data ingestion pipelines from Airbyte data sources like Stripe, Facebook, or Google to Airbyte data destination connectors like S3 data lake, Redshift, Snowflake, or BigQuery.

Airbridge is a configuration-driven, Airbyte connector service. It is open source, released under the MIT license. For more detail on Airbridge, visit the project on Github: https://github.com/openbridge/airbridge

For docs on activating Airbridge and Airbyte on your AWS account, please see Activating Airbridge In Your AWS Account.

Understanding Airbyte Connector Configurations

A core feature of an open source Airbyte Cloud runtime for a data pipeline is the “connector .” Airbyte has two classes of connectors: data source and destination. These connector acts as a bridge, facilitating data movement between various platforms. Both the source and destination require "configs" to connect, authorize, and process data.

Airbyte Data Source Config

The term "data source" in the context of Airbyte refers to the original location or platform from which data is extracted. This could be a CRM system, web analytics, databases, or any other platform where raw data is generated and stored.

Here is an example of a klaviyo soruce config which requires an API key and date:

{
"api_key": "required_value",
"start_date": "required_value"
}

However, you can see the variation in the salesforce source config requirements;

{
"is_sandbox": "optional_value",
"auth_type": "optional_value",
"client_id": "required_value",
"client_secret": "required_value",
"refresh_token": "required_value",
"start_date": "optional_value",
"force_use_bulk_api": "optional_value",
"streams_criteria": "optional_value"
}


The respective configs are defined by the source itself.

Scheduling and Timing When Collecting Data From A Source

While you may want "live", real-time data, the data source system defines when data is available and the frequency it can be requested.

Scheduling defines the cadence at which the data pipeline will run. Data source APIs are not unlimited resources; they have restrictions. If you set a schedule that exceeds the capacity, the source API will block, throttle, or fail your requests. A schedule should only be based on recommended frequencies for a given data source.

For more information on scheduling, Understanding Data Source APIs

Airbyte Source Catalogs

One of the key components of Airbyte's configuration is the catalog.json file, which defines the streams (i.e., tables, collections, or any other structured data) that a specific source connector can read from. This file plays an integral role in the ETL (Extract, Transform, Load) process, as it defines the schema for the data that will be extracted from a source and loaded into a destination.

To find the catalog.json, you will need to navigate to the respective sources on Github. For example, you were interested in Chargebee, go to source-chargebee/integration_tests/. In that folder, you would find the configured_catalog.json. Typically, no changes are needed to the catalog unless you are familiar with customizing this file. For most, leaving it as-is would be sufficient.

Airbyte Data Destinations Config

The term "data destination" in the context of Airbyte Cloud refers to where the integrated data will land. This could be a specific cloud database, data warehouse, data lake, or another storage platform.

Like sources, destinations also have configs. The following is an example config for Amazon Web Services S3:

{
"access_key_id": "optional_value",
"secret_access_key": "optional_value",
"s3_bucket_name": "required_value",
"s3_bucket_path": "required_value",
"s3_bucket_region": "required_value",
"format": {
"format_type": "optional_value",
"compression_codec": "optional_value",
"flattening": "optional_value",
"compression": {
"compression_type": "optional_value"
},
"block_size_mb": "optional_value",
"max_padding_size_mb": "optional_value",
"page_size_kb": "optional_value",
"dictionary_page_size_kb": "optional_value",
"dictionary_encoding": "optional_value"
},
"s3_endpoint": "optional_value",
"s3_path_format": "optional_value",
"file_name_pattern": "optional_value"
}


References

The following is a reference collection of data source documentation. This is not meant to be a comprehensive list, merely a waypoint to help get people pointed in the right direction.

Connector Name

Documentation Page

Postgres

ActiveCampaign

Adjust

Aha API

Aircall

Airtable

AlloyDB for PostgreSQL

Alpha Vantage

Amazon Ads

Amazon Seller Partner

Amazon SQS

Amplitude

Apify Dataset

Appfollow

Apple Search Ads

AppsFlyer

Appstore

Asana

Ashby

Auth0

AWS CloudTrail

Azure Blob Storage

Azure Table Storage

Babelforce

Bamboo HR

Baton

BigCommerce

BigQuery

Bing Ads

Braintree

Braze

Breezometer

CallRail

Captain Data

Cart.com

Chargebee

Chargify

Chartmogul

ClickHouse

ClickUp API

Clockify

Close.com

CockroachDB

Coda

CoinAPI

CoinGecko Coins

Coinmarketcap API

Commcare

Commercetools

Configcat API

Confluence

ConvertKit

Convex

Copper

Courier

Customer.io

Datadog

DataScope

Db2

Delighted

Dixa

Dockerhub

Dremio

Drift

Drupal

Display & Video 360

Dynamodb

End-to-End Testing Source for Cloud

End-to-End Testing Source

Elasticsearch

EmailOctopus

Everhour

Exchange Rates API

Facebook Marketing

Facebook Pages

Faker

Fastbill

Fauna

Files (CSV, JSON, Excel, Feather, Parquet)

Firebase Realtime Database

Firebolt

Flexport

Freshcaller

Freshdesk

Freshsales

Freshservice

FullStory

Gainsight-API

GCS

Genesys

getLago API

GitHub

GitLab

Glassfrog

GNews

GoCardless

Gong

Google Ads

Google Analytics 4 (GA4)

Google Analytics (Universal Analytics)

Google Directory

Google PageSpeed Insights

Google Search Console

Google Sheets

Google-webfonts

Google Workspace Admin Reports

Greenhouse

Gridly

Gutendex

Harness

Harvest

HTTP Request

Hubplanner

HubSpot

Insightly

Instagram

Instatus

Intercom

Intruder.io API

Ip2whois API

Iterable

Jenkins

Jira

K6 Cloud API

Kafka

Klarna

Klaviyo

Kustomer

Kyriba

Kyve Source

Launchdarkly API

Lemlist

Lever Hiring

LinkedIn Ads

LinkedIn Pages

Linnworks

Lokalise

Looker

Magento

Mailchimp

MailerLite

Mailersend

MailGun

Mailjet - Mail API

Mailjet - SMS API

Marketo

Merge

Metabase

Microsoft Dataverse

Microsoft Dynamics AX

Microsoft Dynamics Customer Engagement

Microsoft Dynamics GP

Microsoft Dynamics NAV

Microsoft Teams

Mixpanel

Monday

Mongo DB

Microsoft SQL Server (MSSQL)

My Hours

MySQL

N8n

NASA

Oracle Netsuite

News API

Newsdata API

Notion

New York Times

Okta

Omnisend

OneSignal

Open Exchange Rates

OpenWeather

Opsgenie

Oracle Peoplesoft

Oracle Siebel CRM

Oracle DB

Orb

Orbit

Oura

Outreach

PagerDuty

Pardot

Partnerstack

Paypal Transaction

Paystack

Pendo

PersistIq

Pexels-API

Pinterest

Pipedrive

Pivotal Tracker

Plaid

Plausible

Pocket

PokéAPI

Polygon Stock API

Postgres

PostHog

Postmarkapp

PrestaShop

Primetric

Public APIs

Punk-API

PyPI

Qonto

Qualaroo

QuickBooks

Railz

RD Station Marketing

Recharge

Recreation.gov API

Recruitee

Recurly

Redshift

Reply.io

Retently

RingCentral

Robert Koch-Institut Covid

Rocket.chat API

RSS

S3

Salesforce

Salesloft

SAP Business One

sap-fieldglass

SearchMetrics

Secoda API

Sendgrid

Sendinblue API

Senseforce

Sentry

SFTP Bulk

SFTP

Shopify

Shortio

Slack

Smaily

SmartEngage

Smartsheets

Snapchat Marketing

Snowflake

Sonar Cloud API

SpaceX-API

Spree Commerce

Square

Statuspage.io API

Strava

Stripe

Sugar CRM

SurveySparrow

SurveyCTO

SurveyMonkey

Talkdesk Explore

Tempo

Teradata

The Guardian API

TiDB

TikTok Marketing

Timely

TMDb

Todoist

Toggl API

TPL/3PL Central

Trello

TrustPilot

TVMaze Schedule

Twilio Taskrouter

Twilio

Twitter

Tyntec SMS

Typeform

Unleash

US Census API

Vantage API

VictorOps

Visma e-conomic

Vittally

Waiteraid

Weatherstack

Webflow

Whisky Hunter

Wikipedia Pageviews

WooCommerce

Wordpress

Workable

Workramp

Wrike

Xero

XKCD

Yahoo Finance Price

Yandex Metrica

Yotpo

Younium

YouTube Analytics

Zapier Supported Storage

Zencart

Zendesk Chat

Zendesk Sell

Zendesk Sunshine

Zendesk Support

Zendesk Talk

Zenefits

Zenloop

Zoho CRM

Zoom

Zuora


Did this answer your question?