FileZilla: 3 Simple Steps To Load Data to Google BigQuery, Amazon Redshift, Amazon Redshift Spectrum, or Amazon Athena

FileZilla is a reliable, easy-to-use cross-platform tool to help you get CSV files transferred to a destination data warehouse such as Amazon Redshift, Amazon Athena, Redshift Spectrum, or BigQuery.

What Is FileZilla?

FileZilla is open source software, which means it is distributed free of charge under the terms of the GNU General Public License. You can get client software for Mac, Windows, and Linux operating systems. The following are a few key features of FileZilla:

Supports FTP, FTP over SSL/TLS (FTPS), and SSH File Transfer Protocol (SFTP)
The PRO version also supports WebDAV or Amazon S3 and Azure cloud services
Cross-platform. Runs on Windows, Linux, Mac OS X and more
Supports resume and transfer of large files >4GB
Graphical user interface
Transfer files to AWS Redshift, AWS Athena, AWS Redshift Spectrum or Google BigQuery

FileZilla can deliver CSV data directly to the Openbridge batch Data Pipeline Service (DPS). When you upload CSV files with FileZilla the batch service will automatically clean, convert and route your data to target warehouses. Each batch pipeline includes automated schema, table, and view creation/versioning as well as de-duplication routines.

Getting Started

The guide assumes you have an Openbridge account and have set up a batch data pipeline. In this guide, we will show you an example of how to transfer Salesforce data to a pipeline called loyalty_purchases . The pipeline had been configured during the setup process to load data to a target table called loyalty_purchases within a Redshift data warehouse. This process will simply show how loading data to the target upload location will trigger the import to your data warehouse.

Step 1: Install Filezilla

The first step is to download and install the software. If you need an install guide, FileZilla provides a pretty good one here.

Once you have the software installed, proceed to Step 2.

Step 2: Launch And Configure Filezilla

Open the software. Next, you want to add a new site in the Site Manager: File → Site Manager. Enter the connection details you received when you set up your Openbridge batch data pipeline:

Host: pipeline-01.openbridge.io
Select SFTP for Protocol
Port: 22 or 443 (Note: port 443 is often used in the event a corporate firewall blocks outbound connections on port 22 )
Logon Type: normal
User: yourusername
Password: yourpassword

You will now be able to connect to the server! Next, we will transfer Salesforce data to your loyalty_purchasesdata pipeline.

Step 3: Connect and Transfer

With your software configured, you are ready to connect and start your first transfer.

When you log in you will see a folder called /loyalty_purchase and /testing. There are two important concepts relating to where you are uploading data. The /testing directory is where you can experiment and test your file transfers. This is your sandbox. The /testing directory allows you to upload data without it getting processed and loaded to a warehouse.

The /loyalty_purchases folder is your production delivery location. Only push the data you configured in the batch setup process to this location. This reflects the data you want to be loaded into a loyalty_purchases table in your target data warehouse destination.

In our example above, we transferred all of our *_salesforce_click.zip data to our /loyalty_purchases production location. In this example, we transferred 15 files that totaled close to 74MB. All 15 files will be processed and loaded into your target warehouse destination.

That's it! Your salesforce_click data will make its way to the loyalty_purchases table in about 5-10 minutes.

With your Salesforce data exported and successfully loaded into your data warehouse, connect your favorite data visualization tools like Tableau, Power BI, Qlik, Looker, Mode Analytics, AWS QuickSight, and many others for unified data analysis, visualizations, and reporting of your salesforce_click (or any other CSV!) data.