All Collections
Data Sources
Batch
Batch Data Pipeline FAQs
Batch Data Pipeline FAQs

Collection of FAQs to help troubleshoot issues you may have with your batch data pipeline

Openbridge Support avatar
Written by Openbridge Support
Updated over a week ago

Why can't the data I posted to the pipeline be loaded into the database?

If you don’t see data that you expect to see in one or more tables, you can take a few steps to diagnose the issue…

Verify File Delivery

Verify that the file(s) containing the data in question was successfully posted to the SFTP location. Most SFTP software includes search functionality that allows you to search the relevant folder for the filename or a portion of the filename (e.g., filename contains ‘foo’)

Resolution: If the file was not posted to the SFTP location, attempt to re-post (or re-request) the data and check the Redshift table to see if it has loaded. Depending on the file size and other files in the processing queue, this could take a few seconds to several minutes.

Verify The File Contents

If the file was successfully posted to the SFTP location, download a copy of the file to your desktop and open the file in a text editor (e.g., Notepad) or Excel to check the following issues that could result in a load failure:

  1. Does the file contain data? Sometimes a file may be empty.

  2. Is the file layout (i.e., columns and column order) the same as its corresponding warehouse table?

  3. Is the file delimited properly (e.g., tab or comma-quote), and are the delimiters consistent with the initial load file for the table?

  4. Are the values for each field formatted properly (e.g., text, number, date) and consistent with the initial load file for the table?

  5. Are there special characters present in the data? Here is an example where a column contains URLs with special characters. Here is a sample of a URL present in the data:

https://www.domain.com/?id=Email-_-AZXCSQWWW-_-None-_-CAV-_XX&src=1232123&utm_medium=Email&utm_source=News&utm_campaign=434333&utm_content=Nav&j=%%job%%&sfmc_sub=%%SubscriberID%%&u=%%JobURLID%%

In this URL, the presence of various special characters can cause the data trainer to take a long time to infer the data resident in that column:

/ : _ - %

Resolution: Fix the error(s) in the file, re-post the file to the SFTP location with a new name (e.g., if the original file was named some_data_20150101.csv rename the new file to something like some_data_20150101_2.csv) and check the target warehouse table to see if it has been successfully loaded.

Still having an issue?

If the file passes the above checks, please submit a support ticket by emailing support@openbridge.com so that a support agent can assist with diagnosing and resolving the load error. To facilitate the diagnostic efforts, please be sure to include the following in the email:

  1. Warehouse table name with the missing data

  2. Criteria that can be used to identify the missing row(s) in the table (e.g., table ‘foo’ is missing data for the date = xx/xx/xxxx)

  3. Filename (and SFTP path name if known) for the unloaded data

The initial load file is the first file that is posted to a particular folder. This file generates and defines the properties for the respective warehouse table (field names, order, and formats). Subsequent files posted to a folder must have the same specifications as the initial load file to be loaded successfully.

What happens if I post the same data more than once?

The Openbridge platform creates a unique id (ob_transaction_id) based on the hash for each row of data processed. If a row of data is subsequently processed with the same hash value, it will automatically be excluded from the load process for the table in question. However, if any field values differ, the hash will also differ, and the data will be loaded. Let's look at an example...

Let's say on 12/11/22, you posted a file named 20151211_ad_performance.csv to a table named ad_performance:

"ADID","DATE","CLICKS","IMPRESSIONS","CAMPAIGNID"
"0123","December 10, 2022","12","120","A102B"
"4567","December 10, 2022","25","125","A904F"

The records loaded to the ad_performance table

You will see that the system fields ob_transaction_id and ob_modified_date have been added to your table and represent the unique id for that record and the timestamp that the record was loaded to the table.

Then let's say the next day, 12/12/22 that you posted another file named 20151212_ad_performance.csv that included ad performance data for both 12/10/22 and 12/11/22:

"ADID","DATE","CLICKS","IMPRESSIONS","CAMPAIGNID"
"0123","December 10, 2022","12","120","A102B"
"0123","December 11, 2022","18","100","A102B"
"4567","December 10, 2022","25","125","A904F"
"4567","December 11, 2022","20","180","A904F"

The Openbridge platform will analyze this file and identify the 1st and 3rd records on this file as duplicate records because all field values are the same and exclude these records from the load process into the ad_performance table. 

However, let's say that your data changes slightly over time or reflects lifetime values for a particular metric, and the file you posted on 12/12/22 looks like this...

"ADID","DATE","CLICKS","IMPRESSIONS","CAMPAIGNID"
"0123","December 10, 2022","12","121","A102B"
"0123","December 11, 2022","18","100","A102B"
"4567","December 10, 2022","25","125","A904F"
"4567","December 11, 2022","20","180","A904F"

In this case, the value in the impressions field for ADID on 12/10/22, changed from 120 to 121. As a result, the row of data will no longer have the same hash value as the row from the previous file, and it will be loaded to the ad_performance table

As you can see, there are now two rows of data for ADID 0123 on 12/10/22 with different values for ob_transaction_id and ob_modified_date. Depending on how your reporting queries are structured, this could result in overstating impressions for ADID 0123 on 12/10/15.

If your data changes over time, it is best practice to load data for a particular date once. If you face this situation, you have a couple of options...

  1. Create a view against the ad_performance table (or modify your query logic) so that it returns the row of data with the MAX (or MIN) ob_modified_date for each ADID and date combination.

  2. Contact Openbridge Support (support@openbridge.com) to create a customized view or delete the duplicate data from the ad_performance table based on ob_modified_date.

What happens if new columns are added (or removed) from my file?

Treat the files you send like a table in a database. If your data in the CSV file looks like this;

"ADID","DATE","CLICKS","IMPRESSIONS","CAMPAIGNID"
"0123","December 10, 2022","12","121","A102B"
"0123","December 11, 2022","18","100","A102B"
"4567","December 10, 2022","25","125","A904F"
"4567","December 11, 2022","20","180","A904F"

Your table will have a fixed schema that looks like this:

"ADID","DATE","CLICKS","IMPRESSIONS","CAMPAIGNID"

However, let's say you send data to the same pipeline with a new column called REACH

"ADID","DATE","CLICKS","IMPRESSIONS","CAMPAIGNID","REACH"
"0123","December 10, 2022","12","121","A102B","34"
"0123","December 11, 2022","18","100","A102B","1245"
"4567","December 10, 2022","25","125","A904F","234"
"4567","December 11, 2022","20","180","A904F","444"

The schemas for each file do not match.

"ADID","DATE","CLICKS","IMPRESSIONS","CAMPAIGNID"
"ADID","DATE","CLICKS","IMPRESSIONS","CAMPAIGNID","REACH"

This means the downstream destination will not load your new file as you have conflicting schemas.

If the number or order of fields in your file changes and it no longer matches the layout of the initial 'blueprint' file used to create the table, the new file will fail to load. If the change to the file layout delivered from your source is valid (i.e., not the result of an upstream processing error), you have a couple of options to resolve the issue.

  1. Follow the process described above to load the data to a new table. The resulting new table will have a new 'blueprint', and subsequent files that follow the new layout will load successfully. If you still need to reference the data from the 'old' table, you can still do so. If necessary, you can create a view that merges data from the old and new tables into one consolidated view. The view creation process is described in Amazon's documentation, or contact Openbridge Support (support@openbridge.com) for assistance.

  2. You can contact Openbridge Support (support@openbridge.com) to change the 'blueprint' for the original table to follow the revised file structure so that the new files will be processed successfully. Data for the new fields will only be populated going forward. The previously loaded data will contain null values for any new fields.

What if the name of one of my columns changes?

If only the name of the column on the source file changes (e.g. click_percent  becomes click_pct ) but the underlying data type and order of fields stay the same, so the file should be processed successfully. The field name in the resulting warehouse table will continue to reflect the original field name. If, for some reason, you need to change the name of the field in the new table, contact Openbridge Support (support@openbridge.com).

What if my source files contain additional header and/or footer rows besides the column headers?

The Openbridge pipeline is designed to load delimited text files with one header row automatically. If the files delivered by your source system cannot be modified to remove extraneous header (or footer) rows, you will need to contact Openbridge Support (support@openbridge.com) to modify the table blueprint to ignore those rows. For example, let's say your source files looks like this...

Report Name: Daily Advertising Performance Report
Date: 12/12/2022
Created By: John Doe "ADID","DATE","CLICKS","IMPRESSIONS","CAMPAIGNID"
"0123","December 10, 2022","12","120","A102B"
"0123","December 11, 2022","18","100","A102B"
"4567","December 10, 2022","25","125","A904F"
"4567","December 11, 2022","20","180","A904F"
"Totals", "75", "525"

In this example, four additional rows are above the column headers, and a footer row with summarized totals for the CLICKS and IMPRESSIONS fields. The initial load of this file will fail because the first four rows do not contain valid delimiters. You do not want to load the additional header or footer rows, which will impact your ability to query these fields in the Redshift table. We must modify the 'blueprint' for these files to ignore the first four rows and the last (footer) row.

Once that change has been implemented, as long as the source files follow the same layout and structure as the sample file above, they will successfully load on an automated basis.

What if the name (file mask) of my source file changes?

If only the naming convention of the source file changes (e.g. ad_performance_daily.csv  becomes ad_performance_yyyymmdd.csv ) but the underlying data type and order of fields stay the same, so the file should be processed successfully.

What transfer protocols are supported?

The pipeline supports the SFTP transfer method. SFTP is integrated into many graphical tools or can be done directly via the command line.

FTPES and FTP protocols are not supported. FTP is an insecure protocol that should only be used in limited cases or on networks you trust. 

SFTP is based on the SSH2 protocol, which encodes activity over a secure channel. Unlike FTP, SSH2 only uses a single TCP connection and sends multiple transfers, or "channels", over that single connection.

Openbridge currently supports public key and password authentication for both SFTP SCP file transfer protocols. We do NOT support any shell access. Also, not all SFTP commands are accepted. 

SSH Keys

If you need to use public-key authentication, please submit a support ticket to support@openbridge.com, and we can set that up for you. 

Please be aware that using SSH keys requires setup, configuration, and validation. This is especially true for upstream systems like Adobe, as they require a public key attached to each unique Openbridge account.  Normally it can take 24-48 hours to activate custom SSH keys for an account, given the various requirements of each upstream system.  

How do I connect to the server?

Connection Details:

    Host:               pipeline-01.openbridge.io
    Port:               22 or 443
    Protocol:           SFTP
    User:               Provided separately
    Password:           Provided separately

NOTE: If you need a  host with a static IP address, you can use the host name of static.pipeline.openbridge.io .

I need to whitelist your IP address for our egress firewall rules. What do you think I should use?

If your network blocks outbound connections and requires whitelisting, the server's internal resources will connect with you can use these two IP addresses:

54.210.33.66  
52.54.179.84

Do you support FTP file transfers?

Openbridge does not support the use of FTP. We recognize that some systems can only deliver data via FTP. For example, many of the Adobe export processes typically occur via FTP. However, it should be noted that the use FTP offers no encryption for connection and data transport. Using the FTP protocol is regarded to be unsafe. Therefore, it is advisable to use SFTP connections to ensure that data is securely transferred.

Why am I having trouble connecting to the server?

Your corporate firewall may be blocking outbound connections on port 22 . If you are having connection difficulties, please make sure you can make outbound network connections to SFTP make sure outbound port:22 or port:443 is open. Either can be used to connect.

Why am I having trouble transferring files with certain extensions?

To help protect the integrity of the data sent to Openbridge, we do not allow the delivery of certain files. For example, you are not allowed to send a file called app.exe or my.sql. This reduces the potential for introducing unwanted or malicious software threats. The following is a sample of the blocked files:

    ade|adp|app|ai|asa|ashx|asmx|asp|bas|bat|cdx|cer|cgi|chm|class|cmd|com|config|cpl|crt|csh|dmg|doc|docx|dll|eps|exe|fxp|ftaccess|hlp|hta|htr|htaccess|htw|html|htm|ida|idc|idq|ins|isp|its|jse|ksh|lnk|mad|maf|mag|mam|maq|mar|mas|mat|mau|mav|maw|mda|mdb|mde|mdt|mdw|mdz|msc|msh|msh1|msh1xml|msh2|msh2xml|mshxml|msi|msp|mst|ops|pdf|php|php3|php4|php5|pcd|pif|prf|prg|printer|pst|psd|rar|reg|rem|scf|scr|sct|shb|shs|shtm|shtml|soap|stm|tgz|taz|url|vb|vbe|vbs|ws|wsc|wsf|wsh|xls|xlsx|xvd

If you attempt to send a file that matches a blocked file, the transfer will not be allowed.

Hidden files are not allowed on the server. Hidden files have a dot prefix . in the file name. For example, .file.txt or .foo would be considered hidden files and be blocked.

The following are examples of names using a dot prefix:

    .file.txt
    .file.csv
    .file
    ..file

If you attempt to send a file that contains a . prefix, the transfer will not be allowed.

What is the /testing folder used for?

Normally data delivered to the SFTP server will be routed to the target data warehouse. However, there is one exception. Any files placed into a /testing directory will be ignored. By default, each user has a /testing folder generated for them upon login. This is a safe place to upload files without the risk of the data flowing downstream to the data warehouse.

How do I audit that my files have been delivered?

In most cases, file transfers occur without incident. However, there are situations where an error may arise. Troubleshooting an error can be a difficult and time-consuming endeavor, especially when a system may send 100s or 1000s of files a day.

Openbridge does not have visibility into what should be sent from a source system. It only knows what was sent by the source system. For example, if are 100 files in a source system and only 50 were sent, Openbridge is only aware of the 50 delivered files. We will not know that an additional 50 files were not delivered.

The source system should track what was delivered and what was not. We suggest that a manifest of files is maintained in the source system. This manifest would identify the files to be delivered and their state (success, failure, pending).

The manifest procedure allows the source system to recover from certain errors, such as failure of the network, source/host system, or in the transfer process. In the event of an error, the source system would know to attempt redelivery for any file that had not received a successful 226  code from Openbridge.

How do I know if a file was transferred successfully?

Your client will normally be sent a response code 226 to indicate a successful file transfer. However, other status codes may be present. See below:

  • A code 226  is sent if the entire file was successfully received and stored

  • A code 331  is sent due to a login error. The name is okay, but you are missing a password.

  • A code 425  is sent if no TCP connection was established. This might mean the server is not available or some other network error. Change from PASV to PORT mode, and check your firewall settings

  • A code 426  is sent if the TCP connection was established but broken by the client or network failure. Retry later.

  • A request with code 451 , 452 , or 552  if the server had trouble saving the file to disk. Retry later.

  • A 553  code means the requested action was not taken because the file name sent is not allowed. Change the file name or delete spaces/special characters in the file name.

If you are stuck with an error condition, contact Openbridge Support (support@openbridge.com) for help.

Do you validate the integrity of the file once you receive it?

Yes, Openbridge uses checksum (or hash) validation to ensure the integrity of the file transfer. Openbridge uses a 128-bit MD5 checksum, represented by a 32-character hexadecimal number. This only applies to files once they are in our possession. We suggest that source systems also calculate the MD5 checksum before file delivery to increase the confidence that the integrity of the file is met. Depending on the tools a source system employs, the checksum process might be built-in.

Employing file integrity checks will help you cross-check that the files from the source system match what was delivered to Openbridge. While the use of MD5 is the default, Openbridge can support other checksum commands when sending data to us:

  • XCRC (requests CRC32 digest/checksum)

  • XSHA/XSHA1 (requests SHA1 digest/checksum)

  • XSHA256 (requests SHA256 digest/checksum)

  • XSHA512 (requests SHA512 digest/checksum)

If you need assistance with any of these, please contact Openbridge Support.

Is my account blocked or banned?

Openbridge employs a dynamic "ban" list that prevents the banned user or host from logging in to the server. This will occur if our system detects four incorrect login attempts. The ban will last approximately 30 minutes when you can attempt to log in again. If you continue to have difficulties, please contact support.

Openbridge also employs a DNSBL (a 'Blocklist"). This database is queried in real-time to obtain an opinion on the origin of incoming hosts. The role of a DNSBL is to assess whether a particular IP Address meets the acceptance policies of inbound connections. DNSBL is often used by email, web, and other network services to determine and reject addresses known to be sources of spam, phishing, and other unwanted behavior.

More information on DNS blacklists can be found here:(http://en.wikipedia.org/wiki/DNSBL)

Why did my connection time out?

Openbridge sets the maximum number of seconds a connection between our server and your client can exist after the client has successfully authenticated. This is typically 10 minutes or 600 seconds in length. If you are idle for longer than this time allotment, the server will think you are finished and disconnect your client. If this occurs, you will simply need to reconnect.

Did this answer your question?