Why isn't the data I posted to the pipeline loaded to the database?

If you don’t see data that you are expecting to see in one or more tables, there are a few steps you can take to diagnose the issue…

Verify File Delivery

Verify that the file(s) containing the data in question was successfully posted to the SFTP location. Most SFTP software includes search functionality that allows you to search the relevant folder for the filename or a portion of the filename (e.g. filename contains ‘foo’)

Resolution: If the file was not posted to the SFTP location, attempt to re-post (or re-request) the data and check the Redshift table to see if it has loaded. Depending on the size of the file and other files in the processing queue, this could take anywhere from a few seconds to several minutes.

Verify The File Contents

If the file was successfully posted to the SFTP location, download a copy of the file to your desktop and open the file in a text editor (e.g. Notepad) or Excel to check the following issues that could result in a load failure:

  1. Does the file contain data? Sometimes a file may be empty

  2. Is the file layout (i.e. columns and column order) the same as its corresponding warehouse table?

  3. Is the file delimited properly (e.g. tab or comma-quote) and are the delimiters consistent with the initial load file for the table?

  4. Are the values for each field formatted properly (e.g. text, number, date) and consistent with the initial load file for the table?

  5. Are there special characters present in the data? Here is an example where a column contains URLs with special characters. Here is a sample of a URL present in the data:

https://www.domain.com/?id=Email-_-AZXCSQWWW-_-None-_-CAV-_XX&src=1232123&utm_medium=Email&utm_source=News&utm_campaign=434333&utm_content=Nav&j=%%job%%&sfmc_sub=%%SubscriberID%%&u=%%JobURLID%%

In this URL the presence of various special characters can cause the data trainer to take a long time to infer the data resident in that column:

/ : _ - %

Resolution: Fix the error(s) in the file, re-post the file to the SFTP location with a new name (e.g. if the original file was named some_data_20150101.csv rename the new file to something like some_data_20150101_2.csv) and check the target warehouse table to see if it has been successfully loaded.

Still, having an issue?

If the file passes the above checks, please submit a support ticket by emailing support@openbridge.com so that a support agent can assist with the diagnosis and resolution of the load error. To facilitate the diagnostic efforts please be sure to include the following in the email:

  1. Warehouse table name with the missing data

  2. Criteria that can be used to identify the missing row(s) in the table (e.g. table ‘foo’ is missing data for the date = xx/xx/xxxx)

  3. Filename (and SFTP path name if known) for the unloaded data

The initial load file is the first file that is posted to a particular folder. This file is what generates and defines the properties for the respective warehouse table (field names, order, and formats). Subsequent files posted to a folder need to have the same specifications as the initial load file to be loaded successfully.

What happens if I post the same data more than once?

The Openbridge platform creates a unique id (ob_transaction_id) based on the hash for each row of data processed. If a row of data is subsequently processed with the same hash value, it will automatically be excluded from the load process for the table in question. However, if any field values differ, the hash will also be different and the data will be loaded. Let's look at an example...

Let's say on 12/11/15 you posted a file named 20151211_ad_performance.csv to a table named ad_performance:

"ADID","DATE","CLICKS","IMPRESSIONS","CAMPAIGNID"
"0123","December 10, 2015","12","120","A102B"
"4567","December 10, 2015","25","125","A904F"

The records loaded to the ad_performance table

You will see that the system fields 'ob_transaction_id' and ob_modified_date have been added to your table and represent the unique id for that record and the timestamp that the record was loaded to the table.

Then let's say the next day, 12/12/15 that you posted another file named 20151212_ad_performance.csv that included ad performance data for both 12/10/15 and 12/11/15:

"ADID","DATE","CLICKS","IMPRESSIONS","CAMPAIGNID"
"0123","December 10, 2015","12","120","A102B"
"0123","December 11, 2015","18","100","A102B"
"4567","December 10, 2015","25","125","A904F"
"4567","December 11, 2015","20","180","A904F"

The Openbridge platform will analyze this file and identify the 1st and 3rd records on this file as duplicate records because all field values are the same and exclude these records from the load process into the ad_performance table. 

However, let's say that your data changes slightly over time or reflects lifetime values for a particular metric and the file you posted on 12/12/15 looks like this...

"ADID","DATE","CLICKS","IMPRESSIONS","CAMPAIGNID"
"0123","December 10, 2015","12","121","A102B"
"0123","December 11, 2015","18","100","A102B"
"4567","December 10, 2015","25","125","A904F"
"4567","December 11, 2015","20","180","A904F"

In this case, the value in the impressions field for ADID on 12/10/15 changed from 120 to 121. As a result, the row of data will no longer have the same hash value as the row from the previous file and it will be loaded to the ad_performance table

As you can see, there are now two rows of data for ADID 0123 on 12/10/15 with different values for ob_transaction_id and ob_modified_date. Depending on how your reporting queries are structured this could result in overstating impressions for ADID 0123 on 12/10/15.

If your data changes over time, it is best practice to only load data for a particular date one time. If you are faced with this situation you have a couple of options...

  1. Create a view against the ad_performance table (or modify your query logic) so that it returns the row of data with the MAX (or MIN) ob_modified_date for each ADID and date combination.

  2. Contact Openbridge Support (support@openbridge.com) to create a customized view or you can delete the duplicate data from the ad_performance table based on ob_modified_date.

What happens if new columns are added (or removed) from my file?

If the number or order of fields in your file changes and it no longer matches the layout of the initial 'blueprint' file used to create the table, the new file will fail to load. If the change to the file layout delivered from your source is valid (i.e. not the result of an upstream processing error) you have a couple of options to resolve the issue.

  1. Follow the process described above to load the data to a new table. The resulting new table will have a new 'blueprint' and subsequent files that follow the new layout will load successfully. If you still need to reference the data from the 'old' table you can still do so. If necessary, you can create a view that merges data from both the old and the new table into one consolidated view. The view creation process is described in Amazon's documentation or contact Openbridge Support (support@openbridge.com) for assistance.

  2. You can contact Openbridge Support (support@openbridge.com) to modify the 'blueprint' for the original table to follow the revised file structure so that the new files will process successfully. Data for the new fields will only be populated going forward. The previously loaded data will contain null values for any of the new fields.

What if the name of one of my columns changes?

If only the name of the column on the source file changes (e.g. click_percent  becomes click_pct ) but the underlying data type and order of fields stay the same, the file should be processed successfully. The field name in the resulting warehouse table will continue to reflect the original field name. If for some reason you need to change the name of the field in the new table, contact Openbridge Support (support@openbridge.com).

What if the name (file mask) of my source file changes?

If only the naming convention of the source file changes (e.g. ad_performance_daily.csv  becomes ad_performance_yyyymmdd.csv ) but the underlying data type and order of fields stay the same, the file should be processed successfully.

What if my source files contain additional header and/or footer rows in addition to the column headers?

The Openbridge pipeline is designed to automatically load delimited text files with one header row. If the files delivered by your source system cannot be modified to remove extraneous header (or footer) rows you will need to contact Openbridge Support (support@openbridge.com) to modify the table blueprint to ignore those rows. For example, let's say your source files looks like this...

Report Name: Daily Advertising Performance Report
Date: 12/12/2015
Created By: John Doe    "ADID","DATE","CLICKS","IMPRESSIONS","CAMPAIGNID"
    "0123","December 10, 2015","12","120","A102B"
    "0123","December 11, 2015","18","100","A102B"
    "4567","December 10, 2015","25","125","A904F"
    "4567","December 11, 2015","20","180","A904F"
"Totals", "75", "525"

In this example, there are 4 additional rows above the column headers and a footer row with summarized totals for the CLICKS and IMPRESSIONS fields. The initial load of this file will fail because the first 4 rows do not contain valid delimiters. You do not want to load either the additional header or footer rows as it will impact your ability to query these fields in the Redshift table. We will need to modify the 'blueprint' for these files to ignore the first 4 rows and the last (footer) row.

Once that change has been implemented, as long as the source files follow the same layout and structure as the sample file above, they will successfully load on an automated basis.

What transfer protocols are supported?

Pipeline supports the SFTP transfer method. SFTP is integrated into many graphical tools or can be done directly via the command line.

FTPES and FTP protocols are not supported. FTP is an insecure protocol that should only be used in limited cases or on networks you trust. 

SFTP is based on the SSH2 protocol, which encodes activity over a secure channel. Unlike FTP, SSH2 only uses a single TCP connection and sends multiple transfers, or "channels", over that single connection.

Openbridge currently supports public key and password authentication for both SFTP and SCP file transfer protocols. We do NOT support any shell access. Also, not all SFTP commands are accepted. 

SSH Keys

If you need to use public-key authentication, please submit a support ticket to support@openbridge.com and we can set that up for you. 

Please be aware that the use of SSH keys does require setup, configuration, and validation. This is especially true for upstream systems like Adobe as they require a public key to be attached to each unique Openbridge account.  Normally it can take 24-48 hours to activate custom SSH keys for an account given the various requirements of each upstream system.  

How do I connect to the server?

Connection Details:

    Host:               pipeline-01.openbridge.io
    Port:               22 or 443
    Protocol:           SFTP
    User:               Provided separately
    Password:           Provided separately

NOTE: If you need a  host with a static IP address, you can use the host name of static.pipeline.openbridge.io .

I need to whitelist your IP address for our egress firewall rules. What should I use?

If your network blocks outbound connections and requires whitelisting the servers internal resources will connect with you can use these two IP address:

54.210.33.66  
52.54.179.84

Do you support FTP file transfers?

Openbridge does not support the use of FTP. We recognize that there are some systems that can only deliver data via FTP. For example, many of the Adobe export processes typically occur via FTP. However, it should be noted that the use of FTP offers no encryption for connection and data transport. Using the FTP protocol is regarded to be unsafe. It is therefore advisable to use SFTP connections to ensure that data is securely transferred.

If you need FTP access, please contact support.

Why am I having trouble connecting to the server?

Your corporate firewall may be blocking outbound connections on port 22 . If you are having connection difficulties, please make sure you can make outbound network connections for SFTP make sure outbound port:22 or port:443 is open. Either can be used to connect.

Why am I having trouble transferring files with certain extensions?

To help protect the integrity of the data sent to Openbridge, we do not allow the delivery of certain files. For example, you are not allowed to send a file called app.exe or my.sql. This reduces the potential for introducing unwanted or malicious software threats. The following is a sample of the blocked files:

    ade|adp|app|ai|asa|ashx|asmx|asp|bas|bat|cdx|cer|cgi|chm|class|cmd|com|config|cpl|crt|csh|dmg|doc|docx|dll|eps|exe|fxp|ftaccess|hlp|hta|htr|htaccess|htw|html|htm|ida|idc|idq|ins|isp|its|jse|ksh|lnk|mad|maf|mag|mam|maq|mar|mas|mat|mau|mav|maw|mda|mdb|mde|mdt|mdw|mdz|msc|msh|msh1|msh1xml|msh2|msh2xml|mshxml|msi|msp|mst|ops|pdf|php|php3|php4|php5|pcd|pif|prf|prg|printer|pst|psd|rar|reg|rem|scf|scr|sct|shb|shs|shtm|shtml|soap|stm|tgz|taz|url|vb|vbe|vbs|ws|wsc|wsf|wsh|xls|xlsx|xvd

If you attempt to send a file that matches a blocked file then the transfer will not be allowed.

Hidden files are not allowed on the server. Hidden files have a dot prefix . in the file name. For example, .file.txt or .foo would be considered hidden files and be blocked.

The following are examples of names using a dot prefix:

    .file.txt
    .file.csv
    .file
    ..file

If you attempt to send a file that contains a . prefix the transfer will not be allowed.

What is the /testing folder used for?

Normally data delivered to the SFTP server will be routed to the target data warehouse. However, there is one exception. Any files placed into a /testing directory will be ignored. By default, each user has a /testing folder generated for them upon login. This is a safe place to upload files without the risk of the data flowing downstream to the data warehouse.

How do I audit that my files have been delivered?

In most cases, file transfers occur without incident. However, there are situations where an error may arise. Troubleshooting an error can be a difficult and time-consuming endeavor, especially in cases where a system may be sending 100s or 1000s of files a day.

Openbridge does not have visibility into what should be sent from a source system. It only knows what was sent by the source system. For example, if there are 100 files in a source system and only 50 were sent, Openbridge is only aware of the 50 delivered files. We will not know that an additional 50 files were not delivered.

The source system should be tracking what was delivered and what was not delivered. We suggest that a manifest of files is maintained in the source system. This manifest would identify the files to be delivered and their state (success, failure, pending).

The manifest procedure allows the source system to recover from certain errors such as failure of the network, source/host system, or in the transfer process. In the event of an error, the source system would know to attempt redelivery for any file that had not received a successful 226  code from Openbridge.

How do I know if a file was transferred successfully?

Your client will normally be sent a response code of 226 to indicate a successful file transfer. However, there are other status codes that may be present. See below:

  • A code 226  is sent if the entire file was successfully received and stored

  • A code 331  is sent due to a login error. The name is okay, but you are missing a password.

  • A code 425  is sent if no TCP connection was established. This might mean the server is not available or some other network error. Change from PASV to PORT mode, check your firewall settings

  • A code 426  is sent if the TCP connection was established but then broken by the client or by network failure. Retry later.

  • A request with code 451 , 452 , or 552  if the server had trouble saving the file to disk. Retry later.

  • A 553  code means the requested action not taken because the file name sent is not allowed. Change the file name or delete spaces/special characters in the file name.

If you are stuck with an error condition, reach out to Openbridge Support (support@openbridge.com) for help.

Do you validate the integrity of the file once you receive it?

Yes, Openbridge uses checksum (or hash) validation to ensure the integrity of the file transfer. Openbridge uses a 128-bit MD5 checksum, which is represented by a 32-character hexadecimal number. This only applies to files once they are in our possession. We suggest that source systems also calculate MD5 checksum in advance of file delivery to increase the confidence that the integrity of the file to be delivered is met. Depending on the tools employed by a source system, the checksum process might be built-in.

Employing file integrity checks will help you cross-check that the files delivered from the source system match what was delivered to Openbridge. While the use of MD5 is the default, Openbridge can support other checksum commands when sending data to us:

  • XCRC (requests CRC32 digest/checksum)

  • XSHA/XSHA1 (requests SHA1 digest/checksum)

  • XSHA256 (requests SHA256 digest/checksum)

  • XSHA512 (requests SHA512 digest/checksum)

If you need assistance with any of these, please contact Openbridge Support.

Is my account blocked or banned?

Openbridge employs a dynamic "ban" list that prevents the banned user or host from logging in to the server. This will occur if our system detects 4 incorrect login attempts. The ban will last for approximately 30 minutes at which time you can attempt to log in again. If you continue to have difficulties please contact support.

Openbridge also employs a DNSBL (commonly known as a 'Blocklist"). This is a database that is queried in real time for the purpose of obtaining an opinion on the origin of incoming hosts. The role of a DNSBL is to assess whether a particular IP Address meets the acceptance policies of inbound connections. DNSBL is often used by email, web, and other network services for determining and rejecting addresses known to be sources of spam, phishing, and other unwanted behavior.

More information on DNS blacklists can be found here:(http://en.wikipedia.org/wiki/DNSBL)

Why did my connection timeout?

Openbridge sets the maximum number of seconds a connection between our server and your client can exist after the client has successfully authenticated. This is typically 10 minutes or 600 seconds in length. If you are idle for longer than this time allotment the server will think you are finished and disconnect your client. If this occurs you will simply need to reconnect.

Did this answer your question?