All Collections
Data Sources
Adobe
How To Automate Adobe Data Feed Integration To Amazon Web Services, Google Cloud, or Azure
How To Automate Adobe Data Feed Integration To Amazon Web Services, Google Cloud, or Azure
Openbridge Support avatar
Written by Openbridge Support
Updated over a week ago

The Openbridge Adobe data feed integration is focused on automating the organization, processing, and routing of raw Adobe data feeds to a destination warehouse. This Adobe pipeline provides a fully automated workflow to a target warehouse.

What are Adobe Data Feeds?

Per Adobe, data feeds reflect "data collected from websites, mobile apps, or uploaded using web service APIs or data sources, is processed and stored in Adobe's Data Warehouse".  The raw Adobe clickstream data forms the underlying data set that is used by the Adobe Analytics user interface.

However, not everyone wants to be limited to the insights within an Adobe user interface. There are situations where companies want to leverage business intelligence software from Tableau, Microsoft, Looker, and many others. If you are one of those companies looking to harness the power of Adobe data feeds, this service is for you.

What is included in an Adobe Data Feed?

Adobe delivers raw data to the customer in a batched manner on a recurring daily or hourly delivery schedule. Each Adobe data feed comprises raw data for a single report suite (RSID). The data deliverables consist of three file classes:

  • Manifest file: A text file with information regarding the contents of the delivery as a whole

  • Lookup data: A TAR containing several files which contain supplemental data

  • Hit data: The file which contains all relevant hit data.

The delivery from Adobe typically is organized like this:

/path/to/delivery/location/rsid_name_20180516-210000.txt
/path/to/delivery/location/rsid_name_20180516-210000-lookup_data.tar.gz
/path/to/delivery/location/rsid_name_20180516-210000.tsv.gz
/path/to/delivery/location/rsid_name_20180517-210000.txt
/path/to/delivery/location/rsid_name_20180517-210000-lookup_data.tar.gz
/path/to/delivery/location/rsid_name_20180517-210000.tsv.gz

Resident within the compressed archives is a collection of data files like these:

  • hit_data

  • browser_type

  • browser

  • color_depth

  • connection_type

  • country

  • hit_data

  • event

  • javascript_version

  • languages

  • operating_systems

  • plugins

  • referrer_type

  • resolution

  • search_engines

Depending on your schedule (hourly or daily), the volume of files will vary. However, regardless of your schedule the data feeds could be more organized, especially if they need to get loaded into a warehouse. This is where the Openbridge Adobe Data Feed preprocessor comes in.

How The Openbridge Adobe Data Feed Integration Works

The Openbridge Adobe data feed preprocessor aims to monitor, process, route and deliver Adobe data feeds to downstream services for loading into a destination warehouse.

Step 1: Define Your Adobe Data Feeds

In this step, you define the data you want as part of your data feed. These activities are completed within the Adobe Analytics admin interface. 

Note: A pipeline can support multiple RSIDs assuming they are homogeneous (i.e. sharing the same structure and types). The number of RSIDs to be processed increases processing time largely due to increased testing and data volumes needed to ensure requirements are met.

Step 2: Configure Adobe Data Feed Delivery Location

Adobe needs to deliver data to an external location. In this step you will tell them where you want it to go.

Step 3: Loading Adobe Data To A Destination Warehouse

The Openbridge pipeline handles loading to a destination warehouse. This includes unzips and splits of the zipped lookup data file into individual folders. This ensures that each lookup file can be routed for loading into their respective tables in the destination warehouse.

While steps 1 and 2 got everything set up and organized, you will want to have the data loaded into a target warehouse. The process is simple and only takes a few minutes to complete.

  • Setup an Openbridge account: https://www.openbridge.com/register

  • Activate the destination warehouse at Google or Amazon. For example, you can setting up BigQuery, Redshift, Amazon Athena, or have us output to Apache Parquet.

  • Register the warehouse service within Openbridge. Here is an example if how to register BigQuery within Openbridge.  

  • You must request Openbridge activate your S3 destination and configure the Adobe data feed deliveries to get processed and routed to your target warehouse location.

How Data Gets Loaded Into Your Warehouse

Adobe data will be processed and routed to the destination warehouse. This will result in 14 initial source tables and 14 roll-up views:

  • adobe_hit_data

  • adobe_browser_type

  • adobe_browser

  • adobe_color_depth

  • adobe_connection_type

  • adobe_country

  • adobe_event

  • adobe_javascript_version

  • adobe_languages

  • adobe_operating_systems

  • adobe_plugins

  • adobe_referrer_type

  • adobe_resolution

  • adobe_search_engines

This document describes in more detail how we structure the data to change the management of Adobe data.

Historical Data

Historical data should be on its own path once the daily data is flowing. The simplest approach is to deliver historical data into your go-forward pipeline. With an operation go-forward pipeline, you can push historical data into it. 

Be cognizant that for one RSID, covering 2 years, there will be 730 timestamped exports which equals about 10,000+ data files.

  • Define the history to be retroactively delivered. How far back do you want to go?

  • Define the period of each delivery; typically, we suggest a monthly block. This is where you export a month and then validate all files for a month to meet expectations. This will get you 30 or so unique deliveries for each month.  Once a month is complete, continue to move back monthly until there is no more historical data. We also recommend a single daily export (versus multiple exports per day).

  • Validate each tranche of deliveries allowing you "sign off" on each delivery block.

Typically you would establish windows/blocks of time to process and validate historical deliveries.  For example, working back in time monthly until you have all the possible historical data exported. Depending on the volume of historical data, this can take a while to complete.

Using the go-forward pipeline assumes that historical data is aligned with go-forward data. Additional effort to fuse the different data sets would be needed if your historical data varies in structure and type. This may often be the first time teams see older data and realize minor (or significant) QA issues. These errors may or may not impact loading, and lower-level QA/QC may need to occur.

Timing

Adobe Deliveries

According to Adobe, before processing hourly or daily data, data feeds wait until all the hits that entered data collection within the timeframe (day or hour) have been settled in their system. Once that occurs, Adobe collects data within the selected timeframe, compresses it, and sends it to Amazon S3.

For hourly feeds, files are typically sent to the data warehouse within 15-30 minutes after the hour. However, please keep in mind Adobe doesn't give you a guaranteed delivery time period. If there was no data within the selected timeframe, the Adobe process tries again during the next window. The Adobe data feed uses the `date_time` field to determine which hits belong to the hour. This field is based on the time zone of the report suite.

Openbridge

Openbridge will be processing data upon receipt of the data. Depending of the number of records and the width of the data (i.e. number of columns) it can take anywhere from 5 to 60 minutes for data to be loaded to your destination warehouse.

Frequently Asked Questions

What are examples of post-processing activities once Adobe data is resident in your warehouse?

  • Create custom "lookup" tables. For example, based on a site tagging plan or media plans, create lookups for campaign/creative IDs that allow you to cross-reference any campaign/creative variables that may be referenced in the Adobe data.

  • Deploy on-premise BigQuery data export applications. For example, we have deployed tools like this within a client environment that remotely connects to BigQuery, exports data, and then transfers it to an on-premise location.

  • Create custom "pre-processing" workflows to clean up or "fix" bad data within go-forward data, historical data, or both.

  • Retain data beyond our standard data retention policy of 14 days. This is typically adequate when teams are pulling data into their on-premise systems. However, we can offer longer policies, including no expiration on how long we hold the data. This may be desirable for those teams that want to use us as a longer-term data lake of Adobe data.

  • Undertake detailed QA/QC of data feeds to validate alignment with site tagging.

  • Transform the Adobe data to comply with customer requirements for various business intelligence or data visualization tools. For example, a customer may want to use native Tableau/Power BI to calculate metrics or set data types directly in those tools rather than manipulate the data.  In other cases, a customer may prefer pre-processed views in a database to accomplish those calculations rather than use Tableau or Power BI. Create special overlays on top of the data. An overlay may be a process that pre-calculates specific views or segments.

What is the table structure like for the loaded Adobe data?

As part of this service, we can generate a collection of database tables/views that follow Adobe packaging of data feeds. The following reflects the current feeds:

  • adobe_hit_data

  • adobe_browser_type

  • adobe_browser

  • adobe_color_depth

  • adobe_connection_type

  • adobe_country

  • adobe_event

  • adobe_javascript_version

  • adobe_languages

  • adobe_operating_systems

  • adobe_plugins

  • adobe_referrer_type

  • adobe_resolution

  • adobe_search_engines

Each of the feeds will have a source `_v1, v2, v3,...` table and a `master` view:

  • adobe_hit_data_master

  • adobe_hit_data_v1

  • adobe_hit_data_v2

  • adobe_hit_data_v3

  • adobe_country_master

  • adobe_country_v1

  • adobe_browser_master

  • adobe_browser_v1

This document describes in more detail how we structure the data to simply change management.

Can I create custom views or models of the Adobe data?

Yes, if any specialized calculations or models need to be created for your Adobe data feeds, you can define them as views in your warehouse.  We provide some examples below. If you need help with an effort like this, please reach out to our expert services team.

What are examples of custom-calculated views?

Adobe describes how to calculate common metrics using data feeds. They define a set of common calculated metrics that can be an overlay on top of the Adobe data feeds. The following are some formulas for common metrics. Please note that while instructions to calculate several common metrics are defined, the actual implementation requires customization based on user requirements.

Page Views

Page views can be calculated by counting when there is either a value in post_pagename or post_page_url. You can use similar logic to count custom links:

Visits

To get to Visits, you need to exclude all rows where exclude_hit > 0 and any rows with hit_source = 5,7,8,9. Next, combine post_visid_high, post_visid_low, visit_num, and visit_start_time_gmt*. Count the unique number of combinations.

Visitors

Similar to Visits, you will want to Exclude all rows where exclude_hit > 0 and all rows with hit_source = 5,7,8,9. Combine post_visid_high with post_visid_low. Count the unique number of combinations.

Notes

In rare circumstances, internet irregularities, system irregularities, or the use of custom visitor IDs can result in duplicate visit_num values for the same visitor ID that are not the same visit. To avoid resulting issues, also include visit_start_time_gmt when counting visits.

When hit_source = 5, 8, and 9 these reflect summary rows uploaded using data sources. hit_source = 7 represents transaction ID data source uploads that should not be included in visit and visitor counts.

Can you provide an example of a custom view?

Below is an example of the logic needed to fuse Adobe data with Google Analytics data. This creates a calculated, unified, and easy to query ga_channeldetail_bounce view for a Tableau or PowerBI user.

CREATE OR REPLACE VIEW ga_channeldetail_bounce
(
  date,
  rsid,
  last_touch_channel,
  last_touch_channel_detail,
  visits,
  unique_visitors,
  page_views,
  time_spent_on_page,
  product_interactions,
  buy_now_clicks,
  where_to_buy,
  video_views,
  game_starts,
  source
)
AS
 SELECT DISTINCT bounce_visits.date::timestamp without time zone AS date, ((bounce_visits.dimension2::text || ' '::character varying::text) || bounce_visits.dimension3::text)::character varying AS rsid, bounce_visits.trafficsource_medium AS last_touch_channel, bounce_visits.trafficsource_source AS last_touch_channel_detail, count(DISTINCT (bounce_visits.visitid::character varying::text || ' '::character varying::text) || bounce_visits.fullvisitorid::text) AS visits, count(DISTINCT bounce_visits.fullvisitorid) AS unique_visitors, sum(
        CASE
            WHEN bounce_visits.hits_type::text = 'PAGE'::character varying::text THEN 1
            ELSE 0
        END) AS page_views, sum(bounce_visits.totals_timeonsite::numeric::numeric(18,0) / bounce_visits.totals_hits::numeric::numeric(18,0)) AS time_spent_on_page, COALESCE(sum(COALESCE(bounce_visits.metric1::integer, 0) + COALESCE(bounce_visits.metric2::integer, 0)), 0::bigint) AS product_interactions,, sum(
        CASE
            WHEN bounce_visits.hits_eventinfo_eventcategory::text = 'eCommerce Lead:SHOP'::character varying::text THEN 1
            ELSE 0
        END) AS buy_now_clicks, sum(
        CASE
            WHEN bounce_visits.hits_eventinfo_eventcategory::text = 'eCommerce Lead:BUY'::character varying::text OR bounce_visits.hits_eventinfo_eventcategory::text = 'eCommerce Lead:Buy'::character varying::text THEN 1
            ELSE 0
        END) AS where_to_buy, sum(
        CASE
            WHEN bounce_visits.hits_eventinfo_eventcategory::text ~~ 'Video:%'::character varying::text AND bounce_visits.hits_eventinfo_eventaction::text = 'Video Play'::character varying::text THEN 1
            ELSE 0
        END) AS video_views, sum(
        CASE
            WHEN bounce_visits.hits_eventinfo_eventcategory::text ~~ 'Game:%'::character varying::text AND (bounce_visits.hits_eventinfo_eventaction::text = 'play'::character varying::text OR bounce_visits.hits_eventinfo_eventaction::text = 'click start'::character varying::text OR bounce_visits.hits_eventinfo_eventaction::text = '70'::character varying::text OR bounce_visits.hits_eventinfo_eventaction::text = 'click play'::character varying::text OR bounce_visits.hits_eventinfo_eventaction::text = 'start'::character varying::text OR bounce_visits.hits_eventinfo_eventaction::text = 'launch'::character varying::text) THEN 1
            ELSE 0
        END) AS game_starts, 'ga_channeldetail_bounce'::character varying AS source
   FROM ( SELECT ga_360_core.date, ga_360_core.dimension2, ga_360_core.dimension3, ga_360_core.dimension4, ga_360_core.dimension5, ga_360_core.visitid, ga_360_core.fullvisitorid, ga_360_core.metric1, ga_360_core.metric2, ga_360_core.hits_type, ga_360_core.hits_eventinfo_eventcategory, ga_360_core.hits_eventinfo_eventaction, ga_360_core.totals_timeonsite, ga_360_core.totals_hits, ga_360_core.trafficsource_medium, ga_360_core.trafficsource_source
           FROM ga_360_core
      JOIN ( SELECT DISTINCT (ga_360_core.visitid::character varying::text || ' '::character varying::text) || ga_360_core.fullvisitorid::text AS visit
                   FROM ga_360_core
                  WHERE ga_360_core.totals_bounces::text = '1'::character varying::text) bounce_hits ON ((ga_360_core.visitid::character varying::text || ' '::character varying::text) || ga_360_core.fullvisitorid::text) = bounce_hits.visit) bounce_visits
  GROUP BY bounce_visits.date, (bounce_visits.dimension2::text || ' '::character varying::text) || bounce_visits.dimension3::text, (bounce_visits.dimension4::text || ' '::character varying::text) || bounce_visits.dimension5::text, bounce_visits.trafficsource_medium, bounce_visits.trafficsource_source, bounce_visits.metric1;GRANT SELECT, DELETE, UPDATE, TRIGGER, RULE, REFERENCES, INSERT ON ga_channeldetail_bounce TO steve;
GRANT SELECT ON ga_channeldetail_bounce TO read_only_users;
COMMIT;

Do you test and validate the Adobe data?

Yes, we do a test and validate the data that is provided to us by Adobe. However, this is somewhat limited in scope. We are only testing and validating the format, structure, and type of data sent. If there are issues within the data related to bad tagging, implementation errors, or other data quality problems, we are not checking for this. This level of testing requires lower level of quality control efforts.

Here are a couple of suggestions in advance of setting up a pipeline:

  • Download a test file from Adobe. QA/QC the source Adobe files versus what is expected.  Audit discrepancies and errors vs a tagging plan.

  • Connect Power BI or Tableau to your sample data. Run a collection of test queries to get a sense of what data is resident in your Adobe feed.

How should I configure my data feed deliveries?

When setting up your data feeds, ALWAYS make sure you are including your report suite ID (RSID) as one of the columns in the data. 

In the event you are routing multiple report suites to the same warehouse, the RSID will allow you to query and filter records based on this ID. Without RSID in the data, there would be no way to discern a record from one report suite from another. This is especially important for those who are using roll-up reports.

How many data feeds can I configure for a report suite?

The right answer should be one, but Adobe UI allows you to configure more than one. If you attempt to set up more than one, expect problems. Why? Data feed file names comprise the report suite ID and the date. Any feeds configured for the same RSID and date(s) will have the same file name. If those feeds are delivered to the exact location, one file would overwrite the other. To prevent a file overwrite, you cannot create a feed that has the potential to overwrite an existing feed in the same location.

Why is Adobe reporting delivery errors?

If you receive these delivery errors, consider a few possible reasons;

  • Did the delivery path change?

  • Did the report suite change?

  • Did the Amazon S3 bucket change?  

  • Did your Amazon S3 authorization change or get revoked?

How do I view the transfer activity for Adobe data feeds?

To review the status of your data feeds, you need to review the job history in the Adobe UI. The job history will hold the last 30 days' activity for all report suites. To view the history, select one or more feeds. Next, click Job History. The Jobs page opens and displays the history for the selected feeds.

Can I use my tagging plan for QA/QC?

Yes, the contents of your feeds will be influenced by tagging. An often overlooked quality control issue is poorly or inconsistently implemented tags for an RSID. This can introduce special characters, odd data types, and other issues that can conflict with rules and reserved (delimiter) characters specified by Adobe. These issues can cause load errors in a target warehouse due to data type or rule mismatches.

References

Did this answer your question?