The Openbridge Adobe data feed integration is focused on automating the organization, processing, and routing of raw Adobe data feeds to a destination warehouse. This Adobe pipeline provides a fully automated workflow to a target warehouse.
What are Adobe Data Feeds?
Per Adobe, data feeds reflect "data collected from websites, mobile apps, or uploaded using web service APIs or data sources, is processed and stored in Adobe's Data Warehouse". The raw Adobe clickstream data forms the underlying data set that is used by the Adobe Analytics user interface.
However, not everyone wants to be limited to the insights within an Adobe user interface. There are situations where companies want to leverage business intelligence software from Tableau, Microsoft, Looker, and many others. If you are one of those companies looking to harness the power of Adobe data feeds, this service is for you.
What is included in an Adobe Data Feed?
Adobe delivers raw data to the customer in a batched manner on a recurring daily or hourly delivery schedule. Each Adobe data feed comprises raw data for a single report suite (RSID). The data deliverables consist of three file classes:
Manifest file: A text file with information regarding the contents of the delivery as a whole
Lookup data: A TAR containing several files which contain supplemental data
Hit data: The file which contains all relevant hit data.
The delivery from Adobe typically is organized like this:
/path/to/delivery/location/rsid_name_20180516-210000.txt
/path/to/delivery/location/rsid_name_20180516-210000-lookup_data.tar.gz
/path/to/delivery/location/rsid_name_20180516-210000.tsv.gz
/path/to/delivery/location/rsid_name_20180517-210000.txt
/path/to/delivery/location/rsid_name_20180517-210000-lookup_data.tar.gz
/path/to/delivery/location/rsid_name_20180517-210000.tsv.gz
Resident within the compressed archives is a collection of data files like these:
hit_data
browser_type
browser
color_depth
connection_type
country
hit_data
event
javascript_version
languages
operating_systems
plugins
referrer_type
resolution
search_engines
Depending on your schedule (hourly or daily), the volume of files will vary. However, regardless of your schedule the data feeds could be more organized, especially if they need to get loaded into a warehouse. This is where the Openbridge Adobe Data Feed preprocessor comes in.
How The Openbridge Adobe Data Feed Integration Works
The Openbridge Adobe data feed preprocessor aims to monitor, process, route and deliver Adobe data feeds to downstream services for loading into a destination warehouse.
Step 1: Define Your Adobe Data Feeds
In this step, you define the data you want as part of your data feed. These activities are completed within the Adobe Analytics admin interface.
Note: A pipeline can support multiple RSIDs assuming they are homogeneous (i.e. sharing the same structure and types). The number of RSIDs to be processed increases processing time largely due to increased testing and data volumes needed to ensure requirements are met.
Step 2: Configure Adobe Data Feed Delivery Location
Adobe needs to deliver data to an external location. In this step you will tell them where you want it to go.
Step 3: Loading Adobe Data To A Destination Warehouse
The Openbridge pipeline handles loading to a destination warehouse. This includes unzips and splits of the zipped lookup data file into individual folders. This ensures that each lookup file can be routed for loading into their respective tables in the destination warehouse.
While steps 1 and 2 got everything set up and organized, you will want to have the data loaded into a target warehouse. The process is simple and only takes a few minutes to complete.
Setup an Openbridge account: https://www.openbridge.com/register
Activate the destination warehouse at Google or Amazon. For example, you can setting up BigQuery, Redshift, Amazon Athena, or have us output to Apache Parquet.
Register the warehouse service within Openbridge. Here is an example if how to register BigQuery within Openbridge.
You must request Openbridge activate your S3 destination and configure the Adobe data feed deliveries to get processed and routed to your target warehouse location.
How Data Gets Loaded Into Your Warehouse
Adobe data will be processed and routed to the destination warehouse. This will result in 14 initial source tables and 14 roll-up views:
adobe_hit_data
adobe_browser_type
adobe_browser
adobe_color_depth
adobe_connection_type
adobe_country
adobe_event
adobe_javascript_version
adobe_languages
adobe_operating_systems
adobe_plugins
adobe_referrer_type
adobe_resolution
adobe_search_engines
This document describes in more detail how we structure the data to change the management of Adobe data.
Historical Data
Historical data should be on its own path once the daily data is flowing. The simplest approach is to deliver historical data into your go-forward pipeline. With an operation go-forward pipeline, you can push historical data into it.
Be cognizant that for one RSID, covering 2 years, there will be 730 timestamped exports which equals about 10,000+ data files.
Define the history to be retroactively delivered. How far back do you want to go?
Define the period of each delivery; typically, we suggest a monthly block. This is where you export a month and then validate all files for a month to meet expectations. This will get you 30 or so unique deliveries for each month. Once a month is complete, continue to move back monthly until there is no more historical data. We also recommend a single daily export (versus multiple exports per day).
Validate each tranche of deliveries allowing you "sign off" on each delivery block.
Typically you would establish windows/blocks of time to process and validate historical deliveries. For example, working back in time monthly until you have all the possible historical data exported. Depending on the volume of historical data, this can take a while to complete.
Using the go-forward pipeline assumes that historical data is aligned with go-forward data. Additional effort to fuse the different data sets would be needed if your historical data varies in structure and type. This may often be the first time teams see older data and realize minor (or significant) QA issues. These errors may or may not impact loading, and lower-level QA/QC may need to occur.
Timing
Adobe Deliveries
According to Adobe, before processing hourly or daily data, data feeds wait until all the hits that entered data collection within the timeframe (day or hour) have been settled in their system. Once that occurs, Adobe collects data within the selected timeframe, compresses it, and sends it to Amazon S3.
For hourly feeds, files are typically sent to the data warehouse within 15-30 minutes after the hour. However, please keep in mind Adobe doesn't give you a guaranteed delivery time period. If there was no data within the selected timeframe, the Adobe process tries again during the next window. The Adobe data feed uses the `date_time` field to determine which hits belong to the hour. This field is based on the time zone of the report suite.
Openbridge
Openbridge will be processing data upon receipt of the data. Depending of the number of records and the width of the data (i.e. number of columns) it can take anywhere from 5 to 60 minutes for data to be loaded to your destination warehouse.
Frequently Asked Questions
What are examples of post-processing activities once Adobe data is resident in your warehouse?
Create custom "lookup" tables. For example, based on a site tagging plan or media plans, create lookups for campaign/creative IDs that allow you to cross-reference any campaign/creative variables that may be referenced in the Adobe data.
Deploy on-premise BigQuery data export applications. For example, we have deployed tools like this within a client environment that remotely connects to BigQuery, exports data, and then transfers it to an on-premise location.
Create custom "pre-processing" workflows to clean up or "fix" bad data within go-forward data, historical data, or both.
Retain data beyond our standard data retention policy of 14 days. This is typically adequate when teams are pulling data into their on-premise systems. However, we can offer longer policies, including no expiration on how long we hold the data. This may be desirable for those teams that want to use us as a longer-term data lake of Adobe data.
Undertake detailed QA/QC of data feeds to validate alignment with site tagging.
Transform the Adobe data to comply with customer requirements for various business intelligence or data visualization tools. For example, a customer may want to use native Tableau/Power BI to calculate metrics or set data types directly in those tools rather than manipulate the data. In other cases, a customer may prefer pre-processed views in a database to accomplish those calculations rather than use Tableau or Power BI. Create special overlays on top of the data. An overlay may be a process that pre-calculates specific views or segments.
What is the table structure like for the loaded Adobe data?
As part of this service, we can generate a collection of database tables/views that follow Adobe packaging of data feeds. The following reflects the current feeds:
adobe_hit_data
adobe_browser_type
adobe_browser
adobe_color_depth
adobe_connection_type
adobe_country
adobe_event
adobe_javascript_version
adobe_languages
adobe_operating_systems
adobe_plugins
adobe_referrer_type
adobe_resolution
adobe_search_engines
Each of the feeds will have a source `_v1, v2, v3,...` table and a `master` view:
adobe_hit_data_master
adobe_hit_data_v1
adobe_hit_data_v2
adobe_hit_data_v3
adobe_country_master
adobe_country_v1
adobe_browser_master
adobe_browser_v1
This document describes in more detail how we structure the data to simply change management.
Can I create custom views or models of the Adobe data?
Yes, if any specialized calculations or models need to be created for your Adobe data feeds, you can define them as views in your warehouse. We provide some examples below. If you need help with an effort like this, please reach out to our expert services team.
What are examples of custom-calculated views?
Adobe describes how to calculate common metrics using data feeds. They define a set of common calculated metrics that can be an overlay on top of the Adobe data feeds. The following are some formulas for common metrics. Please note that while instructions to calculate several common metrics are defined, the actual implementation requires customization based on user requirements.
Page Views
Page views can be calculated by counting when there is either a value in post_pagename
or post_page_url
. You can use similar logic to count custom links:
Visits
To get to Visits, you need to exclude all rows where exclude_hit > 0
and any rows with hit_source = 5,7,8,9
. Next, combine post_visid_high
, post_visid_low
, visit_num
, and visit_start_time_gmt*
. Count the unique number of combinations.
Visitors
Similar to Visits, you will want to Exclude all rows where exclude_hit > 0
and all rows with hit_source = 5,7,8,9
. Combine post_visid_high
with post_visid_low
. Count the unique number of combinations.
Notes
In rare circumstances, internet irregularities, system irregularities, or the use of custom visitor IDs can result in duplicate visit_num values for the same visitor ID that are not the same visit. To avoid resulting issues, also include visit_start_time_gmt
when counting visits.
When hit_source = 5, 8, and 9
these reflect summary rows uploaded using data sources. hit_source = 7
represents transaction ID data source uploads that should not be included in visit and visitor counts.
Can you provide an example of a custom view?
Below is an example of the logic needed to fuse Adobe data with Google Analytics data. This creates a calculated, unified, and easy to query ga_channeldetail_bounce
view for a Tableau or PowerBI user.
CREATE OR REPLACE VIEW ga_channeldetail_bounce
(
date,
rsid,
last_touch_channel,
last_touch_channel_detail,
visits,
unique_visitors,
page_views,
time_spent_on_page,
product_interactions,
buy_now_clicks,
where_to_buy,
video_views,
game_starts,
source
)
AS
SELECT DISTINCT bounce_visits.date::timestamp without time zone AS date, ((bounce_visits.dimension2::text || ' '::character varying::text) || bounce_visits.dimension3::text)::character varying AS rsid, bounce_visits.trafficsource_medium AS last_touch_channel, bounce_visits.trafficsource_source AS last_touch_channel_detail, count(DISTINCT (bounce_visits.visitid::character varying::text || ' '::character varying::text) || bounce_visits.fullvisitorid::text) AS visits, count(DISTINCT bounce_visits.fullvisitorid) AS unique_visitors, sum(
CASE
WHEN bounce_visits.hits_type::text = 'PAGE'::character varying::text THEN 1
ELSE 0
END) AS page_views, sum(bounce_visits.totals_timeonsite::numeric::numeric(18,0) / bounce_visits.totals_hits::numeric::numeric(18,0)) AS time_spent_on_page, COALESCE(sum(COALESCE(bounce_visits.metric1::integer, 0) + COALESCE(bounce_visits.metric2::integer, 0)), 0::bigint) AS product_interactions,, sum(
CASE
WHEN bounce_visits.hits_eventinfo_eventcategory::text = 'eCommerce Lead:SHOP'::character varying::text THEN 1
ELSE 0
END) AS buy_now_clicks, sum(
CASE
WHEN bounce_visits.hits_eventinfo_eventcategory::text = 'eCommerce Lead:BUY'::character varying::text OR bounce_visits.hits_eventinfo_eventcategory::text = 'eCommerce Lead:Buy'::character varying::text THEN 1
ELSE 0
END) AS where_to_buy, sum(
CASE
WHEN bounce_visits.hits_eventinfo_eventcategory::text ~~ 'Video:%'::character varying::text AND bounce_visits.hits_eventinfo_eventaction::text = 'Video Play'::character varying::text THEN 1
ELSE 0
END) AS video_views, sum(
CASE
WHEN bounce_visits.hits_eventinfo_eventcategory::text ~~ 'Game:%'::character varying::text AND (bounce_visits.hits_eventinfo_eventaction::text = 'play'::character varying::text OR bounce_visits.hits_eventinfo_eventaction::text = 'click start'::character varying::text OR bounce_visits.hits_eventinfo_eventaction::text = '70'::character varying::text OR bounce_visits.hits_eventinfo_eventaction::text = 'click play'::character varying::text OR bounce_visits.hits_eventinfo_eventaction::text = 'start'::character varying::text OR bounce_visits.hits_eventinfo_eventaction::text = 'launch'::character varying::text) THEN 1
ELSE 0
END) AS game_starts, 'ga_channeldetail_bounce'::character varying AS source
FROM ( SELECT ga_360_core.date, ga_360_core.dimension2, ga_360_core.dimension3, ga_360_core.dimension4, ga_360_core.dimension5, ga_360_core.visitid, ga_360_core.fullvisitorid, ga_360_core.metric1, ga_360_core.metric2, ga_360_core.hits_type, ga_360_core.hits_eventinfo_eventcategory, ga_360_core.hits_eventinfo_eventaction, ga_360_core.totals_timeonsite, ga_360_core.totals_hits, ga_360_core.trafficsource_medium, ga_360_core.trafficsource_source
FROM ga_360_core
JOIN ( SELECT DISTINCT (ga_360_core.visitid::character varying::text || ' '::character varying::text) || ga_360_core.fullvisitorid::text AS visit
FROM ga_360_core
WHERE ga_360_core.totals_bounces::text = '1'::character varying::text) bounce_hits ON ((ga_360_core.visitid::character varying::text || ' '::character varying::text) || ga_360_core.fullvisitorid::text) = bounce_hits.visit) bounce_visits
GROUP BY bounce_visits.date, (bounce_visits.dimension2::text || ' '::character varying::text) || bounce_visits.dimension3::text, (bounce_visits.dimension4::text || ' '::character varying::text) || bounce_visits.dimension5::text, bounce_visits.trafficsource_medium, bounce_visits.trafficsource_source, bounce_visits.metric1;GRANT SELECT, DELETE, UPDATE, TRIGGER, RULE, REFERENCES, INSERT ON ga_channeldetail_bounce TO steve;
GRANT SELECT ON ga_channeldetail_bounce TO read_only_users;
COMMIT;
Do you test and validate the Adobe data?
Yes, we do a test and validate the data that is provided to us by Adobe. However, this is somewhat limited in scope. We are only testing and validating the format, structure, and type of data sent. If there are issues within the data related to bad tagging, implementation errors, or other data quality problems, we are not checking for this. This level of testing requires lower level of quality control efforts.
Here are a couple of suggestions in advance of setting up a pipeline:
Download a test file from Adobe. QA/QC the source Adobe files versus what is expected. Audit discrepancies and errors vs a tagging plan.
Connect Power BI or Tableau to your sample data. Run a collection of test queries to get a sense of what data is resident in your Adobe feed.
How should I configure my data feed deliveries?
When setting up your data feeds, ALWAYS make sure you are including your report suite ID (RSID) as one of the columns in the data.
In the event you are routing multiple report suites to the same warehouse, the RSID will allow you to query and filter records based on this ID. Without RSID in the data, there would be no way to discern a record from one report suite from another. This is especially important for those who are using roll-up reports.
How many data feeds can I configure for a report suite?
The right answer should be one, but Adobe UI allows you to configure more than one. If you attempt to set up more than one, expect problems. Why? Data feed file names comprise the report suite ID and the date. Any feeds configured for the same RSID and date(s) will have the same file name. If those feeds are delivered to the exact location, one file would overwrite the other. To prevent a file overwrite, you cannot create a feed that has the potential to overwrite an existing feed in the same location.
Why is Adobe reporting delivery errors?
If you receive these delivery errors, consider a few possible reasons;
Did the delivery path change?
Did the report suite change?
Did the Amazon S3 bucket change?
Did your Amazon S3 authorization change or get revoked?
How do I view the transfer activity for Adobe data feeds?
To review the status of your data feeds, you need to review the job history in the Adobe UI. The job history will hold the last 30 days' activity for all report suites. To view the history, select one or more feeds. Next, click Job History
. The Jobs page opens and displays the history for the selected feeds.
Can I use my tagging plan for QA/QC?
Yes, the contents of your feeds will be influenced by tagging. An often overlooked quality control issue is poorly or inconsistently implemented tags for an RSID. This can introduce special characters, odd data types, and other issues that can conflict with rules and reserved (delimiter) characters specified by Adobe. These issues can cause load errors in a target warehouse due to data type or rule mismatches.