The Openbridge Adobe data feed integration is focused on automating the organization, processing and routing of raw Adobe data feeds to a destination warehouse. This Adobe pipeline provides a fully automated workflow to a target warehouse.
What are Adobe Data Feeds?
Per Adobe, data feeds reflect "data that is collected from web sites, mobile apps, or is uploaded using web service APIs or data sources, is processed and stored in Adobe's Data Warehouse". The raw Adobe clickstream data forms the underlying data set that is used by the Adobe Analytics user interface.
However, not everyone wants to be limited to the insights within an Adobe user interface. There are situations where companies want to leverage business intelligence software from Tableau, Microsoft, Looker and many others. If you are one of those companies looking to harness the power of Adobe data feeds, then this service is for you.
What is included in an Adobe Data Feed?
Adobe delivers raw data to the customer in a batched manner on a recurring daily or hourly delivery schedule. Each Adobe data feed is comprised of raw data for a single report suite (RSID). The data deliverables consist of three file classes:
- Manifest file: A text file with information regarding the contents of the delivery as a whole
- Lookup data: A TAR containing a number of files which contain supplemental data
- Hit data: The file which contains all relevant hit data.
The delivery from Adobe typically is organized like this:
Resident within the compressed archives is a collection of data files like these:
Depending on your schedule (hourly or daily), the volume of files will vary. However, regardless of your schedule the data feeds are somewhat disorganized, especially if they need to get loaded into a warehouse. This is where the Openbridge Adobe Data Feed preprocessor comes in.
How The Openbridge Adobe Data Feed Integration Works
The goal of the Openbridge Adobe data feed preprocessor is to monitor, process, route and deliver Adobe data feeds to downstream services for loading into a destination warehouse.
Step 1: Define Your Adobe Data Feeds
In this step you define the data you want as part of your data feed. These activities are completed within the Adobe Analytics admin interface.
Note: A pipeline can support multiple RSIDs assuming they are homogeneous (i.e. sharing the same structure and types). The number of RSIDs to be processed increases processing time largely due to increased testing and data volumes needed to ensure requirements are met.
Step 2: Configure Adobe Data Feed Delivery Location
Adobe needs to deliver data to an external location. In this step you will tell them where you want it to go.
Step 3: Loading Adobe Data To A Destination Warehouse
The Openbridge pipeline which handles loading to a destination warehouse. This includes unzips and splits of the zipped lookup data file into individual folders. This ensures that each lookup file can be routed for loading into their respective tables in the destination warehouse.
While steps 1 and 2 got everything setup and organized, you will want to have the data loaded into a target warehouse. The process is simple and only takes a few minutes to complete.
- Setup an Openbridge account: https://www.openbridge.com/register
- Activate destination warehouse at Google or Amazon. For example, you can setting up BigQuery, Redshift, Amazon Athena or have us output to Apache Parquet.
- Register the warehouse service within Openbridge. Here is an example if how to register BigQuery within Openbridge.
- You need to request Openbridge activate your S3 destination and configure the Adobe data feed deliveries to get processed and router to your target warehouse location.
How Data Gets Loaded Into Your Warehouse
Adobe data will be processed and routed to the destination warehouse. This will result in a 14 initial source tables and 14 roll-up views:
This document describes in more detail how we structure the data to simply change management of Adobe data.
Historical data should be on its own path once the daily data is flowing. The simplest approach is to deliver historical data into your go-forward pipeline. With an operation go-forward pipeline you can push historical data into it.
Be cognizant that for one RSID, covering 2 years, there will be 730 timestamped exports which equal about 10,000+ data files.
- Define the history to be retroactively delivered. How far back do you want to go?
- Define the period of each delivery, typically we suggest a monthly block. This is where you export a month and then validate all files for a month meet expectations. This will get you 30 or so unique deliveries for each month. Once a month is complete, continue to move back monthly until there is no more historical data. We also recommend a single daily export (versus multiple exports per day).
- Validate each tranche of deliveries allowing you "sign-off" on each delivery block.
Typically you would establish windows/blocks of time to process and validate historical deliveries. For example, working back in time monthly until you have all the possible historical data exported. Depending on the volume of historical data, this can take awhile to complete.
The approach of using the go-forward pipeline assumes that historical data is aligned with go-forward data. If your historical data varies in structure and type additional effort to fuse the different data sets would be needed. This may often be the first time teams are seeing older data and may realize minor (or significant) QA issues. These errors may or may not impact loading and lower level QA/QC may need to occur.
According to Adobe, before processing hourly or daily data, data feeds wait until all the hits that entered data collection within the timeframe (day or hour) have been settled in their system. Once that occurs, Adobe collects data that falls within the selected timeframe, compresses it, and sends to Amazon S3.
For hourly feeds, files are typically written out to data warehouse within 15-30 min after the hour. However, please keep in mind Adobe provides no guaranteed delivery time period. If there was no data with the selected timeframe, the Adobe process tries again during the next window. The Adobe data feed uses the `date_time` field to determine which hits belong to the hour. This field is based on the time zone of the report suite.
Openbridge will being processing data upon receipt of the data. Depending of the number of records and the width of the data (i.e. number of columns) it can take anywhere from 5 to 60 minutes for data to be loaded to your destination warehouse.
Frequently Asked Questions
What are examples of post processing activities once Adobe data is resident in your warehouse?
- Create custom "lookup" tables. For example, based on a site tagging plan or media plans, create lookups for campaign/creative IDs that allow you to cross reference any campaign/creative variables that may be referenced in the Adobe data
- Deploy on-premise BigQuery data export applications. For example, we have deployed tools like this within a client environment that remotely connects to BigQuery, exports data and then transfers it to an on-premise location.
- Create custom "pre-processing" workflows to clean up or "fix" bad data within go-forward data, historical data or both.
- Retain data beyond our standard data retention policy of 14 days. This is typically adequate when teams are pulling data into their on-premise systems. However, we can offer longer policies, including no expiration on how long we hold the data. This may be desirable for those teams that want to use us as a longer term data lake of Adobe data.
- Undertake detailed QA/QC of data feeds to validate alignment with site tagging.
- Transform the Adobe data to comply with customer requirements for various business intelligence or data visualization tools. For example, a customer may want to use native Tableau/Power BI to calculate metrics or set data types directly in the those tools rather than manipulate the data. In other cases, a customer may prefer pre-processed views in a database to accomplish those calculations rather than use Tableau or Power BI. Create special overlays on top of the data. An overlay may be a process that pre-calculates specific views or segments.
What is is the table structure like for the loaded Adobe data?
As part of this service we can generate a collection of database tables/views that follow Adobe packaging of data feeds. The following reflect the current feeds:
Each of the feeds will have a source `_v1, v2, v3,...` table and a `master` view:
This document describes in more detail how we structure the data to simply change management.
Can I create custom views or models of the Adobe data?
Yes, if any specialized calculations or models need to be created for your Adobe data feeds, you can define them as views in your warehouse. We provide some examples below. If you need help with an effort like this, please reach out to our expert services team.
What are examples of custom calculated views?
Adobe describes how to calculate common metrics using data feeds. They defines a set of common calculated metrics that can be an overlay on top of the Adobe data feeds. The following are some formulas for common metrics. Please note that while instructions to calculate several common metrics are defined, the actual implementation requires customization based on user requirements.
Page views can be calculated by counting when there is either a value in
post_page_url. You can use similar logic to count custom links:
To get to Visits you need to exclude all rows where
exclude_hit > 0 and any rows with
hit_source = 5,7,8,9. Next, combine
visit_start_time_gmt*. Count unique number of combinations.
Similar to Visits, you will want to Exclude all rows where
exclude_hit > 0 and all rows with
hit_source = 5,7,8,9. Combine
post_visid_low. Count unique number of combinations.
In rare circumstances, internet irregularities, system irregularities, or the use of custom visitor IDs can result in duplicate visit_num values for the same visitor ID that are not the same visit. To avoid resulting issues, also include
visit_start_time_gmt when counting visits.
hit_source = 5, 8, and 9 these reflect summary rows uploaded using data sources.
hit_source = 7 represents transaction ID data source uploads that should not be included in visit and visitor counts.
Can you provide an example of a custom view?
Sure, below is an example of the logic need to fuse Adobe data with Google Analytics data. This creates a calculated, unified and easy to query
ga_channeldetail_bounce view for a Tableau or PowerBI user.
CREATE OR REPLACE VIEW ga_channeldetail_bounce
SELECT DISTINCT bounce_visits.date::timestamp without time zone AS date, ((bounce_visits.dimension2::text || ' '::character varying::text) || bounce_visits.dimension3::text)::character varying AS rsid, bounce_visits.trafficsource_medium AS last_touch_channel, bounce_visits.trafficsource_source AS last_touch_channel_detail, count(DISTINCT (bounce_visits.visitid::character varying::text || ' '::character varying::text) || bounce_visits.fullvisitorid::text) AS visits, count(DISTINCT bounce_visits.fullvisitorid) AS unique_visitors, sum(
WHEN bounce_visits.hits_type::text = 'PAGE'::character varying::text THEN 1
END) AS page_views, sum(bounce_visits.totals_timeonsite::numeric::numeric(18,0) / bounce_visits.totals_hits::numeric::numeric(18,0)) AS time_spent_on_page, COALESCE(sum(COALESCE(bounce_visits.metric1::integer, 0) + COALESCE(bounce_visits.metric2::integer, 0)), 0::bigint) AS product_interactions,, sum(
WHEN bounce_visits.hits_eventinfo_eventcategory::text = 'eCommerce Lead:SHOP'::character varying::text THEN 1
END) AS buy_now_clicks, sum(
WHEN bounce_visits.hits_eventinfo_eventcategory::text = 'eCommerce Lead:BUY'::character varying::text OR bounce_visits.hits_eventinfo_eventcategory::text = 'eCommerce Lead:Buy'::character varying::text THEN 1
END) AS where_to_buy, sum(
WHEN bounce_visits.hits_eventinfo_eventcategory::text ~~ 'Video:%'::character varying::text AND bounce_visits.hits_eventinfo_eventaction::text = 'Video Play'::character varying::text THEN 1
END) AS video_views, sum(
WHEN bounce_visits.hits_eventinfo_eventcategory::text ~~ 'Game:%'::character varying::text AND (bounce_visits.hits_eventinfo_eventaction::text = 'play'::character varying::text OR bounce_visits.hits_eventinfo_eventaction::text = 'click start'::character varying::text OR bounce_visits.hits_eventinfo_eventaction::text = '70'::character varying::text OR bounce_visits.hits_eventinfo_eventaction::text = 'click play'::character varying::text OR bounce_visits.hits_eventinfo_eventaction::text = 'start'::character varying::text OR bounce_visits.hits_eventinfo_eventaction::text = 'launch'::character varying::text) THEN 1
END) AS game_starts, 'ga_channeldetail_bounce'::character varying AS source
FROM ( SELECT ga_360_core.date, ga_360_core.dimension2, ga_360_core.dimension3, ga_360_core.dimension4, ga_360_core.dimension5, ga_360_core.visitid, ga_360_core.fullvisitorid, ga_360_core.metric1, ga_360_core.metric2, ga_360_core.hits_type, ga_360_core.hits_eventinfo_eventcategory, ga_360_core.hits_eventinfo_eventaction, ga_360_core.totals_timeonsite, ga_360_core.totals_hits, ga_360_core.trafficsource_medium, ga_360_core.trafficsource_source
JOIN ( SELECT DISTINCT (ga_360_core.visitid::character varying::text || ' '::character varying::text) || ga_360_core.fullvisitorid::text AS visit
WHERE ga_360_core.totals_bounces::text = '1'::character varying::text) bounce_hits ON ((ga_360_core.visitid::character varying::text || ' '::character varying::text) || ga_360_core.fullvisitorid::text) = bounce_hits.visit) bounce_visits
GROUP BY bounce_visits.date, (bounce_visits.dimension2::text || ' '::character varying::text) || bounce_visits.dimension3::text, (bounce_visits.dimension4::text || ' '::character varying::text) || bounce_visits.dimension5::text, bounce_visits.trafficsource_medium, bounce_visits.trafficsource_source, bounce_visits.metric1;GRANT SELECT, DELETE, UPDATE, TRIGGER, RULE, REFERENCES, INSERT ON ga_channeldetail_bounce TO steve;
GRANT SELECT ON ga_channeldetail_bounce TO read_only_users;
Do you test and validate the Adobe data?
Yes, we do test and validate the data that is provided to us by Adobe. However, this is somewhat limited in scope. We are only testing and validating the format, structure and type of data being sent. If there are issues within the data related to bad tagging, implementation errors or other data quality problems we are not checking for this. This level of testing requires lower level quality control efforts.
Here are a couple of suggestions in advance of setting up a pipeline:
- Download a test file from Adobe. QA/QC the source Adobe files versus what is expected. Audit discrepancies and errors vs a tagging plan.
- Connect Power BI or Tableau to your sample data. Run collection of test queries to get a sense of what data is resident in your Adobe feed.
How should I configure my data feed deliveries?
When setting up your data feeds, ALWAYS make sure you are including your report suite ID (RSID) as one of the columns in the data.
In the event you are routing multiple report suites to the same warehouse, the RSID will allow you to query and filter records based on this ID. Without RSID in the data there would be no way to discern a record from one report suite from another. This is especially important for those who are using roll-up reports.
How many data feeds can I configure for a report suite?
The right answer should be one, but the Adobe UI allows you to configure more than one. If you attempt to setup more than one expect problems. Why? Data feed file names are made up of the report suite ID and the date. Any two feeds that are configured for the same RSID and date(s) will have the same file name. If those feeds are delivered to the same location, one file would overwrite the other. To prevent a file overwrite, you cannot create a feed that has the potential to overwrite an existing feed in the same location.
Why is Adobe reporting delivery errors?
If you receive these delivery errors consider a few possible reasons;
- Did the delivery path change?
- Did the report suite change?
- Did the Amazon S3 bucket change?
- Did your Amazon S3 authorization change or get revoked?
How do I view the transfer activity for Adobe data feeds?
To review the status of your data feeds, you need to review the job history in the Adobe UI. The job history will hold the last 30 days activity for all report suites. To view the history select one or more feeds. Next, click
Job History. The Jobs page opens and displays the history for the selected feeds.
Can I use my tagging plan for QA/QC?
Yes, the contents of your feeds will be influenced by tagging. An often overlooked quality control issue is poorly or inconsistently implemented tags for an RSID. This can introduce special characters, odd data types and other issues that can conflict with rules and reserved (delimiter) characters specified by Adobe. These types of issues can cause load errors into a target warehouse due to data type or rule mismatches.
- Data Feed Contents
- Feed Destination
- Troubleshooting Data Feeds
- Adobe Analytics (SiteCatalyst or Omniture) — Using Self-Service Analytics Tools
- Adobe Analytics: Avoid Heartache Configuring Adobe Clickstream Data Feeds
- 4 Ways To Get Your Adobe Analytics Data Into Tableau, Power BI, Looker And Many Other BI Tools