It is not uncommon for teams to perform complex queries on large tables. These may be multi-table joins and aggregations that model some business logic drawing on a collection of base tables. Depending on the type of analysis, it can be challenging to perform certain types of queries directly against base tables. One strategy to simplify this process is the use of views. A view can offer increased flexibility and convenience to users of the data.
What Are Views?
What exactly is a view? Put simply; a view is a virtual table defined by a SQL query. The use of a view offers an improved user experience for those running queries for analytics, ETL, and other data operations. Why? A view will often encapsulate specific query logic into a well-defined result set.
For example; let’s assume an analyst wants to do a count of impressions for an ad campaign. However, there are multiple attribution rows for the same date. The analyst only cares about the most recent row as it reflects the latest attribution metrics from the Ad platform for a day. To do the counts, the analyst wants to remove possible duplicate rows for older records to simplify the analysis. They created a view that ensures only the most recent row is present:
-- Create deduplication view
CREATE VIEW amzadvertising_hsa_campaigns_master_deduped
AS
SELECT *
FROM (SELECT *,
Row_number ()
OVER (
partition BY ob_transaction_id
ORDER BY modified_at ASC ) AS row
FROM amzadvertising_hsa_campaigns_master)
WHERE row = 1
Once created, the analyst will query the view the same way they had queried the base table(s). While the analyst may be able to accomplish this directly in their analytics tools, the view offers the convenience of querying data from the base tables to improve testing, efficiency, and productivity.
Let's look at another example. An analyst wants to create a summary of complex underlying order data from Amazon, broken down by time and item status. This view limits the scope of order data which makes it easier to undertake some basic counts;
-- Create order summary view
CREATE VIEW amazon_orders_summary
AS
SELECT Extract(month FROM transaction_date) AS month,
Extract(day FROM transaction_date) AS day,
Extract(hour FROM transaction_date) AS hour,
amazon_order_id AS order_id,
Count(*)
FROM mws_orders_by_last_update_date
WHERE item_status = “shipped”
AND ob_seller_id = 123456789
GROUP BY month,
day,
hour,
order_id
Ultimately, the view definition reflects how you want to model the data in a manner that best aligns with a business need.
Materialized Views
Depending on your use case, processing queries in views may be an expensive operation. For example, "expensive" may mean a query takes a long time to complete, which may have cost implications in systems like BigQuery or Snowflake that charge for computing time. A possible solution to this is the use of materialized views. What, exactly, is a materialized view? So far we discussed views which are virtual tables. However, a materialized view creates a pre-computed physical table based on your SQL view query.
For example, imagine you have a collection of queries populating dashboards created by tools like Tableau, Data Studio, Looker, or Amazon QuickSight. Instead of performing resource-intensive queries against virtual tables (views) or large source tables (such as aggregates or multiple joins), a materialized view can offer a pre-computed result set.
The result of the materialized view is that your queries occur against an optimized, pre-computed physical table. These materialized views can be handy for speeding up queries that are predictable and repeatable.
While materialized views can add value, they may not be a fit for all of your use cases. Redshift, BigQuery, Snowflake, and others all have specific recommendations relating to views and materialize views. Read their docs to make sure you are aligning your particular use case to their best practices.
References