Amazon Athena Frequently Asked Questions

Our code-free, zero administration data lake service delivers cost savings and performance gains for Amazon Athena by compressing, partitioning, and converting your data to a columnar format to reduce the amount of data that it needs to scan.

Amazon Athena is serverless, so there is no infrastructure to manage. With our automated data pipeline service, you don’t need to worry about configuration, software updates, failures, or scaling your infrastructure as your datasets and number of users grow.

WHAT ARE SOME AWS ATHENA USE CASES?

One of the popular use cases is combining the simplicity of Tableau, a data lake, and Athena's power. This use case demonstrates how to deliver a cost-efficient, high-performance analytics-driven data lake architecture. Read more about this use case here: 4 Steps To Create a Serverless Analytics with Tableau and Amazon Athena.

WHY DO I NEED A DATA LAKE OR WAREHOUSE?

To maximize productivity, your data needs to be organized, processed, and loaded into a data lake, or warehouse. This data warehouse will act as a central repository of data aggregated from disparate sources. The data destination becomes the hub that allows you to plug in your favorite tools so you can query, discover, visualize or model that data.

WHAT DATA VISUALIZATION, BUSINESS INTELLIGENCE, REPORTING, OR DASHBOARDING TOOLS CAN I USE?

You have a lot of powerful and affordable choices these days! Tools like Tableau Software, Looker, Mode Analytics, Chartio, and Microsoft Power BI are just a few options to consider. Please take a look at the complete list of tools or read our DZone article about creating an objective 10-point business intelligence tool checklist to help narrow the field. If you have questions, feel free to reach out to us.

DO YOU CHARGE FOR DATA LAKE OR WAREHOUSE?

No, there will not be any charges from Openbridge for your warehouse. Any charges are directly billed by the data destination provider (i.e., Google, Azure, or Amazon) to you.

DOES AWS ATHENA PRICING USE ON-DEMAND COSTING?

Every data lake or cloud warehouse has its pricing model. Often pricing will vary by usage, which is defined by the compute and storage consumed or provisioned. Depending on your situation and requirements, different price-performance considerations may come into play. For example, if you need to start with a no or low-cost solution, Athena only charges according to usage. This may provide you with the essentials to kickstart your efforts. If you have questions, feel free to reach out to us. We can offer some tips and best practices on how best to set up a data lake based on your needs.

DO YOU FOLLOW AMAZON'S BEST PRACTICES FOR DATA PARTITIONING WITHIN AWS ATHENA?

Yes! Amazon suggests that partitioning can help reduce the volume of data scanned per query, thereby improving performance and reducing cost. You can restrict the volume of data scanned because partitions act as virtual columns. When you combine partitions with the use of columnar data formats like Apache Parquet, you are optimizing for best practices.

CAN I USE STANDARD SQL?

Yes, using standard SQL is supported for Athena. Most destinations, like Google BigQuery, Amazon Redshift, Amazon Athena, and others, support familiar SQL constructs. There may be some limitations or best practices for the specific use case, but the rule of thumb is that SQL is available.

DO YOU OPTIMIZE FOR ATHENA DATA LAKES?

Yes! We follow Amazon's best practices relating to file sizes of the objects we partition, split, and compress. Doing so ensures queries run more efficiently, and reading data can be parallelized because blocks of data can be read sequentially. This is true mostly for larger files as well as smaller files, generally less than 128 MB, that do not always realize the same performance benefits.

DOES AMAZON ATHENA CSV QUERYING WORK?

Yes, you can query CSV. We have a service that automates the loading of CSVs for use in Athena. However, using CSV is not the most efficient approach. DZone has published an article we wrote on the subject: Apache Parquet vs. CSV File.

WHICH DATA LAKE OR CLOUD WAREHOUSE SHOULD I BE USING?

When building your data strategy and architecture, it's essential to understand which data warehouses should be candidates for consideration. Typically, teams will be asking themselves answers like "How do I install and configure a data warehouse?" or "Which data warehouse solution will help me to get the fastest query times?" or "Which of my analytics tools are supported?" This article covers key features and benefits of five widely used data lake and warehouse solutions supported by Openbridge to help you choose the right one: How to Choose a Data Warehouse Solution that Fits Your Needs. If you have answers, feel free to reach out to us.

DO I NEED AN EXPERT SERVICES ENGAGEMENT FOR ATHENA?

Typically, you will not need services for Athena. Most customers are up and running using their Athena data quickly. However, if you need support, we do offer expert services. There may be situations where you have specific needs relating to Athena data. These situations can require expert assistance to tailor Athena data to fit your requirements. Ultimately, our mission is to help you get value from data, and this can often happen more quickly with the assistance of our passionate expert services team.

DO I NEED TO AUTHORIZE OPENBRIDGE OR ITS PARTNERS MY ACCESS TO ATHENA SYSTEM?

Yes, typically, Athena requires authorization to access your data. You would provide us with the Athena authorizations, so we can properly connect to their system. However, there are some situations where companies like Athena can "push" data to us. In those cases, we provide them with connection details to our API or our Data Transfer Service. Once they have those details, they use that information to connect, authenticate, and deliver data.

DO YOU SUPPORT COMPRESSION AND FILE SPLITTING?

Yes! Amazon suggests compression and file splitting can have a significant impact to speed up Athena queries significantly. The smaller data sizes mean optimized queries, and it will also reduce network traffic with data stored in Amazon S3 to Athena.

When your data is splittable, Openbridge will do this Athena optimization for you. This allows the execution engine in Athena to optimize the reading of a file to increase parallelism and reduce the amount of data scanned. In the case of an unsplittable file, then only a single reader can read the file. This only happens in the case of smaller files (generally less than 128 MB).

DO YOU SUPPORT COLUMNAR DATA FORMATS LIKE APACHE PARQUET?

Yes! Amazon suggests the use of columnar data formats. We have chosen to use Apache Parquet vs. other columnar formats. Based on the data type, parquet will store data efficiently with column-wise compression, including different encoding and compression. Openbridge will automatically handle the conversion of data to Parquet format, saving you time and money, primarily when Athena executes queries that are ad hoc in nature. Also, using Parquet-formatted files means reading fewer bytes from Amazon S3, leading to better Athena query performance.

HOW DOES AWS ATHENA WORK

Athena uses Presto, a commercialized version of a Facebook project. Presto is an open-source SQL query engine that powers the AWS Athena service. Unlike Presto, you only pay for the queries you run, with no servers to manage. Check out the article What is Presto, Facebook Presto Database, PrestoSQL, or PrestoDB? A powerful SQL query engine

IS THERE A LOOKER ATHENA CONNECTION? HOW ABOUT FOR TABLEAU OR POWER BI?

Amazon does charge for the service. For the current costs, check out the AWS pricing page.

DOES AMAZON ATHENA COST EXTRA

Amazon does charge for the service. For the current costs, check out the AWS pricing page.

WHERE CAN I FIND AMAZON ATHENA DOCUMENTATION?

Check out the AWS documentation.