TracksSpecializations and Deep DivesData Engineering EssentialsData Warehousing Concepts(4 of 6)

Data Warehousing Concepts

When your application database handles thousands of transactions per second, running complex analytical queries against it would slow everything down. Data warehouses solve this problem by providing a separate, optimized home for analytical workloads.

Transactional vs Analytical Databases

Regular databases — called OLTP (Online Transaction Processing) systems — are optimized for many small, fast operations. They handle your app's reads and writes: creating orders, updating user profiles, recording events.

Data warehouses — called OLAP (Online Analytical Processing) systems — are optimized for few large, complex queries. They answer questions like "What were our sales trends across all regions for the past three years?"

OLTP (Your App Database):
- Many small reads/writes
- Current state of data
- Normalized schema (minimal duplication)
- Response time: milliseconds

OLAP (Data Warehouse):
- Few large analytical queries
- Historical data over time
- Denormalized schema (optimized for reads)
- Response time: seconds to minutes

How Data Gets to the Warehouse

Data flows from operational systems into the warehouse through ETL or ELT pipelines. These pipelines extract data from various sources, transform it into analytical-friendly formats, and load it into the warehouse — typically on a schedule.

The warehouse stores historical snapshots, so you can analyze how metrics changed over time. Your production database might only keep current customer addresses, but your warehouse keeps every address a customer ever had.

Modern Data Warehouse Options

Cloud warehouses have transformed this space. You no longer need to provision and manage servers — you pay for what you use.

Snowflake separates storage and compute, letting you scale each independently. You pay per query, making it cost-effective for variable workloads.

BigQuery from Google is fully serverless. You write SQL, and Google handles everything else. Great for teams without dedicated infrastructure expertise.

Redshift from AWS follows a more traditional model with provisioned clusters. It integrates deeply with other AWS services.

Databricks combines warehousing with machine learning capabilities, useful when analytics and ML share the same data.

When You Need a Warehouse

Small applications often don't need a separate warehouse — running analytics against your production database works fine. As data grows and queries become complex, a warehouse becomes essential for keeping your application fast while enabling deep analysis.

See More

Further Reading

You need to be signed in to leave a comment and join the discussion