ETL and ELT Pipelines
Moving data between systems requires extracting it from sources, transforming it into useful formats, and loading it into destinations. The order of these operations — particularly when transformation happens — defines two major approaches: ETL and ELT.
ETL: Extract, Transform, Load
The traditional approach transforms data before loading it into the destination:
Source → Extract → Transform → Load → Warehouse
↑
Processing Server
(transformation happens here)
In ETL, a separate processing system handles transformation. Data is extracted from sources, sent to processing servers that clean and reshape it, then loaded into the data warehouse in its final form.
Advantages of ETL:
- Warehouse only stores processed, clean data
- Reduces warehouse storage costs
- Works with warehouses that have limited compute power
- Better for complex transformations requiring specialized tools
Challenges of ETL:
- Requires maintaining transformation infrastructure
- Changes to transformations require reprocessing
- Raw data isn't preserved for future analysis
- Longer time from source to availability
ELT: Extract, Load, Transform
The modern approach loads raw data first, then transforms it using the warehouse's compute power:
Source → Extract → Load → Warehouse → Transform
↑
(transformation happens here,
using warehouse SQL)
ELT takes advantage of modern data warehouses like Snowflake, BigQuery, and Redshift that offer massive, scalable compute. Instead of maintaining separate transformation servers, you write SQL that runs in the warehouse.
Advantages of ELT:
- Raw data preserved for future needs
- Faster initial data availability
- Simpler infrastructure (no separate processing layer)
- Easy to iterate on transformations
- Leverages warehouse's optimized query engine
Challenges of ELT:
- Higher warehouse storage costs
- Requires powerful warehouse compute
- SQL limitations for some transformations
- Raw data access requires governance
Choosing Between Them
The right choice depends on your situation:
Choose ETL when:
- Your warehouse has limited compute capacity
- Compliance requires not storing raw sensitive data
- Transformations need specialized processing (ML, complex algorithms)
- You're working with legacy systems
Choose ELT when:
- You use a modern cloud data warehouse
- You need flexibility to change transformations
- Raw data has future analytical value
- You want simpler infrastructure
Many organizations use hybrid approaches — ELT for most data, with ETL for specific cases requiring specialized processing.
The Modern Trend
The industry has shifted toward ELT as cloud warehouses have become more powerful and cost-effective. The ability to iterate quickly on transformations, preserve raw data, and avoid managing processing infrastructure makes ELT attractive for most new projects.
Tools like dbt (data build tool) have emerged specifically for the "T" in ELT, providing version control, testing, and documentation for SQL transformations that run inside the warehouse.