why postgresql for analytics infrastructure (dw)?
TRANSCRIPT
![Page 1: Why PostgreSQL for Analytics Infrastructure (DW)?](https://reader034.vdocuments.mx/reader034/viewer/2022050614/588915d11a28ab4a5c8b6573/html5/thumbnails/1.jpg)
Huy NguyenCTO, Cofounder - Holistics.io
Why PostgreSQL for Analytics Infrastructure (DW)?
Grokking TechTalk - Database SystemsHo Chi Minh City - Aug 2016
![Page 2: Why PostgreSQL for Analytics Infrastructure (DW)?](https://reader034.vdocuments.mx/reader034/viewer/2022050614/588915d11a28ab4a5c8b6573/html5/thumbnails/2.jpg)
● Cofounder
○ Data Reporting (BI) and Infrastructure SaaS
● Cofounder of Grokking Vietnam○ Building community of world-class engineers in Vietnam
● Previous○ Growth Team at Facebook (US)
○ Built Data Pipeline at Viki (Singapore)
About Me
![Page 3: Why PostgreSQL for Analytics Infrastructure (DW)?](https://reader034.vdocuments.mx/reader034/viewer/2022050614/588915d11a28ab4a5c8b6573/html5/thumbnails/3.jpg)
Background: What is Analytics/DW?
![Page 4: Why PostgreSQL for Analytics Infrastructure (DW)?](https://reader034.vdocuments.mx/reader034/viewer/2022050614/588915d11a28ab4a5c8b6573/html5/thumbnails/4.jpg)
- A Typical Web Application
Data-related Business Problems:
• Daily/weekly registered users by different platforms, countries?
• How many video uploads do we have everyday?
![Page 5: Why PostgreSQL for Analytics Infrastructure (DW)?](https://reader034.vdocuments.mx/reader034/viewer/2022050614/588915d11a28ab4a5c8b6573/html5/thumbnails/5.jpg)
- A Typical Web Application
• Daily/weekly registered users by different platforms, countries?
• How many video uploads do we have everyday?
![Page 6: Why PostgreSQL for Analytics Infrastructure (DW)?](https://reader034.vdocuments.mx/reader034/viewer/2022050614/588915d11a28ab4a5c8b6573/html5/thumbnails/6.jpg)
![Page 7: Why PostgreSQL for Analytics Infrastructure (DW)?](https://reader034.vdocuments.mx/reader034/viewer/2022050614/588915d11a28ab4a5c8b6573/html5/thumbnails/7.jpg)
A Typical Data Pipeline
![Page 8: Why PostgreSQL for Analytics Infrastructure (DW)?](https://reader034.vdocuments.mx/reader034/viewer/2022050614/588915d11a28ab4a5c8b6573/html5/thumbnails/8.jpg)
Analytics Database
CSVs / Excels / Google Sheets
Operational Data Data Warehouse Reporting / Analysis
Data Science / ML
Reporting / BI
Event Logs (behavioural
data)
Live Databases
Live DatabasesProduction
DBsDaily Snapshot
Import
Pre-aggregate
Modify / Transform
![Page 9: Why PostgreSQL for Analytics Infrastructure (DW)?](https://reader034.vdocuments.mx/reader034/viewer/2022050614/588915d11a28ab4a5c8b6573/html5/thumbnails/9.jpg)
Analytics Database
CSVs / Excels / Google Sheets
Operational Data Data Warehouse Reporting / Analysis
Data Science / ML
Reporting / BI
Event Logs (behavioural
data)
Live Databases
Live DatabasesProduction
DBsDaily Snapshot
Import
Pre-aggregate
Modify / Transform
What database should we pick?
![Page 10: Why PostgreSQL for Analytics Infrastructure (DW)?](https://reader034.vdocuments.mx/reader034/viewer/2022050614/588915d11a28ab4a5c8b6573/html5/thumbnails/10.jpg)
Transactional Applications vs Analytics Applications
Ref: http://www.slideshare.net/PGExperts/really-big-elephants-postgresql-dw-15833438 (slide 5)
Data:
● Many single-row writes
● Current, single data
Queries:
● Generated by user activities; 10 to 1000 users
● < 1s response time
● Short queries
Data:
● Few large batch imports
● Years of data, many sources
Queries:
● Generated by large reports; 1 to 10 users
● Queries run for hours
● Long queries
![Page 11: Why PostgreSQL for Analytics Infrastructure (DW)?](https://reader034.vdocuments.mx/reader034/viewer/2022050614/588915d11a28ab4a5c8b6573/html5/thumbnails/11.jpg)
Ref: http://www.slideshare.net/PGExperts/really-big-elephants-postgresql-dw-15833438 (slide 8)
Complex Query...
![Page 12: Why PostgreSQL for Analytics Infrastructure (DW)?](https://reader034.vdocuments.mx/reader034/viewer/2022050614/588915d11a28ab4a5c8b6573/html5/thumbnails/12.jpg)
Why start with Postgres?
1. Simple to Get Started
2. Rich Features for Analytics
– Data Pipeline (ETL)
– Data Analysis
3. Scale Up
(3) Scale(1) Start (2) Grow
Data Growth
![Page 13: Why PostgreSQL for Analytics Infrastructure (DW)?](https://reader034.vdocuments.mx/reader034/viewer/2022050614/588915d11a28ab4a5c8b6573/html5/thumbnails/13.jpg)
1. Simple to Get Started
2. Rich Features for Analytics
– Data Pipeline (ETL)
– Data Analysis
3. Scale Up
Why start with Postgres?
(3) Scale(1) Start (2) Grow
Data Growth
![Page 14: Why PostgreSQL for Analytics Infrastructure (DW)?](https://reader034.vdocuments.mx/reader034/viewer/2022050614/588915d11a28ab4a5c8b6573/html5/thumbnails/14.jpg)
1 Simple to Get Started
● Data requests grow gradually as your company grows● Business users care about results (not backend)
Postgres:
● Free (open-source)● Easy to setup
→ Need something quick to start, easy to fine-tune along the way
1. Simple start 2. Rich features 3. Scale up
![Page 15: Why PostgreSQL for Analytics Infrastructure (DW)?](https://reader034.vdocuments.mx/reader034/viewer/2022050614/588915d11a28ab4a5c8b6573/html5/thumbnails/15.jpg)
1. Simple to Get Started
2. Rich Features for Analytics
– Data Pipeline (ETL)
– Data Analysis
3. Scale Up
Why start with Postgres?
(3) Scale(1) Start (2) Grow
Data Growth
![Page 16: Why PostgreSQL for Analytics Infrastructure (DW)?](https://reader034.vdocuments.mx/reader034/viewer/2022050614/588915d11a28ab4a5c8b6573/html5/thumbnails/16.jpg)
Analytics Database
CSVs / Excels / Google Sheets
Operational Data Data Warehouse Reporting / Analysis
Data Science / ML
Reporting / BI
Event Logs (behavioural
data)
Live Databases
Live DatabasesProduction
DBsDaily Snapshot
Import
Pre-aggregate
Modify / Transform
Data Pipeline (ETL) Data Analysis
1. Simple start 2. Rich features 3. Scale up
![Page 17: Why PostgreSQL for Analytics Infrastructure (DW)?](https://reader034.vdocuments.mx/reader034/viewer/2022050614/588915d11a28ab4a5c8b6573/html5/thumbnails/17.jpg)
Analytics Database
CSVs / Excels / Google Sheets
Data Warehouse
Event Logs (behavioural
data)
Live Databases
Live DatabasesProduction
DBs
Table
Table
Table
Table
Table
Table
Table
Table
Table
Table
Table
Table
Table
Table
Table
Table
1. Simple start 2. Rich features 3. Scale up
![Page 18: Why PostgreSQL for Analytics Infrastructure (DW)?](https://reader034.vdocuments.mx/reader034/viewer/2022050614/588915d11a28ab4a5c8b6573/html5/thumbnails/18.jpg)
● Managing Table Data: table partitioning
● Managing Disk Space: tablespace
● Write Performance: unlogged table
● Others: foreign data wrapper, point-in-time recovery
2 a- Data Pipeline (ETL) & Performance
1. Simple start 2. Rich features 3. Scale up
![Page 19: Why PostgreSQL for Analytics Infrastructure (DW)?](https://reader034.vdocuments.mx/reader034/viewer/2022050614/588915d11a28ab4a5c8b6573/html5/thumbnails/19.jpg)
● Managing Table Data: table partitioning
● Managing Disk Space: tablespace
● Write Performance: unlogged table
● Others: foreign data wrapper, point-in-time recovery
2 a- Data Pipeline (ETL) & Performance
1. Simple start 2. Rich features 3. Scale up
![Page 20: Why PostgreSQL for Analytics Infrastructure (DW)?](https://reader034.vdocuments.mx/reader034/viewer/2022050614/588915d11a28ab4a5c8b6573/html5/thumbnails/20.jpg)
Analytics tables hold lots of data
Managing Data Tables
pageviews_2015_06
pageviews_2015_07
pageviews_2015_08
pageviews_2015_09
Solution: Split (partition) to multiple tables
Problem:Difficult to query data across multiple months
⇒ Table grows big quickly, difficult to manage !
pageviews
(+ 100k records a day)
date_d | country | user_id | browser | page_name | views
1. Simple start 2. Rich features 3. Scale up
![Page 21: Why PostgreSQL for Analytics Infrastructure (DW)?](https://reader034.vdocuments.mx/reader034/viewer/2022050614/588915d11a28ab4a5c8b6573/html5/thumbnails/21.jpg)
Managing Data Tables: parent table
pageviews_2015_06
pageviews_2015_07
pageviews_2015_08
pageviews_2015_09
…
ALTER TABLE pageviews_2015_09 INHERIT video_plays;
ALTER TABLE pageviews_2015_09 ADD CONSTRAINTCHECK date_d >= '2015-09-01' AND date_d < '2015-10-01';
pageviews_parent (parent table)
1. Simple start 2. Rich features 3. Scale up
![Page 22: Why PostgreSQL for Analytics Infrastructure (DW)?](https://reader034.vdocuments.mx/reader034/viewer/2022050614/588915d11a28ab4a5c8b6573/html5/thumbnails/22.jpg)
● Managing Table Data: table partitioning
● Managing Disk Space: tablespace
● Write Performance: unlogged table
● Others: foreign data wrapper, point-in-time recovery
2 a- Data Pipeline (ETL) & Performance
1. Simple start 2. Rich features 3. Scale up
![Page 23: Why PostgreSQL for Analytics Infrastructure (DW)?](https://reader034.vdocuments.mx/reader034/viewer/2022050614/588915d11a28ab4a5c8b6573/html5/thumbnails/23.jpg)
Analytics DB holds lots of data; hardware spaces are limited
● SSD: fast, expensive● SATA: cheap, slow
Data have different accessfrequency
● Hot Data● Warm Data● Cold Data
Managing Disk-spaces
1. Simple start 2. Rich features 3. Scale up
![Page 24: Why PostgreSQL for Analytics Infrastructure (DW)?](https://reader034.vdocuments.mx/reader034/viewer/2022050614/588915d11a28ab4a5c8b6573/html5/thumbnails/24.jpg)
Tablespace: Define where your tables are stored on disks
Managing Disk-spaces: tablespace
CREATE TABLESPACE hot_data LOCATION /disk0/ssd/CREATE TABLESPACE warm_data LOCATION /disk1/sata2/
# beginning of the month
CREATE TABLE pageviews_2016_08 TABLESPACE hot_data;ALTER TABLE pageviews_2016_07 TABLESPACE warm_data;
1. Simple start 2. Rich features 3. Scale up
![Page 25: Why PostgreSQL for Analytics Infrastructure (DW)?](https://reader034.vdocuments.mx/reader034/viewer/2022050614/588915d11a28ab4a5c8b6573/html5/thumbnails/25.jpg)
Combining TABLESPACE and PARENT TABLE
pageviews_2015_06
pageviews_2015_07
pageviews_2015_08
pageviews_2015_09
…
pageviews_parent (parent table)
1. Simple start 2. Rich features 3. Scale up
![Page 26: Why PostgreSQL for Analytics Infrastructure (DW)?](https://reader034.vdocuments.mx/reader034/viewer/2022050614/588915d11a28ab4a5c8b6573/html5/thumbnails/26.jpg)
● Managing Table Data: table partitioning
● Managing Disk Space: tablespace
● Write Performance: unlogged table
● Others: foreign data wrapper, point-in-time recovery
2 a- Data Pipeline (ETL) & Performance
1. Simple start 2. Rich features 3. Scale up
![Page 27: Why PostgreSQL for Analytics Infrastructure (DW)?](https://reader034.vdocuments.mx/reader034/viewer/2022050614/588915d11a28ab4a5c8b6573/html5/thumbnails/27.jpg)
Analytics Database
CSVs / Excels / Google Sheets
Data Warehouse
Event Logs (behavioural
data)
Live Databases
Live DatabasesProduction
DBs
Table
Table
Table
Table
Table
Table
Table
Table
Table
Table
Table
Table
Table
Table
Table
Table
Analytics tables can be rebuilt from source
1. Simple start 2. Rich features 3. Scale up
![Page 28: Why PostgreSQL for Analytics Infrastructure (DW)?](https://reader034.vdocuments.mx/reader034/viewer/2022050614/588915d11a28ab4a5c8b6573/html5/thumbnails/28.jpg)
CREATE TABLE daily_summary(...) UNLOGGED;
INSERT INTO daily_summary …;
Write Performance: unlogged table● Transactional Safety: Every update is 2 writes:
○ Update data inside table
○ Write WAL (Write Ahead Log)
● UNLOGGED TABLE○ Skip WAL log○ Improved Write Performance
http://pgsnaga.blogspot.com/2011/10/data-loading-into-unlogged-tables-and.html
1. Simple start 2. Rich features 3. Scale up
![Page 29: Why PostgreSQL for Analytics Infrastructure (DW)?](https://reader034.vdocuments.mx/reader034/viewer/2022050614/588915d11a28ab4a5c8b6573/html5/thumbnails/29.jpg)
● Managing Table Data: table partitioning
● Managing Disk Space: tablespace
● Write Performance: unlogged table
● Others: foreign data wrapper, point-in-time recovery
2 a- Data Pipeline (ETL) & Performance
1. Simple start 2. Rich features 3. Scale up
![Page 30: Why PostgreSQL for Analytics Infrastructure (DW)?](https://reader034.vdocuments.mx/reader034/viewer/2022050614/588915d11a28ab4a5c8b6573/html5/thumbnails/30.jpg)
● Extract / transform● Aggregate / summarize● Statistical analysis
2- b- Data Analysis (writing SQLs)
Analytics Database
Data WarehouseReporting /
Analysis
Data Science / ML
Reporting / BI
1. Simple start 2. Rich features 3. Scale up
![Page 31: Why PostgreSQL for Analytics Infrastructure (DW)?](https://reader034.vdocuments.mx/reader034/viewer/2022050614/588915d11a28ab4a5c8b6573/html5/thumbnails/31.jpg)
● SQL features
○ WITH clause
○ Window functions
○ Aggregation functions
○ Statistical functions
● Data structures
○ JSON / JSONB
○ Arrays
○ PostGIS (geo data)
○ Geometry (point, line, etc)
○ HyperLogLog (extension)
2- b - Data Analysis with Postgres● PL/SQL
● Full-text search (n-gram)
● Performance:
○ Parallel queries (pg9.6)
○ Materialized views
○ BRIN index
● Others:
○ DISTINCT ON
○ VALUES
○ generate_series()
○ Support FULL OUTER JOIN
○ Better EXPLAIN
1. Simple start 2. Rich features 3. Scale up
![Page 32: Why PostgreSQL for Analytics Infrastructure (DW)?](https://reader034.vdocuments.mx/reader034/viewer/2022050614/588915d11a28ab4a5c8b6573/html5/thumbnails/32.jpg)
SELECT ... FROM (SELECT ... FROM t1 JOIN (SELECT ... FROM ...) a ON (...) ) b JOIN (SELECT ... FROM ...) c ON (...)
CTE - Problem with Nested QueriesNested queries are
a) hard to readb) cannot be reused
1. Simple start 2. Rich features 3. Scale up
![Page 33: Why PostgreSQL for Analytics Infrastructure (DW)?](https://reader034.vdocuments.mx/reader034/viewer/2022050614/588915d11a28ab4a5c8b6573/html5/thumbnails/33.jpg)
CTE - Common Table Expressions (WITH clause)
WITH a AS ( SELECT ... FROM ...), b AS ( SELECT ... FROM t1 JOIN a ON (...)), c AS ( SELECT ... FROM ...)SELECT ... FROM b JOIN c ON ...
● SQL’s “private methods”
● WITH view can be referred multiple times
● Allows chaining instead of nesting
1. Simple start 2. Rich features 3. Scale up
![Page 34: Why PostgreSQL for Analytics Infrastructure (DW)?](https://reader034.vdocuments.mx/reader034/viewer/2022050614/588915d11a28ab4a5c8b6573/html5/thumbnails/34.jpg)
CTE (cont.)● Recursive CTE● Writeable CTE
1. Simple start 2. Rich features 3. Scale up
# move data from A to BWITH deleted_rows AS (
DELETE FROM a WHERE ...RETURNING *
)INSERT INTO bSELECT * FROM deleted_rows;
![Page 35: Why PostgreSQL for Analytics Infrastructure (DW)?](https://reader034.vdocuments.mx/reader034/viewer/2022050614/588915d11a28ab4a5c8b6573/html5/thumbnails/35.jpg)
SELECT gender, COUNT(1) AS signupsFROM usersGROUP BY 1
● GROUP BY aggregate: reduce a partition of data into 1 value
Limitation of GROUP BY aggregate
What if we want to work through each row of each partition?
1. Simple start 2. Rich features 3. Scale up
![Page 36: Why PostgreSQL for Analytics Infrastructure (DW)?](https://reader034.vdocuments.mx/reader034/viewer/2022050614/588915d11a28ab4a5c8b6573/html5/thumbnails/36.jpg)
● Window functions: moving frame of 1 partition data
● Examples:○ Calculate moving average○ Cumulative sum○ Ranking by partition○ …
Window functions
1. Simple start 2. Rich features 3. Scale up
![Page 37: Why PostgreSQL for Analytics Infrastructure (DW)?](https://reader034.vdocuments.mx/reader034/viewer/2022050614/588915d11a28ab4a5c8b6573/html5/thumbnails/37.jpg)
SELECT created_at::date AS date_d, COUNT(1) AS daily_signups, SUM(COUNT(1)) OVER (ORDER BY dated_d) AS cumulative_signupsFROM users UGROUP BY 1ORDER BY 1
| date_d | daily_signups | cumulative_signups || 2016-08-01 | 100 | 100 || 2016-08-02 | 50 | 150 || 2016-08-03 | 80 | 230 |
Example: Cumulative Sum
CREATE TABLE users ( id INT, gender VARCHAR(10), created_at TIMESTAMP);
1. Simple start 2. Rich features 3. Scale up
![Page 38: Why PostgreSQL for Analytics Infrastructure (DW)?](https://reader034.vdocuments.mx/reader034/viewer/2022050614/588915d11a28ab4a5c8b6573/html5/thumbnails/38.jpg)
SELECT gender, name, RANK() OVER (PARTITION BY gender ORDER BY created_at) AS signup_rnkFROM users U ORDER BY 1, 3;
| gender | name | signup_rnk || male | Hung | 1 || male | Son | 2 || ... || female | Lan | 1 || female | Tuyet | 2 |
Example: Group by Gender and rank by signup time
CREATE TABLE users ( id INT, name VARCHAR, gender VARCHAR(10), created_at TIMESTAMP);
1. Simple start 2. Rich features 3. Scale up
![Page 39: Why PostgreSQL for Analytics Infrastructure (DW)?](https://reader034.vdocuments.mx/reader034/viewer/2022050614/588915d11a28ab4a5c8b6573/html5/thumbnails/39.jpg)
● SQL features
○ WITH clause
○ Window functions
○ Aggregation functions
○ Statistical functions
● Data structures
○ JSON / JSONB
○ Arrays
○ PostGIS (geo data)
○ Geometry (point, line, etc)
○ HyperLogLog (extension)
2 b- Data Analysis with Postgres
● PL/SQL
● Full-text search (n-gram)
● Performance:
○ Parallel queries (pg9.6)
○ Materialized views
○ BRIN index
● Others:
○ DISTINCT ON
○ VALUES
○ generate_series()
○ Support FULL OUTER JOIN
○ Better EXPLAIN
PostgreSQL is well suited for data analysis!
![Page 40: Why PostgreSQL for Analytics Infrastructure (DW)?](https://reader034.vdocuments.mx/reader034/viewer/2022050614/588915d11a28ab4a5c8b6573/html5/thumbnails/40.jpg)
Analytics Database
CSVs / Excels / Google Sheets
Operational Data Data Warehouse Reporting / Analysis
Data Science / ML
Reporting / BI
Event Logs (behavioural
data)
Live Databases
Live DatabasesProduction
DBsDaily Snapshot
Import
Pre-aggregate
Modify / Transform
Data Pipeline (ETL) Data Analysis
1. Simple start 2. Rich features 3. Scale up
![Page 41: Why PostgreSQL for Analytics Infrastructure (DW)?](https://reader034.vdocuments.mx/reader034/viewer/2022050614/588915d11a28ab4a5c8b6573/html5/thumbnails/41.jpg)
Why start with Postgres?
1. Simple to Get Started
2. Rich Features for Analytics
– Data Pipeline (ETL)
– Data Analysis
3. Scale Up
(3) Scale(1) Start (2) Grow
Data Growth
![Page 42: Why PostgreSQL for Analytics Infrastructure (DW)?](https://reader034.vdocuments.mx/reader034/viewer/2022050614/588915d11a28ab4a5c8b6573/html5/thumbnails/42.jpg)
● PostgreSQL downsides:○ Optimized for transactional applications
○ Single-core execution; row-based storage
● CitusDB Extension○ Automated data sharding and parallelization○ Columnar Storage Format (better storage and performance)
● Vertica (HP)○ Columnar Storage, Parallel Execution
○ Started by Michael Stonebraker (Postgres original author)
● Amazon Redshift○ Fork of PostgreSQL 8.2 -- ParAccel DB○ Columnar Storage & Parallel Executions
3- Scaling Up
![Page 43: Why PostgreSQL for Analytics Infrastructure (DW)?](https://reader034.vdocuments.mx/reader034/viewer/2022050614/588915d11a28ab4a5c8b6573/html5/thumbnails/43.jpg)
Other Proprietary DW Databases (Relational)● Greenplum
● Teradata
● Infobright
● Google BigQuery
● Aster Data
● Paraccel (Postgres fork)
● Vertica (from Postgres author)
● CitusDB (Postgres extension)
● Amazon Redshift (from Paraccel)
1. Simple start 2. Rich features 3. Scale up
Related to Postgres
![Page 44: Why PostgreSQL for Analytics Infrastructure (DW)?](https://reader034.vdocuments.mx/reader034/viewer/2022050614/588915d11a28ab4a5c8b6573/html5/thumbnails/44.jpg)
Compare: Popular SQL Databases
PostgreSQL MySQL Oracle SQL Server
License / Cost Free / Open-source Free / Open-source Expensive Expensive
DW features Strong Weak Strong Strong
![Page 45: Why PostgreSQL for Analytics Infrastructure (DW)?](https://reader034.vdocuments.mx/reader034/viewer/2022050614/588915d11a28ab4a5c8b6573/html5/thumbnails/45.jpg)
● SQL features
○ WITH clause
○ Window functions
○ Aggregation functions
○ Statistical functions
● Data structures
○ JSON / JSONB
○ Arrays
○ PostGIS (geo data)
○ Geometry (point, line, etc)
○ HyperLogLog (extension)
● PL/SQL
● Full-text search (n-gram)
● Performance:
○ Parallel queries (pg9.6)
○ Materialized views
○ BRIN index
● Others:
○ DISTINCT ON
○ VALUES
○ generate_series()
○ Support FULL OUTER JOIN
○ Better EXPLAIN
![Page 46: Why PostgreSQL for Analytics Infrastructure (DW)?](https://reader034.vdocuments.mx/reader034/viewer/2022050614/588915d11a28ab4a5c8b6573/html5/thumbnails/46.jpg)
● SQL features
○ WITH clause
○ Window functions
○ Aggregation functions
○ Statistical functions
● Data structures
○ JSON / JSONB
○ Arrays
○ PostGIS (geo data)
○ Geometry (point, line, etc)
○ HyperLogLog (extension)
● PL/SQL
● Full-text search
● Performance:
○ Parallel queries (pg9.6)
○ Materialized views
○ BRIN index
● Others:
○ DISTINCT ON
○ VALUES
○ generate_series()
○ Support FULL OUTER JOIN
○ Better EXPLAIN
![Page 47: Why PostgreSQL for Analytics Infrastructure (DW)?](https://reader034.vdocuments.mx/reader034/viewer/2022050614/588915d11a28ab4a5c8b6573/html5/thumbnails/47.jpg)
![Page 48: Why PostgreSQL for Analytics Infrastructure (DW)?](https://reader034.vdocuments.mx/reader034/viewer/2022050614/588915d11a28ab4a5c8b6573/html5/thumbnails/48.jpg)
Summary
1. Simple to Get Started
2. Rich Features for Analytics
– Data Pipeline (ETL)
– Data Analysis
3. Easy to Scale Up
(3) Scale(1) Start (2) Grow
Data Growth
![Page 49: Why PostgreSQL for Analytics Infrastructure (DW)?](https://reader034.vdocuments.mx/reader034/viewer/2022050614/588915d11a28ab4a5c8b6573/html5/thumbnails/49.jpg)
Summary (cont)
● Why starting with Postgres
● Scaling up to DW databases
● Comparing with other transactional DBs
● Not Cover:
○ How to setup PostgreSQL for DW
○ Performance Optimizations
○ Behavioural Data: Hadoop, Spark, HDFS