gpdb best practices v a01 20150313

Download Gpdb best practices v a01 20150313

Post on 18-Jul-2015

423 views

Category:

Software

2 download

Embed Size (px)

TRANSCRIPT

Pivotal Cloud Platform

Pivotal Greenplum Database Best Practices 2015. 03. 18Lee, SangHeeSR. Field EnginnerPIVOTAL KOREA

Copyright 2014 Pivotal. All rights reserved. Copyright 2014 Pivotal. All rights reserved. Copyright 2014 Pivotal. All rights reserved.SDDC PaaS DevOps Pivotal CF .INTRODUCTIONData ModelHeap and AO StorageRow and Column StorageCompression DistributionsMemory Management IndexesPartitioningVacuumLoadingResource Queues AnalyzeDATA TYPESLocal (co-located) JoinsData SkewProcessing Skew Copyright 2014 Pivotal. All rights reserved.INTRODUCTIONGreenplum database(GPDB) Greenplum - GPDB Greenplum Greenplum Database : http://gpdb.docs.pivotal.io/gpdb-434.html

Copyright 2014 Pivotal. All rights reserved.Data ModelGPDB MPP shared nothing database , SMP .GPDB MPP .Ex) Star schema, snowflake schema Fact , dimension Copyright 2014 Pivotal. All rights reserved.Heap and AO(Append Only) StorageHeap / Append only Row DML(update, delete, insert) HEAP Concurrent update, delete, insert Heap Row INSERT/UDATE/DELTE AO Concurrent Batch UPDATE, DELETE AO ( , Concurrent INSERT .)AO update, delete recover, reuse .

Copyright 2014 Pivotal. All rights reserved.Row and Column Oriented StorageRow / Column Oriented storage Row Oriented storage update insert access SELECT Mixed Workload Column Oriented storage Access select aggregation Row update Copyright 2014 Pivotal. All rights reserved.Compression - AO , I/O CPU cycle . Copyright 2014 Pivotal. All rights reserved.Distributions - MPP DBMS . , . ( ) WHERE , Local : - broadcast motion redistribution motion . Incremental Loads SKEW Randomly round robin distribution . 10% Copyright 2014 Pivotal. All rights reserved.Memory Management OS - /etc/sysctl.conf vm.overcommit_memory = 2 - pages OS Database : gp_vmem_protect_limit - gp_vmem_protect_limit - gp_vmem_protect_limit - gp_vmem_protect_limit : (SWAP + (RAM*vm.overcommit_ratio))* 0.9/number_segments_per_serverDB : gp_vmem_protect_limit - statement_mem Resource Queue - ACTIVE_STATEMENTS MEMORY_LIMIT - Default Resource Queue - Resource Queue gp_vmem_protect_limit - Priority ( operation flow RQ ) Copyright 2014 Pivotal. All rights reserved.Indexes GPDB Index . cardinality update Index Index B-Tree Index UPDATE Bitmap index Unique , cardinality , Bigmap Index Bitmap index . . Copyright 2014 Pivotal. All rights reserved.Partitioning Read . , - scan . ( ) immutable operators .=, < , , >= , and Default : Scan . ( ) - .GPDB ? (, OS open file limit ) -

Copyright 2014 Pivotal. All rights reserved.Vacuum update, delete vacuum Vacuum Full Bloated CTAS Reorg Vacuum (Bloat ) Bloat Vacuum Full Catalog Vacuum Kill Vacuum analyze Vacuum Analyze Copyright 2014 Pivotal. All rights reserved.Loading GPDB load, unload gpfdist (gpload ) . ( Loading )ETL . gpfdist gpfdist Interface .Gp_external_max_segs gpfdist server .gp_external_max_segs . index ( ) analyze gp_autostats_mode NONE vacuum (recover space) Copyright 2014 Pivotal. All rights reserved.Resource Queues Resource Queue . user defined resource queue . ACTIVE_STATEMENTS MEMORY_LIMIT MEDIUM (workload ) Resource Queue Copyright 2014 Pivotal. All rights reserved.Analyze Database analyze . . analyze Insert, update, delete analyze Create index analyze analyze , , where , sort, group by, having analyze

Copyright 2014 Pivotal. All rights reserved.DATA TYPES data type Character data type , CHAR TEXT VARCHAR Type Numeric data type data type . Type . - Data type GPDB data type .Numeric data type data type . Type . data type size .

Copyright 2014 Pivotal. All rights reserved.Local (co-located) Joins Local Join . (, )Local Join ( , )Local Join . ( Data Type, ). Data Type Hash .

Copyright 2014 Pivotal. All rights reserved.Data SkewData Skew Read - join, group by Data Skew , , . SELECT 'Example Table' AS "Table Name", max(c) AS "Max Seg Rows", min(c) AS "Min Seg Rows", (max(c)-min(c))*100.0/max(c) AS "Percentage Difference Between Max & Min" FROM (SELECT count(*) c, gp_segment_id from facts group by 2) AS a;

Copyright 2014 Pivotal. All rights reserved.Processing SkewData Skew , Processing SKEW detect .MPP Processing skew . - Greenplum Database process skew .Processing SKEW 1) database OID # selectoid, datname from pg_database ;oid| datname -------+-----------17088 | gpadmin10899 | postgres38817 | pws

Copyright 2014 Pivotal. All rights reserved.Processing Skew 2) [gpadmin@mdw kend]$ gpssh -f ~/hosts -e "du -b /data[1-2]/primary/gpseg*/base//pgsql_tmp/*" | grep -v "du -b" | sort | awk -F" " '{ arr[$1] = arr[$1] + $2 ; tot = tot + $2 }; END { for ( i in arr ) print "Segment node" i, arr[i], "bytes (" arr[i]/(1024**3)" GB)"; print "Total", tot, "bytes (" tot/(1024**3)" GB)" }' -Example output:Segment node[sdw1] 2443370457 bytes (2.27557 GB)Segment node[sdw2] 1766575328 bytes (1.64525 GB)Segment node[sdw3] 1761686551 bytes (1.6407 GB)Segment node[sdw4] 1780301617 bytes (1.65804 GB)Segment node[sdw5] 1742543599 bytes (1.62287 GB)Segment node[sdw6] 1830073754 bytes (1.70439 GB)Segment node[sdw7] 1767310099 bytes (1.64594 GB)Segment node[sdw8] 1765105802 bytes (1.64388 GB)Total 14856967207 bytes (13.8366 GB)

Copyright 2014 Pivotal. All rights reserved.Processing Skew 3) (SKEW ) [gpadmin@mdw kend]$ gpssh -f ~/hosts -e "du -b /data[1-2]/primary/gpseg*/base//pgsql_tmp/*" | grep -v "du -b" | sort | awk -F" " '{ arr[$1] = arr[$1] + $2 ; tot = tot + $2 }; END { for ( i in arr ) print "Segment node" i, arr[i], "bytes (" arr[i]/(1024**3)" GB)"; print "Total", tot, "bytes (" tot/(1024**3)" GB)" }' -Example output:Segment node[sdw1] 2443370457 bytes (2.27557 GB)Segment node[sdw2] 1766575328 bytes (1.64525 GB)Segment node[sdw3] 1761686551 bytes (1.6407 GB)Segment node[sdw4] 1780301617 bytes (1.65804 GB)Segment node[sdw5] 1742543599 bytes (1.62287 GB)Segment node[sdw6] 1830073754 bytes (1.70439 GB)Segment node[sdw7] 1767310099 bytes (1.64594 GB)Segment node[sdw8] 1765105802 bytes (1.64388 GB)Total 14856967207 bytes (13.8366 GB)

SKEW .

Copyright 2014 Pivotal. All rights reserved.Processing Skew 4) SKEW ( sort )[gpadmin@mdw kend]$ gpssh -f ~/hosts -e "ls -l /data[1-2]/primary/gpseg*/base/19979/pgsql_tmp/*" | grep -i sort | sort[sdw1] -rw------- 1 gpadmin gpadmin 1002209280 Jul 29 12:48 /data1/primary/gpseg2/base/19979/pgsql_tmp/pgsql_tmp_slice10_sort_19791_0001.0[sdw1] -rw------- 1 gpadmin gpadmin 1003356160 Jul 29 12:48 /data1/primary/gpseg1/base/19979/pgsql_tmp/pgsql_tmp_slice10_sort_19789_0001.0[sdw1] -rw------- 1 gpadmin gpadmin 288718848 Jul 23 14:58 /data1/primary/gpseg2/base/19979/pgsql_tmp/pgsql_tmp_slice0_sort_17758_0001.0[sdw8] -rw------- 1 gpadmin gpadmin 924581888 Jul 29 12:48 /data2/primary/gpseg45/base/19979/pgsql_tmp/pgsql_tmp_slice10_sort_15673_0010.9[sdw8] -rw------- 1 gpadmin gpadmin 990085120 Jul 29 12:48 /data1/primary/gpseg42/base/19979/pgsql_tmp/pgsql_tmp_slice10_sort_15667_0001.0

Sdw8 gpseg45 instance

Copyright 2014 Pivotal. All rights reserved.Processing Skew 5) lsof ID .[root@sdw8 ~]# lsof /data2/primary/gpseg45/base/19979/pgsql_tmp/pgsql_tmp_slice10_sort_15673_0002.1COMMAND PID USER FD TYPE DEVICE SIZE NODE NAMEpostgres 15673gpadmin 11u REG 8,48 1073741824 64424546751 /data2/primary/gpseg45/base/19979/pgsql_tmp/pgsql_tmp_slice10_sort_15673_0002.1

6) ID [root@sdw8 ~]# ps -eaf | grep 15673gpadmin 15673 27471 28 12:05 ? 00:12:59 postgres: port 40003, sbaskin bdw 172.28.12.250(21813) con699238 seg45 cmd32 slice10 MPPEXEC SELECTroot 29622 29566 0 12:50 pts/16 00:00:00 grep 15673ID = con699238 , : cmd32

Copyright 2014 Pivotal. All rights reserved. Object - Best Practice 100,000 => * < 100,000

2. - madlib, pl/r, pl/java

- SKEW ( Data, Process) - Analyze - Vacuum / Re-org - Partition / Index - Local Join ( Motion check : broadcast, redistribution)

Copyright 2014 Pivotal. All rights reserved. Copyright 2014 Pivotal. All rights reserved. Copyright 2014 Pivotal. All rights reserved.A NEW PLATFORM FOR A NEW ERA Copyright 2014 Pivotal. All rights reserved.

View more >