greenplum database open source december 2015

PowerPoint Presentation

# 2015 Pivotal Software, Inc. All rights reserved.

Greenplum DatabaseOpen SourceDecember, 2015


Forward Looking StatementsThis presentation contains forward-looking statements as defined under the Federal Securities Laws. Actual results could differ materially from those projected in the forward-looking statements as a result of certain risk factors, including but not limited to: (i) adverse changes in general economic or market conditions; (ii) delays or reductions in information technology spending; (iii) the relative and varying rates of product price and component cost declines and the volume and mixture of product and services revenues; (iv) competitive factors, including but not limited to pricing pressures and new product introductions; (v) component and product quality and availability; (vi) fluctuations in VMwares Inc.s operating results and risks associated with trading of VMware stock; (vii) the transition to new products, the uncertainty of customer acceptance of new product offerings and rapid technological and market change; (viii) risks associated with managing the growth of our business, including risks associated with acquisitions and investments and the challenges and costs of integration, restructuring and achieving anticipated synergies; (ix) the ability to attract and retain highly qualified employees; (x) insufficient, excess or obsolete inventory; (xi) fluctuating currency exchange rates; (xii) threats and other disruptions to our secure data centers and networks; (xiii) our ability to protect our proprietary technology; (xiv) war or acts of terrorism; and (xv) other one-time events and other important factors disclosed previously and from time to time in the filings EMC Corporation, the parent company of Pivotal, with the U.S. Securities and Exchange Commission. EMC and Pivotal disclaim any obligation to update any such forward-looking statements after the date of this release.


Safe HarborAny information regarding pre-release of Pivotal offerings, future updates or other planned modifications is subject to ongoing evaluation by Pivotal and therefore subject to change. This information is provided without warranty of any kind, express or implied. Customers who purchase Pivotal offerings should make their purchase decision based upon features that are currently available. Pivotal has no obligation to update forward looking information in this presentation.


Relational database system for big dataMission critical & system of record product with supporting tools and ecosystemFully open source with a global community of developers and usersImplement worlds leading research in database technology across all componentsOptimizer, Query ExecutionTransaction Processing, Database Storage, Compression, High AvailabilityEmbedded Programming Languages (Python, R, Java, etc . )In-Database analytics in domains (e.g. Geospatial, Text, Machine Learning, Mathematics, etc . )Performance tuned for multiple workload profilesAnalytics, long running queries, short running queries, mixed workloadsLarge industrial focused systemFinancial, Government, Telecom, Retail, Manufacturing, Oil & Gas, etc.Greenplum Database Mission & Strategy


An ambitious project10 years in the making Investment of hundred of millions of dollarsPotential to define a new market and disrupt traditional EDW vendorswww.greenplum.orgGithub codemailing lists / community engagementGlobal project w/ external contributorsPivotal GreenplumEnterprise software distribution & release managementPivotal expertise24-hour global supportGreenplum Open Source


PostgreSQL Compatibility

Roadmap

Strategic backport key features from PostgreSQL to Greenplum JSONB, UUID, Variadic functions, Default function arguments, etc. Consistent back porting of patches from older PostgreSQL to Greenplum


MPP Shared Nothing Architecture

StandbyMasterSegment Host with one or more Segment InstancesSegment Instances process queries in parallel

Performance Through Segment Instance Parallelism High speed interconnect for continuous pipelining of data processing

MasterHost

SQLMaster Host and Standby Master HostMaster coordinates work with Segment HostsInterconnectSegment HostSegment InstanceSegment InstanceSegment InstanceSegment InstanceSegment Hosts have their own CPU, disk and memory (shared nothing)

Segment HostSegment InstanceSegment InstanceSegment InstanceSegment Instance

node1


node2


node3


nodeN


The master host is responsible for coordinating the query workload across the segment hosts. The master does not store user data. Standby master is a warm standby master. Segment host runs one or more segment instances. Basically, each segment host runs its own GPDB. Segment hosts run in a shared nothing environment with their own CPU, disk and memory. Segments stores data and are responsible for executing queries in parallel. The interconnect between the segment hosts is a high speed bus or interconnect, pipelining data between the segment hosts.

Master Host

Master SegmentCatalogQuery OptimizerDistributed TMDispatchQuery ExecutorParser enforces syntax, semantics and produces a parse treeClientAccepts client connections, incoming user requests and performs authenticationParserMaster Host


The master host accepts client connections, performs authentication and incoming user requests. The Postgres database listener process runs on the master host by default on port 5432. The master may perform final processing for queries for example, aggregations, summations, orders and sorts. The master does not contain user data. It is also important to note that system and database administration tasks are performed on the master host. The Parser checks syntax, semantics and produces a parse tree for the Query Optimizer.

Pivotal Query Optimizer

Local StorageMaster SegmentCatalogDistributed TMInterconnectDispatcherQuery ExecutorParserQuery OptimizerConsumes the parse tree and produces the query plan Query execution plan contains how the query is executedMaster Host

Segment InstanceLocal TMQuery ExecutorCatalogLocal StorageSegment HostSegment InstanceLocal TMQuery ExecutorCatalogLocal StorageSegment InstanceLocal TMQuery ExecutorCatalogLocal Storage




The Parallel Query Optimizer uses a cost-based algorithm to evaluate potential query plans and selects the most efficient plan.

Query Dispatcher

Local StorageMaster SegmentCatalogDistributed TMInterconnectQuery OptimizerQuery ExecutorParserDispatcherResponsible for communicating the query plan to segments Allocates cluster resources required to perform the job and accumulating/presenting final resultsMaster Host





The Query Dispatcher dispatches the query plan to segments, allocates resources across segments and is responsible for accumulating and presenting the final results.

Query Executor

Local StorageMaster SegmentCatalogDistributed TMInterconnectQuery OptimizerQuery DispatcherParserQuery ExecutorResponsible for executing the steps in the plan(e.g. open file, iterate over tuples)Communicates its intermediate results to other executor processes



Segment InstanceLocal TMQuery ExecutorCatalogLocal StorageSegment HostSegment InstanceLocal TMQuery ExecutorCatalogLocal StorageSegment InstanceLocal TMQuery ExecutorCatalogLocal StorageMaster Host


A Query Executor (worker process) is responsible for completing its portion of work and communicating its intermediate results to the other worker processes. For each slice of the query plan, there is at least one worker process assigned. A worker process works on its assigned portion of the query plan independently. During query execution, each segment will have a number of worker processes working on the query in parallel. Related worker processes across segments that are working on the same portion of the query plan are referred to as gangs.

Distributed Transaction Management

Local StorageMaster SegmentQuery ExecutorCatalogInterconnectQuery OptimizerQuery DispatcherParserDistributed TMSegments have their own commit and replay logs and decide when to commit, abort for their own transactionsDTM resides on the master and coordinates the commit and abort actions of segments



Segment InstanceLocal TMQuery ExecutorCatalogLocal StorageSegment HostSegment InstanceLocal TMQuery ExecutorCatalogLocal StorageSegment InstanceLocal TMQuery ExecutorCatalogLocal StorageMaster Host


By default GPDB runs in autocommit mode. Each command is implicitly wrapped in a BEGIN and COMMIT (or a ROLLBACK if there is an error). Each statement issued in psql is its own transaction. You can explicitly use BEGIN to start a transaction and COMMIT or ROLLBACK.

GPDB High AvailabilityMaster Host mirroringWarm Standby Master HostReplica of Master Host system catalogsEliminates single point of failureSynchronization process between Master Host and Standby Master HostUses PostgreSQL WAL ReplicationSegment mirroringCreates a mirror segment for every primary segmentUses a custom file block replication process If a primary segment becomes unavailable automatic failover to the mirror


Master mirroring creates a warm standby master. Once configured, the synchronization or replication process between the master and the standby master is executed. This synchronization process, gpsyncagent, is executed from the standby master and ensures that the data between both systems are synchronized. Should the master host become unavailable the replication process is stopped. The replication logs are used to reconstruct the state of the master at the time of the failure. The standby master can be activated to pick up from the last set of successful transactions completed by the master.

A mirror segment is normally configured on a different host than its primary counterpart. It can be configured on systems outside of the array. Changes to the primary segment are copied over to the mirror segment using a file block replication process. Until a failure occurs, there is no live segment instance running on the mirror host, only the replication process. Should the primary segment become unavailable the file replication process is stopped and the mirror is automatically brought online as the primary segment.

Define the Storage ModelCREATE TABLEHeap Tables versus Append Optimized (AO) TablesRow oriented storage versus Column oriented storageCompressionTable level compression applied to entire tableColumn level compression applied to a specific column w/ columnar storageZlib level with Run Length Encoding Optional


Polymorphic Storage User Definable Storage Layout

Columnar storage compresses betterOptimized for retrieving a subset of the columns when queryingCompression can be set differently per column: gzip (1-9), quicklz, delta, RLERow oriented faster when returning all columnsHEAP for many updates and deletesUse indexes for drill through queries

TABLE SALESJun

Column-orientedRow-oriented

Oct

Year -1

Year -2

External HDFSLess accessed partitions on HDFS with external partitions to seamlessly query all dataText, CSV, Binary, Avro, Parquet formatAll major HDP Distros

Nov

Dec

Jul

Aug

Sep


CREATE TABLE Define Data DistributionsOne of the most important aspects of GPDB!Every table has a distribution method DISTRIBUTED BY (column)Uses a hash distributionDISTRIBUTED RANDOMLYUses a random distribution which is not guaranteed to provide a perfectly even distributionExplicitly define a column or random distribution for all tablesDo not use the default


.

Data Distribution: The Key to ParallelismThe primary strategy and goal is to spread data evenly across all segment instances. Most important in a MPP shared nothing architecture!43Oct 20 20051264Oct 20 200511145Oct 20 20054246Oct 20 20056477Oct 20 20053248Oct 20 200512

OrderOrder #Order DateCustomer ID50Oct 20 20053456Oct 20 200521363Oct 20 20051544Oct 20 200510253Oct 20 20058255Oct 20 200555


CREATE TABLE Define PartitioningReduces the amount of data to be scanned by reading only the relevant data needed to satisfy a queryThe only goal of partitioning is to achieve partition elimination aka partition pruning Is not a substitution for distributionsA good distribution strategy and partitioning that achieves partition elimination unlocks performance magicUses table inheritance and constraints Persistent relationship between parent and child tables


Table partitioning may be used to improve query performance by scanning only the relevant data needed to satisfy a given query. Table partitioning does not affect the physical distribution of data. GPDB supports both range partitioning or list partitioning. Range partitioning partitions the data based on a numerical range, for example by date, while list partitioning partitions the data based on a list of values, for example location (store, city, state, region). Partitioning using a combination of range partitioning and list partitioning is also supported. Table partitioning is implemented using table inheritance and constraints. Check constraints limit the data a table can contain based on some defining criteria. These constraints are also used at runtime to determine which child tables to scan in order to satisfy a given query.

Segment 1ASegment 1BSegment 1CSegment 1DSegment 2ASegment 2BSegment 2CSegment 2DSegment 3ASegment 3BSegment 3CSegment 3D


Segment 1ASegment 1BSegment 1CSegment 1DSegment 2ASegment 2BSegment 2CSegment 2DSegment 3ASegment 3BSegment 3CSegment 3DDistributions and PartitioningSELECT COUNT(*) FROM orders WHERE order_date >= Oct 20 2007 AND order_date < Oct 27 2007&Evenly distribute orders data across all segmentsOnly scans the relevant order partitions



Polymorphic Storage User Definable Storage Layout

Columnar storage compresses betterOptimized for retrieving a subset of the columns when queryingCompression can be set differently per column: gzip (1-9), quicklz, delta, RLERow oriented faster when returning all columnsHEAP for many updates and deletesUse indexes for drill through queries

TABLE SALESJun

Column-orientedRow-oriented

Oct

Year -1

Year -2

External HDFSLess accessed partitions on HDFS with external partitions to seamlessly query all dataText, CSV, Binary, Avro, Parquet formatAll major HDP Distros

Nov

Dec

Jul

Aug

Sep


AnalyticsBringing the power of parallelism to modeling and analytics

Path FunctionsIdentify rows of interest from raw table or viewPattern match across rows using regexDefine one or more windows on the matchesApply standard PostgreSQL window functions or aggregations on the windows

Future Roadmap

Support Vector Machines

GP Text

Time Series, Gap FillingComplex Number Support


22

Government detection of benefits that should not be madeGovernment detection of tax fraudGovernment economic statistics research databaseCommercial banking wealth management data science and product developmentFinancial corporation's risk and trade repositories reportingPharmaceutical company vaccine potency prediction based on manufacturing sensors401K providers analytics on investment choicesAuto manufacturers analytics on predictive maintenanceCorporate/Financial internal email and communication surveillance and reportingOil drilling equipment predictive maintenanceMobile telephone company enterprise data warehouseRetail store chain customer purchases analyticsAirlines loyalty program analyticsTelecom company network performance and availability analyticsCorporate network anomalous behavior and intrusion detectionsSemiconductor Fab sensor analytics and reportingHighlighted Greenplum successes


Recent Accomplishments

4.3.5.0 April 2015GA of Pivotal Query OptimizerWith Parallel & Incremental Analyze

External PartitionsGP-Workload Manager

Pivotal Query OptimizerGPDB 4.3.5 May 2015GPDB 4.3.6 Sept 2015GPCC 2.0 Dec 2015


Recent Accomplishments

4.3.5.0 April 2015GA of Pivotal Query OptimizerWith Parallel & Incremental Analyze

Greenplum Open SourceEMC DCA V3Topic Modelling & Matrix OperationsMadlib 1.8 July 2015October 2015DCA V3 Dec 2015


Pivotal Greenplum Roadmap Highlights

S3 External TablesPerformance tuned for AWSDynamic Code Generation using LLVMShort running query performance enhancementsFaster analyzeWAL Replication Segment MirroringIncremental restore MVPDisk space full warningsSnapshot Backup

Anaconda Python Modules: NLTK, etcTime Series Gap FillingComplex Numbers PostGIS Raster SupportGeospatial TrajectoriesPath analyticsEnhanced SVM modulePy-MadlibLock Free Backup


Frank McQuillan () - TBD if we ought to build a PCF service broker. I like it in principle, but have been trying to gage customer interest - vast majority of PCF customers are on vSphere.

For this section I would suggest:

Focus on AWSS3 external tablesBOSH deploy n-node cluster and common extensionsAutomation with BOSH of common day 2 tasksManaged service offering (beta)Customer self-provisioning (beta)Ivan Novick () - you tell me? What are we delivering, we should clarify thisFrank McQuillan () - Do you mean a PCF service broker to an external GPDB cluster, or a GPDB PCF service with BOSH deploying and managing the GPDB cluster?Greenplum File System PrimerYon LewzData Inc

Directory StructureOne directory per database per segment//base/ e.g. /d/d2/primary/gpseg_37/base/19002SELECT oid, datname FROM pg_database;

Data FilesEach file is named using the pg_classs.relfilenode column of its relation

SELECT relfilenode FROM pg_class WHERE oid = test.mytable::regclass;

Originally relfilenode is equal to the OID of the relation but numerous database operations (e.g. truncate) can change this value

DiagnosticsCREATE EXTERNAL WEB TABLE database_files ( host TEXT , segment INT , file TEXT , mtime TIMESTAMP , sz BIGINT) EXECUTE Els l time-style=+%Y%m%d_%H:%M:%S \$GP_SEG_DATADIR/base/ | awk {\\print ENVIRON[HOSTNAME]|ENVIRON[GP_SEGMENT_ID]|\$7|\$6|\$5\\} ON ALLFORMAT text (DELIMITER E| NULL );

DiagnosticsQuerying this table can produce substantial load since it stats every file in the clusterViews can easily be built on top of table to join back to pg_class

Heap TablesOne data file per heap table for tuple storageMinimum file size is equal to the default blocksize defined for the database

CREATE TABLE test1 (a INT, b VARCHAR, c DATE);INSERT INTO test1 VALUES(1, a, current_date);SELECT segment_id, sz FROM database_file WHERE file like %;

segment | sz0 | 01 | 02 | 327683 | 0

AO TablesEither row or columnar orientationVariable file sizeColumnar tables have one file per column (files with format .*)Concurrent loads also create a set of new files related to each tableAO tables initially consist of a single empty file in each data directory until data is insertedData files are not limited to a minimum size corresponding to the database blocksize.

AO TablesCREATE TABLE test1 (a INT, b VARCHAR, c DATE) WITH (appendonly=true, orientation = row);SELECT segment_id, file, sz FROM database_file WHERE file like %;

segment | file | sz0 | 3000010 | 01 | 3000010 | 02 | 3000010 | 03 | 3000010 | 0

INSERT INTO test1 VALUES(1, a, current_date);SELECT segment_id, sz FROM database_file WHERE file like %;

segment | file | sz0 | 3000010 | 01 | 3000010 | 02 | 3000010 | 02 | 3000010.1 | 403 | 3000010 | 0

AO TablesCREATE TABLE test1 (a INT, b VARCHAR, c DATE) WITH (appendonly=true, orientation = column);SELECT segment_id, file, sz FROM database_file WHERE file like %;

segment | file | sz0 | 3000010 | 01 | 3000010 | 02 | 3000010 | 03 | 3000010 | 0

INSERT INTO test1 VALUES(1, a, current_date);SELECT segment_id, sz FROM database_file WHERE file like %;

segment | file | sz0 | 3000010 | 01 | 3000010 | 02 | 3000010 | 02 | 3000010.1 | 402 | 3000010.129 | 402 | 3000010.257 | 403 | 3000010 | 0

AO TablesFor large fact tables ADD/DROP COLUMN operations are much faster carried out against AO columnar tables as no rewrite of data files is required.

AO TablesBeware of large numbers of concurrent loads running against AO tablesFor example, 50 concurrent loads running against an AO columnar table with 500 columns will produce 20000 primary segment files on a single segment host (500 column files x 50 loads x 8 primary segments)File system efficiency can decline drastically as the number of files increases

AO TablesWorkarounds:Rebuild the partition via batch processing every night (CTAS followed by a partition swap)Load into a heap organized staging table

SkewTypically skew is discovered due to unbalanced storage in one or more segments in the clusterSkew in the gp_toolkit view is calculated by querying the hidden gp_segment_id columnSELECT gp_segment_id, count(*) FROM mytable GROUP BY 1;This operation is prohibitively expensive when querying all tables in a cluster

SkewQuerying file metadata with the diagnostic table is much fasterCoefficient of variation, interquartile range

SELECT substring(file, ([0-9]+)), , stddev(sz)/avg(sz) FROM database_files GROUP BY 1 HAVING SUM(sz) != 0;

BloatChecking for skew via the gp_segment_id column will miss physical skew due to bloat (dead space from deleted/updated tuples).

Join the community!

WebsiteMailing listsGithubEventsMore .

Frank McQuillan () - TBD if we ought to build a PCF service broker. I like it in principle, but have been trying to gage customer interest - vast majority of PCF customers are on vSphere.

For this section I would suggest:

Focus on AWSS3 external tablesBOSH deploy n-node cluster and common extensionsAutomation with BOSH of common day 2 tasksManaged service offering (beta)Customer self-provisioning (beta)Ivan Novick () - you tell me? What are we delivering, we should clarify thisFrank McQuillan () - Do you mean a PCF service broker to an external GPDB cluster, or a GPDB PCF service with BOSH deploying and managing the GPDB cluster?


greenplum database open source december 2015

Data & Analytics