data science with postgresql zs_bárány_-_data... · pdf filei big data? i...

101

Upload: dokhue

Post on 03-Feb-2018

227 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Data Science with PostgreSQL zs_Bárány_-_Data... · PDF fileI Big data? I Mash up & format for analysis I Analyze & visualize ... Data Science with PostgreSQL Data Science with

Data Science with PostgreSQL

Data Science with PostgreSQL

Balázs Bárány

Data Scientist

pgconf.de 2015

Page 2: Data Science with PostgreSQL zs_Bárány_-_Data... · PDF fileI Big data? I Mash up & format for analysis I Analyze & visualize ... Data Science with PostgreSQL Data Science with

Data Science with PostgreSQL

Contents

Introduction � What is Data Science?Process model

Tools and methods of Data Scientists

Data Science with PostgreSQLBusiness & data understandingPreprocessingModeling

Evaluation

Deployment

Summary

Page 3: Data Science with PostgreSQL zs_Bárány_-_Data... · PDF fileI Big data? I Mash up & format for analysis I Analyze & visualize ... Data Science with PostgreSQL Data Science with

Data Science with PostgreSQL

Introduction � What is Data Science?

Sexiest job of the 21st century

I According to Google, LinkedIn, ...

I Who is a Data Scientist?

Page 4: Data Science with PostgreSQL zs_Bárány_-_Data... · PDF fileI Big data? I Mash up & format for analysis I Analyze & visualize ... Data Science with PostgreSQL Data Science with

Data Science with PostgreSQL

Introduction � What is Data Science?

Sexiest job of the 21st century

I According to Google, LinkedIn, ...

I Who is a Data Scientist?

Page 5: Data Science with PostgreSQL zs_Bárány_-_Data... · PDF fileI Big data? I Mash up & format for analysis I Analyze & visualize ... Data Science with PostgreSQL Data Science with

Data Science with PostgreSQL

Introduction � What is Data Science?

Data Science Venn Diagram

(c) Drew Conway, 2010. CC-BY-NC

Page 6: Data Science with PostgreSQL zs_Bárány_-_Data... · PDF fileI Big data? I Mash up & format for analysis I Analyze & visualize ... Data Science with PostgreSQL Data Science with

Data Science with PostgreSQL

Introduction � What is Data Science?

Tasks of data scientists

I Get data from various sources

I Big data?

I Mash up & format for analysis

I Analyze & visualize

I Predict & prescribe

I Operationalize

Page 7: Data Science with PostgreSQL zs_Bárány_-_Data... · PDF fileI Big data? I Mash up & format for analysis I Analyze & visualize ... Data Science with PostgreSQL Data Science with

Data Science with PostgreSQL

Introduction � What is Data Science?

Tasks of data scientists

I Get data from various sources

I Big data?

I Mash up & format for analysis

I Analyze & visualize

I Predict & prescribe

I Operationalize

Page 8: Data Science with PostgreSQL zs_Bárány_-_Data... · PDF fileI Big data? I Mash up & format for analysis I Analyze & visualize ... Data Science with PostgreSQL Data Science with

Data Science with PostgreSQL

Introduction � What is Data Science?

Tasks of data scientists

I Get data from various sources

I Big data?

I Mash up & format for analysis

I Analyze & visualize

I Predict & prescribe

I Operationalize

Page 9: Data Science with PostgreSQL zs_Bárány_-_Data... · PDF fileI Big data? I Mash up & format for analysis I Analyze & visualize ... Data Science with PostgreSQL Data Science with

Data Science with PostgreSQL

Introduction � What is Data Science?

Tasks of data scientists

I Get data from various sources

I Big data?

I Mash up & format for analysis

I Analyze & visualize

I Predict & prescribe

I Operationalize

Page 10: Data Science with PostgreSQL zs_Bárány_-_Data... · PDF fileI Big data? I Mash up & format for analysis I Analyze & visualize ... Data Science with PostgreSQL Data Science with

Data Science with PostgreSQL

Introduction � What is Data Science?

Tasks of data scientists

I Get data from various sources

I Big data?

I Mash up & format for analysis

I Analyze & visualize

I Predict & prescribe

I Operationalize

Page 11: Data Science with PostgreSQL zs_Bárány_-_Data... · PDF fileI Big data? I Mash up & format for analysis I Analyze & visualize ... Data Science with PostgreSQL Data Science with

Data Science with PostgreSQL

Introduction � What is Data Science?

Process model

The Data Mining process

Cross Industry Standard Process for Data Mining (Kenneth Jensen/Wikimedia Commons)

Page 12: Data Science with PostgreSQL zs_Bárány_-_Data... · PDF fileI Big data? I Mash up & format for analysis I Analyze & visualize ... Data Science with PostgreSQL Data Science with

Data Science with PostgreSQL

Tools and methods of Data Scientists

Tools and methods

Tools and methods

Page 13: Data Science with PostgreSQL zs_Bárány_-_Data... · PDF fileI Big data? I Mash up & format for analysis I Analyze & visualize ... Data Science with PostgreSQL Data Science with

Data Science with PostgreSQL

Tools and methods of Data Scientists

Scripting and programming

I R

I Python with extensions

I Octave/Matlab, other mathematic languages

I Hadoop and Big Data programming libraries (mostly Java)

I Cloud services

Page 14: Data Science with PostgreSQL zs_Bárány_-_Data... · PDF fileI Big data? I Mash up & format for analysis I Analyze & visualize ... Data Science with PostgreSQL Data Science with

Data Science with PostgreSQL

Tools and methods of Data Scientists

Integrated GUI tools

I (partly) Open Source: RapidMiner, KNIME, Orange

I Data Warehouse tools extended for analytics: Pentaho, Talend

I Many commercial tools, e. g. SAS, IBM SPSS

I Hadoop-related newcomers: e. g. Datameer

Page 15: Data Science with PostgreSQL zs_Bárány_-_Data... · PDF fileI Big data? I Mash up & format for analysis I Analyze & visualize ... Data Science with PostgreSQL Data Science with

Data Science with PostgreSQL

Tools and methods of Data Scientists

Data Infrastructure

I Databases and data stores

I Relational, NoSQLI Hadoop clustersI In-memory

I Data streams

I Free-form data: text, images, video, audio, ...

I Web APIs

I Open Data

Page 16: Data Science with PostgreSQL zs_Bárány_-_Data... · PDF fileI Big data? I Mash up & format for analysis I Analyze & visualize ... Data Science with PostgreSQL Data Science with

Data Science with PostgreSQL

Tools and methods of Data Scientists

Data acquisition and preprocessing

I Data ingestion in raw format

I Joining, aggregating, �ltering, calculating, ...

I Data cleansing

I Missing valuesI Abnormal values

I Result: data set suitable for analytics

Page 17: Data Science with PostgreSQL zs_Bárány_-_Data... · PDF fileI Big data? I Mash up & format for analysis I Analyze & visualize ... Data Science with PostgreSQL Data Science with

Data Science with PostgreSQL

Tools and methods of Data Scientists

Data acquisition and preprocessing

I Data ingestion in raw format

I Joining, aggregating, �ltering, calculating, ...

I Data cleansing

I Missing valuesI Abnormal values

I Result: data set suitable for analytics

Page 18: Data Science with PostgreSQL zs_Bárány_-_Data... · PDF fileI Big data? I Mash up & format for analysis I Analyze & visualize ... Data Science with PostgreSQL Data Science with

Data Science with PostgreSQL

Tools and methods of Data Scientists

Data acquisition and preprocessing

I Data ingestion in raw format

I Joining, aggregating, �ltering, calculating, ...

I Data cleansing

I Missing valuesI Abnormal values

I Result: data set suitable for analytics

Page 19: Data Science with PostgreSQL zs_Bárány_-_Data... · PDF fileI Big data? I Mash up & format for analysis I Analyze & visualize ... Data Science with PostgreSQL Data Science with

Data Science with PostgreSQL

Tools and methods of Data Scientists

Data acquisition and preprocessing

I Data ingestion in raw format

I Joining, aggregating, �ltering, calculating, ...

I Data cleansing

I Missing valuesI Abnormal values

I Result: data set suitable for analytics

Page 20: Data Science with PostgreSQL zs_Bárány_-_Data... · PDF fileI Big data? I Mash up & format for analysis I Analyze & visualize ... Data Science with PostgreSQL Data Science with

Data Science with PostgreSQL

Tools and methods of Data Scientists

Predictive Modeling

I Supervised and unsupervised methods

I Target variable known or not

I Classi�cation (supervised): Prediction of a class or category

I Regression (supervised): Prediction of numeric value

I Clustering (unsupervised): Automatic �grouping� of data

I Association analysis, outlier detection, time series prediction,...

Page 21: Data Science with PostgreSQL zs_Bárány_-_Data... · PDF fileI Big data? I Mash up & format for analysis I Analyze & visualize ... Data Science with PostgreSQL Data Science with

Data Science with PostgreSQL

Tools and methods of Data Scientists

Predictive Modeling

I Supervised and unsupervised methods

I Target variable known or not

I Classi�cation (supervised): Prediction of a class or category

I Regression (supervised): Prediction of numeric value

I Clustering (unsupervised): Automatic �grouping� of data

I Association analysis, outlier detection, time series prediction,...

Page 22: Data Science with PostgreSQL zs_Bárány_-_Data... · PDF fileI Big data? I Mash up & format for analysis I Analyze & visualize ... Data Science with PostgreSQL Data Science with

Data Science with PostgreSQL

Tools and methods of Data Scientists

Predictive Modeling

I Supervised and unsupervised methods

I Target variable known or not

I Classi�cation (supervised): Prediction of a class or category

I Regression (supervised): Prediction of numeric value

I Clustering (unsupervised): Automatic �grouping� of data

I Association analysis, outlier detection, time series prediction,...

Page 23: Data Science with PostgreSQL zs_Bárány_-_Data... · PDF fileI Big data? I Mash up & format for analysis I Analyze & visualize ... Data Science with PostgreSQL Data Science with

Data Science with PostgreSQL

Tools and methods of Data Scientists

Deployment and operationalization

I Model application to new data => prediction + con�dence

I What to do with predictions?

I Store in ERP or CRM

I Tell someone (email, popup)

I Add a label (e. g. mark email as spam)

I Interrupt �nancial transaction => prescription

I Order supplies => prescription

I ...

Page 24: Data Science with PostgreSQL zs_Bárány_-_Data... · PDF fileI Big data? I Mash up & format for analysis I Analyze & visualize ... Data Science with PostgreSQL Data Science with

Data Science with PostgreSQL

Tools and methods of Data Scientists

Deployment and operationalization

I Model application to new data => prediction + con�dence

I What to do with predictions?

I Store in ERP or CRM

I Tell someone (email, popup)

I Add a label (e. g. mark email as spam)

I Interrupt �nancial transaction => prescription

I Order supplies => prescription

I ...

Page 25: Data Science with PostgreSQL zs_Bárány_-_Data... · PDF fileI Big data? I Mash up & format for analysis I Analyze & visualize ... Data Science with PostgreSQL Data Science with

Data Science with PostgreSQL

Tools and methods of Data Scientists

Deployment and operationalization

I Model application to new data => prediction + con�dence

I What to do with predictions?

I Store in ERP or CRM

I Tell someone (email, popup)

I Add a label (e. g. mark email as spam)

I Interrupt �nancial transaction => prescription

I Order supplies => prescription

I ...

Page 26: Data Science with PostgreSQL zs_Bárány_-_Data... · PDF fileI Big data? I Mash up & format for analysis I Analyze & visualize ... Data Science with PostgreSQL Data Science with

Data Science with PostgreSQL

Data Science with PostgreSQL

Data Science with PostgreSQL

Doing Data Science with PostgreSQL

Page 27: Data Science with PostgreSQL zs_Bárány_-_Data... · PDF fileI Big data? I Mash up & format for analysis I Analyze & visualize ... Data Science with PostgreSQL Data Science with

Data Science with PostgreSQL

Data Science with PostgreSQL

Caveats

I This stu� is not easy

I Must be root and postgres

I Maintain your PostgreSQL yourselfI Able to compile stu�

I You should ask ;-)

I your bossI co-workersI customer

Page 28: Data Science with PostgreSQL zs_Bárány_-_Data... · PDF fileI Big data? I Mash up & format for analysis I Analyze & visualize ... Data Science with PostgreSQL Data Science with

Data Science with PostgreSQL

Data Science with PostgreSQL

Caveats

I This stu� is not easy

I Must be root and postgres

I Maintain your PostgreSQL yourselfI Able to compile stu�

I You should ask ;-)

I your bossI co-workersI customer

Page 29: Data Science with PostgreSQL zs_Bárány_-_Data... · PDF fileI Big data? I Mash up & format for analysis I Analyze & visualize ... Data Science with PostgreSQL Data Science with

Data Science with PostgreSQL

Data Science with PostgreSQL

Caveats

I This stu� is not easy

I Must be root and postgres

I Maintain your PostgreSQL yourselfI Able to compile stu�

I You should ask ;-)

I your bossI co-workersI customer

Page 30: Data Science with PostgreSQL zs_Bárány_-_Data... · PDF fileI Big data? I Mash up & format for analysis I Analyze & visualize ... Data Science with PostgreSQL Data Science with

Data Science with PostgreSQL

Data Science with PostgreSQL

Data Science with PostgreSQL

Business & data understanding

Page 31: Data Science with PostgreSQL zs_Bárány_-_Data... · PDF fileI Big data? I Mash up & format for analysis I Analyze & visualize ... Data Science with PostgreSQL Data Science with

Data Science with PostgreSQL

Data Science with PostgreSQL

Business & data understanding

Business understanding

I What is the purpose of the business?

I What are existing processes?

I Drivers of business success

I Project goals and challenges

I Availability of data and resources

I Success criteria

I Not a technical activity, PostgreSQL can't help much

Page 32: Data Science with PostgreSQL zs_Bárány_-_Data... · PDF fileI Big data? I Mash up & format for analysis I Analyze & visualize ... Data Science with PostgreSQL Data Science with

Data Science with PostgreSQL

Data Science with PostgreSQL

Business & data understanding

Business understanding

I What is the purpose of the business?

I What are existing processes?

I Drivers of business success

I Project goals and challenges

I Availability of data and resources

I Success criteria

I Not a technical activity, PostgreSQL can't help much

Page 33: Data Science with PostgreSQL zs_Bárány_-_Data... · PDF fileI Big data? I Mash up & format for analysis I Analyze & visualize ... Data Science with PostgreSQL Data Science with

Data Science with PostgreSQL

Data Science with PostgreSQL

Business & data understanding

Business understanding

I What is the purpose of the business?

I What are existing processes?

I Drivers of business success

I Project goals and challenges

I Availability of data and resources

I Success criteria

I Not a technical activity, PostgreSQL can't help much

Page 34: Data Science with PostgreSQL zs_Bárány_-_Data... · PDF fileI Big data? I Mash up & format for analysis I Analyze & visualize ... Data Science with PostgreSQL Data Science with

Data Science with PostgreSQL

Data Science with PostgreSQL

Business & data understanding

Data understanding

I Existing data

I Entities and covered conceptsI Complete? Correct? In suitable form?I Usable? (regulations, access constraints, ...)

I Connecting separate data sources

I Simple or complex IDs

I Data size

I Too smallI Too big

I Suitability for predictive modeling

I Target variable?I Attribute types

Page 35: Data Science with PostgreSQL zs_Bárány_-_Data... · PDF fileI Big data? I Mash up & format for analysis I Analyze & visualize ... Data Science with PostgreSQL Data Science with

Data Science with PostgreSQL

Data Science with PostgreSQL

Business & data understanding

Data understanding

I Existing data

I Entities and covered conceptsI Complete? Correct? In suitable form?I Usable? (regulations, access constraints, ...)

I Connecting separate data sources

I Simple or complex IDs

I Data size

I Too smallI Too big

I Suitability for predictive modeling

I Target variable?I Attribute types

Page 36: Data Science with PostgreSQL zs_Bárány_-_Data... · PDF fileI Big data? I Mash up & format for analysis I Analyze & visualize ... Data Science with PostgreSQL Data Science with

Data Science with PostgreSQL

Data Science with PostgreSQL

Business & data understanding

Data understanding

I Existing data

I Entities and covered conceptsI Complete? Correct? In suitable form?I Usable? (regulations, access constraints, ...)

I Connecting separate data sources

I Simple or complex IDs

I Data size

I Too smallI Too big

I Suitability for predictive modeling

I Target variable?I Attribute types

Page 37: Data Science with PostgreSQL zs_Bárány_-_Data... · PDF fileI Big data? I Mash up & format for analysis I Analyze & visualize ... Data Science with PostgreSQL Data Science with

Data Science with PostgreSQL

Data Science with PostgreSQL

Business & data understanding

Data understanding

I Existing data

I Entities and covered conceptsI Complete? Correct? In suitable form?I Usable? (regulations, access constraints, ...)

I Connecting separate data sources

I Simple or complex IDs

I Data size

I Too smallI Too big

I Suitability for predictive modeling

I Target variable?I Attribute types

Page 38: Data Science with PostgreSQL zs_Bárány_-_Data... · PDF fileI Big data? I Mash up & format for analysis I Analyze & visualize ... Data Science with PostgreSQL Data Science with

Data Science with PostgreSQL

Data Science with PostgreSQL

Business & data understanding

Data understanding with PostgreSQL

I Get data into PostgreSQL

I Classical import processI Foreign Data Wrappers

I Analyze data distribution

I Group by and aggregate

I Count, Count Distinct, Min, Max

I Count NULLsI Search for missing links (incomplete foreign keys)

I Analyze �surprizes�

I Impossible valuesI Missing values in �required� �elds

Page 39: Data Science with PostgreSQL zs_Bárány_-_Data... · PDF fileI Big data? I Mash up & format for analysis I Analyze & visualize ... Data Science with PostgreSQL Data Science with

Data Science with PostgreSQL

Data Science with PostgreSQL

Business & data understanding

Data understanding with PostgreSQL

I Get data into PostgreSQL

I Classical import processI Foreign Data Wrappers

I Analyze data distribution

I Group by and aggregate

I Count, Count Distinct, Min, Max

I Count NULLsI Search for missing links (incomplete foreign keys)

I Analyze �surprizes�

I Impossible valuesI Missing values in �required� �elds

Page 40: Data Science with PostgreSQL zs_Bárány_-_Data... · PDF fileI Big data? I Mash up & format for analysis I Analyze & visualize ... Data Science with PostgreSQL Data Science with

Data Science with PostgreSQL

Data Science with PostgreSQL

Business & data understanding

Data understanding with PostgreSQL

I Get data into PostgreSQL

I Classical import processI Foreign Data Wrappers

I Analyze data distribution

I Group by and aggregate

I Count, Count Distinct, Min, Max

I Count NULLsI Search for missing links (incomplete foreign keys)

I Analyze �surprizes�

I Impossible valuesI Missing values in �required� �elds

Page 41: Data Science with PostgreSQL zs_Bárány_-_Data... · PDF fileI Big data? I Mash up & format for analysis I Analyze & visualize ... Data Science with PostgreSQL Data Science with

Data Science with PostgreSQL

Data Science with PostgreSQL

Business & data understanding

Data understanding with PostgreSQL � summary

I Good SQL knowledge required

I Tedious manual process

I repetitiveI not suitable for large number of attributes

I No built-in visualization

I Or maybe...

Page 42: Data Science with PostgreSQL zs_Bárány_-_Data... · PDF fileI Big data? I Mash up & format for analysis I Analyze & visualize ... Data Science with PostgreSQL Data Science with

Data Science with PostgreSQL

Data Science with PostgreSQL

Business & data understanding

Data understanding with PostgreSQL � summary

I Good SQL knowledge required

I Tedious manual process

I repetitiveI not suitable for large number of attributes

I No built-in visualization

I Or maybe...

Page 43: Data Science with PostgreSQL zs_Bárány_-_Data... · PDF fileI Big data? I Mash up & format for analysis I Analyze & visualize ... Data Science with PostgreSQL Data Science with

Data Science with PostgreSQL

Data Science with PostgreSQL

Business & data understanding

SQL barchart output

Page 44: Data Science with PostgreSQL zs_Bárány_-_Data... · PDF fileI Big data? I Mash up & format for analysis I Analyze & visualize ... Data Science with PostgreSQL Data Science with

Data Science with PostgreSQL

Data Science with PostgreSQL

Business & data understanding

Bar chart from GUI tool

Page 45: Data Science with PostgreSQL zs_Bárány_-_Data... · PDF fileI Big data? I Mash up & format for analysis I Analyze & visualize ... Data Science with PostgreSQL Data Science with

Data Science with PostgreSQL

Data Science with PostgreSQL

Business & data understanding

Boxplot output

Page 46: Data Science with PostgreSQL zs_Bárány_-_Data... · PDF fileI Big data? I Mash up & format for analysis I Analyze & visualize ... Data Science with PostgreSQL Data Science with

Data Science with PostgreSQL

Data Science with PostgreSQL

Business & data understanding

Data understanding wrap up

I DBMS not built for this

I It can support more specialized tools

I Introduction: R

I �A free software environment for statistical computing andgraphics�

I Available in PostgreSQL

Page 47: Data Science with PostgreSQL zs_Bárány_-_Data... · PDF fileI Big data? I Mash up & format for analysis I Analyze & visualize ... Data Science with PostgreSQL Data Science with

Data Science with PostgreSQL

Data Science with PostgreSQL

Business & data understanding

Data understanding wrap up

I DBMS not built for this

I It can support more specialized tools

I Introduction: R

I �A free software environment for statistical computing andgraphics�

I Available in PostgreSQL

Page 48: Data Science with PostgreSQL zs_Bárány_-_Data... · PDF fileI Big data? I Mash up & format for analysis I Analyze & visualize ... Data Science with PostgreSQL Data Science with

Data Science with PostgreSQL

Data Science with PostgreSQL

Business & data understanding

PL/R: A statistical language for PostgreSQL

I R as a standalone language

I Mathematical and statistical methodsI Powerful visualization functionsI Classical, modern and bleeding edge modelingI Arrays and data frames are central data typesI Operates only in memory

I PL/R: R as a loadable procedural language for PostgreSQL

I First released in 2003 by Joe ConwayI License: GPL

Page 49: Data Science with PostgreSQL zs_Bárány_-_Data... · PDF fileI Big data? I Mash up & format for analysis I Analyze & visualize ... Data Science with PostgreSQL Data Science with

Data Science with PostgreSQL

Data Science with PostgreSQL

Business & data understanding

PL/R: A statistical language for PostgreSQL

I R as a standalone language

I Mathematical and statistical methodsI Powerful visualization functionsI Classical, modern and bleeding edge modelingI Arrays and data frames are central data typesI Operates only in memory

I PL/R: R as a loadable procedural language for PostgreSQL

I First released in 2003 by Joe ConwayI License: GPL

Page 50: Data Science with PostgreSQL zs_Bárány_-_Data... · PDF fileI Big data? I Mash up & format for analysis I Analyze & visualize ... Data Science with PostgreSQL Data Science with

Data Science with PostgreSQL

Data Science with PostgreSQL

Business & data understanding

R usage outside of PostgreSQL

I Development environments

I RStudio (AGPL or commercial, local & web)I RKWard, Cantor (KDE projects)I StatET (Eclipse)

I Frontends

I R CommanderI DeducerI Rattle

I Web framework: Shiny (AGPL or commercial)

Page 51: Data Science with PostgreSQL zs_Bárány_-_Data... · PDF fileI Big data? I Mash up & format for analysis I Analyze & visualize ... Data Science with PostgreSQL Data Science with

Data Science with PostgreSQL

Data Science with PostgreSQL

Business & data understanding

Working with R in PostgreSQL

I Install functions in the database

Example

select install_rcmd('

myfunction <-function(x)

{print(x)}

');

I Install without function body

Example

CREATE FUNCTION rnorm

(n integer, mean double precision, sd double precision)

RETURNS double precision[]

AS �

LANGUAGE 'plr';

Page 52: Data Science with PostgreSQL zs_Bárány_-_Data... · PDF fileI Big data? I Mash up & format for analysis I Analyze & visualize ... Data Science with PostgreSQL Data Science with

Data Science with PostgreSQL

Data Science with PostgreSQL

Business & data understanding

Using R in PostgreSQL for data understanding

I Advanced visualization

I Data distributions

I Advanced statistics

I Execution in the database

I Clumsy, but direct data access

I Execution outside

I Simple and interactive, but data transfer

Page 53: Data Science with PostgreSQL zs_Bárány_-_Data... · PDF fileI Big data? I Mash up & format for analysis I Analyze & visualize ... Data Science with PostgreSQL Data Science with

Data Science with PostgreSQL

Data Science with PostgreSQL

Business & data understanding

Using R in PostgreSQL for data understanding

I Advanced visualization

I Data distributions

I Advanced statistics

I Execution in the database

I Clumsy, but direct data access

I Execution outside

I Simple and interactive, but data transfer

Page 54: Data Science with PostgreSQL zs_Bárány_-_Data... · PDF fileI Big data? I Mash up & format for analysis I Analyze & visualize ... Data Science with PostgreSQL Data Science with

Data Science with PostgreSQL

Data Science with PostgreSQL

Business & data understanding

Using R in PostgreSQL for data understanding

I Advanced visualization

I Data distributions

I Advanced statistics

I Execution in the database

I Clumsy, but direct data access

I Execution outside

I Simple and interactive, but data transfer

Page 55: Data Science with PostgreSQL zs_Bárány_-_Data... · PDF fileI Big data? I Mash up & format for analysis I Analyze & visualize ... Data Science with PostgreSQL Data Science with

Data Science with PostgreSQL

Data Science with PostgreSQL

Preprocessing

Data Science with PostgreSQL

Preprocessing

Page 56: Data Science with PostgreSQL zs_Bárány_-_Data... · PDF fileI Big data? I Mash up & format for analysis I Analyze & visualize ... Data Science with PostgreSQL Data Science with

Data Science with PostgreSQL

Data Science with PostgreSQL

Preprocessing

Preprocessing

I What databases are built for

I Rows: very dynamic

I Easy to create new rows by joiningI Easy to �lter

I Columns: not so much

I Easy to create new columnsI Only explicit access

I Wider interpretation of preprocessing

I Enrichment with external dataI New attributes from existing onesI Recoding, recalculationI Missing value handling

Page 57: Data Science with PostgreSQL zs_Bárány_-_Data... · PDF fileI Big data? I Mash up & format for analysis I Analyze & visualize ... Data Science with PostgreSQL Data Science with

Data Science with PostgreSQL

Data Science with PostgreSQL

Preprocessing

Preprocessing

I What databases are built for

I Rows: very dynamic

I Easy to create new rows by joiningI Easy to �lter

I Columns: not so much

I Easy to create new columnsI Only explicit access

I Wider interpretation of preprocessing

I Enrichment with external dataI New attributes from existing onesI Recoding, recalculationI Missing value handling

Page 58: Data Science with PostgreSQL zs_Bárány_-_Data... · PDF fileI Big data? I Mash up & format for analysis I Analyze & visualize ... Data Science with PostgreSQL Data Science with

Data Science with PostgreSQL

Data Science with PostgreSQL

Preprocessing

Preprocessing

I What databases are built for

I Rows: very dynamic

I Easy to create new rows by joiningI Easy to �lter

I Columns: not so much

I Easy to create new columnsI Only explicit access

I Wider interpretation of preprocessing

I Enrichment with external dataI New attributes from existing onesI Recoding, recalculationI Missing value handling

Page 59: Data Science with PostgreSQL zs_Bárány_-_Data... · PDF fileI Big data? I Mash up & format for analysis I Analyze & visualize ... Data Science with PostgreSQL Data Science with

Data Science with PostgreSQL

Data Science with PostgreSQL

Preprocessing

Preprocessing: organizing work�ow

I Common Table Expressions

I organize processing stepsI partial and intermediate results

Example

WITH source AS (

SELECT *, ROW_NUMBER() OVER () AS rownum

FROM source_table

),

no_missings AS (

SELECT *

FROM source

WHERE field1 IS NOT NULL

AND field2 IS NOT NULL

)

etc.

Page 60: Data Science with PostgreSQL zs_Bárány_-_Data... · PDF fileI Big data? I Mash up & format for analysis I Analyze & visualize ... Data Science with PostgreSQL Data Science with

Data Science with PostgreSQL

Data Science with PostgreSQL

Preprocessing

Preprocessing: attribute creation

I Aggregation

I Partial aggregation by window functions

I In-group measures, e. g. ratioI att / SUM(att) OVER (PARTITION BY ...)

I In-group numberingI ROW_NUMBER() OVER (PARTITION BY ... ORDER BY ...)

I Comparing to previous/next value

I att - LAG(att, 1) OVER (ORDER BY ...)

I Much easier in SQL than programming languages and datamining tools

Page 61: Data Science with PostgreSQL zs_Bárány_-_Data... · PDF fileI Big data? I Mash up & format for analysis I Analyze & visualize ... Data Science with PostgreSQL Data Science with

Data Science with PostgreSQL

Data Science with PostgreSQL

Preprocessing

Preprocessing: attribute creation

I Aggregation

I Partial aggregation by window functions

I In-group measures, e. g. ratioI att / SUM(att) OVER (PARTITION BY ...)

I In-group numberingI ROW_NUMBER() OVER (PARTITION BY ... ORDER BY ...)

I Comparing to previous/next value

I att - LAG(att, 1) OVER (ORDER BY ...)

I Much easier in SQL than programming languages and datamining tools

Page 62: Data Science with PostgreSQL zs_Bárány_-_Data... · PDF fileI Big data? I Mash up & format for analysis I Analyze & visualize ... Data Science with PostgreSQL Data Science with

Data Science with PostgreSQL

Data Science with PostgreSQL

Preprocessing

Preprocessing: attribute creation

I Aggregation

I Partial aggregation by window functions

I In-group measures, e. g. ratioI att / SUM(att) OVER (PARTITION BY ...)

I In-group numberingI ROW_NUMBER() OVER (PARTITION BY ... ORDER BY ...)

I Comparing to previous/next value

I att - LAG(att, 1) OVER (ORDER BY ...)

I Much easier in SQL than programming languages and datamining tools

Page 63: Data Science with PostgreSQL zs_Bárány_-_Data... · PDF fileI Big data? I Mash up & format for analysis I Analyze & visualize ... Data Science with PostgreSQL Data Science with

Data Science with PostgreSQL

Data Science with PostgreSQL

Preprocessing

Preprocessing: attribute creation

I Aggregation

I Partial aggregation by window functions

I In-group measures, e. g. ratioI att / SUM(att) OVER (PARTITION BY ...)

I In-group numberingI ROW_NUMBER() OVER (PARTITION BY ... ORDER BY ...)

I Comparing to previous/next value

I att - LAG(att, 1) OVER (ORDER BY ...)

I Much easier in SQL than programming languages and datamining tools

Page 64: Data Science with PostgreSQL zs_Bárány_-_Data... · PDF fileI Big data? I Mash up & format for analysis I Analyze & visualize ... Data Science with PostgreSQL Data Science with

Data Science with PostgreSQL

Data Science with PostgreSQL

Preprocessing

Preprocessing: attribute creation

I Aggregation

I Partial aggregation by window functions

I In-group measures, e. g. ratioI att / SUM(att) OVER (PARTITION BY ...)

I In-group numberingI ROW_NUMBER() OVER (PARTITION BY ... ORDER BY ...)

I Comparing to previous/next value

I att - LAG(att, 1) OVER (ORDER BY ...)

I Much easier in SQL than programming languages and datamining tools

Page 65: Data Science with PostgreSQL zs_Bárány_-_Data... · PDF fileI Big data? I Mash up & format for analysis I Analyze & visualize ... Data Science with PostgreSQL Data Science with

Data Science with PostgreSQL

Data Science with PostgreSQL

Preprocessing

Preprocessing: enrichment

I PostGIS for geodata

I Foreign data wrappers (see PostgreSQL Wiki)

I Other databases (other PostgreSQL server, MySQL, Oracle,MSSQL, JDBC, SQL Alchemy ...)

I NoSQL databases (MongoDB, Cassandra, CouchDB, Redis, ...)

I Big Data (Hadoop Hive, Impala)

I Network sources - Multicorn (RSS, IMAP, Twitter, S3, ...)I Files (CSV, ZIP, JSON, ...)

I Write your own in C or Python or Ruby

Page 66: Data Science with PostgreSQL zs_Bárány_-_Data... · PDF fileI Big data? I Mash up & format for analysis I Analyze & visualize ... Data Science with PostgreSQL Data Science with

Data Science with PostgreSQL

Data Science with PostgreSQL

Preprocessing

Preprocessing: enrichment

I PostGIS for geodata

I Foreign data wrappers (see PostgreSQL Wiki)

I Other databases (other PostgreSQL server, MySQL, Oracle,MSSQL, JDBC, SQL Alchemy ...)

I NoSQL databases (MongoDB, Cassandra, CouchDB, Redis, ...)

I Big Data (Hadoop Hive, Impala)

I Network sources - Multicorn (RSS, IMAP, Twitter, S3, ...)I Files (CSV, ZIP, JSON, ...)

I Write your own in C or Python or Ruby

Page 67: Data Science with PostgreSQL zs_Bárány_-_Data... · PDF fileI Big data? I Mash up & format for analysis I Analyze & visualize ... Data Science with PostgreSQL Data Science with

Data Science with PostgreSQL

Data Science with PostgreSQL

Preprocessing

Preprocessing: enrichment

I PostGIS for geodata

I Foreign data wrappers (see PostgreSQL Wiki)

I Other databases (other PostgreSQL server, MySQL, Oracle,MSSQL, JDBC, SQL Alchemy ...)

I NoSQL databases (MongoDB, Cassandra, CouchDB, Redis, ...)

I Big Data (Hadoop Hive, Impala)

I Network sources - Multicorn (RSS, IMAP, Twitter, S3, ...)I Files (CSV, ZIP, JSON, ...)

I Write your own in C or Python or Ruby

Page 68: Data Science with PostgreSQL zs_Bárány_-_Data... · PDF fileI Big data? I Mash up & format for analysis I Analyze & visualize ... Data Science with PostgreSQL Data Science with

Data Science with PostgreSQL

Data Science with PostgreSQL

Preprocessing

Preprocessing: enrichment

I PostGIS for geodata

I Foreign data wrappers (see PostgreSQL Wiki)

I Other databases (other PostgreSQL server, MySQL, Oracle,MSSQL, JDBC, SQL Alchemy ...)

I NoSQL databases (MongoDB, Cassandra, CouchDB, Redis, ...)

I Big Data (Hadoop Hive, Impala)

I Network sources - Multicorn (RSS, IMAP, Twitter, S3, ...)I Files (CSV, ZIP, JSON, ...)

I Write your own in C or Python or Ruby

Page 69: Data Science with PostgreSQL zs_Bárány_-_Data... · PDF fileI Big data? I Mash up & format for analysis I Analyze & visualize ... Data Science with PostgreSQL Data Science with

Data Science with PostgreSQL

Data Science with PostgreSQL

Preprocessing

Preprocessing: enrichment

I PostGIS for geodata

I Foreign data wrappers (see PostgreSQL Wiki)

I Other databases (other PostgreSQL server, MySQL, Oracle,MSSQL, JDBC, SQL Alchemy ...)

I NoSQL databases (MongoDB, Cassandra, CouchDB, Redis, ...)

I Big Data (Hadoop Hive, Impala)

I Network sources - Multicorn (RSS, IMAP, Twitter, S3, ...)I Files (CSV, ZIP, JSON, ...)

I Write your own in C or Python or Ruby

Page 70: Data Science with PostgreSQL zs_Bárány_-_Data... · PDF fileI Big data? I Mash up & format for analysis I Analyze & visualize ... Data Science with PostgreSQL Data Science with

Data Science with PostgreSQL

Data Science with PostgreSQL

Preprocessing

Preprocessing: enrichment

I PostGIS for geodata

I Foreign data wrappers (see PostgreSQL Wiki)

I Other databases (other PostgreSQL server, MySQL, Oracle,MSSQL, JDBC, SQL Alchemy ...)

I NoSQL databases (MongoDB, Cassandra, CouchDB, Redis, ...)

I Big Data (Hadoop Hive, Impala)

I Network sources - Multicorn (RSS, IMAP, Twitter, S3, ...)I Files (CSV, ZIP, JSON, ...)

I Write your own in C or Python or Ruby

Page 71: Data Science with PostgreSQL zs_Bárány_-_Data... · PDF fileI Big data? I Mash up & format for analysis I Analyze & visualize ... Data Science with PostgreSQL Data Science with

Data Science with PostgreSQL

Data Science with PostgreSQL

Modeling

Data Science with PostgreSQL

Modeling

Page 72: Data Science with PostgreSQL zs_Bárány_-_Data... · PDF fileI Big data? I Mash up & format for analysis I Analyze & visualize ... Data Science with PostgreSQL Data Science with

Data Science with PostgreSQL

Data Science with PostgreSQL

Modeling

Model development

I Machine learning algorithms not well suited for SQL

I Some attempts to build them

I Naive Bayes, Linear RegressionI Di�cult for more advanced algorithms

I Better done in specialized language or tool

I PL/RI PL/Python

Page 73: Data Science with PostgreSQL zs_Bárány_-_Data... · PDF fileI Big data? I Mash up & format for analysis I Analyze & visualize ... Data Science with PostgreSQL Data Science with

Data Science with PostgreSQL

Data Science with PostgreSQL

Modeling

Model development

I Machine learning algorithms not well suited for SQL

I Some attempts to build them

I Naive Bayes, Linear RegressionI Di�cult for more advanced algorithms

I Better done in specialized language or tool

I PL/RI PL/Python

Page 74: Data Science with PostgreSQL zs_Bárány_-_Data... · PDF fileI Big data? I Mash up & format for analysis I Analyze & visualize ... Data Science with PostgreSQL Data Science with

Data Science with PostgreSQL

Data Science with PostgreSQL

Modeling

Model development

I Machine learning algorithms not well suited for SQL

I Some attempts to build them

I Naive Bayes, Linear RegressionI Di�cult for more advanced algorithms

I Better done in specialized language or tool

I PL/RI PL/Python

Page 75: Data Science with PostgreSQL zs_Bárány_-_Data... · PDF fileI Big data? I Mash up & format for analysis I Analyze & visualize ... Data Science with PostgreSQL Data Science with

Data Science with PostgreSQL

Data Science with PostgreSQL

Modeling

PL/Python

I Python procedural language available in PostgreSQL

I scikit-learn: Machine learning toolbox for Python

I Classi�cation, regression, clusteringI Model selection, validationI Preprocessing

I matplotlib: Generic and statistical plotting library

I PL/Python is an alternative to PL/R

Page 76: Data Science with PostgreSQL zs_Bárány_-_Data... · PDF fileI Big data? I Mash up & format for analysis I Analyze & visualize ... Data Science with PostgreSQL Data Science with

Data Science with PostgreSQL

Data Science with PostgreSQL

Modeling

PL/Python

I Python procedural language available in PostgreSQL

I scikit-learn: Machine learning toolbox for Python

I Classi�cation, regression, clusteringI Model selection, validationI Preprocessing

I matplotlib: Generic and statistical plotting library

I PL/Python is an alternative to PL/R

Page 77: Data Science with PostgreSQL zs_Bárány_-_Data... · PDF fileI Big data? I Mash up & format for analysis I Analyze & visualize ... Data Science with PostgreSQL Data Science with

Data Science with PostgreSQL

Data Science with PostgreSQL

Modeling

Data Science with PostgreSQL

Evaluation

Page 78: Data Science with PostgreSQL zs_Bárány_-_Data... · PDF fileI Big data? I Mash up & format for analysis I Analyze & visualize ... Data Science with PostgreSQL Data Science with

Data Science with PostgreSQL

Data Science with PostgreSQL

Modeling

Evaluation of modeling results

I Models return predictions

I Prediction can be compared to known result (target variable)

I Measures of model performance: Accuracy, precision, recall, ...

I Results on the training set meaningless

I Split validation

I Cross validation

Page 79: Data Science with PostgreSQL zs_Bárány_-_Data... · PDF fileI Big data? I Mash up & format for analysis I Analyze & visualize ... Data Science with PostgreSQL Data Science with

Data Science with PostgreSQL

Data Science with PostgreSQL

Modeling

Evaluation of modeling results

I Models return predictions

I Prediction can be compared to known result (target variable)

I Measures of model performance: Accuracy, precision, recall, ...

I Results on the training set meaningless

I Split validation

I Cross validation

Page 80: Data Science with PostgreSQL zs_Bárány_-_Data... · PDF fileI Big data? I Mash up & format for analysis I Analyze & visualize ... Data Science with PostgreSQL Data Science with

Data Science with PostgreSQL

Data Science with PostgreSQL

Modeling

Evaluation of modeling results

I Models return predictions

I Prediction can be compared to known result (target variable)

I Measures of model performance: Accuracy, precision, recall, ...

I Results on the training set meaningless

I Split validation

I Cross validation

Page 81: Data Science with PostgreSQL zs_Bárány_-_Data... · PDF fileI Big data? I Mash up & format for analysis I Analyze & visualize ... Data Science with PostgreSQL Data Science with

Data Science with PostgreSQL

Data Science with PostgreSQL

Modeling

Evaluation results

I �Good result� depends on the application

I If not good enough,

I get more data

I do more preprocessing

I select better classi�er

I optimize classi�er parameters

I Cycle: preprocessing - modeling - evaluation

I Better done in data mining environment

Page 82: Data Science with PostgreSQL zs_Bárány_-_Data... · PDF fileI Big data? I Mash up & format for analysis I Analyze & visualize ... Data Science with PostgreSQL Data Science with

Data Science with PostgreSQL

Data Science with PostgreSQL

Modeling

Evaluation results

I �Good result� depends on the application

I If not good enough,

I get more data

I do more preprocessing

I select better classi�er

I optimize classi�er parameters

I Cycle: preprocessing - modeling - evaluation

I Better done in data mining environment

Page 83: Data Science with PostgreSQL zs_Bárány_-_Data... · PDF fileI Big data? I Mash up & format for analysis I Analyze & visualize ... Data Science with PostgreSQL Data Science with

Data Science with PostgreSQL

Data Science with PostgreSQL

Modeling

Evaluation results

I �Good result� depends on the application

I If not good enough,

I get more data

I do more preprocessing

I select better classi�er

I optimize classi�er parameters

I Cycle: preprocessing - modeling - evaluation

I Better done in data mining environment

Page 84: Data Science with PostgreSQL zs_Bárány_-_Data... · PDF fileI Big data? I Mash up & format for analysis I Analyze & visualize ... Data Science with PostgreSQL Data Science with

Data Science with PostgreSQL

Data Science with PostgreSQL

Modeling

Evaluation results

I �Good result� depends on the application

I If not good enough,

I get more data

I do more preprocessing

I select better classi�er

I optimize classi�er parameters

I Cycle: preprocessing - modeling - evaluation

I Better done in data mining environment

Page 85: Data Science with PostgreSQL zs_Bárány_-_Data... · PDF fileI Big data? I Mash up & format for analysis I Analyze & visualize ... Data Science with PostgreSQL Data Science with

Data Science with PostgreSQL

Data Science with PostgreSQL

Modeling

Evaluation results

I �Good result� depends on the application

I If not good enough,

I get more data

I do more preprocessing

I select better classi�er

I optimize classi�er parameters

I Cycle: preprocessing - modeling - evaluation

I Better done in data mining environment

Page 86: Data Science with PostgreSQL zs_Bárány_-_Data... · PDF fileI Big data? I Mash up & format for analysis I Analyze & visualize ... Data Science with PostgreSQL Data Science with

Data Science with PostgreSQL

Data Science with PostgreSQL

Modeling

Evaluation results

I �Good result� depends on the application

I If not good enough,

I get more data

I do more preprocessing

I select better classi�er

I optimize classi�er parameters

I Cycle: preprocessing - modeling - evaluation

I Better done in data mining environment

Page 87: Data Science with PostgreSQL zs_Bárány_-_Data... · PDF fileI Big data? I Mash up & format for analysis I Analyze & visualize ... Data Science with PostgreSQL Data Science with

Data Science with PostgreSQL

Data Science with PostgreSQL

Modeling

Evaluation results

I �Good result� depends on the application

I If not good enough,

I get more data

I do more preprocessing

I select better classi�er

I optimize classi�er parameters

I Cycle: preprocessing - modeling - evaluation

I Better done in data mining environment

Page 88: Data Science with PostgreSQL zs_Bárány_-_Data... · PDF fileI Big data? I Mash up & format for analysis I Analyze & visualize ... Data Science with PostgreSQL Data Science with

Data Science with PostgreSQL

Data Science with PostgreSQL

Deployment

Data Science with PostgreSQL

Deployment

Page 89: Data Science with PostgreSQL zs_Bárány_-_Data... · PDF fileI Big data? I Mash up & format for analysis I Analyze & visualize ... Data Science with PostgreSQL Data Science with

Data Science with PostgreSQL

Data Science with PostgreSQL

Deployment

Deployment

I Advantages of deployment in the database:

I Less overhead

I Instant application using triggers

I Well-known execution environment

I Functionality available over standard interface

I Some models easily expressed in SQL

Page 90: Data Science with PostgreSQL zs_Bárány_-_Data... · PDF fileI Big data? I Mash up & format for analysis I Analyze & visualize ... Data Science with PostgreSQL Data Science with

Data Science with PostgreSQL

Data Science with PostgreSQL

Deployment

Deployment

I Advantages of deployment in the database:

I Less overhead

I Instant application using triggers

I Well-known execution environment

I Functionality available over standard interface

I Some models easily expressed in SQL

Page 91: Data Science with PostgreSQL zs_Bárány_-_Data... · PDF fileI Big data? I Mash up & format for analysis I Analyze & visualize ... Data Science with PostgreSQL Data Science with

Data Science with PostgreSQL

Data Science with PostgreSQL

Deployment

Deployment

I Advantages of deployment in the database:

I Less overhead

I Instant application using triggers

I Well-known execution environment

I Functionality available over standard interface

I Some models easily expressed in SQL

Page 92: Data Science with PostgreSQL zs_Bárány_-_Data... · PDF fileI Big data? I Mash up & format for analysis I Analyze & visualize ... Data Science with PostgreSQL Data Science with

Data Science with PostgreSQL

Data Science with PostgreSQL

Deployment

Deployment

I Advantages of deployment in the database:

I Less overhead

I Instant application using triggers

I Well-known execution environment

I Functionality available over standard interface

I Some models easily expressed in SQL

Page 93: Data Science with PostgreSQL zs_Bárány_-_Data... · PDF fileI Big data? I Mash up & format for analysis I Analyze & visualize ... Data Science with PostgreSQL Data Science with

Data Science with PostgreSQL

Data Science with PostgreSQL

Deployment

Deployment

I Advantages of deployment in the database:

I Less overhead

I Instant application using triggers

I Well-known execution environment

I Functionality available over standard interface

I Some models easily expressed in SQL

Page 94: Data Science with PostgreSQL zs_Bárány_-_Data... · PDF fileI Big data? I Mash up & format for analysis I Analyze & visualize ... Data Science with PostgreSQL Data Science with

Data Science with PostgreSQL

Data Science with PostgreSQL

Deployment

Deployment of PL/R or PL/Python models

I Model developed in database or outside

I Put into global context

I PL/R: load(�saved object�, .GlobalEnv)I PL/Python: Global dictionary �GD�

I Application function in matching language

I Uses existing modelI Returns target variable

I Trigger func or UPDATE uses application function

Page 95: Data Science with PostgreSQL zs_Bárány_-_Data... · PDF fileI Big data? I Mash up & format for analysis I Analyze & visualize ... Data Science with PostgreSQL Data Science with

Data Science with PostgreSQL

Data Science with PostgreSQL

Deployment

Deployment of PL/R or PL/Python models

I Model developed in database or outside

I Put into global context

I PL/R: load(�saved object�, .GlobalEnv)I PL/Python: Global dictionary �GD�

I Application function in matching language

I Uses existing modelI Returns target variable

I Trigger func or UPDATE uses application function

Page 96: Data Science with PostgreSQL zs_Bárány_-_Data... · PDF fileI Big data? I Mash up & format for analysis I Analyze & visualize ... Data Science with PostgreSQL Data Science with

Data Science with PostgreSQL

Data Science with PostgreSQL

Deployment

Deployment of PL/R or PL/Python models

I Model developed in database or outside

I Put into global context

I PL/R: load(�saved object�, .GlobalEnv)I PL/Python: Global dictionary �GD�

I Application function in matching language

I Uses existing modelI Returns target variable

I Trigger func or UPDATE uses application function

Page 97: Data Science with PostgreSQL zs_Bárány_-_Data... · PDF fileI Big data? I Mash up & format for analysis I Analyze & visualize ... Data Science with PostgreSQL Data Science with

Data Science with PostgreSQL

Summary

Summary

I PostgreSQL's support for data science tasks

I Best: preprocessing, deployment

I Modern SQL for preprocessing

I Foreign Data Wrappers for data integration

I Procedural languages for data mining

Page 98: Data Science with PostgreSQL zs_Bárány_-_Data... · PDF fileI Big data? I Mash up & format for analysis I Analyze & visualize ... Data Science with PostgreSQL Data Science with

Data Science with PostgreSQL

Summary

Summary

I PostgreSQL's support for data science tasks

I Best: preprocessing, deployment

I Modern SQL for preprocessing

I Foreign Data Wrappers for data integration

I Procedural languages for data mining

Page 99: Data Science with PostgreSQL zs_Bárány_-_Data... · PDF fileI Big data? I Mash up & format for analysis I Analyze & visualize ... Data Science with PostgreSQL Data Science with

Data Science with PostgreSQL

Summary

Summary

I PostgreSQL's support for data science tasks

I Best: preprocessing, deployment

I Modern SQL for preprocessing

I Foreign Data Wrappers for data integration

I Procedural languages for data mining

Page 100: Data Science with PostgreSQL zs_Bárány_-_Data... · PDF fileI Big data? I Mash up & format for analysis I Analyze & visualize ... Data Science with PostgreSQL Data Science with

Data Science with PostgreSQL

Summary

Summary

I PostgreSQL's support for data science tasks

I Best: preprocessing, deployment

I Modern SQL for preprocessing

I Foreign Data Wrappers for data integration

I Procedural languages for data mining

Page 101: Data Science with PostgreSQL zs_Bárány_-_Data... · PDF fileI Big data? I Mash up & format for analysis I Analyze & visualize ... Data Science with PostgreSQL Data Science with

Data Science with PostgreSQL

Summary

Questions?

I Balázs Bárány, <[email protected]>

I https://datascientist.at/