master the essentials of data prep and data blending - inspire europe 2017

23
ESSENTIALS OF DATA PREP & DATA BLENDING Presented by Ben Gomez

Upload: alteryx

Post on 21-Jan-2018

244 views

Category:

Data & Analytics


1 download

TRANSCRIPT

Page 1: Master the Essentials of Data Prep and Data Blending - Inspire Europe 2017

ESSENTIALS OF DATA PREP & DATA BLENDINGPresented by Ben Gomez

Page 2: Master the Essentials of Data Prep and Data Blending - Inspire Europe 2017

FORWARD-LOOKING STATEMENTS This presentation includes “forward-looking statements” within the meaning of the Private Securities Litigation Reform Act of 1995. These forward-looking statements may be identified by the use of terminology such as “believe,” “may,” “will,” “intend,” “expect,” “plan,” “anticipate,” “estimate,” “potential,” or “continue,” or other comparable terminology. All statements other than statements of historical fact could be deemed forward-looking, including any projections of product availability, growth and financial metrics and any statements regarding product roadmaps, strategies, plans or use cases. Although Alteryx believes that the expectations reflected in any of these forward-looking statements are reasonable, these expectations or any of the forward-looking statements could prove to be incorrect, and actual results or outcomes could differ materially from those projected or assumed in the forward-looking statements. Alteryx’s future financial condition and results of operations, as well as any forward-looking statements, are subject to risks and uncertainties, including but not limited to the factors set forth in Alteryx’s press releases, public statements and/or filings with the Securities and Exchange Commission, especially the “Risk Factors” sections of Alteryx’s Quarterly Report on Form 10-Q. Thesedocuments and others containing important disclosures are available at www.sec.gov or in the “Investors” section of Alteryx’s website at www.alteryx.com. All forward-looking statements are made as of the date of this presentation and Alteryx assumes no obligation to update any such forward-looking statements.

Any unreleased services or features referenced in this or other presentations, press releases or public statements are only intended to outline Alteryx’s general product direction. They are intended for information purposes only, and may not be incorporated into any contract. This is not a commitment to deliver any material, code, or functionality (which may not be released on time or at all) and customers should not rely upon this presentation or any such statements to make purchasing decisions. The development, release,and timing of any features or functionality described for Alteryx’s products remains at the sole discretion of Alteryx.

Page 3: Master the Essentials of Data Prep and Data Blending - Inspire Europe 2017

AGENDAHandling Input Data

• Caching

• Sampling

• Input Macro

Building Workflows Efficiently

• Evaluate your data

• Document for clarity

• Simplify the process

Important Details

• Testing your work

Page 4: Master the Essentials of Data Prep and Data Blending - Inspire Europe 2017

PRESENTER

Page 5: Master the Essentials of Data Prep and Data Blending - Inspire Europe 2017

To watch a recording of this session from Inspire Europe 2017, visit

alteryx.com/inspire-europe-2017-tracks

Page 6: Master the Essentials of Data Prep and Data Blending - Inspire Europe 2017

HANDLING INPUT DATA

Best Practice – Use caching when you don’t need live data

• Currently available with relational databases only

Page 7: Master the Essentials of Data Prep and Data Blending - Inspire Europe 2017

HANDLING INPUT DATA

Best Practice – Use caching when you don’t need live data

Caching – 53 seconds Caching – 1.9 seconds!

Page 8: Master the Essentials of Data Prep and Data Blending - Inspire Europe 2017

HANDLING INPUT DATA

Best Practice - Sample data to speed up processing during development

Page 9: Master the Essentials of Data Prep and Data Blending - Inspire Europe 2017

HANDLING INPUT DATA

Best Practice - Sample your data

• Use database sampling features

• PostgreSQL:

• SELECT * FROM table TABLESAMPLE SYSTEM (5)

• SQL Server:

• SELECT TOP 5000 * From table ORDER BY newid()

• Oracle:

• SELECT * FROM table SAMPLE(5)

Page 10: Master the Essentials of Data Prep and Data Blending - Inspire Europe 2017

DEMOS• Input Macros

• Data Profiling

• Document for clarity

• Simplify the Process

Page 11: Master the Essentials of Data Prep and Data Blending - Inspire Europe 2017

BUILDING WORKFLOWS EFFICIENTLY

Best Practice – Evaluate your data

• Browse Tool can help to identify hidden data problems which can produce invalid results and slow you down

• Duplicate records

• Missing values

• Unexpected characters

• Invalid values or ranges

Page 12: Master the Essentials of Data Prep and Data Blending - Inspire Europe 2017

HANDLING INPUT DATA

Best Practice – Utilize Input Macros for frequently-used sources

Page 13: Master the Essentials of Data Prep and Data Blending - Inspire Europe 2017

BUILDING WORKFLOWS EFFICIENTLY

Best Practice – Document for clarity

Page 14: Master the Essentials of Data Prep and Data Blending - Inspire Europe 2017

BUILDING WORKFLOWS EFFICIENTLY

Best Practice – Use sorts sparingly; sorting data is expensive.

When data is joined by fields a sort is done on the full data set, both sides, unless the data was previously sorted and no operations have been done that invalidate the sort

The Unique tool performs a sort: be aware of extra Unique tools

Page 15: Master the Essentials of Data Prep and Data Blending - Inspire Europe 2017

BUILDING WORKFLOWS EFFICIENTLY

Best Practice – Simplify the Process

Only keep the data you are using

• Don’t keep fields that you don’t need

• Don’t create spatial objects until you’re ready to use them; discard them once you are done

• Don’t keep duplicate fields

• Set data aside and rejoin it later

• Best Practice: Add a record id field early that can be used to rejoin records later

Page 16: Master the Essentials of Data Prep and Data Blending - Inspire Europe 2017

BUILDING WORKFLOWS EFFICIENTLY

Best Practice – Simplify the Process

Page 17: Master the Essentials of Data Prep and Data Blending - Inspire Europe 2017

BUILDING WORKFLOWS EFFICIENTLY

Best Practice – Simplify the Process

Separate formulas with distinctly different tasks

• One function per formula, unless they are very closely related or building on each other.

• Easier to understand the process

• Easier to debug

• Easier to split out parts of the data

• Easier to copy and paste specific functionality.

Page 18: Master the Essentials of Data Prep and Data Blending - Inspire Europe 2017

IMPORTANT DETAILS

Best Practice – Build in tests to make sure your work is correct

• Create a test for assumptions

• Number of records

• Results of calculations

• Duplicates or Not

• Eliminates the need for a visual verification

• Prevents unnoticed errors down the road

Page 19: Master the Essentials of Data Prep and Data Blending - Inspire Europe 2017

IMPORTANT DETAILS

Best Practice - Limit data movement

• Where is your data?

• Where is your processing?

Page 20: Master the Essentials of Data Prep and Data Blending - Inspire Europe 2017

IMPORTANT DETAILS

Best Practice - Limit data movement

Page 21: Master the Essentials of Data Prep and Data Blending - Inspire Europe 2017

THANK YOU

Please complete a feedback survey

[email protected]

Ben Gomez

Page 22: Master the Essentials of Data Prep and Data Blending - Inspire Europe 2017

#inspire16#

alteryx.com/trial

Ready to bring these incredible and tangible benefits to your organization?

Download a FREE Trial of Alteryx and start making your data work for you, instead of you working for your data

Page 23: Master the Essentials of Data Prep and Data Blending - Inspire Europe 2017