understanding table and memory management in …...#analyticsx c o p y r ig ht © 201 6, sas in...

86
#AnalyticsX Copyright © 2016, SAS Institute Inc. All rights reserved. Understanding Table and Memory Management in SAS® Cloud Analytic Services (CAS) Brian Bowman Principal Software Developer SAS

Upload: others

Post on 24-May-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Understanding Table and Memory Management in …...#AnalyticsX C o p y r ig ht © 201 6, SAS In stitute In c. All r ig hts r ese rve d. Understanding Table and Memory Management in

#AnalyticsXC o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.

Understanding Table and Memory Management in SAS® Cloud Analytic Services (CAS)Brian BowmanPrincipal Software DeveloperSAS

Page 2: Understanding Table and Memory Management in …...#AnalyticsX C o p y r ig ht © 201 6, SAS In stitute In c. All r ig hts r ese rve d. Understanding Table and Memory Management in

C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.

#analyticsx

C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.

What is SAS® Cloud Analytic Services (CAS)?

Page 3: Understanding Table and Memory Management in …...#AnalyticsX C o p y r ig ht © 201 6, SAS In stitute In c. All r ig hts r ese rve d. Understanding Table and Memory Management in

C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.

#analyticsx

C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.

CAS is the scalable in-memory engine powering SAS® Viya™

What is SAS® Cloud Analytic Services?

Page 4: Understanding Table and Memory Management in …...#AnalyticsX C o p y r ig ht © 201 6, SAS In stitute In c. All r ig hts r ese rve d. Understanding Table and Memory Management in

C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.

#analyticsx

C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.

CAS scales from a single SMP machine to large MPP grids

What is SAS® Cloud Analytic Services?

Page 5: Understanding Table and Memory Management in …...#AnalyticsX C o p y r ig ht © 201 6, SAS In stitute In c. All r ig hts r ese rve d. Understanding Table and Memory Management in

C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.

#analyticsx

C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.

CAS may be deployed SMP, MPP, on premises, in the cloud, virtualized or on “bare metal”

What is SAS ® Cloud Analytic Services?

Page 6: Understanding Table and Memory Management in …...#AnalyticsX C o p y r ig ht © 201 6, SAS In stitute In c. All r ig hts r ese rve d. Understanding Table and Memory Management in

C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.

#analyticsx

C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.

This presentation mostly focuses on CAS running MPP

What is SAS® Cloud Analytic Services?

Page 7: Understanding Table and Memory Management in …...#AnalyticsX C o p y r ig ht © 201 6, SAS In stitute In c. All r ig hts r ese rve d. Understanding Table and Memory Management in

#analyticsx

C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.

CAS Massively Parallel Processing (MPP)

Page 8: Understanding Table and Memory Management in …...#AnalyticsX C o p y r ig ht © 201 6, SAS In stitute In c. All r ig hts r ese rve d. Understanding Table and Memory Management in

C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.

#analyticsx

C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.

"In pioneer days they used oxen for heavy pulling, and when one ox couldn't budge a log, they didn't try to grow a larger ox. We shouldn't be trying for bigger computers, but for more systems of computers.”

Admiral Grace Hopper – Computing Pioneer

Early Predication of MPP?

Page 9: Understanding Table and Memory Management in …...#AnalyticsX C o p y r ig ht © 201 6, SAS In stitute In c. All r ig hts r ese rve d. Understanding Table and Memory Management in

C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.

#analyticsx

C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.

CAS Technology Quick Facts

Page 10: Understanding Table and Memory Management in …...#AnalyticsX C o p y r ig ht © 201 6, SAS In stitute In c. All r ig hts r ese rve d. Understanding Table and Memory Management in

C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.

#analyticsx

C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.

CAS is a multi-user server where concurrent sessions independently execute action requests sent from client interfaces

CAS Technology Quick Facts

Page 11: Understanding Table and Memory Management in …...#AnalyticsX C o p y r ig ht © 201 6, SAS In stitute In c. All r ig hts r ese rve d. Understanding Table and Memory Management in

C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.

#analyticsx

C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.

In MPP configurations, the CAS Controller node services each actionrequest by distributing the processing to Worker nodes

CAS Technology Quick Facts

Page 12: Understanding Table and Memory Management in …...#AnalyticsX C o p y r ig ht © 201 6, SAS In stitute In c. All r ig hts r ese rve d. Understanding Table and Memory Management in

#analyticsx

C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.

CAS: Multiple Clients with Concurrent Server Sessions

Page 13: Understanding Table and Memory Management in …...#AnalyticsX C o p y r ig ht © 201 6, SAS In stitute In c. All r ig hts r ese rve d. Understanding Table and Memory Management in

#analyticsx

C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.

CAS: Sessions Can Consume Resources Differentially

Page 14: Understanding Table and Memory Management in …...#AnalyticsX C o p y r ig ht © 201 6, SAS In stitute In c. All r ig hts r ese rve d. Understanding Table and Memory Management in

#analyticsx

C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.

CAS: A Session May Utilize Only a Subset of Worker Nodes

Page 15: Understanding Table and Memory Management in …...#AnalyticsX C o p y r ig ht © 201 6, SAS In stitute In c. All r ig hts r ese rve d. Understanding Table and Memory Management in

#analyticsx

C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.

Session Data May Not be Uniformly Distributed on Workers

Page 16: Understanding Table and Memory Management in …...#AnalyticsX C o p y r ig ht © 201 6, SAS In stitute In c. All r ig hts r ese rve d. Understanding Table and Memory Management in

C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.

#analyticsx

C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.

The CAS in-memory architecture does not require all Table data to fit in memory on all nodes at once

CAS Technology Quick Facts

Page 17: Understanding Table and Memory Management in …...#AnalyticsX C o p y r ig ht © 201 6, SAS In stitute In c. All r ig hts r ese rve d. Understanding Table and Memory Management in

C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.

#analyticsx

C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.

The CAS failover strategy can tolerate Worker node failure and resume processing without manual intervention or data loss

CAS Technology Quick Facts

Page 18: Understanding Table and Memory Management in …...#AnalyticsX C o p y r ig ht © 201 6, SAS In stitute In c. All r ig hts r ese rve d. Understanding Table and Memory Management in

C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.

#analyticsx

C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.

CAS Tables – What are They?

Page 19: Understanding Table and Memory Management in …...#AnalyticsX C o p y r ig ht © 201 6, SAS In stitute In c. All r ig hts r ese rve d. Understanding Table and Memory Management in

C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.

#analyticsx

C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.

CAS Tables – What are They?

Conceptually simple rectangular Table with Rows and Variables

Page 20: Understanding Table and Memory Management in …...#AnalyticsX C o p y r ig ht © 201 6, SAS In stitute In c. All r ig hts r ese rve d. Understanding Table and Memory Management in

#analyticsx

C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.

Page 21: Understanding Table and Memory Management in …...#AnalyticsX C o p y r ig ht © 201 6, SAS In stitute In c. All r ig hts r ese rve d. Understanding Table and Memory Management in

C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.

#analyticsx

C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.

CAS Tables – What are They?

Conceptually simple rectangular Table with Rows and Variables

New data types support connecting to databases and dynamic data sources without requiring type conversion

Page 22: Understanding Table and Memory Management in …...#AnalyticsX C o p y r ig ht © 201 6, SAS In stitute In c. All r ig hts r ese rve d. Understanding Table and Memory Management in

#analyticsx

C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.

Example layout of some supported Data Types and their values

Page 23: Understanding Table and Memory Management in …...#AnalyticsX C o p y r ig ht © 201 6, SAS In stitute In c. All r ig hts r ese rve d. Understanding Table and Memory Management in

#analyticsx

C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.

Page 24: Understanding Table and Memory Management in …...#AnalyticsX C o p y r ig ht © 201 6, SAS In stitute In c. All r ig hts r ese rve d. Understanding Table and Memory Management in

C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.

#analyticsx

C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.

CAS Tables – What are They?

Conceptually simple rectangular Table with Rows and Variables

New data types support connecting to databases and dynamic data sources without requiring type conversion

Internally CAS Tables are in the HDAT (High-performance DATa) format, specifically designed for the concurrent and distributed data architecture that is central to SAS® Viya™.

Page 25: Understanding Table and Memory Management in …...#AnalyticsX C o p y r ig ht © 201 6, SAS In stitute In c. All r ig hts r ese rve d. Understanding Table and Memory Management in

C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.

#analyticsx

C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.

CAS Tables – What are They?

Conceptually simple rectangular Table with Rows and Variables

New data types support connecting to databases and dynamic data sources without requiring type conversion

Internally CAS Tables are in the HDAT (High-performance DATa) format, specifically designed for the concurrent and distributed data architecture that is central to SAS® Viya™.

Stored HDAT tables have a .sashdat file extension

Page 26: Understanding Table and Memory Management in …...#AnalyticsX C o p y r ig ht © 201 6, SAS In stitute In c. All r ig hts r ese rve d. Understanding Table and Memory Management in

C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.

#analyticsx

C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.

CAS Libraries (caslibs) – What are They?

Page 27: Understanding Table and Memory Management in …...#AnalyticsX C o p y r ig ht © 201 6, SAS In stitute In c. All r ig hts r ese rve d. Understanding Table and Memory Management in

C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.

#analyticsx

C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.

CAS Libraries (caslibs) – What are They?

Similar to SAS Libraries yet unique to the CAS architecture

Page 28: Understanding Table and Memory Management in …...#AnalyticsX C o p y r ig ht © 201 6, SAS In stitute In c. All r ig hts r ese rve d. Understanding Table and Memory Management in

C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.

#analyticsx

C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.

CAS Libraries (caslibs) – What are They?

Similar to SAS Libraries yet unique to the CAS architecture

Represent a data source from which a CAS server can access data

Page 29: Understanding Table and Memory Management in …...#AnalyticsX C o p y r ig ht © 201 6, SAS In stitute In c. All r ig hts r ese rve d. Understanding Table and Memory Management in

C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.

#analyticsx

C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.

CAS Libraries (caslibs) – What are They?

Similar to SAS Libraries yet unique to the CAS architecture

Represent a data source from which a CAS server can access data

An in-memory container for CAS tables

Page 30: Understanding Table and Memory Management in …...#AnalyticsX C o p y r ig ht © 201 6, SAS In stitute In c. All r ig hts r ese rve d. Understanding Table and Memory Management in

C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.

#analyticsx

C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.

CAS Libraries (caslibs) – What are They?

Similar to SAS Libraries yet unique to the CAS architecture

Represent a data source from which a CAS server can access data

An in-memory container holding zero or more CAS tables

Connection info to the data source when required (e.g. databases)

Page 31: Understanding Table and Memory Management in …...#AnalyticsX C o p y r ig ht © 201 6, SAS In stitute In c. All r ig hts r ese rve d. Understanding Table and Memory Management in

C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.

#analyticsx

C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.

CAS Libraries (caslibs) – What are They?

Similar to SAS Libraries yet unique to the CAS architecture

Represent a data source from which a CAS server can access data

An in-memory container holding zero or more CAS tables

Connection info to the data source when required (e.g. databases)

Associated with access controls that define what groups and individual users are authorized to do with caslib contents

Page 32: Understanding Table and Memory Management in …...#AnalyticsX C o p y r ig ht © 201 6, SAS In stitute In c. All r ig hts r ese rve d. Understanding Table and Memory Management in

C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.

#analyticsx

C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.

Native Caslib Data Source Types

Page 33: Understanding Table and Memory Management in …...#AnalyticsX C o p y r ig ht © 201 6, SAS In stitute In c. All r ig hts r ese rve d. Understanding Table and Memory Management in

C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.

#analyticsx

C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.

Native Caslib Data Source Types

CAS provides built-in I/O access to underlying storage systems for .sashdat and .csv files via “native” caslib data source types:

Path – Directory paths available only on the Controller or SMP

Page 34: Understanding Table and Memory Management in …...#AnalyticsX C o p y r ig ht © 201 6, SAS In stitute In c. All r ig hts r ese rve d. Understanding Table and Memory Management in

C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.

#analyticsx

C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.

Native Caslib Data Source Types

CAS provides built-in I/O access to underlying storage systems for .sashdat and .csv files via “native” caslib data source types:

Path – Directory paths available only on the Controller or SMP

DNFS – Distributed Network File System

Page 35: Understanding Table and Memory Management in …...#AnalyticsX C o p y r ig ht © 201 6, SAS In stitute In c. All r ig hts r ese rve d. Understanding Table and Memory Management in

C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.

#analyticsx

C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.

Native Caslib Data Source Types

CAS provides built-in I/O access to underlying storage systems for .sashdat and .csv files via “native” caslib data source types:

Path – Directory paths available only on the Controller or SMP

DNFS – Distributed Network File System

HDFS – Hadoop Distributed File System

Page 36: Understanding Table and Memory Management in …...#AnalyticsX C o p y r ig ht © 201 6, SAS In stitute In c. All r ig hts r ese rve d. Understanding Table and Memory Management in

C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.

#analyticsx

C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.

Path and DNFS Caslib Data Source Types

.sashdat files from Path and DNFS data sources share common SASDNFS storage format that provides data transfer optimizations and enhanced encrypted data-at-rest support.

Page 37: Understanding Table and Memory Management in …...#AnalyticsX C o p y r ig ht © 201 6, SAS In stitute In c. All r ig hts r ese rve d. Understanding Table and Memory Management in

#analyticsx

C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.

Example: yourTable.sashdat stored in a directory referenced by either a Path or DNFS caslib

Page 38: Understanding Table and Memory Management in …...#AnalyticsX C o p y r ig ht © 201 6, SAS In stitute In c. All r ig hts r ese rve d. Understanding Table and Memory Management in

C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.

#analyticsx

C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.

Path and DNFS Caslib Data Source Types

.sashdat files from Path and DNFS data sources have a common SASDNFS storage format that provides data transfer optimizations a enhanced encrypted data-at-rest support.

The SASDNFS format provides seamless .sashdat file access between directories referenced by either Path or DNFS caslibs.

Page 39: Understanding Table and Memory Management in …...#AnalyticsX C o p y r ig ht © 201 6, SAS In stitute In c. All r ig hts r ese rve d. Understanding Table and Memory Management in

C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.

#analyticsx

C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.

Path and DNFS Caslib Data Source Types

.sashdat files from Path and DNFS data sources have a common SASDNFS storage format that provides data transfer optimizations a enhanced encrypted data-at-rest support

The SASDNFS format provides seamless .sashdat file access between directories referenced by either Path or DNFS caslibs

.sashdat files may be freely copied such directories using OS-level copy commands.

Page 40: Understanding Table and Memory Management in …...#AnalyticsX C o p y r ig ht © 201 6, SAS In stitute In c. All r ig hts r ese rve d. Understanding Table and Memory Management in

#analyticsx

C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.

Page 41: Understanding Table and Memory Management in …...#AnalyticsX C o p y r ig ht © 201 6, SAS In stitute In c. All r ig hts r ese rve d. Understanding Table and Memory Management in

#analyticsx

C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.

Page 42: Understanding Table and Memory Management in …...#AnalyticsX C o p y r ig ht © 201 6, SAS In stitute In c. All r ig hts r ese rve d. Understanding Table and Memory Management in

C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.

#analyticsx

C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.

Native Caslib Data Source Types

CAS provides built-in I/O access to underlying storage systems for .sashdat and .csv files via these “native” caslib data source types:

Path – serial I/O to directory paths only available on the Controller

DNFS – parallel I/O to directory paths mounted on all nodes of an MPP CAS grid

HDFS – parallel I/O to co-located Hadoop Distributed File System

Page 43: Understanding Table and Memory Management in …...#AnalyticsX C o p y r ig ht © 201 6, SAS In stitute In c. All r ig hts r ese rve d. Understanding Table and Memory Management in

#analyticsx

C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.

Page 44: Understanding Table and Memory Management in …...#AnalyticsX C o p y r ig ht © 201 6, SAS In stitute In c. All r ig hts r ese rve d. Understanding Table and Memory Management in

C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.

#analyticsx

C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.

Other Supported Caslib Data Source Types

ESP – parallel access from CAS Worker nodes to theSAS® Event Stream Processor

LASR – parallel access from CAS Worker nodes to a remote SAS® LASR Analytic Server.

Amazon S3 – currently under development

Page 45: Understanding Table and Memory Management in …...#AnalyticsX C o p y r ig ht © 201 6, SAS In stitute In c. All r ig hts r ese rve d. Understanding Table and Memory Management in

C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.

#analyticsx

C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.

The CAS Table Actionset

Page 46: Understanding Table and Memory Management in …...#AnalyticsX C o p y r ig ht © 201 6, SAS In stitute In c. All r ig hts r ese rve d. Understanding Table and Memory Management in

C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.

#analyticsx

C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.

addTable

A client session adds data to CAS via a ”ping-pong” process where the client sends chunks of data rows to the server at a time

Page 47: Understanding Table and Memory Management in …...#AnalyticsX C o p y r ig ht © 201 6, SAS In stitute In c. All r ig hts r ese rve d. Understanding Table and Memory Management in

C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.

#analyticsx

C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.

addTable

A client session adds data to CAS via a ”ping-pong” process where the client sends chunks of data rows to the server at a time

The result is an in-memory HDAT table available for processing

Page 48: Understanding Table and Memory Management in …...#AnalyticsX C o p y r ig ht © 201 6, SAS In stitute In c. All r ig hts r ese rve d. Understanding Table and Memory Management in

C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.

#analyticsx

C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.

addTable

A client session adds data to CAS via a ”ping-pong” process where the client sends chunks of data rows to the server at a time

The result is an in-memory HDAT table available for processing

The CAS Libname Engine uses addTable for SAS Version 9 clients

Page 49: Understanding Table and Memory Management in …...#AnalyticsX C o p y r ig ht © 201 6, SAS In stitute In c. All r ig hts r ese rve d. Understanding Table and Memory Management in

C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.

#analyticsx

C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.

addTable

A client session adds data to CAS via a ”ping-pong” process where the client sends chunks of data rows to the server at a time

The result is an in-memory HDAT table available for processing

The CAS Libname Engine uses addTable for SAS Version 9 clients

Python, Java, and Lua clients support the addTable action through syntax friendly to each language respectively

Page 50: Understanding Table and Memory Management in …...#AnalyticsX C o p y r ig ht © 201 6, SAS In stitute In c. All r ig hts r ese rve d. Understanding Table and Memory Management in

C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.

#analyticsx

C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.

loadTable A CAS server session creates an in-memory HDAT table by loading

data from a data source described in a CAS Library (Caslib). This may require a database or SAS® Event Stream server connection.

Page 51: Understanding Table and Memory Management in …...#AnalyticsX C o p y r ig ht © 201 6, SAS In stitute In c. All r ig hts r ese rve d. Understanding Table and Memory Management in

C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.

#analyticsx

C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.

loadTable A CAS server session creates an in-memory HDAT table by loading

data from a data source described in a CAS Library (Caslib). This may require a database or SAS® Event Stream server connection.

Many data sources require loadTable to transform the data as the in-memory HDAT table is created. For example, the SAS Data Connectors which provide access to a variety of data bases.

Page 52: Understanding Table and Memory Management in …...#AnalyticsX C o p y r ig ht © 201 6, SAS In stitute In c. All r ig hts r ese rve d. Understanding Table and Memory Management in

C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.

#analyticsx

C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.

loadTable A CAS server session creates an in-memory HDAT table by loading

data from a data source described in a CAS Library (Caslib). This may require a data base or SAS® Event Stream server connection.

Many data sources require loadTable to transform the data as the in-memory HDAT table is created. For example, the SAS Data Connectors which provide access to a variety of data bases.

Native CAS HDAT tables stored in .sashdat files may be loaded directly without requiring data transformation.

Page 53: Understanding Table and Memory Management in …...#AnalyticsX C o p y r ig ht © 201 6, SAS In stitute In c. All r ig hts r ese rve d. Understanding Table and Memory Management in

C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.

#analyticsx

C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.

loadTable A CAS server session creates an in-memory HDAT table by loading

data from a data source described in a CAS Library (Caslib). This may require a data base or SAS® Event Stream server connection.

Many data sources require loadTable to transform the data as the in-memory HDAT table is created. For example, the SAS Data Connectors which provide access to a variety of data bases.

Native CAS HDAT tables stored in .sashdat files may be loaded directly without requiring data transformation.

Performance Tip: Use the table save action to store frequently used HDAT tables as .sashdat files. Your data will load faster!

Page 54: Understanding Table and Memory Management in …...#AnalyticsX C o p y r ig ht © 201 6, SAS In stitute In c. All r ig hts r ese rve d. Understanding Table and Memory Management in

C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.

#analyticsx

C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.

Current Data Connectors for loadTable

SAS® Data Connector and Data Connect Accelerator to Hive

SAS® Data Connector to ODBC

SAS® Data Connector to Oracle

SAS® Data Connector to PC Files

SAS® Data Connector and Data Connect Accelerator to Teradata

SAS® Data Connector to SAS Data Sets. This includes both serial and parallel support for loading .sas7bdat files directly into CAS.

Others planned or currently under development

Page 55: Understanding Table and Memory Management in …...#AnalyticsX C o p y r ig ht © 201 6, SAS In stitute In c. All r ig hts r ese rve d. Understanding Table and Memory Management in

C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.

#analyticsx

C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.

partition

Redistributes in-memory CAS table rows by grouping them together physically in contiguous HDAT table block locations on the same Worker node. This involves “shuffling” in MPP grid parlance.

Page 56: Understanding Table and Memory Management in …...#AnalyticsX C o p y r ig ht © 201 6, SAS In stitute In c. All r ig hts r ese rve d. Understanding Table and Memory Management in

C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.

#analyticsx

C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.

partition

Redistributes in-memory CAS table rows by grouping them together physically in contiguous HDAT table block locations on the same Worker node. This involves “shuffling” in MPP grid parlance.

HDAT Partitions are created according to groupBy variable(s) values.

Page 57: Understanding Table and Memory Management in …...#AnalyticsX C o p y r ig ht © 201 6, SAS In stitute In c. All r ig hts r ese rve d. Understanding Table and Memory Management in

C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.

#analyticsx

C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.

partition

Redistributes in-memory CAS table rows by grouping them together physically in contiguous HDAT table block locations on the same Worker node. This involves “shuffling” in MPP grid parlance.

HDAT Partitions are created according to groupBy variable(s) values.

Partitioning enables distributed By Group processing in CAS

Page 58: Understanding Table and Memory Management in …...#AnalyticsX C o p y r ig ht © 201 6, SAS In stitute In c. All r ig hts r ese rve d. Understanding Table and Memory Management in

C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.

#analyticsx

C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.

partition

Redistributes in-memory CAS table rows by grouping them together physically in contiguous HDAT table block locations on the same Worker node. This involves “shuffling” in MPP grid parlance.

HDAT Partitions are created according to groupBy variable(s) values.

Partitioning enables distributed By Group processing in CAS

Each partition is represented by its partition key value and this requires memory (and storage) overhead for the HDAT table.

Page 59: Understanding Table and Memory Management in …...#AnalyticsX C o p y r ig ht © 201 6, SAS In stitute In c. All r ig hts r ese rve d. Understanding Table and Memory Management in

C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.

#analyticsx

C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.

partition

Redistributes in-memory CAS table rows by grouping them together physically in contiguous HDAT table block locations on the same Worker node. This involves “shuffling” in MPP grid parlance.

HDAT Partitions are created according to groupBy variable(s) values.

Partitioning enables distributed By Group processing in CAS

Each partition is represented by its partition key value and this requires memory (and storage) overhead for the HDAT table.

Therefore the cardinality of groupBy variable values (relative to total table rows) should be considered when partitioning CAS tables.

Page 60: Understanding Table and Memory Management in …...#AnalyticsX C o p y r ig ht © 201 6, SAS In stitute In c. All r ig hts r ese rve d. Understanding Table and Memory Management in

C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.

#analyticsx

C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.

promote Promotes an in-memory HDAT table from session to global scope

Page 61: Understanding Table and Memory Management in …...#AnalyticsX C o p y r ig ht © 201 6, SAS In stitute In c. All r ig hts r ese rve d. Understanding Table and Memory Management in

C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.

#analyticsx

C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.

promote Promotes an in-memory HDAT table from session to global scope

CAS uses OS calls to memory-map HDAT blocks during loadTable on each Worker node in MPP (or on a single SMP node).

Page 62: Understanding Table and Memory Management in …...#AnalyticsX C o p y r ig ht © 201 6, SAS In stitute In c. All r ig hts r ese rve d. Understanding Table and Memory Management in

C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.

#analyticsx

C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.

promote Promotes an in-memory HDAT table from session to global scope

CAS uses OS calls to memory-map HDAT blocks during loadTable on each Worker node in MPP (or on a solo SMP node)

Memory-mapped HDAT blocks are subject to OS paging

Page 63: Understanding Table and Memory Management in …...#AnalyticsX C o p y r ig ht © 201 6, SAS In stitute In c. All r ig hts r ese rve d. Understanding Table and Memory Management in

C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.

#analyticsx

C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.

promote Promotes an in-memory HDAT table from session to global scope

CAS uses OS calls to memory-map HDAT blocks during loadTable on each Worker node in MPP (or on a solo SMP node)

Memory-mapped HDAT blocks are subject to OS paging

The promote actions transfers a copy of the internal data structures representing mapped HDAT blocks to the CAS main process

Page 64: Understanding Table and Memory Management in …...#AnalyticsX C o p y r ig ht © 201 6, SAS In stitute In c. All r ig hts r ese rve d. Understanding Table and Memory Management in

#analyticsx

C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.

Page 65: Understanding Table and Memory Management in …...#AnalyticsX C o p y r ig ht © 201 6, SAS In stitute In c. All r ig hts r ese rve d. Understanding Table and Memory Management in

C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.

#analyticsx

C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.

promote Promotes an in-memory HDAT table from session to global scope

CAS uses OS calls to memory-map HDAT blocks during loadTable on each Worker node in MPP (or on a solo SMP node)

Memory-mapped HDAT blocks are subject to OS paging

The promote actions transfers a copy of the internal data structures representing mapped HDAT blocks to the CAS main process

Once its HDAT blocks are represented in the CAS main process, the table is “global” and can be accessed by other sessions. CAS access controls strictly govern this based on session user identity

Page 66: Understanding Table and Memory Management in …...#AnalyticsX C o p y r ig ht © 201 6, SAS In stitute In c. All r ig hts r ese rve d. Understanding Table and Memory Management in

#analyticsx

C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.

Page 67: Understanding Table and Memory Management in …...#AnalyticsX C o p y r ig ht © 201 6, SAS In stitute In c. All r ig hts r ese rve d. Understanding Table and Memory Management in

C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.

#analyticsx

C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.

More Table Actions

columnInfo – returns type, length, format info for table columns

Page 68: Understanding Table and Memory Management in …...#AnalyticsX C o p y r ig ht © 201 6, SAS In stitute In c. All r ig hts r ese rve d. Understanding Table and Memory Management in

C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.

#analyticsx

C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.

More Table Actions

columnInfo – returns type, length, format info for table columns

fetch – returns table row data

Page 69: Understanding Table and Memory Management in …...#AnalyticsX C o p y r ig ht © 201 6, SAS In stitute In c. All r ig hts r ese rve d. Understanding Table and Memory Management in

C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.

#analyticsx

C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.

More Table Actions

columnInfo – returns type, length, format info for table columns

fetch – returns table row data

update – selects rows using where= and changes specified variables to new values for an in-memory HDAT table

Page 70: Understanding Table and Memory Management in …...#AnalyticsX C o p y r ig ht © 201 6, SAS In stitute In c. All r ig hts r ese rve d. Understanding Table and Memory Management in

C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.

#analyticsx

C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.

More Table Actions

columnInfo – returns type, length, format info for table columns

fetch – returns table row data

update – selects rows using where= and changes specified variables to new values for an in-memory HDAT table

save – saves a table in .sashdat or .csv format to Path, DNFS, or HDFS caslib source types (others in progress)

Page 71: Understanding Table and Memory Management in …...#AnalyticsX C o p y r ig ht © 201 6, SAS In stitute In c. All r ig hts r ese rve d. Understanding Table and Memory Management in

C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.

#analyticsx

C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.

More Table Actions

columnInfo – returns type, length, format info for table columns

fetch – returns table row data

update – selects rows using where= and changes specified variables to new values for an in-memory HDAT table

save – saves a table in .sashdat or .csv format to Path, DNFS, or HDFS caslib source types (others in progress)

tableInfo – returns name, row and column counts, created and last modified datetime, caslib name, and other status

Page 72: Understanding Table and Memory Management in …...#AnalyticsX C o p y r ig ht © 201 6, SAS In stitute In c. All r ig hts r ese rve d. Understanding Table and Memory Management in

C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.

#analyticsx

C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.

More Table Actions

columnInfo – returns type, length, format info for table columns

fetch – returns table row data

update – selects rows using where= and changes specified variables to new values for an in-memory HDAT table

save – saves a table in .sashdat or .csv format to Path, DNFS, or HDFS caslib source types (others in progress)

tableInfo – returns name, row and column counts, created and last modified datetime, caslib name, and other status

tableDetails – returns information about block distribution and memory usage for loaded HDAT tables

Page 73: Understanding Table and Memory Management in …...#AnalyticsX C o p y r ig ht © 201 6, SAS In stitute In c. All r ig hts r ese rve d. Understanding Table and Memory Management in

C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.

#analyticsx

C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.

CAS Table-related Interfaces in SAS 9

CAS Libname Engine: invokes the CAS Table addTable and fetchactions on behalf of SAS code via libname.table syntax.

Page 74: Understanding Table and Memory Management in …...#AnalyticsX C o p y r ig ht © 201 6, SAS In stitute In c. All r ig hts r ese rve d. Understanding Table and Memory Management in

C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.

#analyticsx

C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.

CAS Table-related Interfaces in SAS 9

CAS Libname Engine: invokes the CAS Table addTable and fetchactions on behalf of SAS code via libname.table syntax.

proc casutil: provides SAS 9 code with the ability to invoke “bread and butter” CAS table actions: loadTable, promote, save, columnInfo, drop, deleteSource, etc.

Page 75: Understanding Table and Memory Management in …...#AnalyticsX C o p y r ig ht © 201 6, SAS In stitute In c. All r ig hts r ese rve d. Understanding Table and Memory Management in

C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.

#analyticsx

C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.

CAS Table-related Interfaces in SAS 9

CAS Libname Engine: invokes the CAS Table addTable and fetchactions on behalf of SAS code via libname.table syntax.

proc casutil: provides SAS 9 code with the ability to invoke “bread and butter” CAS table actions: loadTable, promote, save, columnInfo, drop, deleteSource, etc.

proc cas: implements the new CASL language which currently can invoke nearly all CAS Table actions (addTable not ready yet).

Page 76: Understanding Table and Memory Management in …...#AnalyticsX C o p y r ig ht © 201 6, SAS In stitute In c. All r ig hts r ese rve d. Understanding Table and Memory Management in

C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.

#analyticsx

C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.

CAS Table Access From SAS 9.4

Page 77: Understanding Table and Memory Management in …...#AnalyticsX C o p y r ig ht © 201 6, SAS In stitute In c. All r ig hts r ese rve d. Understanding Table and Memory Management in

C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.

#analyticsx

C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.

Connecting to CAScas s1 host=”xxxxx.xxx.xxx.xxx" port=10317

cassessopts=(caslib=CASUSER);

Page 78: Understanding Table and Memory Management in …...#AnalyticsX C o p y r ig ht © 201 6, SAS In stitute In c. All r ig hts r ese rve d. Understanding Table and Memory Management in

C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.

#analyticsx

C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.

Connecting to CAS – SAS Log results

Page 79: Understanding Table and Memory Management in …...#AnalyticsX C o p y r ig ht © 201 6, SAS In stitute In c. All r ig hts r ese rve d. Understanding Table and Memory Management in

C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.

#analyticsx

C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.

Loading an Table into CAS

Page 80: Understanding Table and Memory Management in …...#AnalyticsX C o p y r ig ht © 201 6, SAS In stitute In c. All r ig hts r ese rve d. Understanding Table and Memory Management in

C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.

#analyticsx

C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.

tableDetails action on loaded Table

Page 81: Understanding Table and Memory Management in …...#AnalyticsX C o p y r ig ht © 201 6, SAS In stitute In c. All r ig hts r ese rve d. Understanding Table and Memory Management in

C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.

#analyticsx

C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.

Promoting a Table from Session to Global Scope

Page 82: Understanding Table and Memory Management in …...#AnalyticsX C o p y r ig ht © 201 6, SAS In stitute In c. All r ig hts r ese rve d. Understanding Table and Memory Management in

C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.

#analyticsx

C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.

Running Summary against the Global Table from a New Session

Notice that the Table did not require an explicit load into session S2

Page 83: Understanding Table and Memory Management in …...#AnalyticsX C o p y r ig ht © 201 6, SAS In stitute In c. All r ig hts r ese rve d. Understanding Table and Memory Management in

C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.

#analyticsx

C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.

Examining Global Table Memory Consumption on Worker Nodes

Page 84: Understanding Table and Memory Management in …...#AnalyticsX C o p y r ig ht © 201 6, SAS In stitute In c. All r ig hts r ese rve d. Understanding Table and Memory Management in

C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.

#analyticsx

C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.

Correlation action with singlePass, vars list, and where clause

Page 85: Understanding Table and Memory Management in …...#AnalyticsX C o p y r ig ht © 201 6, SAS In stitute In c. All r ig hts r ese rve d. Understanding Table and Memory Management in

#analyticsx

C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.

Questions?

Page 86: Understanding Table and Memory Management in …...#AnalyticsX C o p y r ig ht © 201 6, SAS In stitute In c. All r ig hts r ese rve d. Understanding Table and Memory Management in

C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.

#AnalyticsX