sap hana sps10- series data/ timeseries
TRANSCRIPT
1© 2014 SAP AG or an SAP affiliate company. All rights reserved.
SAP HANA SPS 10 - What’s New? Series Data / TimeSeries
SAP HANA Product Management June, 2015
(Delta from SPS 09 to SPS 10)
© 2015 SAP SE or an SAP affiliate company. All rights reserved. 2Public
Disclaimer
This presentation outlines our general product direction and should not be relied on in making
a purchase decision. This presentation is not subject to your license agreement or any other
agreement with SAP.
SAP has no obligation to pursue any course of business outlined in this presentation or to
develop or release any functionality mentioned in this presentation. This presentation and
SAP’s strategy and possible future developments are subject to change and may be changed
by SAP at any time for any reason without notice.
This document is provided without a warranty of any kind, either express or implied, including
but not limited to, the implied warranties of merchantability, fitness for a particular purpose, or
non-infringement. SAP assumes no responsibility for errors or omissions in this document,
except if such damages were caused by SAP intentionally or grossly negligent.
© 2015 SAP SE or an SAP affiliate company. All rights reserved. 3Public
Agenda
Overview
– Series Data Overview
– SPS09 Summary
– SPS10 Overview
Store Enhancements
– Enhanced Support for Equidistant Series
– Support for Equidistant Series with Multiple Increments, Offsets
Query Enhancements
– Updates to SERIES_ROUND
Analytic Enhancements
– New Analytic Functions
Overview
© 2015 SAP SE or an SAP affiliate company. All rights reserved. 5Public
Series Data Overview
Series Data synonymous with Time Series
Series Data support introduced in SPS09 as a core SAP HANA capability
Series Data - What is it?
– Ordered sequence of data points/measurements
– Measured at points in time or within time intervals
o E.g. Discrete measurement taken from a sensor at every 10s
o E.g. Energy consumed by a home for every 15 minute interval (smart metering)
Series Data - What do we do with it?
– Analyze and predict
o Extract useful statistical information
o Forecasting
Series Data – Relevance?
– Foundational technology for IoT
o Industry 4.0 / Industrial Internet of Things (IIoT)
o IT/OT Convergence
© 2015 SAP SE or an SAP affiliate company. All rights reserved. 6Public
Series Data – SPS09 Review
Support very high volumes of data using effective compression techniques
– Non-lossy compression; all values originally inserted are accessible for auditing/regulatory purpose
Support both equidistant and non-equidistant data
– Often, source data will be non-equidistant; it will then be “snapped” to an equidistant “grid” for analysis, model
fitting, etc.
Allow time series manipulation, cleaning, and analytic operations to be expressed naturally in SQL while
maintaining high performance
– Table Creation via CREATE COLUMN TABLE extensions for Series Data
– Efficient grouping to different granularities (GROUP BY SERIES_ROUND(…))
– Built in SQL functions for efficient handling of Series Data
o SERIES_GENERATE; SERIES_DISAGGREGATE; SERIES_ROUND; SERIES_PERIOD_TO_ELEMENT;
SERIES_ELEMENT_TO_PERIOD
– New Analytical SQL Functions
o CORR; CORR_SPEARMAN; LINEAR_APPROX; MEDIAN
© 2015 SAP SE or an SAP affiliate company. All rights reserved. 7Public
Series Data – SPS10 Overview
Handle timestamp data that is NOT equidistant with a single offset in the entire table
– With good compression for reduced memory consumption
– Range block indexes for efficient handling of range queries
Enhance SERIES_ROUND with new rounding modes and to accept an offset
– Enhanced usability and greater expressive power for querying series data
New aggregate and window functions
CDS Support
Store
Equidistant series w/ any alignment
Generated rounded columns
Piecewise equidistant series
Query
Round to computed interval
Granulize (any offset)
Analyze AUTO_CORR, CROSS_CORR
BINNING
CUBIC_SPLINE_APPROX
DFT
RANDOM_PARTITION
SERIES_FILTER
WEIGHTED_AVG
Sliding window support
{FIRST/NTH/LAST}_VALUE
Store EnhancementsEnhanced Support for Equidistant Series
© 2015 SAP SE or an SAP affiliate company. All rights reserved. 9Public
Enhanced Support for Equidistant SeriesLimitations of SPS09
Restrictions/Limitations in SPS09 on Equidistant Series
– Only one equidistant property per table
o i.e. Only a single INCREMENT BY is supported; Defined at table creation time; Applies to all of the series in the table
o Efficient compression can be provided on the timestamp column (but it had to be exactly aligned on the increment
boundary). i.e. no support for any offset
o Can be encoded as a line t = mx (i.e. single slope ‘m’, no offset from the INCREMENT boundary)
– Data needed to be ordered on INSERT (ordered by ‘Series Key, TimeStamp’) for good compression
SPS09 Equidistant series support works great for series data and use cases that meet the above criteria
© 2015 SAP SE or an SAP affiliate company. All rights reserved. 10Public
Enhanced Support for Equidistant SeriesMany use cases require more flexible handling of timestamps/periods
But, many use cases where
– ‘runs of data’ where timestamps for consecutive data points differ by a constant interval
o i.e. data effectively has multiple INCREMENTs
o can be due to different intervals for different series in table
o can be due to different intervals within single series in table
– timestamps are not necessarily aligned to INCREMENT boundaries
o i.e. offsets can exist from the INCREMENT boundaries
– often there may be slight local variations in the timestamp, i.e. some “jitter”
© 2015 SAP SE or an SAP affiliate company. All rights reserved. 11Public
New Representation For Timestamps
Encode series timestamps/periods as t = mx + b + j
t = mx + b + j • x integer value (monotonically increasing)
• m represents slope (i.e. represents INCREMENT BY)
• b is an offset value (locally constant)
• j is a jitter value (can have few distinct values)
Offers good compression even with different slopes and offsets in the series
– Slight differences from ideal line representation and recorded timestamps (j) represented
efficiently with n-bit compression
Enables support for alternate periods
– Useful when the period column needs to be offset by some constant
o e.g. for time zone differences; for daylight savings time; differences in starting day of week etc.
© 2015 SAP SE or an SAP affiliate company. All rights reserved. 12Public
Grammar Updates to Support Equidistant Piecewise SeriesSupported via CDS
Note: New syntax currently only supported via CDS and not via CREATE TABLE
– CREATE TABLE support may be provided in a future version
– Use of syntax via SQL statement will give errors
series_definition := SERIES ‘(‘ series_spec_list ‘)’
series_spec_list: SERIES KEY '(' column_name_list ')'
| NO MINVALUE | MINVALUE str_const
| NO MAXVALUE MAXVALUE str_const
| PERIOD FOR SERIES ‘(‘ {column|NULL} [‘,’ {column|NULL}] ‘)’
| series_equidistant_definition
| reorganize_process
| ALTERNATE PERIOD FOR SERIES (column [, column ...])
series_equidistant_definition:
NOT EQUIDISTANT
| EQUIDISTANT INCREMENT BY constant
[MISSING ELEMENTS [NOT] ALLOWED]
| EQUIDISTANT PIECEWISE
© 2015 SAP SE or an SAP affiliate company. All rights reserved. 13Public
Grammar Updates to Support Equidistant Piecewise SeriesSupported via CDS
entity Weather {
station_id String(3) not null;
ts_utc UTCTimestamp not null; -- UTC time at start of period
ts_local UTCTimestamp not null; -- local time at start of period
temp Decimal(3,1) not null; -- mean temp ℃wind_speed Decimal(2) null; -- wind speed (Km/h)
ts_utc_month UTCTimestamp not null; -- period rounded to months
GENERATED ALWAYS AS SERIES_ROUND(ts_utc,’INTERVAL 1 MONTH’);
} SERIES (
SERIES KEY(station_id)
EQUIDISTANT PIECEWISE
PERIOD FOR SERIES (ts_utc)
ALTERNATE PERIOD FOR SERIES(ts_local)
)
CREATE COLUMN TABLE Weather_(
station_id varchar(3) NOT NULL,
ts_utc_ timestamp NULL, --
ts_utc_x_ integer default 0 NOT NULL,
ts_utc_m_ decimal default 1 NOT NULL,
ts_utc_b_ timestamp default timestamp’0001-01-01 00:00’ NOT NULL,
ts_utc_j_ decimal default 0 NOT NULL,
ts_local_ timestamp NULL,
ts_local_d_ decimal default 1 NOT NULL,
temp decimal(3,1) NOT NULL,
wind_speed decimal(2) NULL,
ts_utc_month TIMESTAMP NOT NULL
GENERATED ALWAYS AS
SERIES_ROUND(
COALESCE(ts_utc,
ADD_SECONDS(_series_b, _series_m*_series_x +_series_j))
,’INTERVAL 1 MONTH’)
flags_ int default 0 not null,
) SERIES (
SERIES KEY(station_id)
EQUIDISTANT INCREMENT BY 1
PERIOD FOR SERIES (ts_utc_x)
)
CREATE VIEW Weather AS
SELECT station_id,
COALESCE(ts_utc_, ADD_SECONDS(ts_utc_b_, ts_utc_m_* ts_utc_x_ + ts_utc_j_)
) as ts_utc,
COALESCE(ts_local_, ADD_SECONDS(ts_utc_b_, ts_utc_m_* ts_utc_x_ + ts_utc_j_+ ts_local_o_)
) as ts_local,
temp, wind_speed, ts_utc_month
FROM Weather_;
On activation of
CDS Document
Logical Representation of the series table
Physical Representation of the series table
CDS specification
© 2015 SAP SE or an SAP affiliate company. All rights reserved. 14Public
Representation for Equidistant Piecewise Series
CREATE COLUMN TABLE Weather_(
station_id varchar(3) NOT NULL,
ts_utc_ timestamp NULL, --
ts_utc_x_ integer default 0 NOT NULL,
ts_utc_m_ decimal default 1 NOT NULL,
ts_utc_b_ timestamp default timestamp’0001-01-01 00:00’ NOT NULL,
ts_utc_j_ decimal default 0 NOT NULL,
ts_local_ timestamp NULL,
ts_local_d_ decimal default 1 NOT NULL,
temp decimal(3,1) NOT NULL,
wind_speed decimal(2) NULL,
ts_utc_month TIMESTAMP NOT NULL
GENERATED ALWAYS AS
SERIES_ROUND(
COALESCE(ts_utc,
ADD_SECONDS(_series_b, _series_m*_series_x +_series_j))
,’INTERVAL 1 MONTH’)
flags_ int default 0 not null,
) SERIES (
SERIES KEY(station_id)
EQUIDISTANT INCREMENT BY 1
PERIOD FOR SERIES (ts_utc_x)
)
Physical Representation of the series table • On first insert ts_utc_ is stored unmodified
• After a reorg step the x, m, b, j (ts_utc_x_, etc) are calculated, and ts_utc_ is set to
NULL
• The view is defined to correctly read the original time stamp value or the calculated
timestamp value after the reorganization.
• Using COALESCE
• Reorg is via ALTER TABLE SERIES REORGANIZE command
• Needs to be instantiated by user
• Generated Rounded Columns: Use rounded period columns for good
performance on range queries
Note: in SPS10, the j component is not yet realized. It is set to 0. This will be fixed in a subsequent release.
© 2015 SAP SE or an SAP affiliate company. All rights reserved. 15Public
Equidistant Piecewise Series – Reorg stepALTER TABLE SERIES REORGANIZE for compression
• On first INSERT the period columns (including alternate period columns) are stored as is (i.e. uncompressed form)
• ALTER TABLE SERIES REORGANIZE is required to store timestamps in their equidistant piecewise form (i.e. x,m,b,j
components) which provides compression
• Reorders the rows by (series key, period) by deleting existing rows (deletion gives good $rowid$ compression by
ensuring rowid matches timely order)
• Equidistant piecewise representation components are calculated (i.e. m, x, b, j) to give good compression while
maintaining the correct timestamp value
• Sets the period column to NULL (after this the timestamps get calculated via the components)
• ALTER TABLE SERIES REORGANIZE
• Needs to be user instantiated
• Can be run against subsets of data (e.g. partitions) and be limited to processing a fixed number of rows during a
run
• Will find the rows that are not optimally encoded and process them
• Should be run against sufficiently large sets of rows (1000’s to 100’s thousands) for good compression
• Is resource intensive – so best run during quiet periods
• M_SERIES_TABLE monitor view returns various statistics on series tables, including no. of rows reorganized
© 2015 SAP SE or an SAP affiliate company. All rights reserved. 16Public
Generated Rounded ColumnsRounded Period Columns for Better Performance on Range Predicates and OLAP Queries
• Generated rounded columns can be used to store period or alternate period columns rounded to a more coarse level
(e.g. day, week, month)
• Have great compression
• Are optional
• Multiple such columns can be created (on different period columns, different levels of coarseness)
• Used automatically by server for improved performance of range predicates on the original column; as well as for OLAP
queries (server can limit no of rows for which exact timestamps need to be calculated)
•
CREATE COLUMN TABLE Weather_(
station_id varchar(3) NOT NULL,
ts_utc_ timestamp NULL, -- ,
…
ts_utc_month TIMESTAMP NOT NULL
GENERATED ALWAYS AS
SERIES_ROUND(
COALESCE(ts_utc,
ADD_SECONDS(_series_b, _series_m*_series_x +_series_j))
,’INTERVAL 1 MONTH’)
…
) SERIES (
SERIES KEY(station_id)
EQUIDISTANT INCREMENT BY 1
PERIOD FOR SERIES (ts_utc_x)
)
• Generated Rounded Columns These store values rounded to a
coarser interval
© 2015 SAP SE or an SAP affiliate company. All rights reserved. 17Public
SummaryBenefits of Equidistant Piecewise Representation
Order-Independent INSERT w/ no degradation in compression
Good compression for multiple INCREMENT BY scenarios
Good Compression for scenarios with multiple offsets from zero in
timestamp
Good Compression for scenarios where timestamps have jitter
Support for local time variations w/ good compression
Efficient range comparisons on timestamp columns
Efficient GROUP BY for timestamp columns
Query EnhancementsSERIES_ROUND Updates
© 2015 SAP SE or an SAP affiliate company. All rights reserved. 19Public
SERIES_ROUNDNew Rounding Modes & Non-Zero Alignment
• New rounding modes especially useful for intervals of months, years => months and years have variable lengths!
• The default rounding mode is ROUND_HALF_UP
• The <alignment_expression> allows specification of a non-zero alignment for the interval datatype
• Allows MINVALUE to have a non-zero offset
• E.g. Allows for summarizing weeks that begin with Mondays (as opposed to Saturdays which is the natural zero 0001-01-01 for the datetime
data type
• Interval widths (INCREMENT BY) can be dynamically specified
Mode Semantics
ROUND_HALF_UP Default value.
The value is rounded to the nearest series value. Values that fall halfway between two series values are rounded up away from zero.
ROUND_HALF_DOWN The value is rounded to the nearest series value. Values that fall halfway between two round values are rounded down towards zero.
ROUND_HALF_EVEN The value is rounded to the nearest series value. Values that fall halfway between two rounded values are rounded to the even series
value based on element number.
ROUND_UP The value is always rounded away from zero, to the larger series value.
ROUND_DOWN The value is always rounded towards zero, to the smaller series value.
ROUND_CEILING The value is always rounded in a positive direction, to the larger series value.
ROUND_FLOOR The value is always rounded in a negative direction, to the smaller series value.
SERIES_ROUND (<value>, {<increment_by> | SERIES TABLE <series_table>} [,
<rounding_mode> [, <alignment_expression>]])
New Rounding Modes
© 2015 SAP SE or an SAP affiliate company. All rights reserved. 20Public
SERIES_ROUNDExamples of Rounding with Month and Year Intervals
Period Length Expression Result with Default
ROUND_HALF_UP
28 days SERIES_ROUND(‘2014-02-14 23:59:59’, ‘INTERVAL 1 MONTH’) 2014-02-01 00:00:00
SERIES_ROUND(‘2014-02-15 00:00:00’, ‘INTERVAL 1 MONTH’) 2014-03-01 00:00:00
29 days SERIES_ROUND(‘2012-02-15 11:59:59’, ‘INTERVAL 1 MONTH’) 2012-02-01 00:00:00
SERIES_ROUND(‘2012-02-15 12:00:00’, ‘INTERVAL 1 MONTH’) 2012-03-01 00:00:00
30 days SERIES_ROUND(‘2014-04-15 23:59:59’, ‘INTERVAL 1 MONTH’) 2014-04-01 00:00:00
SERIES_ROUND(‘2014-04-16 00:00:00’, ‘INTERVAL 1 MONTH’) 2014-05-01 00:00:00
31 days SERIES_ROUND(‘2014-01-16 11:59:59’, ‘INTERVAL 1 MONTH’) 2014-01-01 00:00:00
SERIES_ROUND(‘2014-01-16 12:00:00’, ‘INTERVAL 1 MONTH’) 2014-02-01 00:00:00
59 days
31+28
SERIES_ROUND(‘2014-01-30 11:59:59’, ‘INTERVAL 2 MONTH’) 2014-01-01 00:00:00
SERIES_ROUND(‘2014-01-30 12:00:00’, ‘INTERVAL 2 MONTH’) 2014-03-01 00:00:00
92 days
31+31+30
SERIES_ROUND(‘2014-08-15 23:59:59’, ‘INTERVAL 3 MONTH’) 2014-07-01 00:00:00
SERIES_ROUND(‘2014-08-16 00:00:00’, ‘INTERVAL 3 MONTH’) 2014-10-01 00:00:00
Period Length Expression Result with Default
ROUND_HALF_UP
365 days SERIES_ROUND(‘2014-07-02 11:59:59’, ‘INTERVAL 1 YEAR’) 2014-01-01 00:00:00
SERIES_ROUND(‘2014-07-02 12:00:00’, ‘INTERVAL 1 YEAR’) 2015-01-01 00:00:00
366 days SERIES_ROUND(‘2012-07-01 23:59:59’, ‘INTERVAL 1 YEAR’) 2012-01-01 00:00:00
SERIES_ROUND(‘2012-07-02 00:00:00’, ‘INTERVAL 1 YEAR’) 2013-01-01 00:00:00
730 days
365+365
SERIES_ROUND(‘2014-12-31 23:59:59’, ‘INTERVAL 2 YEAR’) 2014-01-01 00:00:00
SERIES_ROUND(‘2015-01-01 00:00:00’, ‘INTERVAL 2 YEAR’) 2016-01-01 00:00:00
731 days
366+365
SERIES_ROUND(‘2012-12-31 11:59:59’, ‘INTERVAL 2 YEAR’) 2012-01-01 00:00:00
SERIES_ROUND(‘2012-12-31 12:00:00’, ‘INTERVAL 2 YEAR’) 2014-01-01 00:00:00
Note that the rounding result depends on the no of days in the period!
© 2015 SAP SE or an SAP affiliate company. All rights reserved. 21Public
SERIES_ROUNDExamples of Rounding with Specified Alignment Values
Expression Result Explain
SERIES_ROUND(8, 10, 3) 13 because 8 is the midpoint between 3 and 13 and the
default rounding mode ROUND_HALF_UP rounds away
from 0.
SERIES_ROUND(5, 10, 3) 3 because 5 is closer to 3 than 13
SERIES_ROUND(12, 10, 3) 13 because 12 is closer to 13 than 3
SERIES_ROUND(19, 10, 3) 23 because 19 is closer to 23 than 13
SERIES_ROUND( ‘2015-02-27’ , ‘INTERVAL 7 DAY’, ‘2015-01-05 09:00:00’,
ROUND_UP )
‘2015-03-02 09:00:00’ because 2015-01-05 is a Monday, and 2015-02-27 is a
Friday that is closer to Monday 2015-03-02 than to
Monday 2015-02-23.
SERIES_ROUND( ‘2015-03-01’ , ‘INTERVAL 2 MONTH’, ‘2014-02-01’) ‘2015-02-01’ because ‘2015-03-01’ lies closer to ‘2015-02-01’ than to
‘2015-04-01’
© 2015 SAP SE or an SAP affiliate company. All rights reserved. 22Public
SERIES_ROUNDRounding to an Evaluated Interval Width
Some use cases require a dynamic granularity for the interval width
E.g. To split data into n buckets per year (where n is a variable):
SELECT bucket, max(value)FROM (
SELECT SERIES_ROUND(ts,'interval ' || 3600*24*365/n || ' second' ) as bucket, value
FROM T ) DGROUP BY bucket
Analytic EnhancementsNew Analytic Functions
© 2015 SAP SE or an SAP affiliate company. All rights reserved. 24Public
Analytic FunctionsSummary of New functions
Function Description
AUTO_CORR(col,maxlag {SERIES(…)
| ORDER BY c1, …})
Aggregate to computes all autocorrelation coefficients for a given input column.
DFT(col,N{SERIES(…) | ORDER BY c1,…} ).{REAL|IMAGINARY|AMPLITUDE|PHASE}
Aggregate to computes the Discrete Fourier Transform of a column for the first N values
and return an array with exactly N elements.
FIRST_VALUE(col ORDER BY c1,…) [RESPECT NULLS | IGNORE NULLS] Aggregate function to return first value (with given ordering)
LAST_VALUE(col ORDER BY c1,…) [RESPECT NULLS | IGNORE NULLS] Aggregate function to return last value (with given ordering)
NTH_VALUE(col, n ORDER BY c1,…) [RESPECT NULLS | IGNORE NULLS] Aggregate function to return n’th value (with given ordering)
CUBIC_SPLINE_APPROX(col, type, mode, par1, par2 ) OVER (PARTITION BY <…> ORDER BY <…>)
Window function to replace NULL values with cubic spline approximation
CROSS_CORR(col1,col2,N ORDER BY … ) The cross correlation function computes the correlation between two value columns for
a given number of lags
BINNING(col, name => val) OVER(…) Window function assigning input into bins using different algorithms.
RANDOM_PARTITION(n1,n2,n3,seed) OVER(…) Window function to assign input randomly to different sets (training/validation/test)
WEIGHTED_AVG(col,weight_array) OVER(…) Window function to compute a weighted moving average with the provided weight
values.
SERIES_FILTER(col,filter) OVER(…) A window function that applies filtering or smoothing. For example, exponential
smoothing or an autoregressive filter.
SERIES_FORECAST(model).{FITTED | LOW95 | HIGH95 | LOW80 | HIGH80} OVER (…) Forecast based on a model built using PAL.
© 2015 SAP SE or an SAP affiliate company. All rights reserved. 25Public
Analytic FunctionsFirst, Last, Nth Value Aggregate Functions
Changing the time granularity from days to monthsSAP Stock Price
SELECT min("date") as "date", first_value("open" order by "date") as "open",last_value("close" order by "date") as "close",max("high") as "high",min("low") as "low",sum("volume") as "volume"
FROM "I058576"."sap_stock_price"GROUP BY SERIES_ROUND("date", 'INTERVAL 1 MONTH', ROUND_DOWN)
© 2015 SAP SE or an SAP affiliate company. All rights reserved. 26Public
select distinct GF_ISIN,
TO_TIMESTAMP(GF_DATE || ' ' || ADD_SECONDS('09:00:00.000',CAST( CEIL(SECONDS_BETWEEN('09:00:00.000',GF_TIME) /
CEIL(1+SECONDS_BETWEEN('09:00:00.000','18:00:00.000' ) /200)) *
CEIL(1+SECONDS_BETWEEN('09:00:00.000','18:00:00.000' ) /200) as INTEGER))) AS bin_datetime,
FIRST_VALUE(GF_LAST) over (partition by GF_ISIN, TO_TIMESTAMP(GF_DATE || ' ' || ADD_SECONDS('09:00:00.000',CAST(
CEIL(SECONDS_BETWEEN('09:00:00.000',GF_TIME) /
CEIL(1+SECONDS_BETWEEN('09:00:00.000','18:00:00.000' ) /200)) *
CEIL(1+SECONDS_BETWEEN('09:00:00.000','18:00:00.000' ) /200) as INTEGER))) order by GF_ISIN, TO_TIMESTAMP(GF_DATE || ' ' ||
ADD_SECONDS('09:00:00.000',CAST( CEIL(SECONDS_BETWEEN('09:00:00.000',GF_TIME) /
CEIL(1+SECONDS_BETWEEN('09:00:00.000','18:00:00.000' ) /200)) *
CEIL(1+SECONDS_BETWEEN('09:00:00.000','18:00:00.000' ) /200) as INTEGER)))) as open_price,
max(GF_LAST) over (partition by GF_ISIN, TO_TIMESTAMP(GF_DATE || ' ' || ADD_SECONDS('09:00:00.000',CAST(
CEIL(SECONDS_BETWEEN('09:00:00.000',GF_TIME) /
CEIL(1+SECONDS_BETWEEN('09:00:00.000','18:00:00.000' ) /200)) *
CEIL(1+SECONDS_BETWEEN('09:00:00.000','18:00:00.000' ) /200) as INTEGER))) order by GF_ISIN, TO_TIMESTAMP(GF_DATE || ' ' ||
ADD_SECONDS('09:00:00.000',CAST( CEIL(SECONDS_BETWEEN('09:00:00.000',GF_TIME) /
CEIL(1+SECONDS_BETWEEN('09:00:00.000','18:00:00.000' ) /200)) *
CEIL(1+SECONDS_BETWEEN('09:00:00.000','18:00:00.000' ) /200) as INTEGER)))) as high_price,
min(GF_LAST) over (partition by GF_ISIN, TO_TIMESTAMP(GF_DATE || ' ' || ADD_SECONDS('09:00:00.000',CAST(
CEIL(SECONDS_BETWEEN('09:00:00.000',GF_TIME) /
CEIL(1+SECONDS_BETWEEN('09:00:00.000','18:00:00.000' ) /200)) *
CEIL(1+SECONDS_BETWEEN('09:00:00.000','18:00:00.000' ) /200) as INTEGER))) order by GF_ISIN, TO_TIMESTAMP(GF_DATE || ' ' ||
ADD_SECONDS('09:00:00.000',CAST( CEIL(SECONDS_BETWEEN('09:00:00.000',GF_TIME) /
CEIL(1+SECONDS_BETWEEN('09:00:00.000','18:00:00.000' ) /200)) *
CEIL(1+SECONDS_BETWEEN('09:00:00.000','18:00:00.000' ) /200) as INTEGER)))) as low_price,
LAST_VALUE(GF_LAST) over (partition by GF_ISIN, TO_TIMESTAMP(GF_DATE || ' ' || ADD_SECONDS('09:00:00.000',CAST(
CEIL(SECONDS_BETWEEN('09:00:00.000',GF_TIME) /
CEIL(1+SECONDS_BETWEEN('09:00:00.000','18:00:00.000' ) /200)) *
CEIL(1+SECONDS_BETWEEN('09:00:00.000','18:00:00.000' ) /200) as INTEGER))) order by GF_ISIN, TO_TIMESTAMP(GF_DATE || ' ' ||
ADD_SECONDS('09:00:00.000',CAST( CEIL(SECONDS_BETWEEN('09:00:00.000',GF_TIME) /
CEIL(1+SECONDS_BETWEEN('09:00:00.000','18:00:00.000' ) /200)) *
CEIL(1+SECONDS_BETWEEN('09:00:00.000','18:00:00.000' ) /200) as INTEGER)))) as close_price,
COUNT(GF_LAST) over (partition by GF_ISIN, TO_TIMESTAMP(GF_DATE || ' ' || ADD_SECONDS('09:00:00.000',CAST(
CEIL(SECONDS_BETWEEN('09:00:00.000',GF_TIME) /
CEIL(1+SECONDS_BETWEEN('09:00:00.000','18:00:00.000' ) /200)) *
CEIL(1+SECONDS_BETWEEN('09:00:00.000','18:00:00.000' ) /200) as INTEGER))) order by GF_ISIN, TO_TIMESTAMP(GF_DATE || ' ' ||
ADD_SECONDS('09:00:00.000',CAST( CEIL(SECONDS_BETWEEN('09:00:00.000',GF_TIME) /
CEIL(1+SECONDS_BETWEEN('09:00:00.000','18:00:00.000' ) /200)) *
CEIL(1+SECONDS_BETWEEN('09:00:00.000','18:00:00.000' ) /200) as INTEGER)))) as num_trades,
sum(GF_LAST_VOL) over (partition by GF_ISIN, TO_TIMESTAMP(GF_DATE || ' ' || ADD_SECONDS('09:00:00.000',CAST(
CEIL(SECONDS_BETWEEN('09:00:00.000',GF_TIME) /
CEIL(1+SECONDS_BETWEEN('09:00:00.000','18:00:00.000' ) /200)) *
CEIL(1+SECONDS_BETWEEN('09:00:00.000','18:00:00.000' ) /200) as INTEGER))) order by GF_ISIN, TO_TIMESTAMP(GF_DATE || ' ' ||
ADD_SECONDS('09:00:00.000',CAST( CEIL(SECONDS_BETWEEN('09:00:00.000',GF_TIME) /
CEIL(1+SECONDS_BETWEEN('09:00:00.000','18:00:00.000' ) /200)) *
CEIL(1+SECONDS_BETWEEN('09:00:00.000','18:00:00.000' ) /200) as INTEGER)))) as bin_vol
from RAP_USER.GF_TICKS
where GF_TIME >= '08:59:59.999'
and GF_TIME <= '18:00:00.001'
and GF_DATE ='2012-01-13'
and GF_LAST_VOL > 0
and GF_ISIN = 'DE0007164600'
order by GF_ISIN, bin_datetime;
Query without series feature
Same query with series feature
SELECT min("date") as "date", first_value("open" order by "date") as "open",last_value("close" order by "date") as "close",max("high") as "high",min("low") as "low",sum("volume") as "volume"
FROM "I058576"."sap_stock_price"GROUP BY SERIES_ROUND("date", 'INTERVAL 1 MONTH', ROUND_DOWN)
Analytic FunctionsFirst, Last, Nth Value Aggregate Functions
© 2015 SAP SE or an SAP affiliate company. All rights reserved. 27Public
Analytic FunctionsCubic Spline Approximation
Replacement of null values by interpolating
the gaps and extrapolating any leading or
trailing null values.
Interpolation can be done by
Linear interpolation
Cubic spline interpolation
SELECT "ts", "temperature", linear_approx("temperature") OVER (ORDER BY "ts"), cubic_spline_approx("temperature") OVER (ORDER BY "ts")
FROM "weather"
© 2015 SAP SE or an SAP affiliate company. All rights reserved. 28Public
Analytic FunctionsAuto Correlation and Cross Correlation
Series data function used to find periodic
pattern in the data, like seasonality.
Auto-correlation looks for periodicity between
values of the same series as a function of the
time lag between them.
Cross-correlation looks for periodicity between
values of different series as a function of the
time lag between them
SELECT corr, ordinality AS lagFROM unnest((
SELECT auto_corr(temperature, 1000 ORDER BY ts)FROM weather
)) WITH ORDINALITY AS tt(corr)
© 2015 SAP SE or an SAP affiliate company. All rights reserved. 29Public
Analytic FunctionsWeighted Moving Average
Data smoothing via weighted moving average
with linearly decreasing weights.
Window frame defines the smoothing window.
SELECT "ts", "temperature",weighted_avg("temperature") OVER (ORDER BY "ts" ROWS BETWEEN 7 PRECEDING AND CURRENT ROW)
FROM "weather"
© 2015 SAP SE or an SAP affiliate company. All rights reserved. 30Public
Analytic FunctionsFiltering of Series Data
Filter function for different filter method
Exponential smoothing
Autoregressive and moving average filter
In SPS10 available
Single exponential smoothing
Double exponential smoothing
PAL functions integrated into series data.
Support for smoothing and forecasting.
-- single exponential smoothing with a smoothing parameter alpha = 0.2select "ts", "temperature",
series_filter(value => "temperature", method_name => 'SINGLESMOOTH', alpha => 0.2)OVER (ORDER BY "ts") AS SINGLESMOOTH
FROM "weather"
© 2015 SAP SE or an SAP affiliate company. All rights reserved. 31Public
Analytic FunctionsBinning
Binning assigns data values to bins.
Different binning methods
Number of equal width bins
Width of the bins
Number of bins with equal number of records
Number of standard deviations left and right from the mean
PAL function integrated into series data
0
5
10
15
20
25
30
35
1 2 3 4 5 6 7 8
-- compute histogramSELECT bin_number, count(bin_number) as cntFROM (
SELECT binning(value => "open", bin_count => 8) OVER (ORDER BY "date") AS bin_numberFROM "I058576"."sap_stock_price"
)GROUP BY bin_number
© 2015 SAP SE or an SAP affiliate company. All rights reserved. 32Public
Analytic FunctionsRandom Partition
Partitioning divides the input data into three sets, a
training, a validation, and a test set that are used in
machine learning.
Support for
Random partitioning
Stratified partitioning
PAL function integrated into series data
-- stratified partitioning with fractional partition sizes (70% training, 20% validation, 10% test)SELECT *,
random_partition(0.7, 0.2, 0.1, 42) OVER (PARTITION BY "weather_station") AS "PARTITION"FROM "weather"
© 2015 SAP SE or an SAP affiliate company. All rights reserved. 33Public
Analytic FunctionsDiscrete Fourier Transform
Discrete Fourier transforms are used in spectral
analysis of series data, e.g. in vibration analysis.
Computation uses the FFT algorithm and returns
Amplitude / phase
Real part / imaginary part
SELECT ordinality AS "frequency", "amplitude"/4096 AS "amplitude"FROM unnest ((
SELECT dft("amplitude", 4096 order by "ts").amplitudeFROM "vibration"
)) WITH ORDINALITY AS tt(amplitude)
© 2015 SAP SE or an SAP affiliate company. All rights reserved. 34Public
Analytic FunctionsMiscellaneous Updates
MEDIAN as window function with arbitrary window frames
CORR_SPEARMAN for character columns
Aggregate functions in the series library
• Standard deviation (sample and population)
• Variance (sample and population)
• Co-Variance (sample and population)
© 2015 SAP SE or an SAP affiliate company. All rights reserved.
Thank you
Contact information
Raj Rathee
SAP HANA Product Management