tsinghua university: two exemplary applications in china

Big Machine Data - Two Exemplary Applications in China

Jianmin Wang

Tsinghua University

Beijing, China

Agenda

• Background

• Two Exemplary Applications in China

Who we are?

• Institute for Data Science,

Tsinghua University

• Founded in April 2014

• Missions & Status Quo

– Recruiting world-class researchers and engineers from industry and academia

– Long-term dedication to system research and industry practice

– Leading China’s big data strategy, especially for industrial big data

BIG data

Big Data Landscape

People generated

2

1

3

Computer generated

Machine generated

Machine Generated Data

• Broadly exist– Industrial business

– Agriculture

– Utility

– Military

– Smarter City

– Logistics

– Smart devices

– Science research

Data Rate

24*7, up to million data points/s, and millions of devices

Data Type

Mostly are time-series, temporal sequence, and spatial-temporal and array data

Data Usage

Real-time processing.

From monitoring to content, shape, signal based query and analysis

Industrial businesses have entered the era of “big data”, but the real challenge is

how to extract value from data.

Machine generated data is the core of industrial big data

Big Machine data is beyond 3Vs

Our research spans big data lifecycle

Storage1 Access & Exploration3

Preprocessing2 Modeling & Analytics4

Agenda

• Background

• Two Exemplary Applications in China

1 Industrial Sensor Data Management:Cassandra at China Sany Group

2 Climate Data Management:Cassandra at China Meteorological Administration

9© 2015. All Rights Reserved.

China Sany Group


More than 200K active engineering machineriesIn more than 150 countries

SANY Group is a global company in the

construction machinery industry.

In 2011, SANY became China’s unique

company listed among the world’s top

500 companies in the construction

machinery industry.

Pipeline of Industrial Sensor Data Processing

© 2015. All Rights Reserved. 11

Internet

三一运动控制器SYMC

三一工业显示屏SYLD

三一移动终端SYMT

产品主控制柜

基于SCP协

议包

车辆工况数据

无线基站

无线到有线指定IP与端口

快反工程师资料工程师

...

用户计算机服务人员业主

1

2

34

execute

collect

decision

transfer

The data records the operational

statuses of the machineries

5000 kinds of sensors

50 billion records per year

2008

• Start

managing

sensor data

2010

• 60k

machineries

2012

• 80k

machineries

• Can only

support 6

month data

online

2014

•>100k

machineries

•All data

online

2020

•>500K

machineries

•>10K users

Technology Roadmap in Sany Group


SQL Server

➡ Oracle

Oracle➡Cassandra

Why Cassandra?•Cost performance

•Scalability

•P2P Architecture

Operation Worst case Average

case

Write 30% 2x

Query 22.6% 10x

Software Stack of Sensor Data Management


Collect Store Analyze

Storm

设备（主键）

工况 1（列族 1）工况 2（列族 2） ……

接收时间 1

（列 1）

接收时间 2

（列 2） ……

接收时间 1

（列 1）

接收时间 2

（列 2） …… ……

设备 1 监测值监测值 …… 监测值监测值 …… ……

设备 2 监测值监测值 …… 监测值监测值 …… ……

…… …… …… …… …… …… …… ……

Map/Reduce

row

key

sensor1(cf1) sensor2(cf2)

device2

received

time1received

time2received

time1received

time2

device1 value

value

value

value

value value

value value

Structured Storage

gath

er

tim

e

Cassandra Storage:

machin

e

gather time

sensors

。。。

。。。

Schema Design – Row and Column

• Use sensor as Column Family (CF)

• In each Column Family (CF)

– Use as the row key

– Use as the column name

– Use as the column value

– Columns of each row are sorted in advance

– The number of columns is readily increasable

ma

ch

ine

gather time

。。。

ma

ch

ine

gather time

。。。

…

sensor1 sensor2 sensorN

~5000

sensors

5000+ column families

Cassandra v1.2

CQL2 (not CQL3)


Why 5000+ Column Families?

• Cassandra V1.* does not support multiple primary key & clustering key

• This makes programming more complex

• Manually split the row key or column name

• All the data in one SSTABLE belongs to a specific CF

• When querying a specified sensor, we need not scan unnecessary data

Row Key Column Name

machine_id sensor_id : gather_time

Row Key Column Name

sensor_id : machine_id gather_time

Cassandra v1.2

CQL2 (not CQL3)

Challenge 1 – Creating Schema Hang

• Problem– Create 5000+ CFs in batch

– Creation cost increases dramatically

© 2015. All Rights Reserved.

0

5000

10000

15000

1

28

55

82

109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

514

541

Tim

e C

os

t（m

s）

CF Serial Number

Time Cost Create 1

CF: 10s

Create 1

CF: 0.1s

• Root Cause– Protocol Conflict

• Between Gossip Protocol and Request

Propagation Mechanism

– Message Overhead

• May transform the whole schema instead of

the changed part

ReceiveSchemaMessage

Memory Cost

SendSchemaMessage

Memory CostTotal

N1 4.465G 4.236G 8.70G

N2 4.308G 4.907G 9.21G

N3 4.236G 4.024G 8.26G

N4 4.808G 4.387G 9.19G

N5 6.111G 6.373G 12.48G

Memory used by Gossip

Challenge 1 – Creating Schema Hang

• Solution– Gossip takes effect only when:

• Propagation messages lost/timeout

• Nodes recovered from a failure

– Creation time cost can keep constant17

Propagate

LOADSTATUSSHCEMAVERSION

...

LOADSTATUS

SHCEMA 延迟：t秒VERSION

...

metadata metadata

Delay T sT strategy:

1 2

34

Adaptive Lazy Gossip

3 4

Challenge 2 – Balancing Consistency &

Throughput

• Production environment– Sany production: 5 nodes cluster,

2x4 cores 64GB

• Problem

– Throughput = 200K data points/sec

– 75% data is written successfully only

in one replica, while the other

replicas are stale (inconsistent)

• Cassandra is NOT very consistent

• Big obstacle for query operation

– Repair is required, but is very slow


Experiment on Amazon EC2

2 cores, 8GB, 5 nodes

rywc: read your write consistency

Challenge 2 – Balancing Consistency &

Throughput

• Root Cause of slow Repair

– Too many column families (5000+)

– Too many ranges in the consistent

hashing ring

• 256 virtual nodes (VN) per physical

node

• Too many merkle trees (ranges x CFs)

• Experience and Suggestions– Repair CFs and ranges one by one

• Do not repair the whole keyspace (all

CFs) at once

– Repair the important CFs first

– Perform repair at light workload


- 5 physical nodes

- each has 2 VNs

- 10 ranges in total

For each range and each CF, create merkle

tree and compare them between two nodes

Challenge 3 – Heterogeneous Nodes

• Problem– How to assign the data partitions

in a heterogeneous cluster?

• Experiment Study

– Deploy a heterogeneous cluster

• 2 powerful servers and 8 PCs

– Throughput performance

• Heterogeneous cluster cluster

only with the 2 powerful servers


Assign the position of the nodes (i.e. Tokens) in

the ring according to their computing capacities


• Root Cause

– The replica mechanism makes the

unbalanced problem complicated

• Each Node’s configurations may impact

other nodes’ performance

– The Virtual Node (VN) mechanism

cannot fit all scenarios

• Too many VNs make the lookup table

too big and slow down repair speed

• Max #VNs in a physical node is 1536

(restricted by Cassandra source code)


The capacity of N1 is the worst, and E is short

But N1 is responsible for many data records

to the cluster:

• N5 finish the operation quickly

• But N5 has to wait for N1, which is slow


• Solution

– Initialize the cluster properly

• Use Quadratic Optimization

(QP) to find the best positions

of the (virtual) servers

• Has been deployed to China

Sany Group successfully

– Scaling out the cluster

• Use a dynamic algorithm to

find the best positions for the

new added server


Scaling out: Xiangdong Huang, Jianmin Wang et al. Optimizing Data Partition for Scaling out NoSQL Cluster. Concurrency and Computation: Practice and Experience (Early View)

Scaling out: find the best position

Optimize:

1. the order of the nodes in the ring

2. the range length of the ring

Datasets & Results in China Sany Group

• 5000+ column families for sensor data

• 100K+ engineering machineries

• Amount of historical data loaded – From 2012.4 to now

• Data size– Tens of billions operational statuses records

– Several billion GPS data

– Write throughput– 5 nodes (2*4 cores CPU, 64GB memory, 9TB Disk)

– 20K TPS as regular workload, 200K at peak

23

Industrial Big Data Platform: More Requirements

——Beyond Sany Applications

High frequency sensor

High volume sensors

10+ M data point/second

Time and value based query

Richer set of analytical queries

<1 Second response

Edge synchronization

Compression, out-of-order, retransmission

Different data, different algorithms

Transparent to query

Deep compression to historical data

Spatial-temporal index

Trajectory based queries

Even higher

throughput

Native time-series

query

Synchronization

Adaptive deep

compression

Moving object

support

Industrial Data Analysis Pipeline


Boolean value

Status values

Analogue value1.046Billion

Basic indicator

8030

Baseline

1.046Billion

Variance

Sp

ecific

featu

res

Co

mm

on

fea

ture

sO

utlie

rs

Specifie

dop

era

tion

al

sta

tuse

s d

ata

General

count baseline variance

frequency baseline variance

..

Analogue

average baseline variance

variance baseline variance

extremum baseline variance

…

Boolean

times baseline variance

duration baseline variance

…

States

Changestimes

baseline variance

duration baseline variance

…

Driver profile

Hydraulic oil

temperature analysis

Temporal parameter

analysis for vehicle start Parameter correlation

Spatial analysis

for failure

ServiceQualityControlR&D

Key components

anomaly detection

Industry Practice – Value-Added Analytics

horizontal

inclination

angle

Concrete pump truck’s tip-over is mainly caused by insufficient leg’s cylinder

support, which is a major issue of production safety

Big Data Application 1

—Concrete Pump Truck Tip-over Detection



Fast spot and prevent dangerous operation through group behavior

analysis of concrete pump trucks

The overall distribution of horizontal (X-axis) & vertical (Y-axis)

inclination angle of concrete pumps

Unstable instances

Idle instances

Inclination angle vibration

level filter

Inclination angle distribution of

individual concrete pump

Idle instances:

unplugging operation leads to

malfunctioning

Unstable instances:

Early degradation pattern of cylinder

Typical instances:

stable oscillation

Data driven anomaly and potential accidents detection




—Fault Diagnosis

Investigation proved that salt-spray environment and the water quality

along the seaside caused the corrosion of cylinder’s potted component

Via time series pattern analysis and spatial correlation, leakage problem of master

cylinder is highly correlated with a high-speed rail construction project.

Hangzhou-Shenzhen high-speed rail

Salt-spray corrosion environment


- Spare Components Demand Forecasting

• Traditional approach is

based on marketers’

experience

• New approach

– Combining the real-time data from

machines, sale history, holdings of

vehicles, environment and GDP,

etc.

• Result

– Reduce half of inactive spare part

inventory

0

50

100

150

200

250

中旬

下旬

上旬

中旬

下旬

上旬

中旬

下旬

上旬

中旬

下旬

上旬

中旬

下旬

上旬

中旬

下旬

上旬

中旬

下旬

上旬

中旬

下旬

上旬

中旬

下旬

2012/10 2012/11 2012/12 2013/1 2013/2 2013/3 2013/4 2013/5 2013/6

配件

需求量数

量/个

实际备件需求量基于矩阵分解的多地区协同备件预测结果企业实际备货量

The predicted result fits the actual

demand better

Sp

are

part

sn

um

ber

Actual demand Actual preparedResults of Multi-Region Collaborative

Spare Components Prediction Based

on Matrix Factorization

1 Industrial Data Management:Cassandra at China Sany Group

2 Climate Data Management:Cassandra at China Meteorological Administration


T639

win

dfie

ld

tem

pera

ture

field

hum

idity

rain

fall

snow

fall

…...

model

Ground

Aerological

Satellite

Radar

Lightning

Typhoon

850Pa

800Pa

……

900Pa

tem

pera

ture

8AM, 3h

8AM, 6h

…

8PM, 3h

8PM, 6h

Characteristics of Climate Data

Challenge in Meteorological Application

—Pattern Data


• Hierarchical pattern data + flat others

• A highly-efficient data-deliver system for end users

– Support millions of small files

– Access data fast

– Scan data in various order

• Performance requirement

– Get ~1MB data in 50ms

– 600 concurrent clients

/

T639d1 ...

windtemper ...d2

d3

d4

d5

800 850 900

2014.2.18.08

2014.2.18.20

2014.2.19.08

3 6 9

...

... ... ...

2014.2.18.08

...

...

2014.2.18.08

...

... ...3 3 3 3

t1t2t3 t4 t5 t6 t7

d3

Why Cassandra?

• Scalability

• Fast read/write data

• Some columns are sorted

– Easy to scan data sequentially

• Time-based Compaction (>=Cassandra v2.0) for time series


key 3h 6h 9h …

T639/temperature/800Pa file file file …

1. Get the data where key=‘T639…/800Pa’2. Retrieval the data before 6h

Or retrieval the data after 6h

key 3h 6h 9h 12h … 3h 6h 9h

T639/temperature/800Pa file file file file … file file file

Solution – Schema Design for Pattern Data

• Data items

– 5-tuple

– Pattern and variable are disordered

– Level, time, ageing are ordered


time

level

ageingData space

(pattern, variable)

ColumnFamily

Row key

Column

/

T639d1 ...

windtemper ...d2

d3

d4

d5

800 850 900

2014.2.18.08

2014.2.18.20

2014.2.19.08

3 6 9

...

... ... ...

2014.2.18.08

...

...

2014.2.18.08

...

... ...3 3 3 3

t1t2t3 t4 t5 t6 t7

Performance Results

• 10 servers: 2*4 cores of CPU, 64GB memory, 9TB SAS Disk

• Store 7 kinds of model data

– 16TB per day

• Get data quickly

– 100 times faster than the older

system


Thank you