a method for dynamic packing of data blocks for over-the-network indexing

22
PDF: http://bit.do/150928 A method for

Upload: kyushu-institute-of-technology

Post on 15-Apr-2017

224 views

Category:

Technology


1 download

TRANSCRIPT

PDF: http://bit.do/150928

A method for

The Stringex Viewpoint (over-the-network)

Data

Indexer

Index

Network

Traditional Client

Data

Indexer

Index Read, Write

Stringex Client

The

01 myself+0 "A New Practical Design for Browsable Over-the-Network Indexing" ISEEE (2014)

M.Zhanikeev -- [email protected] -- A Method for Dynamic Packing of Data Blocks for Over-the-Network Indexing -- http://bit.do/150928 2/222/22

The Old Stringex

M.Zhanikeev -- [email protected] -- A Method for Dynamic Packing of Data Blocks for Over-the-Network Indexing -- http://bit.do/150928 3/223/22

The Stringex Engine

Stringex

Index

Stringex Client

The

Sync Engine

Optimization

Local Cache

Check 1 2

Use

01 myself+0 "A New Practical Design for Browsable Over-the-Network Indexing" ISEEE (2014)

M.Zhanikeev -- [email protected] -- A Method for Dynamic Packing of Data Blocks for Over-the-Network Indexing -- http://bit.do/150928 4/224/22

Stringex v1 : DesignJSON { name : value1, age : value2, …}

Hash table 000 [ ] 001 … …

#1 #2 … [ ]

Doc # JSON data a123d … 53ffe3 { name: value1, age: value2, …} …. ….

Per JSON key …

hashing

Bit mask

Doc # Doc #

Cloud storage

Local storage

Realtime Sync name .block1

Block

Block

name .block2

age .block1

… age .block2

docs .block1

… docs .block2

Cloud Drive API App Space

• load balancing, basically• fixed-size blocks permeta key and docs

• the engine is in charge ofminimizing traffic

M.Zhanikeev -- [email protected] -- A Method for Dynamic Packing of Data Blocks for Over-the-Network Indexing -- http://bit.do/150928 5/225/22

Stringex v1 : Design (2)INPUT JSON { name : value1, age : value2, …}

Files

… name .imap { ‘bk ’: { ‘ ik’: ‘ start,end ’ , … next ‘ik’ }, … next bk } name .vmap { ‘value’: ‘ bk’, … next value } name .bk1 name .bk2 …

Key: name

Key: age docs .imap { ‘bk ’: { ‘docid ’: ‘ start,end ’ , … next ‘docid ’ }, … next bk }

docs .bk1 docs .bk2 …

Docs

No . vmap

Same Same

Index Data

• blocks aggregated by prefixes ofmd5 hashes

• some JSON structure in .imapfiles with positions for partial filereading

M.Zhanikeev -- [email protected] -- A Method for Dynamic Packing of Data Blocks for Over-the-Network Indexing -- http://bit.do/150928 6/226/22

Stringex v1 vs Lucene

3.15 3.85 4.55 5.25 5.95 6.65Index Size (log)

2.55

2.65

2.75

2.85

2.95

3.05

3.15

3.25)cod/setyb fo gol( tuphguorhT

Lucene

Stringex

Normal operation

Need to improve this part

M.Zhanikeev -- [email protected] -- A Method for Dynamic Packing of Data Blocks for Over-the-Network Indexing -- http://bit.do/150928 7/227/22

The New Stringex

M.Zhanikeev -- [email protected] -- A Method for Dynamic Packing of Data Blocks for Over-the-Network Indexing -- http://bit.do/150928 8/228/22

Stringex vs : Visual Idea

• variable-size blocks

• possibly, variable depth as well --hierarchical layering

• unfortunately, not in this version -- couldnot figure out how to make it work in practice

• ... but a distant goal, anyway

M.Zhanikeev -- [email protected] -- A Method for Dynamic Packing of Data Blocks for Over-the-Network Indexing -- http://bit.do/150928 9/229/22

Stringex v2 : Variable Blocks

prefixmin prefixmax keyorder

global config

1 3 authors, title, pages

Example values

Cloud storage

meta.a.a.a

Three keys

meta.a.af.a …

meta.z.z.z

Update

docs.a.a.a …

docs.z.z.z

Stringex Client

The

Background (lazy) processing

• the biggest change : variableprefix length = variableblock size

• all metadata is togethernow, but order is important --encoded in filenames

M.Zhanikeev -- [email protected] -- A Method for Dynamic Packing of Data Blocks for Over-the-Network Indexing -- http://bit.do/150928 10/2210/22

Stringex v2 : LogicThe idea is...

...the same -- to minimize traffic between client andcloud

Stringex Client

The

JSON { name : value1, age : value2, …}

caching hashing

Sync Engine

Fill gaps ‘0’ prefix

My own recent?

Timeout passed?

no Get cache Index

no

Get large block

yes

Small block still there?

if failed

Get small block try

• since metadata order is infilename, zero prefix isimportant -- gap filling

• local cache can help toavoid syncing recent files

• longer prefix at cloudside means smallerfiles

M.Zhanikeev -- [email protected] -- A Method for Dynamic Packing of Data Blocks for Over-the-Network Indexing -- http://bit.do/150928 11/2211/22

Analysis

M.Zhanikeev -- [email protected] -- A Method for Dynamic Packing of Data Blocks for Over-the-Network Indexing -- http://bit.do/150928 12/2212/22

Analysis : Components

• real life tests using the new client• hotspot distribution defines access frequency• parameters: file count, cache ratio, hotspot class

M.Zhanikeev -- [email protected] -- A Method for Dynamic Packing of Data Blocks for Over-the-Network Indexing -- http://bit.do/150928 13/2213/22

Analysis : Hotspot TraceHotspot distribution...

...consists of normal, popular, and hot/flash sets

0 20 40 60 80 100

Decreasing order

0

0.35

0.7

1.05

1.4

1.75

2.1

2.45

2.8

log(

val

ue)

Class A Class B Class C Class D Class E

• common in CDN today

• top 5% of content is hot/flash• top 20% of content is popular• the rest are normal

M.Zhanikeev -- [email protected] -- A Method for Dynamic Packing of Data Blocks for Over-the-Network Indexing -- http://bit.do/150928 14/2214/22

Analysis : Raw Trace

• logs from an actual run• regularly take snapshots of filesystem state, keep track of access count inthe background

• can be used to visualize the details of operation

M.Zhanikeev -- [email protected] -- A Method for Dynamic Packing of Data Blocks for Over-the-Network Indexing -- http://bit.do/150928 15/2215/22

Analysis : Visualizationtime#725 class#E files#100 topn#0.1 dynamics(depth 4 >> 3@5 2@25 1@100)

• visualized snapshot

• 100 files, 10% incache (cloud side),4-stage dynamics

M.Zhanikeev -- [email protected] -- A Method for Dynamic Packing of Data Blocks for Over-the-Network Indexing -- http://bit.do/150928 16/2216/22

Analysis : Visualizationtime#532 class#B files#200 topn#0.5 dynamics(depth 4 >> 3@5 2@25 1@100)

• 200 files, 50% incache at cloud side

• same dynamics asbefore

M.Zhanikeev -- [email protected] -- A Method for Dynamic Packing of Data Blocks for Over-the-Network Indexing -- http://bit.do/150928 17/2217/22

Analysis : Visualizationtime#23 class#E files#200 topn#0.5 dynamics(depth 4 >> 1@5)

• yet another set, earlyin process (time 23)

M.Zhanikeev -- [email protected] -- A Method for Dynamic Packing of Data Blocks for Over-the-Network Indexing -- http://bit.do/150928 18/2218/22

Analysis : Visualizationtime#845 class#B files#500 topn#0.25 dynamics(depth 4 >> 3@5 2@25 1@100)

• very deep time,mostly settled

M.Zhanikeev -- [email protected] -- A Method for Dynamic Packing of Data Blocks for Over-the-Network Indexing -- http://bit.do/150928 19/2219/22

Analysis : Visualizationtime#451 class#D files#500 topn#0.25 dynamics(depth 4 >> 3@5 2@25 1@100)

• also deep time, alsosettled but this timeinto 2 islands

M.Zhanikeev -- [email protected] -- A Method for Dynamic Packing of Data Blocks for Over-the-Network Indexing -- http://bit.do/150928 20/2220/22

Goal: Yet More Flexibility?

Blocksize

Client

Index

Index > Client

?

M.Zhanikeev -- [email protected] -- A Method for Dynamic Packing of Data Blocks for Over-the-Network Indexing -- http://bit.do/150928 21/2221/22

That’s all, thank you ...

M.Zhanikeev -- [email protected] -- A Method for Dynamic Packing of Data Blocks for Over-the-Network Indexing -- http://bit.do/150928 22/2222/22