"building data streams" Константин Евтеев (avito)

27
Strictly Confidential Strictly Confidential Building data streams Константин Евтеев [email protected]

Upload: avitotech

Post on 18-Feb-2017

243 views

Category:

Internet


1 download

TRANSCRIPT

Page 1: "Building data streams" Константин Евтеев (Avito)

Strictly ConfidentialStrictly Confidential

Building data streams

Константин Евтеев[email protected]

Page 2: "Building data streams" Константин Евтеев (Avito)

2Strictly Confidential 2Strictly Confidential 2

CRUD

read

DBcreate

update

delete

read

read

read

read

read

Page 3: "Building data streams" Константин Евтеев (Avito)

3Strictly Confidential 3Strictly Confidential 3

1 Оптимизируем процедуры выборки

2 Вертикальное и функциональное масштабирование

3 Вводим в бой стендбаи бинарной репликации

4 Денормализуем данные для чтения

5 Шардирование

6 Денормализованные данные выносим на отдельные машины

Optimization steps

Page 4: "Building data streams" Константин Евтеев (Avito)

4Strictly Confidential 4Strictly Confidential 4

C(R)UD & R

read

DBcreate

update

delete

read

read

read

read

read

DB-SB

Page 5: "Building data streams" Константин Евтеев (Avito)

5Strictly Confidential 5Strictly Confidential 5

movies

idnamelength_minutesrating_id

movie_showtimes

idmovie_idtheatre_idroomstart_time

theatres

idnamephone_number

purchased_tickets

idconfirmation_codepurchase_price_cents

auditoriums

theatre_idroomseats_available

orders

confirmation_codemovie_showtime_idmovie_idtheatre_idauditorium_idroomstart_time

Денормализуем данные для чтения

mat_view

movie_namemovie_rating_idmovie_length_minutesmovie_id,theatre_id,room,start_time,theatre_name,seats_availble...

http://www.slideshare.net/pavlushko/sphinx-10460333http://www.pgcon.org/2008/schedule/attachments/64_BSDCan2008-MaterializedViews-paper.pdfhttp://www.pgcon.org/2008/schedule/attachments/63_BSDCan2008-MaterializedViews.pdf

Page 6: "Building data streams" Константин Евтеев (Avito)

6Strictly Confidential 6Strictly Confidential 6

It looks like

1 Command and Query Responsibility Segregation (CQRS) или

Command-query separation (CQS)

2 Event Sourcing

3 Eventual consistency

4 Lambda Architecture

5 Kappa Architecture

http://lambda-architecture.net/http://milinda.pathirage.org/kappa-architecture.com/

Page 7: "Building data streams" Константин Евтеев (Avito)

7Strictly Confidential 7Strictly Confidential 7

DBCUD

(CRUD)mv1

Field1 - fieldN

DB-SBR DBN-

CUD(CRUD)

DB-SBR

mvN

Field1 - fieldN

DB-SBRmv1

Field1 - fieldN

DB-RE-01*2

mv2

Field1 - fieldN

mv1

Field1 - fieldN

DB-RE-N*2

mv6

Field1 - fieldN

mv3

Field1 - fieldN

DBN-SBR

DBN-SBR

node1 node2

node3 node3

PL/Proxy cluster

dwhData streams

Page 8: "Building data streams" Константин Евтеев (Avito)

8Strictly Confidential 8Strictly Confidential 8

Выгрузка по времени

+ работает− есть вопросы

Page 9: "Building data streams" Константин Евтеев (Avito)

9Strictly Confidential 9Strictly Confidential 9

Выгрузка данных batch-ами

Источник Буфер ПриемникТранспорт транспорт

#pg_current_xlog_insert_location() #Get current transaction log insert locationmaster_pos=$(psql_ping_db ­c "select force_wal(pg_current_xlog_insert_location())")

#pg_xlog_location_diff(location text, location text)  numeric #Calculate the difference between two transaction log locationspsql ­c "select pg_xlog_location_diff(pg_last_xlog_replay_location(), '${master_pos}')")

https://github.com/eshkinkot/pgday2016/tree/master/ping_DB

Page 10: "Building data streams" Константин Евтеев (Avito)

10

Strictly Confidential 10

Strictly Confidential 10

Ticker & pgq

2008 год! https://www.pgcon.org/2008/schedule/attachments/55_pgq.pdf

https://www.postgresql.org/docs/9.4/static/functions-info.html#FUNCTIONS-TXID-SNAPSHOT

https://github.com/markokr/skytools/tree/skytools_2_1_stable/python

https://github.com/markokr/skytools/blob/skytools_2_1_stable/sql/pgq/functions/pgq.ticker.sql

http://www.pgcon.org/2009/schedule/attachments/91_pgq.pdf

Page 11: "Building data streams" Константин Евтеев (Avito)

11

Strictly Confidential 11

Strictly Confidential 11

MVCC (https://www.postgresql.org/docs/8.3/static/release-8-3.html)

Add several txid_*() functions to query active transaction IDs (Jan)This is useful for various replication solutions.

Page 12: "Building data streams" Константин Евтеев (Avito)

12

Strictly Confidential 12

Strictly Confidential 12

How to select txids that are between snapshots

xmin1 xmax1

xmax2

xmax1

xmin2

Page 13: "Building data streams" Константин Евтеев (Avito)

13

Strictly Confidential 13

Strictly Confidential 13

Все изменения данных под единой блокировкой объекта(row level)

ItemsCategory

params values

moderation

doublets

Fee packages

UsersContact

information

shops, company

Afraud

payments

User lock

Item lock

Payment data

Items data

Users data

Под блокировкой понимаем цепочку блокировок:select item_id from items for update;

Page 14: "Building data streams" Константин Евтеев (Avito)

14

Strictly Confidential 14

Strictly Confidential 14

Сессионные переменные “signal”

1. Достаточно дешево по ресурсам

2. Не нужно разбирать цепочку вызовов процедур, и добавлять входные/выходные параметры

Достоинства:

Page 15: "Building data streams" Константин Евтеев (Avito)

15

Strictly Confidential 15

Strictly Confidential 15

Сессионные переменные, групповые действия, подводные камни …

В 1 транзакции может меняться несколько объектов, а событие относится не ко всем. Решение:Используем массив по unique key array ‘{}’

Page 16: "Building data streams" Константин Евтеев (Avito)

16

Strictly Confidential 16

Strictly Confidential 16

Init remote mv or counterhttps://www.depesz.com/2016/06/14/incrementing-counters-in-database/

1) Cоздать временную таблицу на стороне подписчика

2) Создать принимающую процедуру

● временную для инициализации

● реальную

3)Начинаем слать события

4)Запускаем инициализацию со стендбая(дождаться прихода

события, с которого начали заполнять временную таблицу)

5)Переключаем на реальную процедуру приема

6)Пересчитываем под блокировкой через очередь

Page 17: "Building data streams" Константин Евтеев (Avito)

17

Strictly Confidential 17

Strictly Confidential 17

Init remote mv or counter

items

item_iduser_idcategory_idlast_update_txtime

CONSTRAINT TRIGGER items_delta  AFTER INSERT OR UPDATE OR DELETE

providerprovider subscriber

pgq

user_items_cnt

user_idcategory_idcntdate

tmp_user_cnt

user_id

1 save data for recount after init

2 copy

3 switch accept to real table

https://github.com/eshkinkot/pgday2016/tree/master/remote_cnt

Page 18: "Building data streams" Константин Евтеев (Avito)

18

Strictly Confidential 18

Strictly Confidential 18

Table and trigger function on provider's side

Page 19: "Building data streams" Константин Евтеев (Avito)

19

Strictly Confidential 19

Strictly Confidential 19

Tables on subscriber's sideAccept function

Page 20: "Building data streams" Константин Евтеев (Avito)

20

Strictly Confidential 20

Strictly Confidential 20

Provider init function

Page 21: "Building data streams" Константин Евтеев (Avito)

21

Strictly Confidential 21

Strictly Confidential 21

Subscriber's init functions

Page 22: "Building data streams" Константин Евтеев (Avito)

22

Strictly Confidential 22

Strictly Confidential 22

Maintenance

Page 23: "Building data streams" Константин Евтеев (Avito)

23

Strictly Confidential 23

Strictly Confidential 23

Real time sphinx index

Main DB

mat_view_index

Idfield1….filedN

DB-RE-INDmat_view_index

Idfield1….filedN sphinx

londiste

Select full reindex

pgq pgq consumer

Page 24: "Building data streams" Константин Евтеев (Avito)

24

Strictly Confidential 24

Strictly Confidential 24

Persistent queue

table1

Idfield1….filedN

pgq

Table1 1

Idfield1….filedNtick_idcall_id

Table1 n

Idfield1….filedNtick_idcall_id

Lock freerotation

complex object tbl 1

Idfield1….filedNtick_idcall_id

complex object tbl n

Idfield1….filedNtick_idcall_id

Lock freerotation

Complex object part 1

Idfield1….

Complex object part 1

Idfield1….

Complex object part 1

Idfield1….

deferred trg

deferred trg

Page 25: "Building data streams" Константин Евтеев (Avito)

25

Strictly Confidential 25

Strictly Confidential 25

Особенности согласования данныхEventual consistency

1 Read with ver num

master

Avito_delta

ClientAvito_delta

reserv

3 (if 2) write due to delay possible in pgq

pgq

2 consumers

2 write with check version

Page 26: "Building data streams" Константин Евтеев (Avito)

26

Strictly Confidential 26

Strictly Confidential 26

Восстановление после аварий

providersb

subscriber

providerundo

provider

subscriber

subscriber

subscriber

1 crash

3 apply undo

2 promote

4 start consumer

3 copy

1 crash

2 stop consumer

4 Add new consumer

5 start consumer

Page 27: "Building data streams" Константин Евтеев (Avito)

White Gardens Business Center, 7 Lesnaya street, Moscow, 125047, www.avito.ruWhite Gardens Business Center, 7 Lesnaya street, Moscow, 125047, www.avito.ru

Thank you foryour attention!

White Gardens Business Center, 7 Lesnaya street, Moscow, 125047, www.avito.ru

Спасибо за внимание!

Константин Евтеев[email protected]

https://hh.ru/vacancy/10795267https://hh.ru/vacancy/11463461