server side data sync for mobile apps with silex
Post on 12-Jul-2015
1.454 Views
Preview:
TRANSCRIPT
Implementing data synchronization API for mobile apps with Silex
Agenda
scenario design choices
implementation alternative approaches
Sync scenario
A
B
C
Sync scenario
ABC
ABC
ABC
Dealing with conflicts
A1
A2
?
Brownfield project
several mobile apps for tracking user generated data (calendar, notes, bio data)
iOS & Android
~10 K users steadily growing at 1.2 K/month
Scenario
MongoDB
Legacy App based on Codeigniter
Existing RPC-wannabe-REST API for data sync
Scenario
For every resource
get updates:
POST /m/:app/get/:user_id/:res/:updated_from
send updates:
POST /m/:app/update/:user_id/:res_id/:dev_id/:res
Scenario
api
~6 different resources, ~12 calls per sync
apps sync by polling every 30 sec
every call sync little data
Scenario
Rebuild sync API for old apps + 2 incoming
Enable image synchronization
More efficient than previous API
Challenge
Existing Solutions
Tstamps, Vector clocks,
CRDTs
syncML, syncano
Azure Data sync
Algorithms Protocols/API
Platform
couchDB, riak
Storage
Not Invented Here?
Don't Reinvent The Wheel,Unless You Plan on Learning More About Wheels
J. Atwood
2 different mobile platforms
Several teams with different skill level
Changing storage wasn’t an option
Forcing a particular technology client side wasn’t an option
Architecture
Architecture
c1
server
c2
c3
sync logicconflicts resolution
thin clients
In the sync domain all resources are managed in the same way
Implementation
For every app:
one endpoint for getting new data
one endpoint for pushing changes
one endpoint for uploading images
Implementation
GET /apps/:app/users/:user_id/changes[?from=:from]
POST /apps/:app/users/:user_id/merge
POST /upload/:res_id/images
The new APIs
Silex Implementation
Silex Implementation
Col 1
Col 2
Col 3
Silex Implementation
Col 1
Col 2
Col 3
Sync Service
Silex Implementation
Col 1
Col 2
Col 3
Sync Service
Silex Implementation
Col 1
Col 2
Col 3
Sync Service
Silex Implementation
Col 1
Col 2
Col 3
Sync Service
Silex Implementation
Col 1
Col 2
Col 3
Sync Service
Silex Implementation
Col 1
Col 2
Col 3
Sync Service
Silex Implementation
Col 1
Col 2
Col 3
Sync Service
Silex Implementation$app->get(“/apps/{mApp}/users/{userId}/changes”,
function ($mApp, $userId, $app, $request){$lastSync = $request->get('from', null); $syncService = $app[‘syncService’];$syncService->sync($lastSync, $userId);
$response = new JsonResponse($syncService->getResult()
);
return $response;}
Silex Implementation$app->get(“/apps/{mApp}/users/{userId}/changes”,
function ($mApp, $userId, $app, $request){$lastSync = $request->get('from', null); $syncService = $app[‘syncService’];$syncService->sync($lastSync, $userId);
$response = new JsonResponse($syncService->getResult()
);
return $response;}
Silex Implementation$app->get(“/apps/{mApp}/users/{userId}/changes”,
function ($mApp, $userId, $app, $request){$lastSync = $request->get('from', null); $syncService = $app[‘syncService’];$syncService->sync($lastSync, $userId);
$response = new JsonResponse($syncService->getResult()
);
return $response;}
Silex Implementation$app->get(“/apps/{mApp}/users/{userId}/changes”,
function ($mApp, $userId, $app, $request){$lastSync = $request->get('from', null); $syncService = $app[‘syncService’];$syncService->sync($lastSync, $userId);
$response = new JsonResponse($syncService->getResult()
);
return $response;}
Silex Implementation$app->get(“/apps/{mApp}/users/{userId}/merge”,
function ($mApp, $userId, $app, $request){$lastSync = $request->get('from', null);$data = $request->get(‘data’, false);$syncService = $app[‘syncService’];$syncService->merge($data, $lastSync, $userId);
$response = new JsonResponse($syncService->getResult()
);
return $response;}
Silex Implementation$app->get(“/apps/{mApp}/users/{userId}/merge”,
function ($mApp, $userId, $app, $request){$lastSync = $request->get('from', null);$data = $request->get(‘data’, false);$syncService = $app[‘syncService’];$syncService->merge($data, $lastSync, $userId);
$response = new JsonResponse($syncService->getResult()
);
return $response;}
Silex Implementation$app->get(“/apps/{mApp}/users/{userId}/merge”,
function ($mApp, $userId, $app, $request){$lastSync = $request->get('from', null);$data = $request->get(‘data’, false);$syncService = $app[‘syncService’];$syncService->merge($data, $lastSync, $userId);
$response = new JsonResponse($syncService->getResult()
);
return $response;}
Silex Implementation$app->get(“/apps/{mApp}/users/{userId}/merge”,
function ($mApp, $userId, $app, $request){$lastSync = $request->get('from', null);$data = $request->get(‘data’, false);$syncService = $app[‘syncService’];$syncService->merge($data, $lastSync, $userId);
$response = new JsonResponse($syncService->getResult()
);
return $response;}
Silex Implementation$app->get(“/apps/{mApp}/users/{userId}/merge”,
function ($mApp, $userId, $app, $request){$lastSync = $request->get('from', null);$data = $request->get(‘data’, false);$syncService = $app[‘syncService’];$syncService->merge($data, $lastSync, $userId);
$response = new JsonResponse($syncService->getResult()
);
return $response;}
Silex Implementation
$app['mongodb'] = new MongoDb(…);
$app[‘changesRepo’] = new ChangesRepository( $app[‘mongodb’]);
$app[‘syncService’] ? new SyncService( $app[‘changesRepo’]);
GET /apps/:app/users/:user_id/changes?from=:from
Get changes
timestamp?
timestamp are inaccurate
server suggests the “from” parameter to be used in the next request
Server suggest the sync time
Server suggest the sync time
c1 server
GET /changes
{ ‘next’ : 12345, ‘data’: […] }
Server suggest the sync time
c1 server
GET /changes
{ ‘next’ : 12345, ‘data’: […] }
GET /changes?from=12345
{ ‘next’ : 45678, ‘data’: […] }
operations:{‘op’: ’add’, id: ‘1’, ’data’:[…]}{‘op’: ’update’, id: ‘1’, ’data’:[…]}{‘op’: ’delete’, id: ‘1’}{‘op’: ’add’, id: ‘2’, ’data’:[…]}
states:{id: ‘1’, ’data’:[…]}{id: 2’, ’data’:[…]}{id: ‘3’, ’data’:[…]}
what to transfer
we choose to transfer states {id: ‘1’, ’type’: ‘measure’, ‘_deleted’: true}{id: 2’, ‘type’: ‘note’}{id: ‘3’, ‘type’: ‘note’}
ps: soft delete all the things!
what to transfer
How do we generate an unique id in a distributed system?
unique identifiers
How do we generate an unique id in a distributed system?
UUID (RFC 4122): several implementations in PHP (https://github.com/ramsey/uuid)
unique identifiers
How do we generate an unique id in a distributed system?
Local/Global Id: only the server generates GUIDsclients use local ids to manage their records
unique identifiers
unique identifiers
c1 server
POST /merge{ ‘data’: [ {’lid’: ‘1’, …}, {‘lid’: ‘2’, …}] }
{ ‘data’: [ {‘guid’: ‘58f0bdd7-1400’, ’lid’: ‘1’, …}, {‘guid’: ‘6f9f3ec9-1400’, ‘lid’: ‘2’, …}] }
mobile generated data are “temporary” until sync to server
server handles conflicts resolution
conflict resolution algorithm (plain data)
conflict resolution:
domain indipendent: e.g. last-write wins
domain dipendent: use domain knowledge to resolve
conflict resolution algorithm (plain data)
function sync($data) {
foreach ($data as $newRecord) {
$s = findByGuid($newRecord->getGuid());
if (!$s) {add($newRecord);send($newRecord);continue;
}
if ($newRecord->updated > $s->updated) {update($s, $newRecord);send($newRecord);continue;
}
updateRemote($newRecord, $s);}
conflict resolution algorithm (plain data)
function sync($data) {
foreach ($data as $newRecord) {
$s = findByGuid($newRecord->getGuid());
if (!$s) {add($newRecord);send($newRecord);continue;
}
if ($newRecord->updated > $s->updated) {update($s, $newRecord);send($newRecord);continue;
}
updateRemote($newRecord, $s);}
conflict resolution algorithm (plain data)
function sync($data) {
foreach ($data as $newRecord) {
$s = findByGuid($newRecord->getGuid());
if (!$s) {add($newRecord);send($newRecord);continue;
}
if ($newRecord->updated > $s->updated) {update($s, $newRecord);send($newRecord);continue;
}
updateRemote($newRecord, $s);}
conflict resolution algorithm (plain data)
no conflict
function sync($data) {
foreach ($data as $newRecord) {
$s = findByGuid($newRecord->getGuid());
if (!$s) {add($newRecord);send($newRecord);continue;
}
if ($newRecord->updated > $s->updated) {update($s, $newRecord);send($newRecord);continue;
}
updateRemote($newRecord, $s);}
conflict resolution algorithm (plain data)
remote wins
function sync($data) {
foreach ($data as $newRecord) {
$s = findByGuid($newRecord->getGuid());
if (!$s) {add($newRecord);send($newRecord);continue;
}
if ($newRecord->updated > $s->updated) {update($s, $newRecord);send($newRecord);continue;
}
updateRemote($newRecord, $s);}
conflict resolution algorithm (plain data)
server wins
conflict resolution algorithm (plain data)
c1
{ ‘lid’: ‘1’, ‘guid’: ‘af54d’, ‘data’ : ‘AAA’ ‘updated’: ’100’ }
{ ‘lid’: ‘2’, ‘data’ : ‘hello!’, ‘updated’: ’15’ }
server
{ ’guid’: ‘af54d’, ‘data’: ‘BBB’, ‘updated’ : ’20’ }
conflict resolution algorithm (plain data)
c1 server
{ ‘lid’: ‘1’, ‘guid’: ‘af54d’, ‘data’ : ‘AAA’ ‘updated’: ’100’ }
{ ‘lid’: ‘2’, ‘data’ : ‘hello!’, ‘updated’: ’15’ } POST /merge
{ ’guid’: ‘af54d’, ‘data’: ‘BBB’, ‘updated’ : ’20’ }
conflict resolution algorithm (plain data)
c1 server
{ ‘lid’: ‘1’, ‘guid’: ‘af54d’, ‘data’ : ‘AAA’ ‘updated’: ’100’ }
{ ‘lid’: ‘2’, ‘data’ : ‘hello!’, ‘updated’: ’15’ } POST /merge
{ ‘guid’: ‘e324f’, ‘data’ : ‘hello!’, ‘updated’: ’15’ }
{ ’guid’: ‘af54d’, ‘data’: ‘BBB’, ‘updated’ : ’20’ }
conflict resolution algorithm (plain data)
c1 server
{ ‘lid’: ‘1’, ‘guid’: ‘af54d’, ‘data’ : ‘AAA’ ‘updated’: ’100’ }
{ ‘lid’: ‘2’, ‘data’ : ‘hello!’, ‘updated’: ’15’ } POST /merge
{ ‘guid’: ‘e324f’, ‘data’ : ‘hello!’, ‘updated’: ’15’ }
{ ’guid’: ‘af54d’, ‘data’: ‘BBB’, ‘updated’ : ’20’ }
conflict resolution algorithm (plain data)
c1 server
{ ‘lid’: ‘1’, ‘guid’: ‘af54d’, ‘data’ : ‘AAA’ ‘updated’: ’100’ }
{ ‘lid’: ‘2’, ‘data’ : ‘hello!’, ‘updated’: ’15’ } POST /merge
{ ‘guid’: ‘e324f’, ‘data’ : ‘hello!’, ‘updated’: ’15’ }
{ ’guid’: ‘af54d’, ‘data’: ‘AAA’, ‘updated’ : ’100’ }
conflict resolution algorithm (plain data)
c1 server
{ ‘lid’: ‘1’, ‘guid’: ‘af54d’, ‘data’ : ‘AAA’ ‘updated’: ’100’ }
{ ’guid’: ‘af54d’, ‘data’: ‘AAA’, ‘updated’ : ’100’ }
{ ‘lid’: ‘2’, ‘data’ : ‘hello!’, ‘updated’: ’15’ } POST /merge
{ ‘guid’: ‘e324f’, ‘data’ : ‘hello!’, ‘updated’: ’15’ }
{‘ok’ : { ’guid’: ‘af54d’ }}
{‘update’ : { lid: ‘2’, ’guid’: ‘e324f’ }}
conflict resolution algorithm (hierarchical data)
How to manage hierarchical data?
{‘lid’ : ‘123456’,‘type’ : ‘baby’, …
}
{‘lid’ : ‘123456’,‘type’ : ‘temperature’, ‘baby_id : ‘123456’
}
conflict resolution algorithm (hierarchical data)
How to manage hierarchical data?1) sync root record2) update ids3) sync child records
{‘lid’ : ‘123456’,‘type’ : ‘baby’, …
}
{‘lid’ : ‘123456’,‘type’ : ‘temperature’, ‘baby_id : ‘123456’
}
function syncHierarchical($data) {
sortByHierarchy($data);
foreach ($data as $newRootRecord) {
$s = findByGuid($newRootRecord->getGuid());
if($newRecord->isRoot()) {
if (!$s) {add($newRootRecord);updateRecordIds($newRootRecord, $data);send($newRootRecord);continue;
}
…
conflict resolution algorithm (hierarchical data)
function syncHierarchical($data) {
sortByHierarchy($data);
foreach ($data as $newRootRecord) {
$s = findByGuid($newRootRecord->getGuid());
if($newRecord->isRoot()) {
if (!$s) {add($newRootRecord);updateRecordIds($newRootRecord, $data);send($newRootRecord);continue;
}
…
conflict resolution algorithm (hierarchical data)
parent records first
function syncHierarchical($data) {
sortByHierarchy($data);
foreach ($data as $newRootRecord) {
$s = findByGuid($newRootRecord->getGuid());
if($newRecord->isRoot()) {
if (!$s) {add($newRootRecord);updateRecordIds($newRootRecord, $data);send($newRootRecord);continue;
}
…
conflict resolution algorithm (hierarchical data)
function syncHierarchical($data) {
sortByHierarchy($data);
foreach ($data as $newRootRecord) {
$s = findByGuid($newRootRecord->getGuid());
if($newRecord->isRoot()) {
if (!$s) {add($newRootRecord);updateRecordIds($newRootRecord, $data);send($newRootRecord);continue;
}
…
conflict resolution algorithm (hierarchical data)
no conflict
…
if ($newRootRecord->updated > $s->updated) {update($s, $newRecord);updateRecordIds($newRootRecord, $data);send($newRootRecord);continue;
} else {updateRecordIds($s, $data);updateRemote($newRecord, $s);
}} else {
sync($data);}
}
conflict resolution algorithm (hierarchical data)
remote wins
…
if ($newRootRecord->updated > $s->updated) {update($s, $newRecord);updateRecordIds($newRootRecord, $data);send($newRootRecord);continue;
} else {updateRecordIds($s, $data);updateRemote($newRecord, $s);
}} else {
sync($data);}
}
conflict resolution algorithm (hierarchical data)
server wins
conflict resolution algorithm (hierarchical data)
{ ‘lid’: ‘1’, ‘data’ : ‘AAA’ ‘updated’: ’100’ }
{ ‘lid’: ‘2’, ‘parent’: ‘1’, ‘data’ : ‘hello!’, ‘updated’: ’15’ }
c1 serverPOST /merge
conflict resolution algorithm (hierarchical data)
c1
{ ‘lid’: ‘1’, ‘data’ : ‘AAA’ ‘updated’: ’100’ }
{ ‘lid’: ‘2’, ‘parent’: ‘1’, ‘data’ : ‘hello!’, ‘updated’: ’15’ }
serverPOST /merge
{ ‘lid’: ‘1’, ‘guid’ : ‘32ead’, ‘data’ : ‘AAA’ ‘updated’: ’100’ }
conflict resolution algorithm (hierarchical data)
c1
{ ‘lid’: ‘1’, ‘data’ : ‘AAA’ ‘updated’: ’100’ }
{ ‘lid’: ‘2’, ‘parent’: ‘32ead’, ‘data’ : ‘hello!’, ‘updated’: ’15’ }
serverPOST /merge
{ ‘lid’: ‘1’, ‘guid’ : ‘32ead’, ‘data’ : ‘AAA’ ‘updated’: ’100’ }
conflict resolution algorithm (hierarchical data)
c1
{ ‘lid’: ‘1’, ‘data’ : ‘AAA’ ‘updated’: ’100’ }
{ ‘lid’: ‘2’, ‘parent’: ‘32ead’, ‘data’ : ‘hello!’, ‘updated’: ’15’ }
serverPOST /merge
{ ‘lid’: ‘1’, ‘guid’ : ‘32ead’, ‘data’ : ‘AAA’ ‘updated’: ’100’ }
{ ‘lid’: ‘2’, ‘parent’: ‘32ead’, ‘data’ : ‘hello!’, ‘updated’: ’15’ }
{‘update’ : { ‘lid’: ‘1’, ’guid’: ‘af54d’ }}
{‘update’ : { lid: ‘2’, ’guid’: ‘e324f’ }}
e.g. “only one temperature can be registered in a given day”
how to we enforce domain constraints on data?
enforcing domain constraints
e.g. “only one temperature can be registered in a given day”
how to we enforce domain constraints on data?1) relax constraints
enforcing domain constraints
e.g. “only one temperature can be registered in a given day”
how to we enforce domain constraints on data?1) relax constraints2) integrate constraints in sync algorithm
enforcing domain constraints
from findByGuid to findSimilar
first lookup by GUID then by domain rules
“two measures are similar if are referred to the same date”
enforcing domain constraints
enforcing domain constraints
c1 server
enforcing domain constraints
c1 server
{ ’guid’: ‘af54d’, ‘when’: ‘20141005’ }
enforcing domain constraints
c1 server
{ ‘lid’: ‘1’, ‘when’: ‘20141005’ }
{ ’guid’: ‘af54d’, ‘when’: ‘20141005’ }
enforcing domain constraints
c1 server
{ ‘lid’: ‘1’, ‘when’: ‘20141005’ }
{ ’guid’: ‘af54d’, ‘when’: ‘20141005’ }
POST /merge
enforcing domain constraints
c1 server
{ ‘lid’: ‘1’, ‘when’: ‘20141005’ }
{ ’guid’: ‘af54d’, ‘when’: ‘20141005’ }
POST /merge
enforcing domain constraints
c1 server
{ ‘lid’: ‘1’, ‘when’: ‘20141005’ }
{ ’guid’: ‘af54d’, ‘when’: ‘20141005’ }
POST /merge
{ ’guid’: ‘af54d’, ‘when’: ‘20141005’ }
Binary data uploaded via custom endpoint
Sync data remains small
Uploads can be resumed
dealing with binary data
Two steps*1) data are synchronized2) related images are uploaded
* this means record without file for a given time
dealing with binary data
dealing with binary data
c1 server
POST /merge
POST /upload/ac435-f8345/image
{ ‘lid’ : 1, ‘type’ : ‘baby’, ‘image’ : ‘myimage.jpg’ }
{ ‘lid’ : 1, ‘guid’ : ‘ac435-f8345’ }
Implementing this stuff is tricky
Explore existing solution if you can
Understanding the domain is important
What we learned
vector clocks
vector clocks
Conflict-free Replicated Data Types (CRDTs)
Constraining the types of operations in order to:
- ensure convergence of changes to shared data by uncoordinated, concurrent actors
- eliminate network failure modes as a source of error
CRDT
Gateways handles sync
Data flows through channels
- partition data set
- authorization
- limit the data
Use revision trees
Couchbase Mobile
Distributed DBEventually/Strong Consistency
Data Types
Configurable conflict resolution- db level for built-in data types- application level for custom data
Riak
http://www.objc.io/issue-10/sync-case-study.htmlhttp://www.objc.io/issue-10/data-synchronization.html
https://dev.evernote.com/media/pdf/edam-sync.pdfhttp://blog.helftone.com/clear-in-the-icloud/
http://strongloop.com/strongblog/node-js-replication-mobile-offline-sync-loopback/http://blog.denivip.ru/index.php/2014/04/data-syncing-in-core-data-based-ios-apps/?lang=en
http://inessential.com/2014/02/15/vesper_sync_diary_8_the_problem_of_unhttp://culturedcode.com/things/blog/2010/12/state-of-sync-part-1.htmlhttp://programmers.stackexchange.com/questions/206310/data-synchronization-in-mobile-apps-multiple-devices-multiple-users
http://bricklin.com/offline.htmhttp://blog.couchbase.com/why-mobile-sync
Links
Vector Clockshttp://basho.com/why-vector-clocks-are-easy/http://www.datastax.com/dev/blog/why-cassandra-doesnt-need-vector-clockshttp://basho.com/why-vector-clocks-are-hard/http://blog.8thlight.com/rylan-dirksen/2013/10/04/synchronization-in-a-distributed-system.html
CRDTshttp://christophermeiklejohn.com/distributed/systems/2013/07/12/readings-in-distributed-systems.htmlhttp://www.infoq.com/presentations/problems-distributed-systemshttps://www.youtube.com/watch?v=qyVNG7fnubQ
Riak http://docs.basho.com/riak/latest/dev/using/conflict-resolution/
Couchbase Sync Gatewayhttp://docs.couchbase.com/sync-gateway/http://www.infoq.com/presentations/sync-mobile-data
APIhttp://developers.amiando.com/index.php/REST_API_DataSynchttps://login.syncano.com/docs/rest/index.html
Links
phones https://www.flickr.com/photos/15216811@N06/14504964841wat http://uturncrossfit.com/wp-content/uploads/2014/04/wait-what.jpgdarth http://www.listal.com/viewimage/3825918hblueprint: http://upload.wikimedia.org/wikipedia/commons/5/5e/Joy_Oil_gas_station_blueprints.jpgbuilding: http://s0.geograph.org.uk/geophotos/02/42/74/2427436_96c4cd84.jpgbrownfield: http://s0.geograph.org.uk/geophotos/02/04/54/2045448_03a2fb36.jpgno connection: https://www.flickr.com/photos/77018488@N03/9004800239no internet con https://www.flickr.com/photos/roland/9681237793vector clocks: http://en.wikipedia.org/wiki/Vector_clockcrdts: http://www.infoq.com/presentations/problems-distributed-systems
Credits
top related