instedd mesh4x platform

11
InSTEDD’s Mesh4x Synchronization Platform: http://code.google.com/p/mesh4x Taha Kass-Hout, MD, MS, Eduardo Jezierski, and Juan Marcelo Tondato InSTEDD, Palo Alto, USA Mesh Properties Data meshes have some interesting properties, including: Symmetric: Data meshes allow data to exist in a concurrent multi- master environment where updates can be applied at any node in the mesh. Asynchronous: Data meshes allow offline updates to information and synchronization with other nodes without requiring data locks, essential for occasionally connected applications. Dynamic: Data meshes allow for synchronization to occur even in constantly changing connection topologies. A user can sync to a server and later the sync can be done between the user’s client with another client, who could then sync with another server, and so on. A data mesh allows two-way synchronization of information in a symmetric way. This guarantees certain versioning behaviors regardless of the path data have taken. In this symmetrical exchange, there is an 'adapter' on each end. Adapters can be created for synchronizing different forms of data including text files, xml files, web pages, and small relational databases. The adapters can be created to synchronize data over various protocols and methods such as Hypertext Transfer Protocol (HTTP) (e.g., when Internet connection is available) or Short Message Service (SMS) exchange (e.g., in the field using cellular telephones or in austere conditions), etc. Therefore, an adapter that enables near real-time data synchronization using the fastest available technology (Internet/HTTP, cell phone/SMS, satellite communication, or flash drive/pen drive) would be of significant value, especially for a field team of epidemiologists or humanitarians in different localities working together to collect and share disease outbreak data from a common event.

Upload: taha-kass-hout-md-ms

Post on 22-Jun-2015

1.227 views

Category:

Health & Medicine


0 download

DESCRIPTION

InSTEDD’s Mesh4x (http://code.google.com/p/mesh4x) allows for data synchronization among different data sources regardless of technology platform or network connectivity. Users can make their data available to all users in their distributed project team or across different jurisdictions. We describe the utility and architecture of Mesh4x to share data over the Internet cloud where users determine which subset of their data are exchanged. This technology raises the potential to share data (e.g., during outbreak investigation, disaster recovery or humanitarian relief efforts) where multiple people are then allowed access to see each other’s data, update the information as the event unfolds, and securely exchange data with one another.

TRANSCRIPT

Page 1: InSTEDD Mesh4x Platform

InSTEDD’s Mesh4x Synchronization Platform: http://code.google.com/p/mesh4x

Taha Kass-Hout, MD, MS, Eduardo Jezierski, and Juan Marcelo TondatoInSTEDD, Palo Alto, USA

Mesh Properties

Data meshes have some interesting properties, including:

Symmetric: Data meshes allow data to exist in a concurrent multi-master environment where updates can be applied at any node in the mesh.

Asynchronous: Data meshes allow offline updates to information and synchronization with other nodes without requiring data locks, essential for occasionally connected applications.

Dynamic: Data meshes allow for synchronization to occur even in constantly changing connection topologies. A user can sync to a server and later the sync can be done between the user’s client with another client, who could then sync with another server, and so on.

A data mesh allows two-way synchronization of information in a symmetric way. This guarantees certain versioning behaviors regardless of the path data have taken. In this symmetrical exchange, there is an 'adapter' on each end. Adapters can be created for synchronizing different forms of data including text files, xml files, web pages, and small relational databases. The adapters can be created to synchronize data over various protocols and methods such as Hypertext Transfer Protocol (HTTP) (e.g., when Internet connection is available) or Short Message Service (SMS) exchange (e.g., in the field using cellular telephones or in austere conditions), etc. Therefore, an adapter that enables near real-time data synchronization using the fastest available technology (Internet/HTTP, cell phone/SMS, satellite communication, or flash drive/pen drive) would be of significant value, especially for a field team of epidemiologists or humanitarians in different localities working together to collect and share disease outbreak data from a common event.

Mesh Requirements

Mesh synchronization requires the following components: Storage abstraction, offline work, conflict resolution, SMS integration, security, identity, and adapters and transformers.

Mesh ComponentsStorage abstraction This is a term to describe the persistence of information unaltered across

systems regardless of the application using the information.Offline work Most of the data collection done during outbreak investigation is done in

the field with no access to the Internet and rarely with access to cellular telephone networks. Systems intended for data acquisition should be designed to accommodate disconnected users at the fringe of the infrastructure. However, these systems should also synchronize seamlessly with Internet systems when connectivity is reestablished. Such techniques are commonplace within the Information Technology industry (at Allianz-Indonesia, the US Federal Aviation Administration, Steelcase Corporation, Monsanto Chemicals, Pennsylvania Office of the Attorney General, the European Railway Agency, Statoil-Norway, and many more)

Page 2: InSTEDD Mesh4x Platform

but they are rarely integrated into public health tools.Conflict resolution This is a term for a synchronization technique that ensures that data

entered over time by two people in an offline environment are updated with each other’s data in a graceful manner when connectivity is established. The mathematics required for managing conflict resolution are difficult, but understood, and the software techniques for conflict resolution have been maturing for more than a decade.

SMS integration As above, cellular telephones, using SMS text messaging to minimize bandwidth and cost, can be the most ubiquitous and resilient form of electronic communication available especially during disasters or in austere conditions. We consider incorporating SMS to be a design mandate in any primary reporting tool.

Security Since we are conveying health information, international security standards apply. Since this information might, if made public, result in social and economic disturbance, regardless of the veracity of the report, data encryption is required.

Identity We consider it imperative for quality control and accountability that all epidemiological reporting system submissions be identified by source, date, and time. Changes may be allowed, but the identity of the person making the updates must be known or documented as clearly as the creator of the original data is known.

Adapters and transformers

IT resources are most commonly an internal decision rarely affected by outside influences. Well established and accepted industry standards often influence some IT decisions, although adherence to such standards cannot be a requirement of a particular platform or application that is enforced across sovereign countries, agencies and organizations. Such an attempt is impractical and unlikely to succeed. However, through broad collaboration and the sharing of knowledge, development of tools and techniques to bridge technical divides can occur. As these are used, improved, and become widely adopted, they lead to the acceptance of new standards and industry best practices. Collaboration should be strongly encouraged at every opportunity with information shared to the greatest degree acceptable. Tools should permit information sharing across all platform and application boundaries (from spreadsheets to databases to GIS layers to document handlers to communications devices to presentation software to statistical programs). There should be negligible additional technical support required of the users except to establish levels of permission.

Mesh Devices

In a data mesh, devices (Figures 1 and 2) can share information whenever they are connected. If the mesh is standards-based, access to data is guaranteed, and a larger number of implementations can arise to address the diverse challenges in the space. Furthermore, if the mesh implementations include open-source projects, they can easily evolve to incorporate specific applications including those of little market value but highly used in public health and humanitarian scenarios.

Page 3: InSTEDD Mesh4x Platform

Mesh DevicesMobile mesh applications take information where a user needs it most. Unlike data collection solutions, a user can update information and see her team’s work while making sure everyone has current, near real-time data.

Personal mesh applications are applications a user uses every day, but are connected to a mesh of data. Spreadsheets, MS Access databases, and almost any application a user has can be a mesh application.Mesh servers can be hosted online, inside an organization, or in a dedicated device. Mesh servers scale as much as the hardware dedicated to them. They can act as a central information hub for a team or a building.Cloud Services can host a user’s mesh data and are built to scale. They are reliable, always online, and a secure relay point for a user’s data. They can also store all previous versions of a user’s data if needed.

Figure 1: Data Mesh Services and Devices

Mesh4x Architecture

Mesh4x server architecture consists of the following (please see Figure 2) components:

a. Update APIs: Application Programming Interfaces (APIs) allow other applications to change the data in the service. A mesh endpoint allows edits (FeedSync or AtomPub styles) using “traditional” (WSDL-based) or http POST services (for simpler updates) and “non-traditional” (RESTful ) services that can be managed from Javascript calls. A JavaROSA endpoint allows the necessary metadata to be exposed to JavaROSA or AndroidROSA mobile phone handsets, and accept updates. An SMS bridge allows sending and receiving semi-structured updates for applications such as The GeoChat [Ref] or FrontlineSMS [Ref].

b. Storage: This is the storage layer for all the data and the configuration, security information, etc. needed to keep the service running. In our web-based instance, all this data is stored in Amazon S3 or Google Cloud; however this could also be hosted by a local health department, State public health agency or CDC using an instance of MySQL or other database engine. The services' information is managed by Mesh4x itself so the actual configuration data can be stored with an adapter.

c. Ontology support: An ontology is a formal representation of a set of concepts within a domain and the relationships between those concepts. The Web Ontology Language (OWL) is a recognized standard for defining ontologies, and there are ongoing efforts for defining ontologies in the geospatial realm, as well as within the various public health agencies. Defining a “complete” ontology is a very resource- intensive effort, which is why ontology development efforts often encourage a communal approach. A more generalized, Web 2.0-friendly approach to defining relationships between different types of entities is the use of Resource Definition Framework (RDF) descriptors. Data can be harvested through a log-style service (API) that can record RDF triples, this is especially useful for user-generated contents in a Web 2.0 environment (e.g., wiki, discussion board, document library, workflow). The resulting RDF triples can then be stored separately on triple stores. The separation of the actual data store and the metadata store provides decoupling between the service and other regular content or system components, resulting in the ubiquitous collection of information. While individual applications/software

Page 4: InSTEDD Mesh4x Platform

components may have ontologies (implicit or explicit, but a given tool only generates a particular limited set of triples), the triple store can support RDF-based and accept any valid triples from any source. Using common identifiers in that system then allows the capture of the overall network of relationships and the mesh recommender style features. Additionally, while most information are geared towards human-readable content, the practice of epidemiologic modeling is heavily structured in nature (databases, spreadsheets.) The potential benefits from representing such information in a computer-processable format would also be greatly improved if the public health community adopted a standard (or developed a public health specific XML-based Mark-Up Language) for publishing and augmenting (tagging) models; such as models for: health behavior risk and intervention, climate change, biodiversity , resource management, pollution, and land-use change. This can be done by separating the representation of the model from the program used to simulate its behavior.

1. Ontology Extraction: This service differs from a database in which the user does not need to specify the schema of the information ahead of time. Very little information about the format of the data is needed. This enables applications to change and evolve the data used without requiring developers to change database structures or write specific code for each case. Knowing just a little about the data structure helps with the definition of mappings and filters. The Ontology Extraction service attempts to infer as much as possible. The Ontology Extraction component allows the user to submit Resource Description Format (RDF)-formed information (or XForms-based or any other format that has a transformer), fields that make up the entities; for example, are then tracked. If a user supplies such ontologies (in RDF, or an XForm Definition), Mesh4x can keep this information (e.g. 'Patient’s Date of Birth is a Date/Time field'). RDF is the default standard in Mesh4x in order to represent data and ontologies.

2. Ontology Mapping: Ontology mapping allows users to map fields and entities of different ontologies in order to make sense of the data being exchanged. For example, to map the data, a user provides a descriptive summary, position, and timestamp associated with the entity: which field should provide the timestamp, which address or coordinate fields should be used to geocode the information, and how should the description be composed from the data. A mapping service allows users to input these parameters, but it is also provided through the user interface as we did with the Epi Info™ data synchronization tool (http://code.google.com/p/mesh4x/wiki/EpiInfoMesh4x).

d. Filtering: This is an essential component in a mesh where small and big devices coexist. For example, a user could have disease outbreak records for an entire state or country in one data mesh, but the user can choose to only want to keep a subset of those records on a mobile device (e.g., her cell phone). As soon as filters are exposed, the phone can synchronize the subset of the records.

e. Format Transformers: Format transformers are components built to translate data into specific formats. GeoRSS, KML (Keyhole Markup Language) and Shapefile are defacto standards for displaying spatial location of objects, and standardization has already made it possible to create large repositories and clearinghouses for geospatial data. The user can see the KML in Google Earth, for example, and items would appear on a map as other users synchronize their data with the server. Transformers for XForms models and XForms forms allow users to translate the information of their entities and ontologies into an XForms format. We see the utility and the pragmatism of XForms models as a way of exchanging records and a way to define the user interface model of the forms that users see in XForms. These transformers also allow transformation of RDF-centric representations in Mesh4x to these broadly adopted formats.

Page 5: InSTEDD Mesh4x Platform

f. Synchronization Adapters: If a user wants to work with the data elsewhere, Mesh4x allows this extended functionality through providing the following data endpoints:

1. Google Spreadsheets: Like the Microsoft Excel adapter, Mesh4x provides adaptors for Google spreadsheets. Google Spreadsheets offer enhanced features; such as creating an online form, ability to add analytics and visualization objects (called Gadgets), and collaborative features through questionnaire design, data collection, and analysis phases of a questionnaire.

2. Zoho offers a variety of useful applications and tools for data collection. Mesh4x offers an adaptor for Zoho so multiple users can synchronize their Zoho application with a data table in MySQL or an MS Access database.

3. MySQL: As many public health applications are now provided as open source, MySQL becomes a natural choice of the backend database. MySQL instances can be exposed on an open network port by simply providing a connection string that is supported by Mesh4x.

Through a user interface, synchronization updates can easily be scheduled, define mappings between schemas or ontologies, and resolve conflicts. For example, these mappings can be part of the mesh where an offline user marks an excel spreadsheet as 'shared' and the data will be synchronized next time the user exchanges her data. The server would also create a Google Spreadsheet endpoint (or other endpoints) with the same information for others to use in their collaboration spaces.

Synchronization Interoperability Standards

The following table lists the synchronization interoperability standards supported by Mesh4x:

StandardsVersioning: One powerful feature of a data mesh is the ability to update information anywhere, anytime, on any device regardless of the platform, or whether the device is connected. To achieve this, there is a versioning standard that tracks updates and detects and preserves conflicts in a way that ensures ‘eventual consistency’.Data Formats: This is needed so applications can let a user modify information the right way. Since applications may treat information differently, this is a standard that is dynamic and extensible. Existing data standards are supported along with microformats, images, videos, large files, geospatial, and time metadata. Representation of altitudes has not yet been standardized.

Endpoints: This supports sharing information and making it discoverable so that users and programs can find, access (if authorized), and share any information updates from the source.

Page 6: InSTEDD Mesh4x Platform

Figure 2: Mesh4x Server Architecture

Supporting Centralized or Distributed Applications

Mesh4x can support centralized, distributed (federated) architectures or a hybrid of both. The following diagram outlines the high level architecture inherent in a mesh-type design.

Page 7: InSTEDD Mesh4x Platform

Figure 3: Putting it all Together

Shared Formats for Data Exchange

To achieve interoperability and reuse the human capital of having trained users, mobile applications should share conventions on message content for diverse uses, such as:

Free text with agreed upon language Free text with explicit tags Geo locations (latitude and longitude, place names, village PCodes, etc.) Delimited data (e.g. Patient19, Lewis County) Self Describing Data (e.g. firstn:Patient|lastn=19|county=Lewis) Multi-Message batching, order-agnostic or sequenced Message batch retries Compression

Leaders in this domain, such as Twitter, are already adopting some of these conventions (e.g., @user (Patient19), #tag (fever)) in common ways where applicable. For example, InSTEDD is currently building an Avian Influenza hotline for Cambodia that will implement batching and self-describing data over SMS. Recent development in social networking technologies and applications endorses an open approach with an underlying strategy for a "hub" or one stop shop. This approach ensures interoperability as various vendors follow a common approach. An example to this approach is the work InSTEDD is doing to implement an application where users collect structured data in JavaROSA (an open source project) and send this data over a GeoChat channel (InSTEDD GeoChat is an open source project for collaboration

Page 8: InSTEDD Mesh4x Platform

over SMS). If successful, this project will allow other clients (e.g., Nokia EpiSurveyor or RapidSMS) to synchronize information bi-directionally--phone-to-phone or phone-to-server, following the same approach.

Competing interests

InSTEDD is supported by research funding from Google.org and Rockefeller Foundation.