spreadsheetspace seminar at icsi
TRANSCRIPT
1
Spreadsheet Composition for Collaborative Data
Analysis
Michele Stecca (ICSI & CIPI)Berkeley, January 15th, 2015
2
Outline
1. Introduction2. Spreadsheet Composition 3. The SpreadSheet Space Software Platform4. Information System Access from the SpreadSheet Space5. Discussion6. Summary
3
IntroductionWhy Spreadsheet Composition?
• Data manipulation/visualization/sharing is becoming more and more important (Open Data, Big Data, Data Scientists, etc.)
• Collaborative work is becoming more and more important
• Microsoft Excel is a widely used tool for data analysis (1.1B users according to Microsoft)
• Microsoft Excel is not the best tool to use in a collaborative environment
New paradigms and tools are needed to face new challenges like real-time reporting, easy data publishing, collaborative data analysis, etc.
4
Outline
1. Introduction2. Spreadsheet Composition3. The SpreadSheet Space Software Platform4. Information System Access from the SpreadSheet Space5. Discussion6. Summary
5
Spreadsheet Composition (1/7)Base Services: Excel spreadsheetsWe need to define the «basic functionalities» provided by a spreadsheet:• Information Publication. A Source User publishes a data range called View to
a set of Target Users. (Watch Video: https://www.youtube.com/watch?v=hM5bsdgF4Mc)• Information Collection. A User provides a data range called Form to one or
more Users to have them fill it out, update it and submit it. (Watch Video: https://www.youtube.com/watch?v=41IV2pNJqy0)• In both cases, at every update of a Worksheet/Cell-Range/Table in the Source User
Spreadsheet the Target User Spreadsheets get automatically synchronized and the personalized analyses and presentations change accordingly. Very important for real-time data analysis in Excel
6
Base Services: Excel spreadsheetsInformation Publication steps:• A Source User Exposes a View on a Spreadsheet to a set of Target Users, i.e.,
it grants Target Users read access rights on such a view;• The Target Users Link their Spreadsheets to such a View.
Information Collection steps:• A Source User Prepares a Form on a Spreadsheet to be filled out by set of
Target Users;• Each Target User Installs the Form in his Spreadsheet;• The Source User receives the updates from the Target Users.
Spreadsheet Composition (2/7)
7
A New Paradigm: the SpreadSheet Space
Hypertext
Internet
World Wide Web
Spreadsheet
Internet
SpreadSheet Space
Spreadsheet Composition (3/7)
9
Composite Services: the Distributed Spreadsheet
• Definition• Associated to a single Virtual Spreadsheet consisting of elements belonging to
different users;• Evolving over time;• Based on cross Spreadsheet links.
• 2 types of Spreadsheet Compositions: • Spontaneous compositions: they are the results of peer-2-peer interactions
among Excel users; • Graphical compositions: there is a Spreadsheet Composition creator (e.g., an
Excel consultant) who defines the set of relationships among spreadsheets though a graphical tool.
Spreadsheet Composition (5/7)
Spontaneous Composition:
the composition is a result of peer-2-peer interactions
Spreadsheet Composition (6/7)
The Graphical Tool used by the
Composite Service Creator
Spreadsheet Composition (7/7)
- the composition is defined BEFORE the interaction among users- the relationships/links are created by the platformautomatically
12
Outline
1. Introduction2. Spreadsheet Composition3. The SpreadSheet Space Software Platform4. Information System Access from the SpreadSheet Space5. Discussion6. Summary
13
The SpreadSheet Space Software Platform (1/6)
• How can the links between two Spreadsheets be implemented if it cannot be guaranteed that the Source Spreadsheet and the Target Spreadsheet are simultaneously open?• Persistence must be provided by a Server
-> The SpreadSheet Space requires a Software Platform
Spreadsheet AlignerPersistency Service
14
High Level Architecture
Range/Table/Sheet SynchronizerData Plane
Link Controller
Control Plane
Form/Link Repository
The SpreadSheet Space Software Platform (2/6)
Graphical Tool
Composite SpreadsheetRepository
Spontaneous Interactions
Composition Plane
15
Data Plane…. i.e. how to Synchronize the Spreadsheets
• Persistence is provided by a Cloud Platform• The Add-in communicates with the Server through Web APIs• There is a publish/subscribe mechanism based on HTTP long polling
for automatic updates
The SpreadSheet Space Software Platform (3/6)
16
…. i.e., how to establish Links among Spreadsheets
(A) Spontaneous Compositions (a.k.a., User Explicit Configuration)• Information Publication. The Source User publishes Spreadsheet
ranges/tables/sheets to a set of Target Users. • Information Collection. A User provides a form to one or more Users to have
them fill it out, update it and submit it.
Control Plane
The SpreadSheet Space Software Platform (4/6)
17
…. i.e., how to establish Links among Spreadheets
(B) Through a Graphical Composite Spreadsheet Creation Environment ROLES:• Composite Service Creator • Creates a Composite Spreadsheet by specifying• The Users (A, B, …) involved• The Views/Forms exposed
• Users• Enter the Composite Spreadsheet Environment• Create the appropriate Views and Forms
The SpreadSheet Space Software Platform (5/6)Control Plane
18
Implementation insights
Microsoft Excel Add-in• Developed as a Office Add-in Framework
component (C#)• Downloaded and installed on user terminal• Fully integrated with Excel
SpreadSheetSpace Server• REST Web Services technology• Apache Tomcat• Deployed in-the-cloud or on-premises• Scalable and Elastic architecture
The SpreadSheet Space Software Platform (6/6)
19
Outline
1. Introduction2. Spreadsheet Composition3. The SpreadSheet Space Software Platform4. Information System Access from the SpreadSheet Space5. Discussion6. Summary
20
• Companies and organizations• Expose Views of company data in the form of Worksheets;
• Spreadsheet users• Link spreadsheets to exposed views
Information System access through Spreadsheets
Information Systems: a special type of Base Service
Users Mash-Up data exposed by different sources and maintain the combined analyses/presentations synchronized with the corporate data.
SpreadSheet Space
21
Information System Integration at the Desktop
22
The SpreadSheet Space Platform for Information System Access
SpreadSheetSpace Server
Firewall/Proxy Aware Network
SSS Addin SSS Addin
Public APIs
Adaptor
Adaptor
Information System 1
Information System 2
23
Outline
1. Introduction2. Spreadsheet Composition3. The SpreadSheet Space Software Platform4. Information System Access from the SpreadSheet Space5. Discussion6. Summary
24
Google Sheets vs. SpreadSheetSpace• Google Sheets is about sharing• It is not Excel! (Limited
functionalities and compatibility problems)• Symmetry - The users that share a
Spreadsheet have the same access rights. They can read it and write it freely. • Spreadsheet level granularity –
Sharing applies to Spreadsheets and not on parts of them. Either a Spreadsheet is shared or it is not.
• SpreadSheetSpace is about linking• It’s Excel!• Asymmetry – The user roles are
complementary. By exposing a View a source user grants the target users read access rights on it. By linking to a View the target users create an image of it in their Spreadsheets. • Cell level granularity – Users are
allowed to expose worksheets, cell ranges and tables while maintaining the rest of the Spreadsheet private.
25
Dynamic Data
• SpreadSheet Space focuses on Dynamic Data, i.e. on data that evolve over time.• One specific case of Dynamic Data is that of the Open Data.• In Dynamic Data the «Export to Excel» functionality, offered by most
Information Systems, is meaningless. Saving a view provided by an Information System in Excel format means taking a picture of the Information System situation at the saving time. • The Link functionality offered in the SpreadSheet Space enriches the
Excel analysis tools by guaranteeing synchronization between the Excel views and the actual Information System situation.
26
Scalable Information System Access (1/2)
• Excel can access external Information Systems through built-in query functionalities.• Dynamic data evolution can be captured through polling,
which injects a tremendous load on the Information Systems.• SpreadSheet Space provides a Publish/Subscribe service
which eliminates polling.• The load to support interaction is transfered from the
Information Systems to the SpreadSheet Space Platform.
Scalable Information System Access (2/2)Native Excel Functionality
27
With SpreadSheetSpace
SELECT * FROM Table1SELECT * FROM
Table1SELECT * FROM
Table1
SELECT * FROM Table1
View 1View 1View 1
SpreadSheet Space
Some Users may expose personalized views of corporate data to other end users.28
SYNC + DSS
Spreadsheet Ecosystems
Combining Information System Access and direct Excel links
29
Manual vs. Automatic Spreadsheet Update•Manual Update • The Target Users • are requested to confirm acceptance of View updates, and • can scan the update history.
• Automatic Update• All the target users are “in sync” with the exposed Views• Data Integrity (no different data versions) is guaranteed.
30
Easy Publication of Tabular Contents and of Graphical Presentations• Very important for Open Data• Although Excel already offers functionalities to publish data,
yet a certain degree of experience on publishing is necessary.• The SpreadSheet Space Platform turns out to be a easy to
use Content Management System for tabular data and graphical presentations.• A TabularData/Presentation repository enables the
development and the diffusion of Data Marketplaces in the SpreadSheet Space.
31
Outline
1. Introduction2. Spreadsheet Composition 3. The SpreadSheet Space Software Platform4. Information System Access from the SpreadSheet Space5. Discussion6. Summary
32
Summary
• The SpreadSheet Space is a space in which the Excel files connected to each other and/or connected to external Information Systems can live.• Spreadsheet Composition is a special case of Service
Composition.• Spreadsheet Composition was developed in two directions,
namely Excel to Excel interconnection and Excel to Information System interconnection.• Special features: Composite Spreadsheets, Linking vs Sharing,
Dynamic Data, Ease of Publication.
33
Email: [email protected]: @steccami / @sss_excel
We are looking for early adopters
www.spreadsheetspace.net