spreadsheetspace seminar at icsi

33
Spreadsheet Composition for Collaborative Data Analysis 1 Michele Stecca (ICSI & CIPI) Berkeley, January 15th, 2015

Upload: spreadsheetspace

Post on 03-Aug-2015

175 views

Category:

Data & Analytics


2 download

TRANSCRIPT

1

Spreadsheet Composition for Collaborative Data

Analysis

Michele Stecca (ICSI & CIPI)Berkeley, January 15th, 2015

2

Outline

1. Introduction2. Spreadsheet Composition 3. The SpreadSheet Space Software Platform4. Information System Access from the SpreadSheet Space5. Discussion6. Summary

3

IntroductionWhy Spreadsheet Composition?

• Data manipulation/visualization/sharing is becoming more and more important (Open Data, Big Data, Data Scientists, etc.)

• Collaborative work is becoming more and more important

• Microsoft Excel is a widely used tool for data analysis (1.1B users according to Microsoft)

• Microsoft Excel is not the best tool to use in a collaborative environment

New paradigms and tools are needed to face new challenges like real-time reporting, easy data publishing, collaborative data analysis, etc.

4

Outline

1. Introduction2. Spreadsheet Composition3. The SpreadSheet Space Software Platform4. Information System Access from the SpreadSheet Space5. Discussion6. Summary

5

Spreadsheet Composition (1/7)Base Services: Excel spreadsheetsWe need to define the «basic functionalities» provided by a spreadsheet:• Information Publication. A Source User publishes a data range called View to

a set of Target Users. (Watch Video: https://www.youtube.com/watch?v=hM5bsdgF4Mc)• Information Collection. A User provides a data range called Form to one or

more Users to have them fill it out, update it and submit it. (Watch Video: https://www.youtube.com/watch?v=41IV2pNJqy0)• In both cases, at every update of a Worksheet/Cell-Range/Table in the Source User

Spreadsheet the Target User Spreadsheets get automatically synchronized and the personalized analyses and presentations change accordingly. Very important for real-time data analysis in Excel

6

Base Services: Excel spreadsheetsInformation Publication steps:• A Source User Exposes a View on a Spreadsheet to a set of Target Users, i.e.,

it grants Target Users read access rights on such a view;• The Target Users Link their Spreadsheets to such a View.

Information Collection steps:• A Source User Prepares a Form on a Spreadsheet to be filled out by set of

Target Users;• Each Target User Installs the Form in his Spreadsheet;• The Source User receives the updates from the Target Users.

Spreadsheet Composition (2/7)

7

A New Paradigm: the SpreadSheet Space

Hypertext

Internet

World Wide Web

Spreadsheet

Internet

SpreadSheet Space

Spreadsheet Composition (3/7)

8

A New Paradigm: the SpreadSheet Space

Spreadsheet Composition (4/7)

9

Composite Services: the Distributed Spreadsheet

• Definition• Associated to a single Virtual Spreadsheet consisting of elements belonging to

different users;• Evolving over time;• Based on cross Spreadsheet links.

• 2 types of Spreadsheet Compositions: • Spontaneous compositions: they are the results of peer-2-peer interactions

among Excel users; • Graphical compositions: there is a Spreadsheet Composition creator (e.g., an

Excel consultant) who defines the set of relationships among spreadsheets though a graphical tool.

Spreadsheet Composition (5/7)

Spontaneous Composition:

the composition is a result of peer-2-peer interactions

Spreadsheet Composition (6/7)

The Graphical Tool used by the

Composite Service Creator

Spreadsheet Composition (7/7)

- the composition is defined BEFORE the interaction among users- the relationships/links are created by the platformautomatically

12

Outline

1. Introduction2. Spreadsheet Composition3. The SpreadSheet Space Software Platform4. Information System Access from the SpreadSheet Space5. Discussion6. Summary

13

The SpreadSheet Space Software Platform (1/6)

• How can the links between two Spreadsheets be implemented if it cannot be guaranteed that the Source Spreadsheet and the Target Spreadsheet are simultaneously open?• Persistence must be provided by a Server

-> The SpreadSheet Space requires a Software Platform

Spreadsheet AlignerPersistency Service

14

High Level Architecture

Range/Table/Sheet SynchronizerData Plane

Link Controller

Control Plane

Form/Link Repository

The SpreadSheet Space Software Platform (2/6)

Graphical Tool

Composite SpreadsheetRepository

Spontaneous Interactions

Composition Plane

15

Data Plane…. i.e. how to Synchronize the Spreadsheets

• Persistence is provided by a Cloud Platform• The Add-in communicates with the Server through Web APIs• There is a publish/subscribe mechanism based on HTTP long polling

for automatic updates

The SpreadSheet Space Software Platform (3/6)

16

…. i.e., how to establish Links among Spreadsheets

(A) Spontaneous Compositions (a.k.a., User Explicit Configuration)• Information Publication. The Source User publishes Spreadsheet

ranges/tables/sheets to a set of Target Users. • Information Collection. A User provides a form to one or more Users to have

them fill it out, update it and submit it.

Control Plane

The SpreadSheet Space Software Platform (4/6)

17

…. i.e., how to establish Links among Spreadheets

(B) Through a Graphical Composite Spreadsheet Creation Environment ROLES:• Composite Service Creator • Creates a Composite Spreadsheet by specifying• The Users (A, B, …) involved• The Views/Forms exposed

• Users• Enter the Composite Spreadsheet Environment• Create the appropriate Views and Forms

The SpreadSheet Space Software Platform (5/6)Control Plane

18

Implementation insights

Microsoft Excel Add-in• Developed as a Office Add-in Framework

component (C#)• Downloaded and installed on user terminal• Fully integrated with Excel

SpreadSheetSpace Server• REST Web Services technology• Apache Tomcat• Deployed in-the-cloud or on-premises• Scalable and Elastic architecture

The SpreadSheet Space Software Platform (6/6)

19

Outline

1. Introduction2. Spreadsheet Composition3. The SpreadSheet Space Software Platform4. Information System Access from the SpreadSheet Space5. Discussion6. Summary

20

• Companies and organizations• Expose Views of company data in the form of Worksheets;

• Spreadsheet users• Link spreadsheets to exposed views

Information System access through Spreadsheets

Information Systems: a special type of Base Service

Users Mash-Up data exposed by different sources and maintain the combined analyses/presentations synchronized with the corporate data.

SpreadSheet Space

21

Information System Integration at the Desktop

22

The SpreadSheet Space Platform for Information System Access

SpreadSheetSpace Server

Firewall/Proxy Aware Network

SSS Addin SSS Addin

Public APIs

Adaptor

Adaptor

Information System 1

Information System 2

23

Outline

1. Introduction2. Spreadsheet Composition3. The SpreadSheet Space Software Platform4. Information System Access from the SpreadSheet Space5. Discussion6. Summary

24

Google Sheets vs. SpreadSheetSpace• Google Sheets is about sharing• It is not Excel! (Limited

functionalities and compatibility problems)• Symmetry - The users that share a

Spreadsheet have the same access rights. They can read it and write it freely. • Spreadsheet level granularity –

Sharing applies to Spreadsheets and not on parts of them. Either a Spreadsheet is shared or it is not.

• SpreadSheetSpace is about linking• It’s Excel!• Asymmetry – The user roles are

complementary. By exposing a View a source user grants the target users read access rights on it. By linking to a View the target users create an image of it in their Spreadsheets. • Cell level granularity – Users are

allowed to expose worksheets, cell ranges and tables while maintaining the rest of the Spreadsheet private.

25

Dynamic Data

• SpreadSheet Space focuses on Dynamic Data, i.e. on data that evolve over time.• One specific case of Dynamic Data is that of the Open Data.• In Dynamic Data the «Export to Excel» functionality, offered by most

Information Systems, is meaningless. Saving a view provided by an Information System in Excel format means taking a picture of the Information System situation at the saving time. • The Link functionality offered in the SpreadSheet Space enriches the

Excel analysis tools by guaranteeing synchronization between the Excel views and the actual Information System situation.

26

Scalable Information System Access (1/2)

• Excel can access external Information Systems through built-in query functionalities.• Dynamic data evolution can be captured through polling,

which injects a tremendous load on the Information Systems.• SpreadSheet Space provides a Publish/Subscribe service

which eliminates polling.• The load to support interaction is transfered from the

Information Systems to the SpreadSheet Space Platform.

Scalable Information System Access (2/2)Native Excel Functionality

27

With SpreadSheetSpace

SELECT * FROM Table1SELECT * FROM

Table1SELECT * FROM

Table1

SELECT * FROM Table1

View 1View 1View 1

SpreadSheet Space

Some Users may expose personalized views of corporate data to other end users.28

SYNC + DSS

Spreadsheet Ecosystems

Combining Information System Access and direct Excel links

29

Manual vs. Automatic Spreadsheet Update•Manual Update • The Target Users • are requested to confirm acceptance of View updates, and • can scan the update history.

• Automatic Update• All the target users are “in sync” with the exposed Views• Data Integrity (no different data versions) is guaranteed.

30

Easy Publication of Tabular Contents and of Graphical Presentations• Very important for Open Data• Although Excel already offers functionalities to publish data,

yet a certain degree of experience on publishing is necessary.• The SpreadSheet Space Platform turns out to be a easy to

use Content Management System for tabular data and graphical presentations.• A TabularData/Presentation repository enables the

development and the diffusion of Data Marketplaces in the SpreadSheet Space.

31

Outline

1. Introduction2. Spreadsheet Composition 3. The SpreadSheet Space Software Platform4. Information System Access from the SpreadSheet Space5. Discussion6. Summary

32

Summary

• The SpreadSheet Space is a space in which the Excel files connected to each other and/or connected to external Information Systems can live.• Spreadsheet Composition is a special case of Service

Composition.• Spreadsheet Composition was developed in two directions,

namely Excel to Excel interconnection and Excel to Information System interconnection.• Special features: Composite Spreadsheets, Linking vs Sharing,

Dynamic Data, Ease of Publication.

33

Email: [email protected]: @steccami / @sss_excel

We are looking for early adopters

www.spreadsheetspace.net