learning from graphs counterfactual inference€¦ · counterfactual prediction–what would...

!

"#(ℎ&(Φ), * = ,&)

Φ

IPM0(123456, 12345&)ℎ6

# ℎ6 Φ , * = ,6…

…

…

ℎ&

789! = 0

789! = 1

Failing to adjust for the distributional difference between treatment groups maylead to biased predictions and policies. To overcome this problem, we proposedto learn representations of patients that are balanced [5, 6]—they retaininformation that is both predictive and similar across groups. This is done bytrading off predictive accuracy for distributional balance in representations.

We realize this idea as a deep neural network with imbalance regularization,achieving state-of-the-art result on counterfactual inference benchmarks [6].

Learning with geometric embeddings of graphs Counterfactual inference with deep learningGraphs are used to represent problems and data in diverse fields. In chemo-informatics, graphs represent atoms and bonds, and an important problem is topredict the toxicity of a compound.This is an instance of graph classification.

Most classification algorithms operate only on continuous vector valued data. Inmy research, I have shown how to leverage both the expressive power ofgraphs and the strength of standard machine learning tools by representinggraphs as geometric objects [2, 4] and developing theory and algorithms for theuse of these graph embeddings in classification [1, 3] and clustering [4].

Learning from graphs | Counterfactual inference

PhD advisor: Devdatt Dubhashi. CSE, Chalmers University of Technology

Fredrik D. Johansson, PhD Currently post-doc at MIT

Recent success stories in deep learning have made policy makers eager to applysuch tools for decision support in diverse areas like medicine, education andadvertising. Unfortunately, these developments have failed to address one of themost crucial aspects of policy making—causality: what effect will a choice ofmedication have on a patient? How will job training change unemploymentrates? Can tutoring improve the grades of high-school students?

Measuring similarity of graphsCounterfactual inference from observational data

OHO

O

O

H3C Toxic?

Classifying graphs using geometric embeddings

A graph is made up of nodes and edges and is inherently combinatorial innature. Traditionally, graphs are compared and classified by comparing sub-structures such as walks, triangles, paths, and subtrees with similarity functionscalled graph kernels, often counting the number of occurrences of such patterns.

Traditional graph kernels are inherently local, and fail to capture globalproperties of graphs. In my thesis, I developed graph kernels based ongeometric embeddings [2, 3], that were shown to capture global propertiessuch as girth and chromatic number. A geometric embedding of a graph is a setof vectors, one for each node, that capture the structure of the graph. Thesewere used to define two new families of kernels [2, 3] that achieved state-of-the-art results on several graph classification benchmarks.

a) Geometric embedding of graph [2]

Context𝑥

RepresentationΦ

Outcome errorloss(ℎ(Φ, 𝑡), 𝑦)

Treatment𝑡

Imbalancedist(𝑝1234, 𝑝1235)

Example of graph classification

Learning balanced representations for counterfactual inference

Publications

Publications

[1] Hermansson, L., Kerola, T., Johansson, F., Jethava, V. and Dubhashi, D., 2013. Entity disambiguation in anonymized graphs using graph kernels.In Proceedings of the 22nd international Conference on Information & Knowledge Management, CIKM 2013

[2] Johansson, F., Jethava, V., Dubhashi, D. and Bhattacharyya, C., 2014. Global graph kernels using geometric embeddings. In Proceedings of the31st International Conference on Machine Learning, ICML 2014

[3] Johansson, F.D. and Dubhashi, D., 2015. Learning with similarity functions on graphs using matchings of geometric embeddings. InProceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2015

[4] Johansson, F.D., Chattoraj, A., Bhattacharyya, C. and Dubhashi, D., 2015. Weighted theta functions and embeddings with applications to max-cut, clustering and summarization. In Advances in Neural Information Processing Systems, NIPS 2015

[5] Johansson, F.D., Shalit, U. and Sontag, D., 2016. Learning representations for counterfactual inference. In Proceedings of the 33rdInternational Conference on Machine Learning, ICML 2016

[6] U. Shalit, F.D. Johansson, D. Sontag. , 2017. Estimating individual treatment effect: generalization bounds and algorithms. In Proceedings ofthe 34th International Conference on Machine Learning, ICML 2017

Embedding & clustering of weighted graphsMany graph problems are represented by weighted graphs—graphs where edgeconnections vary in strength. In [4], we generalized a well-known geometricembedding to be applicable to weighted graphs and gave a clustering(community detection) algorithm operating on the resulting embedding whichautomatically finds the number of clusters, in contrast to traditional methods.

Shortest-path kernel Graphlet kernel Random walk kernela) Shortest-path kernel b) Graphlet kernel c) Random-walk kernel

b) Embed-and-match comparison of graphs [3]

!(1)

!(2)

!(1)

!(2)

Clustering of nodes in a weighted graph—community detection

Medication B“Treated”! = 1

Medication A“Control”! = 0

Age = 54Gender = Female

Race = Asian

Blood pressure = 150/95

WBC count = 6.8*109/L

Temperature = 36.7°C

Blood sugar = High

Anna

Sep 15

Blood sugar = ?%&

Blood sugar = ?%'

May 15

Estimating the effect a medical treatment had on an single individual requiresestimating what would have happened had she not received it. Doing so fromobservational (non-experimental) data is challenging as patients who received atreatment are often different from patients who did not receive it: for example,patients who receive a treatment might be younger on average.

Counterfactual prediction–what would Anna’s blood sugar be had we treated her differently?

!"

?Outcome,!

Age,#

Treatment groupControl group

What would the outcome have been for a treated patient had she not received treatment?

Learning balanced representations

Generalization bounds for counterfactual inferenceWe rigorously motivate our framework by developing the first learning boundson the error in predicting counterfactual outcomes based on observationaldata.They take the following form (simplified for readability).

Counterfactualerror ≤ Factualerror + 𝛼 ⋅ Treatmentgroupdistance

Neural network architecture for counterfactual prediction

Privacy Preserving Data CollectionHamid Ebadi ([email protected]) http://www.cse.chalmers.se/∼hamide/

Department of Computer Science and Engineering, Chalmers University of Technology

Privacy Matters

The General Data Protection Regu-lation (GDPR) aims to give controlover personal data back to citizens.

Problem

Privacy preserving data collectionand analysis.

Anonymisation Failures

Removing personally identifyingfields has been shown to be ineffec-tive.•AOL search data leak a

•The re-identification of governorWilliam Weld’s medicalinformation b

•Netflix settles privacy lawsuit,cancels prize sequel c

ahttps://en.wikipedia.org/wiki/AOL_search_data_leakbhttps://papers.ssrn.com/sol3/papers.cfm?abstract_id=2076397chttps://www.forbes.com/sites/firewall/2010/03/12/

netflix-settles-privacy-suit-cancels-netflix-prize-two-sequel/

Differential Privacy

A mathematical definition that protect the privacy of user data by producingsimilar answers as a result of an analysis regardless of the presence of anyindividual in the database.Plausible deniability: the ability to deny answers/data

Centralised Model

Local Model

Our Contributions

•“Proper”[1] framework for thecentralised settings.

•Requires trust on the aggregator!

•“preTpost”[2] framework for thelocal settings.

•Aggregator is not trusted•No pre-defined set of analyses.•Participants control their privacy.•Flexible policies

• Personalised• Event level differential privacy• Heterogeneous Privacy• RAPPOR Permanent Randomised Response

•Protects against side channels

Randomised Response

Have you ever cheated ontax declaration?

Differential privacy in thereal world

Demonstrator

•Running on embedded devices•Collect data on network traffic andconfiguration

Project Funding

Data-driven Secure Business Intelli-gence

[1] Differential Privacy: Now it’s Getting Personal,Hamid Ebadi, David Sands, Gerardo Schneider[2] PreTPost: A Local Differential PrivacyFramework, Hamid Ebadi, David Sands

mailto:[email protected]

http://www.cse.chalmers.se/~hamide/

PRIVACY POLICIES FOR SOCIAL NETWORKS

Raúl Pardo – www.cse.chalmers.se/~pardo

Gerardo Schneider – www.cse.chalmers.se/~gersch

The Problem: Empirical studies show that the privacy policies implemented by social networkssuch as Facebook, Twitter or Instagram do not meet users requirements. For example,

Our Approach: We have developed a framework where users define privacy policies for theirinformation, as opposed to the traditional approach where the person uploading the informationdecides the audience. Our framework defines an enforcement mechanism. The framework isbased on formal methods which allows us to mathematically prove that no policy will be violated.

Raúl Pardo & Gerardo Schneider. “A Formal Privacy Policy Framework for Social Networks”. InSEFM’14. Pages 378–392.

Raúl Pardo, Musard Balliu & Gerardo Schneider. “Formalising Privacy Policies in Social Networks”.In JLAMP’17. Pages 2352–2208.

Raúl Pardo, Christian Colombo, Gordon J. Pace & Gerardo Schneider. “An Automata-BasedApproach to Evolving Privacy Policies for Social Networks”. In RV’16. Pages 85–301.

Raúl Pardo, Ivana Kellyérová, César Sánchez, & Gerardo Schneider. “Specification of EvolvingPrivacy Policies for Online Social Networks”. In TIME’16. Pages 70–79.

Pablo Picazo-Sanchez, Raúl Pardo, & Gerardo Schneider. “Secure Photo Sharing in SocialNetworks”. In IFIP SEC’17. Pages 79–92.

This research has been supported by the Swedish funding agency SSF underthe grant Data Driven Secure Business Intelligence.

friends

followfriendRequest

Alice’s location is included in this post.

Since the posts was created by Bob he decides its audience. Alice cannot choose!

INTRODUCTION

REFERENCES

KNOWLEDGE-BASED PRIVACY POLICIES

We track the information that users know in their knowledge bases.

Knowledge bases also include implicit knowledge that users can infer from explicit information. Here Bob can infer Alice’s location because he has access to one of her pictures.

STATIC DYNAMIC DATA-DRIVENUsers can define static privacy policies whichdelimit who can know their information.

Using our formal language the policy is expressedas follows:

¬𝑆𝐴𝑔∖ 𝐴𝑙𝑖𝑐𝑒 𝑙𝑜𝑐𝑎𝑡𝑖𝑜𝑛(𝐴𝑙𝑖𝑐𝑒) 𝐴𝑙𝑖𝑐𝑒

The sub-index in the wrapper indicates that Alicedefined the policy. Literally the policy should beread as “it is not the case that someone in thegroup of everyone (Ag) minus Alice knows Alice’slocation.

Nobody can know

Users can define dynamic privacy policies thatdetermine when and how often other users areallowed to know their information.

Here the policy includes a super index to includethe recurrence

¬𝐾𝐵𝑜𝑏𝑝𝑖𝑐𝑡𝑢𝑟𝑒 (𝐴𝑙𝑖𝑐𝑒) 𝐴𝑙𝑖𝑐𝑒[𝑤𝑒𝑒𝑘𝑒𝑛𝑑𝑠]

we have developed policy automata which canbe used to automatically generate enforcementmonitors

Bob cannot know during the weekend

Bob can know Alice’s

picture

Bob cannot know Alice’s

picture

Saturday at 00:00

Monday at 00:00

We use Attribute-Based Encryption (ABE) tomerge policy and data.

In this way, we guarantee that information alwaystravels with the corresponding policy among thenodes of the distributed data-storage system ofsocial networks.

Nobody can know

𝑁𝑜𝑑𝑒0

𝑁𝑜𝑑𝑒1

𝑁𝑜𝑑𝑒2

𝑁𝑜𝑑𝑒3

ACKNOWLEDGMENTS

,

Malware on Browser ExtensionsPablo Picazo-Sanchez1

1Dept. of Computer Science and Engineering. Chalmers | Gothenburg University. Gothenburg. Sweden.

,

Introduction

Web browsers manage and collect privacy information about theusers. When Alice surfs the web, the browser has access to her bankaccount, all keys she presses, the coordinates of her mouse, and allpersonal information Alice introduces. The amount of sensitive datathat web browsers manage is invaluable, so arguing for secure webbrowsers is no-brainer.

Browser extensions are usually coded in web languages such asHTML, CSS or JavaScript and can include as many resources asneeded (e.g., images, fonts, JSON, databases, etc.).

��

��

��

��

��

��

Figure 1: Browser Extension Architecture

Extensions can access a huge amount of sensitive information duetwo main factors: they can be seen as part of the browser (knownas background pages), and they can have access to the web contentby introducing JavaScript files which are executed as part of theHTML (known as content scripts).

Login

*************

John Doe

Access your bank account

1

Extension A 2 Extension B

3

External Server

A Bhttps://bank.exampleSecure

Figure 2: Collusion attack among browser extensions

Figure 2 depicts an example of collusion attack where the ExtensionA can access to the user’s bank information whereas the ExtensionB was not suppose to get this sensitive information. However, inthis example Extension A shares the private information with theExtension B without the user’s knowledge and the information issent to an external server.

Very Malicious

0 1000 2000 3000 4000 5000 6000

Malicious

Suspicious

Unusual

3

128

396

5280

Figure 3: Classification of domains used by browser extensions

We have detected that there are more than 500 different browserextensions which might send sensitive information to (potentially)insecure external servers.

Proposed Solution

Find extensions

Extract domains Validate domains

Save paths toa JSON �le

Extract extensioninformation

Save domains andextension

Query all domainsto Recorded Future

Sta

tic

anal

ysis

Pre

par

atio

n

Figure 4: Architecture of the proposed solution

Our focus is to solve the problem of privacy leakages to external servers using browser exten-sions. On the one hand, content scripts can send information by using the XMLHTTPRe-quest method of JavaScript. On the other hand, apart from using the XMLHTTPRequestmethod, background pages can also use the Chrome API for the same purpose. Our aim isto detect and mitigate such cases, by introducing a policy system where the user can sether own privacy settings and thus avoid sharing sensitive information with external servers.

Figure 5: Recorded Future dashboard Figure 6: Recorded Future extended information

For the domain analysis, we used Recorded Future API to get as much information aboutthe domains as possible and also to check whether the IPs are considered malicious ornot. Figures 5 and 6 depict an example of the Recorded Future’s API when we looked for2.5.29.37 address.

Figure 7: Overall Statistics about sintactic analysis

References

A Runtime Monitoring System to Secure Browser Extensions. R. Pardo, P. Picazo-Sanchez,G. Schneider, J. Tapiador. Security Principles and Trust Hotspot 2017

Acknowledgement

This research has been supported by the Swedish funding agency SSF under the grantData Driven Secure Business Intelligence.

November 9, 2017 [email protected]

0.030 0.035 0.040 0.045 0.050 0.055 0.060 0.065 0.070 0.075MSE

0

5

10

15

20

CVR

LΣL1

Covariance-variance-ratio (CVR) vs mean

squared error (MSE) for a convolutional

autoencoder as covariance regularization

is increased on CIFAR-10.

From documents to characters:neural approaches to language

Olof Mogren

http://mogren.one/publications

... ...

Demo relation RNN encoder 2

... ...

Query RNN encoder

FC merge

FC relation

Relation class

Attention

... ...

Decoder RNN

... ...

Demo relation RNN encoder 1

Learning to inflect words based on analogies

• Open vocabulary morphological relational reasoning with analogies:

see is to sees as eat is to what?

• Predict a target word form of a given query word wquery (above: eat)

• The target form is specified using a demo relation (a pair of word forms) wdemo-source, wdemo-target

(above: see, sees)

• Our solution

• A character-level recurrent neural network (RNN) trained to predict the missing form of wquery

• Two encoder RNNs to embed the demo relation wdemo-source, wdemo-target

• One encoder RNN for the query word wquery

• One decoder RNN to generate the target word form

• Does not rely on explicit enumeration of morphological features

Results

• Exact form accuracy for English: 94.7%

• Exact form accuracy for Swedish: 89.3%

(Swedish has a more complex morphology with more inflectional patterns for nouns and verbs).

• Training on both English and Swedish at the same time improves the performance on

Swedish data.

EvaluationTraining English Swedish Both

One language 94.7% 88.3% N/A

Both languages 90.6% 89.3% 89.9%

Extractive multi-document summarization is the process of selecting the most representativesentences in a collection of documents. Existing methods use different approaches to balancethe coverage with the amount of redundancy of the summary, most of them relying heavilyon measuring the similarity of two sentences.

Sentence similarity for extractive multi-document summarization

Our system improves over the original submodular optimization-based system on all ROUGEvariants on DUC 04.

ROUGE-1 ROUGE-2 ROUGE-SU40.00%1.00%2.00%3.00%4.00%5.00%6.00%7.00%

Relative improvement

Results

Contribution

• We present new ways of measuring the similarity between sentences, based on sentiment

analysis and neural word embeddings.

• We show that combining these together with similarity measures from existing methods,

helps to create better summaries. The combination is done by multiplying the scores, an idea

from kernel methods.

The findings are demonstrated with MULTSUM, a novel summarization method that uses

Submodular optimization (Lin & Bilmes 2011) to produce summaries. Our method improves over

the state-of-the-art on standard datasets; it is also fast and scale to large document collections.

Joint work with Richard Johansson. Joint work with Mikael Kågebäck, Nina Tahmasebi, and Devdatt Dubhashi.

Joint work with Simon Almgren and Sean Pavlov.

Extracting medical terms from Swedish health records

• Tag each occurrence of medical terms, such as (1) disorders and findings, (2) pharmaceutical

drugs, and (3) body structure in Swedish medical health records.

• Character-based deep bidirectional LSTM model:

• Open-vocabulary

• Requires no tokenization

• Evaluation on Stockholm EPR corpus.

Category P R F1

Disorder & f ndings 0.72 0.18 0.29Pharmaceutical drugs 0.69 0.43 0.53Body structure 0.46 0.28 0.35

Total 0.67 0.24 0.35

Results

Disentanglement by penalizing correlation

• Deep neural networks automatically learn representations of data in levels of abstraction,

disentangling data.

• We propose a novel regularization method that penalize covariance between dimensions

of the hidden layers in a network.

• Learns linearly independent (uncorrelated) features, improves disentanglement.

• No decrease in performance.

CVR UD90% MSE

LΣ 6.56 35.18 0.0398L1 4.03 20.59 0.0569W/O 20.00 41.69 0.0365

Covariance-variance-ratio (CVR), utilized

dimensions (UD) for 90% of the variability,

and mean squared error (MSE) for a

convolutional autoencoder with covariance

regularization on CIFAR-10.

Joint work with Mikael Kågebäck.

1. Character-based recurrent neural networks for morphological relational reasoning, SCLeM 2017

Olof Mogren, Richard Johansson

2. Extractive Summarization using Continuous Vector Space Models, CVSC at EMNLP 2014

Mikael Kågebäck, Olof Mogren, Nina Tahmasebi, Devdatt Dubhashi

3. Extractive Summarization by Aggregating Multiple Similarities, RANLP 2015

Olof Mogren, Mikael Kågebäck, Devdatt Dubhashi

4. Named Entity Recognition in Swedish Health Records with Character-Based Deep Bidirectional

LSTMs, BioTxtM at COLING 2016, Simon Almgren, Sean Pavlov, Olof Mogren

5. Disentanglement by penalizing correlation, in submission 2017

Mikael Kågebäck, Olof Mogren

Nyttiggörande

Machine Intelligence Sweden [email protected]

Visiting address:Teatergatan 19411 35 Göteborg

ProblemAccess to world-leading competence is paramount in today's competitive knowledge economy and a prerequisite for innovation. Collaboration between academia and industry means opportunities for both parties and is something that is generally encouraged through various initiatives. Nevertheless, 92% of Sweden's largest export companies consider that industry-academia collaborations must increase (Novus 2011).

We too!

Company A

Compa

ny B

Strategic discussion about the near future state of augmented reality in automotive industry

Discussion about materials and tech for windscreen displays

5%

30%

We have a problem with

cabin sound. Suggestions?

Sound

simula

tion

MaterialDesign

2 month active safety

prototype project

University X

University Y University Z

SolutionBy developing an innovative service (ScienceRouter), based on research results, we want to digitalize, and thus scale up, Swedish companies' knowledge networks, thereby strengthening their international competitiveness and innovation capabilities. We aim to achieve this by developing an international, internet-based and innovative competence matching platform for researchers and companies, with a powerful automatic matchmaking system that has a deep understanding of individual's knowledge portfolio. In addition to this we will develop effective administrative processes for collaborations in order to stimulate more collaborations and reduce the friction of knowledge transfer from academia into society. The service thus serves to make researchers more effective in reaching out with their research to companies as well as for companies to move their innovation processes closer to the research boundary.

● The platform will be in a closed beta testing starting in March 2018. Please let us know if you want to be in the evaluation group!

● We deeply appreciate any feedback that you might have! Give us a call or send us your feedback via email: [email protected]

learning from graphs counterfactual inference€¦ · counterfactual prediction–what would...

Documents