[ieee 2009 international conference on advances in social network analysis and mining (asonam) -...
TRANSCRIPT
Developing Compelling Social-Enabled Applicationswith Context-based Social Interaction Analysis
Ryan Skraba, Mathieu Beauvais, Johann Stan, Abderrahmane Maaradji and Johann Daigremont
Alcatel-Lucent Bell Labs FranceCentre de Villarceaux, Route de Villejust
94160 Nozay, [email protected]
Abstract—We present in this paper a new approach forconstructing implicit social networks from electronic sources,like emails, SMS and phone calls. The main novelty is theuse of Social Interaction Analysis to assist end-users in theircommunication needs. We discuss how a social proximity canbe calculated between two persons in a social network andshow how a weighted, directed network is constructed basedon interactions between people. After a description of thearchitecture of the framework, we show how a contextual,weighted social network can help in automatically finding thecontact with highest probability to know the whereabouts of aperson who is currently not reachable. The implemented socialhelper application uses a specific contextual view of the socialnetwork, exploiting only interactions that occurred in the last48 hours.
Keywords-Social Network Analysis, Social Proximity, Con-text, Interactions, Communications
I. INTRODUCTION
With the modern growth of online social networking sites,
there has been a corresponding explosion in social net-
working applications. For the most part, these applications
are simple and fun, and propagate among users’ friends.
However, with the wide deployment of mobiles in developed
countries, there is an opportunity to develop social network-
ing applications that truly act on the users’ social networks
to enable efficient social communications. Our approach for
constructing an end-to-end Social Interaction Analysis (SIA)
system involves three stages: collection from different user
devices and identities, ongoing analysis in the network and
a semantic user profile database for applications.
Social Interaction Analysis (SIA) attempts to construct
and qualify the users’ social networks by examining the
implicit interactions between users instead of the explicit
declarations of “friendship” or community. Users can have
a contextual view of their interactions (Which of my inter-
actions are related to cinema? Who is the person I called
most often in the last 48 hours?), serving as a back-end for
many innovative services. By processing social interactions
and by logging the content and context of interactions, a
matrix of interpersonal communications can be constructed.
We present the architecture of our framework and show
how context-based filtering can be applied in the spe-
cific scenario of an “emergency call” service, a contact
recommendation scenario. Social Interaction Analysis and
semantic user profiling can be combined in a framework that
allows efficiently answering similar scenarios. We present
the design of an end-to-end system that enables innovative
social applications by automatically constructing context-
based social graphs from end-users’ daily communications.
The remainder of this paper is organized as follows: in
Section 2, we detail our proposed SIA framework. Section
3 presents preliminary results to illustrate the interest of our
approach. In Section 4, we demonstrate the SIA concept in a
case study called Social Helper in which we show a potential
real usage of our framework using a mobile social network.
In Section 5, we discuss difficulties in tuning parameters
to experimentally validate our system. Section 6 discusses
related work. We conclude and discuss work in Section 7.
II. SOCIAL INTERACTION ANALYSIS FRAMEWORK
Many types of social interactions are already performed
in a manner that can be captured by a SIA framework.
Direct communications between actors, such as email, text
messages (SMS), and phone calls can already be logged
by the user, in the network or on their devices. These
direct interactions are typically between two actors that can
be identified by the endpoints of the communication. An
interaction can be considered direct even if it is mediated by
a communication object or technology. For example, face-to-
face interactions can be inferred using Bluetooth technology
to pinpoint users in a location. An indirect interaction can
be between a person and an object, but have the same
targeted communication intention as a direct interaction,
such as leaving a voice message when a call cannot be
completed. A social proximity can also be attached to the
communication object to profile how two users prefer to
communicate, which has implications on their calculated
social proximity. Interactions via objects can define an
indirect social relationship between actors without any direct
communications between them by implying a shared interest
or community, such as by exchanging media (downloading
a photo set that another user uploaded) or by participating
in the same internet forums. A social proximity can also be
2009 Advances in Social Network Analysis and Mining
978-0-7695-3689-7/09 $25.00 © 2009 IEEE
DOI 10.1109/ASONAM.2009.7
206
2009 Advances in Social Network Analysis and Mining
978-0-7695-3689-7/09 $25.00 © 2009 IEEE
DOI 10.1109/ASONAM.2009.7
206
inferred indirectly from direct social interactions between
multiple recipients, such as when two people are always
included together as recipients of an interaction despite never
communicating directly (forums, mailing lists).
In order to create a weighted, directed edge between
two actors in a social graph, a number of variables are
taken into account. The number of interactions between
each user (and for each context) is counted separately for
each type of interaction, and the type of interaction can
have a different subjective impact for a given user. For
example, for some users, email is used almost exclusively
for professional relationships and SMS are considered more
intimate and immediate for personal use. There are no fixed
rules for every user, so the values for social proximity are
normalized independently for each type of interaction, and
then weighted according to the individual’s usage patterns to
calculate a global social proximity between any two actors.
In order to validate the usefulness of this approach, the
experimental framework implements a system for context-
based Social Interaction Analysis. Figure 1 shows an ar-
chitecture for an end-to-end system that consumes social
interaction data from the end user (collection), handles and
processes it (analysis) and provides a context-based view
of a social graph to innovative social applications (presenta-
tion). The architecture is flexible; future types of interactions
and analysis functions can be integrated dynamically.
A. Collection
Many diverse sources exist for social interaction data,
and the social interaction collection layer is responsible
for observing, reporting and storing them to the system as
required.
One way to acquire data is to run a collector in the
background on the users’ end terminal. In this use case,
when an SMS is sent or received, a log is kept on the
terminal and periodically sent to the collector. In the initial
implementation, the log format is a simple XML construc-
tion containing all of the attributes belonging to an SMS
and is transmitted using the HTTP protocol. In order to
be successful, the user needs to see a benefit to providing
this information, and they have the opportunity to configure
the information that is sent to the collection layer from that
device.
Another method is to acquire data using components on
the network side, without any local device installation for
each end user, and in certain cases, without requiring their
permission. A responsible privacy policy would ensure that
the end user is aware of how this information is collected and
used in social applications, preferably by having them opt in
to the system. Like SMS interactions, email interactions are
reported to the collection layer using an XML format, and
some attributes are common to both types such as: sender
and receiver identifiers, length, timestamp and potentially
the complete text contents. Email interactions, however,
Figure 1. System Architecture
have a richer set of header information and can be shared
among multiple recipients, with different recipient classes
(“To:”, “Cc:” and “Bcc:”). This information is summarized
and captured at the collection layer, and stored in a large
relational database.
B. Analysis
The social interaction analysis layer acquires informa-
tion from the intermediary database provided by the social
interaction collection layer, and uses it to compute social
closeness, or social proximities, between end-users. Given
the ephemeral and complex nature of human relationships,
it is difficult to define exactly what “social closeness” means.
However, a useful model can be created by assuming certain
properties. First of all, social relationships are certainly not
equal or interchangeable. Unlike modern social networking
sites, declaring someone a “friend” doesn’t put them in an
unordered list at the same rank as every other “friend”.
Likewise, it is extremely difficult to order the list; some
“friends” inspire affection but not confidence, while others
have your respect although you don’t share important views.
Finally, relationships are not symmetrical or reciprocal; the
regard you hold for another is not necessarily the regard they
hold for you.
Given these basic observations, we can assume that if
two actors are nodes in a graph, their social proximity is
a directed, weighted value between them, and that different
social proximities can exist between two actors for different
contexts. The analysis layer deduces these social proximities
207207
form the basic attributes of social interactions from the
collection layer, refining them with any content or context
that is available. For example, given the SMS and email
interaction attributes, the system can calculate the frequency
of contact between two people, the direction of contact and
potentially keywords or topics (from the content, or subject
in the case of email). The user can also have reported
some contextual information such as when and where the
interaction took place, defined as either the GPS or cell id
coordinates of the interaction, or a looser definition such
as the sphere - the users can be in the home or work
environment, as declared explicitly or deduced by the time,
the terminal (professional or personal), or the location.
Email interactions have more information about the shape
of the conversation, such as threads of replies and forwards
between multiple recipients. A long thread of back and
forth exchanges between two people implies more intimacy
than one message broadcast to a large number of people
without any response. Likewise, between any two actors,
the system can calculate who takes the initiative in creating
a conversation, or the probability of responding (qualities
which have been experimentally named “sendiness” and
“replyness”). In fact, given the nature of email, a deduction
can be made about whether two actors belong to some social
group if they are consistently included as recipients together
in email messages, even if they never communicate directly
(a quality which has been investigated under the name of
”groupiness” between actors). These qualities are calculated
using the interactions from the collection layer, and weighted
according to the usage patterns of each individual end user,
and a weighted, directed edge is computed between the two
nodes representing two end-users.
Once a social graph has been automatically constructed
between actors, social network analysis techniques can be
applied to identify clusters/cliques of socially related users,
to identify key users/hubs/bridges for a context or a topic,
or to analyze graph topological attributes over time.
C. Social interaction presentation and API
The presentation layer stores the computed social prox-
imity graphs in a fluid, accessible way for social applica-
tions. By using semantic web technologies with computer-
understandable vocabularies, a wide range of social appli-
cations can be targeted. The current implementation uses
a format based on the FOAF/RDF representation of social
data stored in a semantic social user profile database, and
provides access through SPARQL semantic queries as well
as a lighter web service interface.
III. PRELIMINARY RESULTS
An implementation of the social interaction system was
used to investigate how a model of a social network is
constructed from interactions between actors, and to demon-
strate an example social application that takes advantage of
Figure 2. SIA constructed social graphs.
the resulting graph to provide an advanced social commu-
nication service.
A. Automatically constructed social graphs
As the social interaction collection layer acquires infor-
mation from the actors, it connects those who have had
interactions in the past. To test the automatic construction of
a social graph from historical email logs, the email collection
tools were run on volunteers’ professional email accounts
(from emails collected over a period from six months to
four years). These data sets have the benefits of involving
actors for the most part from a closed community (work
colleagues) and interactions that have a degree of content
continuity (ongoing projects).
The constructed graph linked two actors if an email was
exchanged between them. The social graph could be filtered
by running the collection tools on a subset of the exchanged
messages, based on keyword matching in the subject line.
Figure 2 shows two social graphs constructed in this manner,
and demonstrate that standard social network analysis tech-
niques can be applied to these graphs to determine: clusters,
key players, actor centrality, etc. [1] In both social graphs,
some links are greyed out based on a measure of centrality
of their nodes, identifying related clusters of users as a
consequence. Since email interactions contain recipients that
are not part of the experiment, these results show that it is
possible to make intelligent inferences about a larger social
community, even when only a subset of actors participate.
B. Social proximity in a directed graph
When automatically constructing the social graph, the
system weights each link with a value representing the social
strength or proximity between two actors. In general, the
more that two actors communicate, the higher the social
proximity between them; the simplest computation is to
count the number of interactions that involve the two actors.
A more refined computation uses the attributes that can
be collected for different types of social interactions. For
example, an email sent to a community is not as personal
as an email sent to a single recipient. The social proximity
208208
that one email adds to a link is inversely proportional to the
number of recipients.
The conversational aspect of the interactions is also rel-
evant. Consider an actor who composes ten unanswered
emails to another actor versus a back-and-forth conversation
of ten emails. Although the number of interactions between
the two actors is the same, only the second scenario shows
an engagement between the two.
C. Context-based social interaction
Further depth to the links between actors can be extracted
from the context of the interaction (and the content, where
available). For example, emails between two actors using
email identifiers from the same business domain, that occur
during work hours, that concern typically professional sub-
jects and are sent from a work location very strongly suggest
that the two actors are professional colleagues. Thus, content
and context (such as time, location, content of messages,
etc.) can be a useful characterizer for different types of social
proximity.
Other studies have shown that observing interactions over
a long period don’t always result in constructing a social
graph with immediate relevancy [8]. Therefore, the Social
Interaction Analysis system looks at two types of social
proximity based on the interaction period: a general “all-
time” social proximity and “last 48 hours” social proxim-
ity that only considers recent interactions. The “all-time”
proximity is useful for observing long-term relationships and
long-term usage patterns of an actor, and social applications
can take advantage of the “last 48 hours” proximity for
immediacy.
IV. CASE STUDY: SOCIAL HELPER
The purpose of the Social Interaction Analysis system is
to enable innovative social applications, initially targeting
a social helper mobile application running on end-users’
devices. The Social Helper finds a social contact in an
emergency context.
The use case is as follows: Bob and George are youths
organizing a birthday party for a mutual friend. Although
they have never communicated in the past, George becomes
Bob’s best social contact for the “last 48 hours” social
proximity as they interact via SMS, email and telephone
calls. At one point, Bob’s mother, Jena, is unable to get
in touch with him on his phone for a family emergency,
even through their mutual contacts. She then launches the
Social Helper on her phone, and is able to discover George
as Bob’s best “last 48 hours” social link thanks to their recent
communications about the birthday party, despite not having
George in her own address book.
This use case was implemented with a Social Interaction
Analysis server running in the network, with collection
components deployed on end-user’s terminals. In the case of
Bob and George, this application is monitoring their SMS
Figure 3. SIA Viewer.Number represent the number of exchanged calls.
and telephone call usage and reporting the interactions to
the collection layer. As well, a collector in the network
is reporting their email interactions to the collection layer.
The analysis layer incorporates all new interactions into the
calculation of an overall social proximity between the actors
of the use case, and a “last 48 hour” rolling window.
Figure 3 shows the weighted, directed social graph con-
structed in the use case, centered on Bob. This view is
only available to the mobile operator; individual actors are
only allowed to see their own social links. Both Bob and
George have used the Social Helper to set their mothers as
emergency contacts, and this information has been stored in
the semantic social user profile database. Therefore, via the
Social Helper, Bob has given his mother, Jena, permission
to see his social links, so she is eventually able to find
George and his contact information. Figure 4 shows Jena’s
Social Helper when George has become Bob’s new best
contact. She remains unable to see farther into the social
network to George’s social links, but she is able to place
an emergency call to the person he has configured as his
emergency contact.
The Social Helper mobile application is an example of
an end-to-end social application that can take advantage of
the Social Interaction Analysis system to provide a new
communication service to the end user.
V. DISCUSSION
One of the most important criticisms of the proposed
Social Interaction Analysis system is that though the compu-
tations for social proximity appear useful, they are difficult
to validate. In order to experimentally confirm the results,
the system needs to be run against a large set of real users,
and obtain feedback about whether the calculated social
proximity corresponds to the expectations and assumptions
of the end-users. This could either be obtained by performing
209209
Figure 4. Social helper mobile application.
a user survey on the social applications delivered to the
system, or incorporated into the collection stage by soliciting
ongoing user feedback. Likewise, the value of “last 48
hours” as a measure of relevant social proximity was chosen
as a plausible value; an experiment on a set of live social
interactions will confirm an appropriate value.
The studies that calculate quantitative statistics on email
database logs show that “sendiness” (i.e. given the past
interactions, the probability that one user will compose
or reply to another) is sufficient to pick the top social
contact for a user. However, it is unknown how the other
calculated values can be used to sort the remaining social
contacts in an intelligent and useful order. Another difficulty
in collecting historical email interactions is that end users
have already pre-processed their emails over time, deleting
unimportant or trivial messages. It is unclear how the results
are affected, given that the retained emails are more likely
to be important, or more precise results can be obtained by
considering every email on arrival.
Further consideration needs to be given to the subjective
impact of the type of the interactions for a given user. An
email can reflect a professional relationship for some users,
while an SMS can be considered something more intimate
and immediate for others, but there is no fixed rule for
everyone. When calculating a global social proximity, there
should be a justifiable way to weigh the contribution of the
interaction type based on global or individual actor patterns.
Another important aspect to raise is scalability. An net-
work operator can handle between 2000 to 5000 calls
per second in a network of a quarter of a billion unique
telephone numbers [2]. The storage capacity management
through incremental storage and the duration and periodicity
of analysis shall be addressed with the right architectural
design. Finally, the experimental framework for Social In-
teraction Analysis was designed to be very flexible to permit
adding new functions to its three layers. In order to remain
viable in a real-life deployment (in a mobile network, for
example), the system will require tuning for specifically tar-
geted social applications and services. The social interaction
user manager is a component that exists outside of the three
layers, and provides a provisioning interface to the mobile
operator. The mobile operator can determine which users are
of interest for social interaction collection, and can use the
interface to provide further identity information from their
subscriber databases, for example correlating multiple phone
numbers and/or email addresses to a single user identity.
The mobile operator can also use the social interaction
user manager to limit the significant computation resources
required by the social interaction analysis layer. For exam-
ple, one user may have subscribed to an enterprise appli-
cation that calculates social proximity in the work sphere,
while another user uses an application that only consid-
ers social applications with certain keyword content. The
necessary resources can be reduced by only performing the
calculations required for mapping social proximity data to
the selected vocabulary in the presentation layer. In addition,
by provisioning the interesting users and required calcula-
tions via the social interaction user manager, the analysis and
presentation layers of the system can be distributed across
the network, where different machines are responsible for
subsets of users and/or specific context-based calculations.
VI. RELATED WORK
The social-networking applications are based for the most
part on a constructed social graph. There are several ways
to build a social graph. The most used method is based on
web declarative mode (explicit declaration of relationships).
Users of Social Networking Sites fill in their profiles by
inviting contacts. In addition to that, aggregation and incen-
tive mechanisms are used.
A second method for social graph construction is the
analysis of the content mostly known as web crawling
(implicit declaration of relationships). The targeted contents
can be simple web pages, scientific publications or email
exchanges [3],[4]. The analysis can be applied to one or
more content at a time. In [5], authors present an end-to-
end system to mining and building a social network based
on exchanged email and web crawling. [6] proposes a new
approach to study the dynamics of key players in social
networks from an experiment using 57,158 email exchanged
during 113 days in a large university. The Flink system [7]
extracts knowledge about social relationships from email,
web pages and publications, performs several computations
on the data and consolidates what is learned using a common
RDF vocabulary based on the FOAF user profile ontology
[8]. A very complete description of this system, with a
special focus on the application of semantic technologies
for social network analysis is given in [9].
The third method is the collection and the analysis of
interactions based on telecommunications means such as
voice calls, SMS, MMS, instant messaging, etc. In [1]the
authors have constructed a social graph of 3.9 million nodes
210210
based on telephone call logs. This graph allowed calculating
and identifying the properties of large scale and weighted
social graph.
In our case, we combine several interaction sources from
the telecommunication world (SMS and phone calls) and
Internet communications (notably email) to build a social
network that is closest to the real social network, and an
enabler for compelling applications. Moreover, we build sev-
eral social networks depending on a considered context. Our
case study: Social helper uses the last 48 hours interactions
to build a social network describing the relationships of the
last 48 hours.
VII. CONCLUSION AND FUTURE WORK
We have presented in this paper a complete framework
for building a comprehensive social network from a large
set of interaction sources. We show how this network
can be the entry point for compelling applications on a
mobile device, by using contextual filters on the network.
The Social Helper emergency call application addresses the
specific case of an emergency call: leveraging the social
proximities to find the closest person based on interaction
in the last 48 hours. Our framework is designed to deal
with almost any kind of use case that needs social network
information. This application uses a simple definition of
the social strength, primarily based on the number of email
or SMS messages received. Since social proximity may be
positive or negative depending on the interaction content and
context, we are currently working on the improvement of
the social strength calculation by taking into account those
features. Indexing interactions with content and context will
enable rich information retrieval through social data mining
based on semantic tagging rather than keyword matching to
identify content topics.
The long term goal is to enrich the set of social inter-
actions that the system can capture and analyze (from the
Internet or telecommunication networks) in order to build
a social network that closely models reality. The area of
media sharing is an example of a rich source of interactions
that can be exploited to determine social proximity between
actors [10]. More importantly, capturing proximity or other
indicators of face-to-face communications is an interesting
way to reach this goal since up to two thirds of social
interactions occur face-to-face [11] as opposed to digital
message exchanges.
ACKNOWLEDGMENTS
The authors would like to thank the following department
members for their contribution to this framework: Lionel
Natarianni, Denis Leclerc, Ronan Daniellou, Adrien Joly,
Linas Maknavicius and Hakim Hacid. This work is being
performed as part of a collaborative research project called
HERMES within the European Eureka cluster programme
CELTIC for telecommunications [10] and is partially funded
by French Ministry of Economy, Industry and Labor, DGCIS
Directorate.
REFERENCES
[1] J. Onnela, J. Saramaki, J. Hyvonen, G. Szabo, M. de Menezes,K. Kaski, A. Barabasi, and J. Kertesz, “Analysis of a large-scale weighted network of one-to-one human communica-tion,” New Journal of Physics, vol. 9, no. 6, p. 179, 2007.
[2] N. Easter, “What would you do with the telephone callnetwork of an entire country?” 2006. [Online]. Available:http://www.iq.harvard.edu/blog/
[3] A. L. Barabasi, Linked: The New Science of Networks.Perseus Publishing, 2002.
[4] A. L. Barabasi, H. Jeong, Z. Neda, E. Ravasz, A. Schubert,and T. Vicsek, “Evolution of the social network of scientificcollaborations,” vol. 311, no. 3-4. Physica A: StatisticalMechanics and its Applications, Aug 2002, pp. 590–614.
[5] A. Culotta, R. Bekkerman, and A. McCallum, “Extractingsocial networks and contact information from email and theweb.” in CEAS, 2004.
[6] Y. Matsuo, J. Mori, M. Hamasaki, K. Ishida, T. Nishimura,H. Takeda, K. Hasida, and M. Ishizuka, “Polyphonet: anadvanced social network extraction system from the web,” inWWW ’06: Proceedings of the 15th international conferenceon World Wide Web. New York, NY, USA: ACM Press,2006, pp. 397–406.
[7] P. Mika, “Flink: Semantic web technology for the extractionand analysis of social networks,” Web Semantics: Science,Services and Agents on the World Wide Web, vol. 3, no. 2-3,pp. 211–223, October 2005.
[8] D. Brickley and L. Miller, “The Friend Of A Friend (FOAF)vocabulary specification,” November 2007.
[9] P. Mika, T. Elfring, and P. L. M. Groenewegen, “Applicationof semantic technology for social network analysis in thesciences,” Scientometrics, vol. 68, no. 1, pp. 3–27, 2006.
[10] “Celtic eureka cluster programme, integrated telecom-munications systems, numeric referencing: Projectinformation.” 2009. [Online]. Available: http://www.celtic-initiative.org/Projects/HERMES/
[11] S. Farnham, S. U. Kelly, W. Portnoy, and J. L. Schwartz,“Wallop: Designing social software for co-located social net-works,” Hawaii International Conference on System Sciences,vol. 4, p. 40107a, 2004.
211211