advances in fca-based applications for social networks...

17
Advances in FCA-based applications for Social Networks Analysis Marie-Aude Aufaure Ecole Centrale Paris, France Bénédicte Le Grand Université Paris 1 Panthéon-Sorbonne, France ABSTRACT Concept lattices have been widely used for various purposes in many different applications since the 1980s. Recent applications of Formal Concept Analysis include extensions of traditional FCA applications such as data and text mining, machine learning and knowledge management. Progress has also recently been made in software engineering, Semantic Web and databases. New applications have also emerged in the fields of healthcare, ecology, biology, agronomy, business and social networks. This article presents example of successful applications of FCA for Social Networks Analysis. We show the benefit of FCA solutions, as well as their combination with semantics and topology-based approaches. We conclude by presenting FCA-based visualization solutions and open challenges for FCA in the context of large and dynamic data. Keywords: Formal Concept Analysis applications, Social Networks Analysis. 1. INTRODUCTION Since the development of Birkhoff’s lattice theory in 1940 and the birth of Formal Concept Analysis in the 1980s, concept lattices have been widely used for various purposes in many different applications. Recent applications of Formal Concept Analysis include extensions of traditional FCA applications such as data and text mining, machine learning and knowledge management. Progress has also recently been made in software engineering, Semantic Web and databases. New applications have also emerged in the fields of healthcare, ecology, biology, agronomy, business and social networks. This article reviews recent examples of FCA applications for Social Networks Analysis (SNA). We show that FCA is highly relevant to this field, through various use cases such as buzz monitoring on twitter and identification of leaders (or isolated members) in various online social networks. Our results demonstrate that conceptual metrics derived from FCA are complementary to traditional centrality-based SNA metrics. In addition, FCA is a great tool to identify (overlapping) social communities, as shown in Section 2. Moreover, the expressiveness of Galois lattices may easily be combined to any kinds of semantic representations, such as ontologies; the benefit of these 2 powerful approaches leads to very interesting results, as illustrated in Section 3. The identification of users communities in social networks generally relies on metrics based on the underlying social graphs topology; we present in Section 4 another promising approach,

Upload: dangdieu

Post on 11-Jul-2018

214 views

Category:

Documents


0 download

TRANSCRIPT

Advances in FCA-based applications for Social Networks Analysis

 

Marie-Aude Aufaure Ecole Centrale Paris, France

Bénédicte Le Grand Université Paris 1 Panthéon-Sorbonne, France

     ABSTRACT Concept lattices have been widely used for various purposes in many different applications since the 1980s. Recent applications of Formal Concept Analysis include extensions of traditional FCA applications such as data and text mining, machine learning and knowledge management. Progress has also recently been made in software engineering, Semantic Web and databases. New applications have also emerged in the fields of healthcare, ecology, biology, agronomy, business and social networks. This article presents example of successful applications of FCA for Social Networks Analysis. We show the benefit of FCA solutions, as well as their combination with semantics and topology-based approaches. We conclude by presenting FCA-based visualization solutions and open challenges for FCA in the context of large and dynamic data. Keywords: Formal Concept Analysis applications, Social Networks Analysis. 1. INTRODUCTION Since the development of Birkhoff’s lattice theory in 1940 and the birth of Formal Concept Analysis in the 1980s, concept lattices have been widely used for various purposes in many different applications. Recent applications of Formal Concept Analysis include extensions of traditional FCA applications such as data and text mining, machine learning and knowledge management. Progress has also recently been made in software engineering, Semantic Web and databases. New applications have also emerged in the fields of healthcare, ecology, biology, agronomy, business and social networks.

This article reviews recent examples of FCA applications for Social Networks Analysis (SNA). We show that FCA is highly relevant to this field, through various use cases such as buzz monitoring on twitter and identification of leaders (or isolated members) in various online social networks. Our results demonstrate that conceptual metrics derived from FCA are complementary to traditional centrality-based SNA metrics. In addition, FCA is a great tool to identify (overlapping) social communities, as shown in Section 2. Moreover, the expressiveness of Galois lattices may easily be combined to any kinds of semantic representations, such as ontologies; the benefit of these 2 powerful approaches leads to very interesting results, as illustrated in Section 3. The identification of users communities in social networks generally relies on metrics based on the underlying social graphs topology; we present in Section 4 another promising approach,

which consists in combining FCA and the traditional topology-based techniques of Social Networks Analysis. Indeed, the solutions coming from those both worlds are complementary –and not contradictory as one might think. Finally, we conclude a FCA-based Visual Analytics tool and present open challenges Formal Concept Analysis to cope with large and dynamic data. 2. APPLICATIONS OF FCA FOR SOCIAL NETWORKS ANALYSIS 2.1. FCA and buzz monitoring on Twitter Formal Concept Analysis has been applied to buzz monitoring, opinion analysis and e-reputation observation on Twitter, in a tool called EVARIST (Cuvelier and Aufaure, 2011). Like other large online social networks, information is broadcast in Twitter in an almost instantaneous way. EVARIST aims at measuring the popularity of a given tweet, and its evolution over time.

One way to study this popularity could be to consider all retweets1 generated by the initial tweet. However, the content of a retweet is not necessarily identical to the reference tweet. Figure 1 shows an example of this phenomenon, which we call polymorphism: we see that the initial tweet (tweet N.0) has been retweeted in many ways. Indeed, we may see a first group of retweets (N.1.1 and N.1.2) in which the initial tweet is unchanged, and a second group of retweets (N.2.1 and N.2.2), with slight changes. Furthermore, one of the retweets from this second group is itself a retweet’s retweet N.2.2.1). This polymorphism is a real problem when we want to measure the popularity of a piece of information. If we only consider unchanged retweets, we will neglect the whole set of modified retweets which, in spite of their modifications, may carry the same information. Figure 2 illustrates the shared versus specific content of the tweets from Figure 1, in which stop words and punctuations have been discarded (in our example: more, of, and, are, by, us).

 

Figure 1 Figure 2

Another way to address this issue could be to search for the most commonly used words. However, it would be more interesting to track the sets of jointly used words, as they contain the most frequently shared advice about the initially queried set of words. This is precisely what EVARIST does by using Formal Concept Analysis on Twitter; the various inclusions in Figure 2                                                                                                                          1  Retweet  is  used  on  the  Twitter  Web  site  to  show  you  are  tweeting  content  that  has  been  posted  by  another  user.  The  format  is  RT  @username  where  username  is  the  twitter  name  of  the  person  you  are  retweeting.  

allow us to identify the core of the carried information, i.e. the terms shared by all tweets. These inclusions also define a partial order which can be represented by a Hasse diagram as illustrated in Figure 3. The main idea of EVARIST is therefore to transform a set of tweets about a subject into such a Galois lattice. This lattice stresses the common words in a group of tweets, as well as the different forms under which the same information was carried.

The lattice shown in Figure 3 provides a representation where the various tweets are connected to another according to their actual content and not to their chronology (as in Figure 1).

 Figure 3

 EVARIST has been used in practice for e-reputation and buzz monitoring on Twitter. Some

research issues remain open with regard to scalability issues, as shown in Figure 4 (see Section 5 for performance issues).

Figure 4  

This application of Formal Concept Analysis illustrates its interest for the analysis of information carried in a social network. Another interesting example is given in (Ducassé et al., 2011) and in (Eklund and Wray, 2010). The following section describes another application, which focuses on the members of the network.

2.2. FCA-based social networks metrics In (Riadh et al. 2009), we proposed a method relying on Formal Concept Analysis and Galois lattices for Complex Systems analysis. This approach provides an intuitive visual characterization of the systems under study through the computation of conceptual metrics generated from Galois lattices. These footprints help users better understand the data's overall structure and features, as well as identify significant elements. This method also enables the automation of outliers' filtering. Moreover, this approach is generic and may therefore be applied to any type of complex system; it is particularly adapted to social networks analysis. The major contribution of this work is to propose new methods for social networks analysis based on Formal Concept Analysis and Galois Lattices, complementary to existing methods such as centrality-based methods.

Given a social network, we define objects as members and their attributes as their contacts. They constitute the formal context from which we build a Galois lattice. This Galois lattice is then used to calculate statistics, which we call Conceptual Relatedness and Closeness, about every member of the social network. These metrics enable the identification of interesting objects (members), as we explain below. FCA can also be used to identify interesting objects in other contexts than social networks, e.g. for requirement modeling (Bogatyrev and Nuriahmetov, 2011).

Conceptual Relatedness and Closeness, from which we derive the Conceptual Distribution of

the system, are computed for each object according to the extents and intents of the lattices' concepts, as detailed below:

• Conceptual relatedness Let an object o. Let C the set of concepts from the lattice which contain the object o in their

extents. Let C' the subset of concepts from C which contain at least another object than o in their extents and at least one property in their intents. The value of o's Relatedness (noted Relatedness(o)) corresponds to the average number of objects with which o is clustered in concepts from C', divided by the total number of objects in the social network. The Relatedness value indicates whether o is connected to many other objects.

• Conceptual closeness Let S the set of objects with which o is clustered in at least one concept of the lattice (i.e. the

set of objects with which o is connected); these objects have at least one property in common with o (by construction). The object o's Closeness value (noted Closeness(o)) is the average number of properties o shares with the other objects from S, divided by the total number of o's properties. This parameter indicates whether o is strongly connected to the other objects from S (i.e. whether it shares a high proportion of properties with them).

• Conceptual Distribution The (Relatedness(o), Closeness(o)) pair constitutes the Conceptual Distribution of the

object o.

Figure 5 illustrates these conceptual metrics on the toy network made of 20 members shown in Figure 4. Conceptual Relatedness and Closeness appear on Figure 5 on the X and the Y axis respectively; this is the representation of what we call the conceptual distribution of the network.

These conceptual metrics are complementary to traditional centrality measures (see Figure 6), as they provide different (and not contradictory) information about members of the considered social network.

We recall that centrality measures are traditionally used in SNA, in particular:

• Closeness Centrality: this value is high for the nodes which are close to many other nodes in the network (in terms of number of links);

• Betweenness Centrality: this value is high for nodes which are on a high proportion of shortest paths to the other nodes in the network.

 

Figure 5 Figure 6

 

Figure 7 The conceptual distribution deduced from these values enables the social network’s global

characterization in terms of homogeneity or heterogeneity, as well as an automatic filtering of outliers, called marginal objects (i.e. members). The filtering process, based on this conceptual characterization, consists in removing members with low conceptual relatedness and closeness values, which are “conceptually isolated” or “marginal”. Typically in this case, there is no

marginal member in the network under conceptual criteria, whereas members 1 and 12 would be filtered under centrality criteria. In (Riadh et al. 2009), we have performed this conceptual filtering and compared its result with filtering based on centrality measures, i.e. which removes members with low closeness and betweenness centrality values. When we compute the lattices from the new set of objects, after the filtering process, the conceptual structure is better preserved with the conceptual filtering than with the centrality-based filtering.

Another very interesting feature of the social network may be deduced from the conceptual

distribution on Figure 5: its (conceptual) community structure. Indeed, two distinct communities may be identified: C1, which contains nodes with low Relatedness and high Closeness, and C2, with high Relatedness and low Closeness. Members of C2 are therefore connected to few other members, but they have a high proportion of common contacts with them. They represent several social groups with common interests. We can say that there are several interest groups, as if it was a single interest group, then the conceptual relatedness value would be high too. On the other hand, C1 gathers members who are connected to many other members, but who have a small proportion of common contacts. In our examples, members 1 and 12 are “leaders” who are referred to by many other members but who themselves do not refer to other members.

Figure 6 displays another type of information about the network, based on centrality measures; we may not say that one representation is “better” than the other, but rather that they offer different perspectives of the network. Formal Concept analysis therefore provides additional ways of analysis social networks.

In order to be able to compare FCA and centrality-based results, we have used topological

information only for our formal context, i.e. information about contacts within the social network. FCA has an advantage on centrality-based metrics for communities detection as it may take into account additional information, e.g. semantics (see section 3), or information about users’ environment or context, as described in the following section.

2.3. FCA-based contextual search In this section, we describe an application based of FCA which performs a dynamic community building, coupled to recommendations to join a group (Vanrompay et al., 2013). This work has been conducted in the context of MANETs (Mobile Ad Hoc Networks). The challenge for the applications in such ubiquitous computing environments consists in adapting to users context, acquired by multiple context sensors distributed over the environment. Nevertheless, applications are not interested in all available context information. Context distribution mechanisms have to cope with the dynamicity that characterizes MANETs and also prevent context information to be delivered to nodes (and applications) that are not interested in it. FCA can provide interesting mechanisms; it has also been applied successfully to pervasive applications in (Chollet et al., 2012).

Our approach is based on a grouping mechanism which organizes the distribution of context

information in groups (i.e. user communities) whose definition is context based: the composition of each contextual community is based on a criteria set (e.g. the shared location and interest) and has a dissemination set, which controls the information that can be shared within the community. Traditional work on context grouping relies on a rather static perception of groups, being defined explicitly by the developer at design time (Kirsch Pinheiro et al., 2008). In order to provide a

spontaneous, personalized and dynamic way of defining and joining these groups, we use a lattice-based classification and recommendation mechanism that analyzes the interrelations between communities and users, and recommends new communities, based on user interests and preferences. Another approach relying on FCA to perform recommendation has been proposed in (Ignatov et al., 2008).

The combination of contextual and conceptual information allows us to discover and analyze

relations between different communities and the users they contain. Based on a user profiles’ similarity with the elements specifying the group and the current situation of the user, the system can decide to give a recommendation to the user to join a community. This leads to a dynamic, proactive and personalized user experience. The methodology we used for the construction of the lattice is very close from the one described in Section 3. We therefore do not detail it here.

We have shown in this Section three successful applications of FCA for social networks

analysis, for the analysis of social content information and for the identification of users’ communities. FCA’s power may be further enhanced by its joint use with external information such as ontologies, as explained in the following section.

3. CONCEPTUAL NAVIGATION AND SEMANTIC SEARCH IN SOCIAL NETWORKS Combination of FCA with semantics has already been studied in the past. In particular, some works aim at using FCA to build or maintain ontologies (Sertkaya, 2010; Kirchberg et al., 2012; Bendaoud et al., 2008; Völker and Rudolph,2008), or to find underlying –hidden- semantics in data (Watmough et al., 2013). In this section, we present another successful application of Formal Concept Analysis in the context of social networks, which combines conceptual clustering with semantic information, namely ontologies. This has been also the approach followed by (Alam et al., 2012; Codocedo et al., 2012 and Guerin et al., 2012) in other contexts.

Our application aims at enhancing spoken dialogue systems by providing a more personalized

answer tailored to the user’s preferences and need. This work is part of the EU FP7 funded project PARLANCE2 that aims to design and build mobile applications which approach human performance in conversational interaction.

In order to achieve these goals, we have proposed a spoken dialogue system where user

interests are expressed as scores in modular ontologies. The use of ontologies enables us to cover multiple domains (e.g. searching for restaurant, housing, ...) because each ontology module corresponds to a specific search domain. This approach allows for a dynamic and evolving representation of user interests. We use techniques borrowed from formal concept analysis (FCA) to build ad-hoc communities of users with similar interest, within which information can be shared amongst and recommended to users3.

                                                                                                                         2  The  Parlance  project  Web  site  is  located  at  the  following  URL:  https://sites.google.com/site/parlanceprojectofficial/    3  Note  that  this  objective  is  similar  to  the  one  presented  in  Section  XXX,  applied  here  in  a  different  context.  

We have designed a novel approach to efficiently represent dynamically evolving user preferences and interests (Vanrompay et al., 2012). By analyzing the dialogue history of the user, interests are inferred and ontology modules for different domains are annotated with scores. In this way, our spoken dialogue system can provide a more personalized experience to the user by providing answers tailored to the interests and preferences of the user. Second, the identified interests are used to perform formal concept analysis and to construct ad-hoc communities of users sharing similar interests, allowing a form of social search. By collaborative filtering we can share and recommend possibly interesting information and additional communities to users.

More precisely, the user interests and preferences are expressed in ontology modules as

entities and attributes that are annotated with scores wi indicating a user (dis-)interest in a particular item. Each ontology module is domain-specific, representing for example interests in houses for sale or restaurants. For example, the restaurant sub-module contains attributes like location, food type and dress code. Scores associated with these attributes indicate whether the user prefers Indian or Italian restaurants, or whether the dress code should be formal or casual. Each scored ontology module serves as input for the construction of a formal context (G, M, I). The set G holds all users, while M contains relevant attributes (interests). I is a function defined as follows:

I(gi, mj): G x M → {0, 1} = 1 if wj(ui) ≥ T, otherwise 0 wj(ui) is the score assigned for user i to attribute j, and T is a threshold between 0 and 1. Figure 8 shows an example formal context, in which crosses indicate a value of 1 for I. Based

on this formal context, we build a concept lattice (also illustrated in Figure 8) allowing us to identify the sets of interests which are best suited to be taken into account for the definition of user communities. Within these communities it is then possible to propose recommendations. The lattice assigns users to (sets of) interests which are the building blocks for the communities. A user ui is interested in all concepts and attribute(s) values that are connected to and hierarchically higher in the lattice than the node the user is assigned to. From the Galois lattice we can automatically infer association rules like the ones shown in Figure 9. Several works have successfully applied Formal Concept Analysis to the extraction of relevant association rules (Hamrouni et al., 2008; Belohlavek et al., 2011) or to perform sequential pattern mining (Egho et al., 2011).

For example, rule 1 states that 7 users are interested in restaurants with formal dress code, and

6 users out of these 7 prefer French food. From the association rules, we infer suitable communities, which balance generality and specificity. Communities should not be too general because information of interest for the user should be disseminated only to users who are interested in that information. Users should not be flooded with all kinds of information. On the other hand, communities should not be too specific, because this would lead to a large number of sparsely populated communities. Intuitively, this corresponds to concepts in the lattice on a not too high and not too low level, i.e. somewhere in the middle.    

 Figure 8: Formal context and Galois lattice

 Figure 9: Association rules

The association rules allow us to identify a balanced number of communities with users who

share common interests. Two basic metrics, support and confidence, are used to find the sets of interests defining the communities:

• Support s: this metric denotes the proportion of users who expressed their interest in a set of attributes. For example, association rule 4 indicates that 6 users (out of 15) have an interest in restaurants serving French food with formal dress code; the value of support is therefore 0.4.

• Confidence c: this metric represents the proportion of users having an interest in consequent rules, given they have an interest in antecedent rules. In rule 4 for example, 5 out of 6 users who are interested in French restaurants with formal dress code are interested in Parisian restaurants. So confidence in this case is equal to 0.83.

As soon as the appropriate candidate interest sets for the communities are identified, a second

filtering step takes place using the stability index metric (Kuznetsov et al., 2007). Informally, a concept (i.e. community in our case) is considered stable if its intent (set of interests) does not

depend much on each particular object of the extent (set of users). Intentional stability expresses to what extent the set of interests forming a community depends on the interests of its individual users. If a community is persistent, it does not depend on a few users: some users leaving the community should not change the set of interests the community represents. It should be noted that we already minimize this risk by incorporating the minimal confidence metric in step 1. A stable community also does not merge with a different community or split into several independent sub-communities when some users leave the community. Apart from the intentional stability, we also use the extensional stability, which measures how the users of a particular community depend on the particular interests. Extensional stability indicates to what extent the community of users depends on a particular interest.

This innovative application of FCA allows to efficiently represent dynamically evolving user

preferences and interests. By analyzing the dialogue history of the user, interests are inferred and ontology modules for different domains are annotated with scores. In this way:

1. our spoken dialogue system can provide a more personalized experience to the user by providing answers tailored to her/his interests and preferences ;

2. these interests are used to perform formal concept analysis and to construct ad-hoc communities of users sharing similar interests. Through collaborative filtering we can share and recommend possibly interesting information and additional communities to users.

In this Section, we have shown an example of successful application of FCA, combined with

semantics, for the identification of relevant user communities. In the next Section, we propose another approach, still relying on FCA, to perform this task.

4. FCA AND TOPOLOGY-BASED SOLUTIONS APPLIED TO SOCIAL NETWORKS ANALYSIS The efficient identification of communities in a network is a traditional research area in complex networks analysis. Various approaches have been proposed in this area, a very popular one being the Louvain method (Blondel et al., 2008). This method consists in partitioning a graph in a way that maximizes a specific quality function called modularity. Maximizing this function leads to the identification of groups of nodes -called communities- which are tightly connected to one another within the group, and loosely connected to nodes from other groups. This approach is extremely efficient in time and can handle graphs with millions of nodes. However, the obtained communities are disjoint and this method suffers from non-determinism. Moreover, communities are not labeled and only topological information is taken into account for their computation. On the other hand, Galois lattices perform an overlapping clustering and produced deterministic communities, which may be labeled with extent/intent. However, their complexity is very high.

We therefore propose to combine an approach relying on modularity optimization with Galois lattices in order to exploit the strengths of both worlds (Melo and al., 2012b). We apply our work to the identification of user communities in an enterprise environment, in order to enhance Business Process Management. The goal of this application is to support market analysts in their task of Customer Relations Management (CRM) marketing and management, through the analysis of all available and relevant customer information is a major task.

We have proposed measures based on Formal Concept Analysis to determine conceptual proximity between people and identify user communities. Other approaches for conceptual similarity may be found in (Formica, 2008; Lee et al., 2011). The temporal evolution of our proximity measure is analyzed, and provides significant insights on trends and market behavior. This approach has been exemplified with a case study on Twitter, to analyze content dynamics within user communities. Users of Twitter are linked in the network through the ‘follower” relationship.

We briefly describe the proposed method as follows: 1. A community detection algorithm based on modularity optimization identifies clusters of

densely-connected users. We use the resulting community structure relying on the social network topology as a basis of our analysis. These topology-based communities are reflected by the color of nodes in the various representations. Three communities have been identified in this sample (represented with red, purple and green colors).

2. Formal Concept Analysis is then applied, through the computation of a Galois lattice, to study the evolution of these communities, based on the content of users tweets. Instead of using traditional data analysis approaches, we use a FCA approach as we know from our previous work that is takes into account an implicit context which is not captured with traditional approaches. We then compute, for all users, a conceptual similarity value with other users, based on the generated Galois lattice. The similarity matrix, containing all similarity values between users, allows us to draw 2D maps, using Multi-Dimensional Scaling (MDS) projections. The evolution of these maps over time shows how users get closer to one another within their own community, and how they may get closer –or more distant- to other communities.

The evolution of content can be observed by snapshots of MDS maps for conceptual

similarity as illustrated in the example in Figures 10 and 11. The sequence of snapshots illustrates the “conceptual movement” of users, approaching and moving away from each other, eventually to form clusters around a “buzz” (like A and B areas of Figure 10). More precisely, the A area in Figure 10 corresponds to a small group of people in the same topological community (as they are represented with the same color) publishing topics related to “parties” such as “Saturday night”, “disco”, “dj”, etc. The C area in Figure 10 is an interesting region of the map where people are conceptually similar, i.e. close to one another on the map, although they belong to different communities (red, purple and green points may indeed be found in this area). Such event may indicate future collaboration between communities, with nodes switching from one community to the other. These users are not necessarily following each other, i.e. are not necessarily connected in the graph, which may be interesting for link prediction models. A more accurate analysis of similar conceptual groups and of network linking is one of the perspectives of this work.

A subsequent timespan is illustrated by Figure 11, showing an unusual mutual gathering of users from all topological communities. This happened to be October 2011 when people were moved by the death of Steve Jobs who hence became a common subject of discussion.  

 

Figure 10: MDS map for the conceptual similarity between users from each community in timespan tn

 

 

Figure 11: MDS map for the conceptual similarity between users from each community in timespan tn+m

In this Section, we have shown how FCA could be used in conjunction with traditional

community detection approaches from graph analysis, in order to provide additional information. In the example above, FCA has allowed to assign coordinates to users on a map and to analyze the evolution of topological communities, which is an open issue for complex networks analysis. A presentation of research challenges related to dynamic socio-semantic networks may be found in (Yavorsky, 2011). One promising application of the proposed approach could be link prediction.

Formal Concept analysis may also be combined with other types of topological communities.

For example, the authors of (Danisch et al, 2013) compute overlapping (multi-)ego-centered

communities on very large datasets, such as the set of 2 million Wikipedia pages. Galois lattices would be a natural representation for these communities to enhance navigation within them.

5. CONCLUSION We have shown successful recent applications of Formal Concept Analysis to Social Networks Analysis, focusing both on content and actors, on static and dynamic behaviors of the networks. However FCA still faces many challenges, in particular in terms of usability and visualization capabilities, and more generally to cope with the large, heterogeneous and highly dynamic data generated daily (in online social networks but also in complex systems in general).

5.1. FCA-based visualization solutions Various tools have been developed for the visualization and the navigation within Galois lattices (Andrews et al., 2011; Villerd et al., 2008). In this section, we will focus on a recent tool, call Cubix, which is a FCA-based visual analytics solution developed recently for Business Intelligence applications (Melo and al., 2012a). Business Intelligence (BI) is an umbrella term which essentially describes the ability for an organization to take all relevant, available data and convert them into knowledge. BI uses technologies and applications to analyze mostly internal, structured data, condense and aggregate that data, and present the aggregated data in a form to a user such that she can conduct better decision making. Some works have proposed other approaches relying on FCA to address business issues (Macko, 2012; Polovina, 2012; Guédi et al., 2012; Domenach and Tayari, 2012), but Cubix is the most elaborated one in terms of visualization capabilities.

CUBIX has been developed during the CUBIST4 (Combining and Uniting Business Intelligence with Semantic Technologies) project, which is a EU FP7 funded project aiming at going beyond the state-of-the-art by integrating semantics in data warehouses and providing efficient visual analytics tools.

Similar to traditional BI systems, in CUBIST data from different sources (structured and

unstructured) is gathered and stored in a central repository. With respect to traditional BI systems, CUBIST has though two distinguishing features:

1. The repository used in CUBIST is not a database or a data warehouse, but a semantic repository, a so-called triple store.

2. The analytics in CUBIST follow a different approach: Instead of conducting the quantitative show me the numbers approach, it follows a qualitative show me the clusters and dependencies approach based on mathematical theory called Formal Concept Analysis.

We followed a user-centered approach in Cubix. The implemented techniques allow selecting,

comparing, filtering, detailing and getting an overview of concept lattice features. The novelty of our work consists of (i) the combination of FCA with visual analytics data exploration techniques, (ii) new algorithms for condensing and filtering conceptual data, and (iii) integration with semantic data from semantic repositories (triple stores).

                                                                                                                         4  http://www.cubist-­‐project.eu/  and  the  YouTube  channel:  http://www.youtube.com/user/CUBISTFP7ICT  

Typical uses of Cubix include semantic data analysis and pattern detection, anomaly detection, comparisons, information classification, and knowledge discovery. Cubix’s workflow allows users to carry out an analysis starting from a real data set, converting it into a formal context, simplifying the context to make it manageable, and visualizing the result as a concept lattice (Figure 11). Cubix is a step forward in relation to existing FCA tools in terms of number of techniques employed and of its ability to deal with much larger data.

Figure 12. Cubix User Interface

5.2. Open challenges for FCA-based applications Many challenges remain open for FCA-based applications, due to the very large volume of heterogeneous and highly dynamics data generated daily in all types of systems around us.

The complexity of the construction of Galois lattices is such that solutions have to be found to

cope with this volume of data. Efficient solutions (Ferré, 2008) have been proposed to manage large datasets (Andrews et al., 2010) or big dynamical text collections (Kuznetsov et al., 2012).

The interpretation of resulting lattice also becomes a challenge (Andrews, 2011), and we have

proposed distributed solutions for the computation of our conceptual metrics (Riadh et al., 2010). Another way to deal with large lattices is to reduce them by selecting specific concepts (Kuznetsov et al., 2007; Jay et al., 2008), or by trees extracted from them in order to benefit from tree-based visualizations (Melo et al., 2011).

Formal Concept Analysis has succeeded so far to show its potential in many different areas

and to face many challenges. No doubt we will keep hearing from it for many more years…

REFERENCES Alam, M, Coulet, A., Napoli, A. & Smail-Tabbone, M. (2012). Formal Concept Analysis Applied to Transcriptomic Data, in proceedings of The Ninth International Conference on Concept Lattices and Their Applications - CLA 2012, Fuengirola (Málaga) : Spain. Andrews, S. (2011). In-Close2, a High Performance Formal Concept Miner, In: Andrews, S., Polovina, S., Hill, R., Akhgar, B. (eds.) ICCS-ConceptStruct 2011. LNCS, vol. 6828, pp. 50–62. Springer, Heidelberg. Andrews, S., Orphanides, C. & Polovina, S. (2011). Visualising Computational Intelligence through Converting Data into Formal Concepts. Next Generation Data Technologies for Collective Computational Intelligence pp. 139-165. Andrews, A. & Orphanides, C. (2010). Analysis of Large Data Sets using Formal Concept Lattices, In: Kryszkiewicz, M. and Obiedkov, S. (eds.) Proceedings of the 7th International Conference on Concept Lattices and Their Applications (CLA) 2010, ISBN 978-84614-4027-6, Seville: University of Seville, pp. 104-115. Belohlavek, R., Grissa, D., Guillaume, S., Mephu Nguifo, E. & Outrata, J. (2011). Boolean factors as a means of clustering of interestingness measures of association rules, in proceedings of The Eighth International Conference on Concept Lattices and Their Applications, Nancy, France, October 17-20. Bendaoud, R., Toussaint, Y. & Napoli, A. (2008). PACTOLE: A Methodology and a System for Semi-automatically Enriching an Ontology from a Collection of Texts, in proceedings of ICCS 2008, pp. 203-216. Blondel, V. D., Guillaume, J.-L., Lambiotte, R. & Lefebvre, E. (2008). Fast unfolding of communities in large networks, Journal of Statistical Mechanics: Theory and Experiment (10), P10008. Bogatyrev, M. & Nuriahmetov, V. (2011). Application of Conceptual Structures in Requirements Modeling, in proceedings of CDUD'11 (Concept Discovery in Unstructured Data), workshop co-located with the 13th International Conference on Rough Sets, Fuzzy Sets, Data Mining, and Granular Computing (RSFDGrC-2011), Moscow, Russia. Chollet, S., Lestideau, V., Maurel, Y., Gandrille, E., Lalanda, P. & Raynaud, O. (2012). Practical Use of Formal Concept Analysis in Service-Oriented Computing, in proceedings of ICFCA 2012, pp. 61-76. Codocedo, V., Lykourentzou, I. & Napoli, A. (2012). Semantic Querying of Data Guided by Formal Concept Analysis, in proceedings of FCA4AI 2012 (What can FCA do for Artificial Intelligence?), Moscow, Russia. Cuvelier, E. & Aufaure, M.-A. (2011). A buzz and e-reputation monitoring tool for Twitter based on Galois Lattices, in proceedings of ICCS'2011: Conceptual Structures for Discovering Knowledge, pp. 91-103, Derby, UK. Danisch, M., Guillaume, J.-L. & Le Grand, B. (2013). Towards multi-ego-centered communities: a node similarity approach, International Journal of Web-based Communities (IJWBC) 9(2). Domenach, F. & Tayari, A. (2012). Decision Aiding Software Using FCA, in proceedings of the 10th International Conference, ICFCA 2012, Leuven, Belgium. Ducassé, M, Ferré, S. & Cellier, P. (2011). Building up Shared Knowledge with Logical Information Systems, in proceedings of CLA 2011, pp. 31-42. Ducassé, M & Ferré, S. (2008). Fair(er) and (almost) serene committee meetings with Logical and Formal Concept Analysis, in Proceedings of the 16th International Conference on

Conceptual Structures ICCS 2008, volume 5113 of Lecture Notes in Computer Science, page 217-230. Springer. Egho, E., Jay, N., Raissi, C. & Napoli, A. (2011). A FCA-based analysis of sequential care trajectories, in proceedings of the 8th International Conference on Concept Lattices and their Applications, CLA 2011, pp. 363-376, Nancy, France. Eklund, P. W. & Wray, T. (2010). Social Tagging for Digital Libraries using Formal Concept Analysis, in proceedings of CLA 2010, pp. 139-150. Ferré, S. (2008). Implementations of Logical Information Systems based on the Decomposition of Contexts, in proceedings of the 6th International Conference, ICFCA 2008, Montreal, Canada, February 25-28. Formica, A. (2008). Concept similarity in Formal Concept Analysis: An information content approach. Knowledge-Based Systems 21, pp. 80–87. Guédi, 1. O., Miralles, A., Huchard, M., Libourel, T. & Nebut, C. (2012). How Formal Concept Analysis can help to Observe the Evolution of a Business Model, in proceedings of the 10th International Conference, ICFCA 2012, Leuven, Belgium. Guerin, C., Bertet, K. & Revel, A. (2012). An Approach to Semantic Content Based Image Retrieval Using Logical Concept Analysis. Application to Comicbooks, in proceedings of FCA-ECAI, La Rochelle, France. Hamrouni, T., Ben Yahia, S. & Mephu Nguifo, E. (2008). GARM: Generalized Association Rule Mining, in proceedings of CLA 2008, pp. 145-156. Ignatov, D. I. & Kuznetsov, S. O. (2008). Concept-based Recommendations for Internet Advertisement, In proceedings of The Sixth International Conference Concept Lattices and Their Applications (CLA'08), Olomouc, Czech Republic, ISBN 978-80-244-2111-7. Jay, N., Kohler, F. & Napoli, A. (2008). Analysis of Social Communities with Iceberg and Stability-based Concept Lattices, In proceeding of: Formal Concept Analysis, 6th International Conference, ICFCA 2008, Montreal, Canada, February 25-28. Kirchberg, M., Leonardi, E., Tan, Y. S., Link, S., Ko, R. & Lee, B. S. (2012). Formal Concept Analysis for Semantic Web Data, in proceedings of 10th International Conference on Formal Concept Analysis, Leuven, Belgium. Kuznetsov, S.O., Neznanov, A. A. & Poelmans, J. (2012). A System for Knowledge Discovery in Big Dynamical Text Collections, in proceedings of FCA4AI 2012 (What can FCA do for Artificial Intelligence?), Moscow, Russia. Kuznetsov, S. O., Obiedkov, S. & Roth, C. (2007). Reducing the Representation Complexity of Lattice-Based Taxonomies, In: Conceptual Structures: Knowledge Architectures for Smart Applications. Springer, Berlin / Heidelberg , (pp. 241-254). Lee, M. C., Chen, H. H. & Li Y.-S. (2011). FCA based concept constructing and similarity measurement algorithms, International Journal of Advancements in Computing Technology (IJACT) 3(1), pp. 97–105. Macko, J. (2012). Formal Concept Analysis as a framework for Business Intelligence technologies, in the proceedings of the 10th International Conference, ICFCA 2012, Leuven, Belgium, pp., 195-210. Melo, C., Mikheev, A., Le Grand, B. & Aufaure, M.-A. (2012). Cubix: A Visual Analytics Tool for Conceptual and Semantic Data, IEEE International Conference on Data Mining (ICDM 2012), Demo paper, Brussels. Melo, C., Le Grand, B. & Aufaure, M.-A. (2012). A Conceptual Approach to Characterize Dynamic Communities in Social Networks: Application to Business Process Management, 5th

Int. Workshop on Business Process Management and Social Software (BPMS2’12), collocated with the 10th International Conference on Business Process Management, Tallinn, Estonia, Sept. 3-6, 2012. Melo, C., Le Grand, B., Aufaure, M.-A. & Berezianos, A. (2011). Anastazia Berezianos, Extracting and Visualizing Tree-Like Structures from Concept Lattices, IV’2011 conference on Information Visualization, London. Polovina, S. (2012). The Transaction Concept in Enterprise Systems, In: ANDEWS, Simon and DAU, Frithjof, (eds.) The 10th International Conference on Formal Concept Analysis (ICFCA 2012). Proceedings of the 2nd CUBIST workshop. Aachen, Sun SITE Central Europe, pp. 43-52. Riadh, T. M., Le Grand, B., Aufaure, M.-A. & Soto, M. (2010). Powerconcept: Conceptual Metrics' Distributed Computation, in proceedings of the 8th International Conference on Formal Concept Analysis (ICFCA), Agadir, Morroco. Riadh, T. M., Le Grand, B., Aufaure, M.-A. & Soto, M. (2009). Conceptual and statistical footprints for social networks' characterization, in proceeding of SNAKDD 2009: 8. Sertkaya, B. (2010). A Survey on how Description Logic Ontologies Benefit from Formal Concept Analysis. In M. Kryszkiewicz and S. Obiedkov editors, Proceedings of the 7th International Conference on Concept Lattices and Their Applications, (CLA 2010), volume 672 of CEUR Workshop Proceedings, pp. 2-21. Vanrompay, Y., Kirsch Pinheiro, M., Ben Mustapha, N. & Aufaure, M.-A. (2013). Context-based Grouping and Recommendation in MANETs, in Intelligent Technologies and Techniques for Pervasive Computing, IGI Global, eds. Kostas Kolomvatsos, Christos Anagnostopoulos, Stathes Hadjiefthymiades. Vanrompay, Y., Ben Mustapha, N. & Aufaure, M.-A. (2012). Ontology-based User Preferences and Social Search for Spoken Dialogue Systems, in proceedings of the 7th International workshop on Semantic and Social Media Adaptation and Personalization (SMAP), pp. 113-118, Luxembourg. Villerd, J., Ranwez, S. & Crampes, M. (2008). Using Concept Lattice for Visual Navigation Assistance and Attribute Selection, In proceeding of: Supplementary Proceedings of the 16th International Conference on Conceptual Structures, ICCS 2008, Toulouse, France. Völker, J. & Rudolph, S. (2008). Lexico-Logical Acquisition of OWL DL Axioms; An Integrated Approach to Ontology Refinement. In: Medina, R., Obiedkov, S. (eds.) ICFCA 2008. LNCS (LNAI), vol. 4933, pp. 62–77. Springer, Heidelberg. Watmough, M., Polovina, S. & Andrews, S. (2013). Designing Learning to Research the Formal Concept Analysis of Transactional Data, in proceedings of ICCS 2013, pp. 231-238. Yavorsky, R. (2011). Research Challenges of Dynamic Socio-Semantic Networks, In: Ignatov, D., Poelmans, J., Kuznetsov, S. (eds.) CDUD 2011 - Concept Discovery in Unstructured Data, CEUR Workshop Proceedings, vol. 757, pp. 119–122.