the fifth international conference on network analysis net

The Fifth International Conference on

Network Analysis NET 2015

May 18-20, 2015

Laboratory of Algorithms and Technologies for Networks Analysis

(LATNA),

National Research University Higher School of Economics, Nizhny

Novgorod, Russia

2 May 18th - May 20

th 2015, Nizhny Novgorod, Russia

Monday, May 18

Room 209 HSE, 136 Rodionova Str.

09:00-09:30 Registration

09:30-10:00 Panos M. Pardalos

The Fifth International Conference on Network Analysis NET 2015

10:00-10:50 Fedor Fomin

Graph Modification Problems: A modern perspective

10:50-11:10 Coffee Break

11:10-12:00 Konstantin Avratchenkov

Graph-based semi-supervised learning methods

12:00-13:00 Session 1

Liudmila Ostroumova Prokhorenkova

Global clustering coefficient in scale-free networks

Alexander Krot

Local clustering coefficient in preferential attachment graphs

Akmaljon Artikov

Factorization threshold models for scale-free networks generation

13:00-14:30 Lunch Break

14:30-15:20 Alexander Kononov

Energy-Efficient Scheduling problems


15:40-16:40 Session 2

Dmitriy Malyshev

A complexity dichotomy and a new boundary class for the dominating set problem

Anna Pirova

Parallel algorithm of graph ordering for minimizing sparse Cholesky factor fill-in

The Fifth International Conference on Network Analysis 3

Margarita Pankratova

Hybrid methods for mapping a parallel program onto computing network


17:00-18:00 Session 3

Andrey Murashov

The text network analysis: What does strategic documentation tell us about regional integration?

Alexander Semenov

Threshold selection for pseudo-bimodal networks of retweets via different metrics of network

centrality

Sergey Bastrakov

An Algorithm for Constraint/Generator Removal from Double Description of Polyhedra

4 May 18th - May 20


Tuesday, May 19


09:30-10:20 Oleg Prokopyev

Finding maximum subgraphs with relatively large vertex connectivity


10:40-11:30 Oleg Burdakov

A bi-criteria approach to solving huge hop-constrained Steiner tree problems

11:30-12:30 Session 4

Yury Maximov

Improved polynomial time approximation guarantees for well structured quadratic optimization

problems

Theodore Trafalis

Kernel methods in Natural Gas Storage Valuation

Dmitry Zhelonkin

Sentiment Analysis in Russian Social Networks


14:00-14:50 Andrey Leonidov

Network effects in economics and finance


15:10-16:10 Session 5

Mario R. Guarracino

On Laplacian regularization for generalized eigenvalue classifiers

Artem Ryblov

Network analysis of Mass spectroscopy medical data

Alexander Karsakov

Network analysis of methylation data for cancer diagnostics



16:30-17:30 Session 6

Alexandr Maximenko

Comparing Complexity of Combinatorial Polytopes

Dmitry Mokeev

König graphs for 4-path. Full description

Dmitry Gribanov

Integer programming in simplices

6 May 18th - May 20


Wednesday, May 20


10:00-10:50 Andrey Raigorodskii

Small Subgraphs in Preferential Attachment Networks


11:10-12:30 Session 7

Dima Kamzolov

Computationally efficient PageRank algorithm exploting graph sparsity

Irina Utkina

Branch and bound algorithm for cell formation problem

Alexander Gasnikov

Semi-Supervised PageRank Model Learning with Gradient-Free Optimization Methods

Alexander Gagloev

Sparsity and randomization based techniques in huge scale traffic matrix estimation problems


14:00-14:50 Nelly Litvak

Ranking in large scale-free networks


15:10-16:10 Session 8

Alexander Nikolaev

Bayesian Evidence Cascades and Seed-Initiated Marketing Campaigns in Social Networks

Petr Koldanov

Identification of concentration graph in Gaussian graphical model

Alexey Kazakov

Dynamics of the ensemble of inhibitory coupled neuron-like Rulkov map

http://nnov.hse.ru/en/latna/conferences/net2015/Raigorodskii


Graph Modification Problems: A modern perspective

Fedor Fomin

Bergen University, Norway and St Petersburg department of Steklov Mathematical

Institute

In network (or graph) modifications problem we have to modify (repair, improve, or

adjust) a network to satisfy specific required properties while keeping the cost of the

modification to the minimum. The commonly adapted mathematical model in the study

of network problems is the graph modification problem. This is a fundamental unifying

problem with a tremendous number of applications in various disciplines like machine

learning, networking, sociology, data mining, computational biology, computer vision,

and numerical analysis, and many others.

In this talk we give an overview of recent results and techniques in parameterized

algorithms for graph modification problems.

8 May 18th - May 20


Graph-based semi-supervised learning methods

Konstantin Avratchenkov

INRIA, France

Semi-supervised learning methods constitute a category of machine learning methods

which use labelled points together with the similarity graph for classification of data

points into predefined classes. For each class a semi-supervised method provides a

classification function. The main idea of the semi-supervised methods is based on the

assumption that the classification function should change smoothly over the similarity

graph. This idea can be formulated as an optimization problem. Some particularly well

known semi-supervised learning methods are the Standard Laplacian (or transductive

learning) method and the Normalized Laplacian (or diffusion kernel) method. Different

semi-supervised learning methods have different kernels which reflect how the

underlying similarity graph influences the values of the classification functions. In the

present work, we analyse a general family of semi-supervised methods, explain the

differences between the methods and provide recommendations for the choice of the

kernel parameters and labelled points. In particular, it appears that it is preferable to

choose a method and a kernel based on the properties of the labelled points. Our

general framework gives particularly promising PageRank based method. We illustrate

our general theoretical conclusions with a typical benchmark example, clustered

preferential attachment model and two applications. One application is about

classification of Wikipedia pages and another application is about classification of

content in P2P networks. (This talk is based on the joint works with P. Goncalves, A.

Mishenin and M. Sokol)


Energy-Efficient Scheduling problems

Alexander Kononov

Sobolev Institute of Mathematics, Novosibirsk

Scheduling problems has long been in the center of interest from the researchers in

Computer Science, Business Analytics, Operations Research and Engineering due to a

wide range of applications such as timetabling, transportation, air traffic control etc. The

classical scheduling problems usually try to optimize various performance metrics such

as schedule length (makespan), or number of jobs completed by their due date etc.,

subject to various capacity or resource constraints such as number of concurrent jobs

that a processor can handle or bandwidth constraints.

In today's world, energy is one of the most important resources and energy conservation

is a major concern today. It has been realized recently that energy is not just a regular

resource similar to processor capacity. We address one of the main mechanisms for

reducing the energy consumption in modern computer systems which is based on the

use of speed scalable processors. This relatively new technique saves energy by

utilizing the full speed/frequency spectrum of a processor and applying low speeds

whenever possible. The dependence of energy consumption on performance of the

system is highly non-linear and as a result new techniques to assign tasks to processors

and execute them in the optimal or near-optimal manner are required.

We are given a set of jobs, each one specified by its release date, its deadline and its

processing volume (work), and a single (or a set of) speed-scalable processor(s). We

adopt the standard model in speed-scaling in which if a processor runs at speed s then

the energy consumption is sα per time unit, where α> 1. Our goal is to find a schedule

respecting the release dates and the deadlines of the jobs so that the total energy

consumption is minimized.

Dynamic speed scaling leads to many interesting complicated scheduling problems. At

any time a scheduler has to decide not only which job to execute but also which speed

to use. Consequently, there has been considerable research interest in the design and

analysis of efficient scheduling algorithms. We survey recent research that has appeared

in the theoretical computer science literature on algorithmic problems related to off-line

energy-efficient scheduling problems.

10 May 18th - May 20


Finding maximum subgraphs with relatively large vertex connectivity

Oleg Prokopyev

University of Pittsburgh, USA

We consider a clique relaxation model based on the concept of relative vertex

connectivity. It extends the classical definition of a k-vertex-connected subgraph by

requiring that the minimum number of vertices whose removal results in a disconnected

(or a trivial) graph is proportional to the size of this subgraph, rather than fixed at k.

Consequently, we further generalize the proposed approach to require vertex-

connectivity of a subgraph to be some function f of its size. We discuss connections of

the proposed models with other clique relaxation ideas from the literature and

demonstrate that our generalized framework, referred to as f-vertex-connectivity,

encompasses other known vertex-connectivity-based models, such as s-bundle and k-

block. We study related computational complexity issues and show that finding

maximum subgraphs with relatively large vertex connectivity is NP-hard. An interesting

special case that extends the R-robust 2-club model recently introduced in the literature

is also considered. In terms of solution techniques, we first develop general linear mixed

integer programming (MIP) formulations. Then we describe an effective exact

algorithm that iteratively solves a series of simpler MIPs, along with some

enhancements, in order to obtain an optimal solution for the original problem. Finally,

we perform computational experiments on several classes of random and real-life

networks to demonstrate performance of the developed solution approaches and

illustrate some properties of the proposed clique relaxation models.


A bi-criteria approach to solving huge hop-constrained Steiner tree

problems

Oleg Burdakov

Linkoping University, Sweden

We consider the directed Steiner tree problem (DSTP) with a constraint on the total

number of arcs (hops) in the tree. This problem is known to be NP-hard. Only heuristics

can be applied in the case of its instances whose size is beyond the capacity of the

existing exact algorithms. The hop-constrained DSTP is viewed as a bi-criteria problem

in which the tree cost and the number of hops are minimized. We derive optimality

conditions and use them for developing an approach aimed at approximately solving

hop-constrained DSTP. The approach can also be used for improving approximate

solutions produced by other heuristic algorithms or as a part of exact algorithms.

Specific label-correcting-type algorithms based on this approach will be presented, and

preliminary results of their performance on a set of test problems will be reported. The

test instances originate from 3D placement of unmanned aerial vehicles used for multi-

target surveillance. They are characterized by a relatively small number of terminal

nodes and a very large number of nodes and a huge number of arcs (above 108).



Network effects in economics and finance

Andrey Leonidov

Theoretical Physics Department, P.N. Lebedev Phhysical Institute, Moscow Chair of

Discrete Mathematics, Moscow Institute of Physics and Technology Laboratory of

Social Analysis, Rissian Endowment for Education and Science

In the talk a review of network-related effects in economics and finance at the examples

of interbank input-output and international trade networks is given. We start with

discussing systemic risks in the interbank networks related to default contagion

propagation. After discussing main characteristics of interbank networks we discuss

probabilistic models of default contagion taking into account the bow-tie structure,

scale-free degree distributions and disassortativity of the corresponding oriented graph.

We continue with the role of topological characteristics of the input-output networks

underlying the dynamic macroeconomic multi-sector models of real business cycles, in

particular of Bonachich centrality. We conclude with considering topological properties

of international trade networks considered as oriented weighted graphs and studying the

spillover propagation of import demand shocks.


Small Subgraphs in Preferential Attachment Networks

Andrey Raigorodskii

Yandex and Moscow State University, Moscow

Real-world networks such as web-graphs, social networks, biological networks, etc.

have many important characteristics including the degree distribution (which usually

follows a power law), the degree correlations, the diameter (which is usually small), the

robustness to random attacks on vertices and the vulnerability to attacks on hubs, and so

on. Also, a source of very important properties is given by counting small subgraphs in

the networks: the most well-known such property is "high clustering", but there are

many others. In our talk, we shall mainly concentrate on such properties.

On the other hand, many good and simple models of complex networks are provided by

the now classical principle of preferential attachment. So in the talk, we shall define

some of them and discuss the corresponding distributions of small subgraphs. We shall

give a survey of results including the most recent ones.



Ranking in large scale-free networks

Nelly Litvak

University of Twente, Netherlands

Ranking algorithms are crucial for assessing the importance of a node in a network, and

have a wide range of applications, from clustering of networks to link prediction. An

example is the famous Google PageRank algorithm for ranking web pages. In this talk I

will discuss several topics related to the mathematical properties of ranking algorithms

in large networks.

One of the examples is the distribution of a family of rankings, which includes Google's

PageRank in random graphs. It has been observed empirically in many studies that the

distribution of the PageRank and In-degree in directed networks are closely related,

however, the literature did not provide any rigorous explanation for this phenomenon.

We make an important step further by obtaining a complete characterization of

PageRank distribution in a random graph created by a Directed Configuration Model.

Our results show remarkable accuracy when compared to the PageRank distribution on

the Wikipedia.

Next, I will discuss the problem of finding nodes with highest in-degrees when the

network is unknown. This is the case, for example, in the Twitter follower network, that

can be only accesses via the Twitter API. We propose Monte Carlo algorithms to find

most important nodes using only a very small number of API requests. These methods

are surprisingly efficient because of the high variability in the nodes’ degrees.


Global clustering coefficient in scale-free networks

Liudmila Ostroumova Prokhorenkova

Yandex, Moscow

I will present a detailed analysis of the global clustering coefficient in scale-free graphs.

Many observed real-world networks of diverse nature have a power-law degree

distribution. Moreover, the observed degree distribution usually has an infinite variance.

Therefore, I will focus on such degree distributions.

There are two well-known definitions of the clustering coefficient of a graph: the global

and the average local clustering coefficients. There are several models proposed in the

literature for which the average local clustering coefficient tends to a positive constant

as a graph grows. On the other hand, there are no models of scale-free networks with an

infinite variance of the degree distribution and with an asymptotically constant global

clustering coefficient. Models with constant global clustering and finite variance were

also proposed. Therefore, in this talk I will focus only on the most interesting case and

analyze the global clustering coefficient for graphs with an infinite variance of the

degree distribution.

For unweighted graphs, I will show that the global clustering coefficient tends to zero

with high probability and I will also estimate the largest possible clustering coefficient

for such graphs. On the contrary, for weighted graphs, the constant global clustering

coefficient can be obtained even for the case of an infinite variance of the degree

distribution.



Local clustering coefficient in preferential attachment graphs

Alexander Krot

Moscow Institute of Physics and Technology (MIPT), Moscow

In our study we analyze the local clustering coefficient for the PA-class of models (a

wide class of models was defined in terms of constraints that are sufficient for the study

of the degree distribution and the clustering coefficient.). We analyze the behavior of

C(d) which is the average local clustering for vertices of degree d.

This talk is the continuation of (https://events.yandex.ru/lib/talks/1919/) with NEW

results about local clustering coefficient.


Factorization threshold models for scale-free networks generation

Akmaljon Artikov, Yana Kashinskaya, Aleksandr Dorodnykh


Egor Samosvat

Yandex, Moscow

Many real networks such as the World Wide Web, financial, biological, citation, social

networks have a power-law degree distribution. Networks with this feature are also

called scale-free. Several models for producing scale-free networks have been obtained

and the most of them are based on the preferential attachment approach. This method

forces old vertices of higher degree to gain edges added to a network more rapidly in a

“rich-get-richer” manner. We will offer the model with another scale-free property

explanation.

Let us define our model for a scale-free networks generation. The model has n vertices

denoted by vi (1 ≤ i ≤ n). We assume that a network is embedded in a d-dimensional

Euclidean space and vertices’ coordinate vectors xi are random variables which are

uniformly and independently distributed over the surface of the Sd-1

sphere. In addition,

each vertex has a weight wi (all the weights are i.i.d. random variables with the preset

density function). An edge between vertices vi and vJ is drawn if and only if

(xi, xj) ∙ wi ∙ wj ≥ θ, (1)

where θ is a fixed threshold for the existence of an edge between a pair of vertices.

Models with a preset threshold for the edge existence are usually called threshold

models. Threshold models were actively investigated recently and have shown good

results in a scale-free networks generation [1], [2]. Actually, coordinates of vertices can

be considered as a latent vector of features that brings us to the matrix factorization

approach which has been successfully used in the link prediction problem [3].Having

combined these approaches we got generative factorization threshold model for the

complex networks.

The overview of our results is the following. First, we will tune the threshold θ in order

to obtain a sparse graph. Then we will show that our model produces scale-free

networks with the fixed power-law exponent if vertices’ weights are distributed

according to the Pareto distribution. Moreover, we will generalize our model to generate

oriented networks with a tunable power-law exponents.

Finally, we will demonstrate our results using computer simulation.

[1] Naoki Masuda, Hiroyoshi Miwa, and Norio Konno. Geographical threshold graphs with small-world and scale-free

properties. Physical Review E,71(3):036108, 2005.

[2] Yukio Hayashi. A review of recent studies of geographical scale-free networks. IPSJ Digital Courier, 2:155–164,

2006.

[3] Aditya Krishna Menon and Charles Elkan. Link prediction via matrix factorization. In Machine Learning and

Knowledge Discovery in Databases, pages 437–452. Springer, 2011.



A complexity dichotomy and a new boundary class for the dominating

set problem

Dmitriy Malyshev

Laboratory of Algorithms and Technologies for Networks Analysis (LATNA),

National Research University Higher School of Economics, Nizhny Novgorod

We study the computational complexity of the dominating set problem for hereditary

graph classes, i.e., classes of simple graphs closed under isomorphism and deletion of

vertices. Every hereditary class can be defined by a set of its forbidden induced

subgraphs. There are numerous open cases for the complexity of the problem even for

hereditary classes with small forbidden structures. We completely determine the

complexity of the problem for classes defined by forbidding a five-vertex path and any

set of fragments with at most five vertices.

The notion of a boundary class is a helpful tool for analyzing the computational

complexity of graph problems in the family of hereditary classes. Three boundary

classes were known for the dominating set problem prior to this study. We present a

new boundary class for it.


Parallel algorithm of graph ordering for minimizing sparse Cholesky

factor fill-in

Anna Pirova

Lobachevsky State University of Nizhny Novgorod, Nizhny Novgorod

This work deals with the NP-complete problem of finding an ordering of graph vertices

that minimizes the fill-in of the Cholesky factor of the sparse matrix associated with the

graph. For this purpose, heuristic approaches based on graph algorithms are applied.

Nested dissection algorithm is one of such approaches commonly used due to its

potential for parallel processing. In this talk we address the problem of parallelization of

the multilevel nested dissection scheme for shared memory systems. The existing

libraries for parallel graph ordering have MPI-based implementations, nevertheless they

do not take into account the architectural features of the modern multicore systems. Our

work considers a new parallel ordering algorithm for shared-memory systems. Parallel

processing is done in a task-based fashion. We present and analyze two ways of

implementing the algorithm. The first approach employs a concurrent queue to store

subgraphs that can be ordered separately while the second one uses OpenMP 3.0 task

parallelism relying on the dynamic load balancing implemented in the OpenMP

runtime. The modified multilevel nested dissection algorithm from the recently

presented MORSy library is used for the ordering. Experimental results on the

symmetric positive definite matrices from the University of Florida Sparse Matrix

Collection prove the competiveness of our implementation on shared memory systems

to the widely used ParMetis library both in terms of the Cholesky factor fill-in and

performance. In our experiments parallel version of MORSy outperforms ParMetis on 8

matrices out of 14 with close quality of the resulting ordering.



Hybrid methods for mapping a parallel program onto computing

network

Margarita Pankratova


We study the architecture depending graph decomposition problem that is the problem

of decomposition and mapping a parallel program onto a multiprocessor system.

We propose the mathematical model of this problem. Weighted graph represents a

parallel program. The nodes represent the parallel parts of program, the edges represent

the data communications between these parts. Weighted hyper graph represents a

computing system. The nodes represent the processors of system, the hyper edges

represent the physical links between processors. We offer an algorithm for transforming

a hyper graph model to a matrix model of computing system. The size of the graph of

parallel program is usually much more than the number of processors of computing

system, so we need to decompose this graph taking into the account balance restrictions.

The goal of this problem is to assign the decomposed graph to the processors and to

minimize the common cost of communications.

This problem is known to be NP-hard, the size of program graphs is about 106-109

nodes. We propose two hybrid algorithms for solving this problem.

The first algorithm we propose is based on recursive bisection scheme. We offer to use

iterated multilevel algorithm for graph bisection and spectral algorithm for matrix

bisection. The proposed recursive algorithm can be used for decomposing the original

problem to a number of problems with reduced size or for finding a solution of original

problem.

The second algorithm we propose is based on reduction the original problem to the

quadratic assignment problem by k-decomposition of the program graph. This algorithm

consists of 3 steps. The first step is decomposition the program graph. The second step

is solving the quadratic assignment problem, and the third step is restoration of solution

and the local optimization.

We have implemented the proposed algorithms and examined them on test benchmarks.


The text network analysis: What does strategic documentation tell us

about regional integration?

Andrey Murashov


Values and attitude towards the regional integration process of the Russian political elite

are considered as an indication of what regional integration (RI) tends to be and how it

evolves over time. Our paper suggests how to systematically grasp and integrate elite’s

values and attitude into the analysis of RI by means of text network analysis (TNA).

Data to analyze is regional strategies of socio-economic development as a central and

most capacious source of information about political elite’s views on RI. From

methodological perspective we apply an approach which com-bines two methods -

comparative text-mining and graph analysis – “text network analysis”.

The TNA allows us to visualize the meanings and agendas present within political

manifests. We build a network of terms based on their co-occurrence in the same text

segments (paragraphs) extracted from the documents. There is an edge between two

terms if they appear in the same text segments. The weight of an edge is its frequency.

Such a net-work (or conceptual map) visualizes logical associations between concepts

presented in the political manifests.

The TNA is performed with R, specifically, with packages {igraph} (plots the graphs),

{tm} (provides functions for text mining) and {topicmodels} (classifies a corpus into

topics).

First we review general graph statistics followed by analysis of the networks’ content.

Upon removing most rare and random terms (concepts) from the networks we try to

detect communities in the graph (with the fast greedy algorithm). We also remove major

articulation points so that the layout of networks is rearrange and new concepts and

links between them are revealed. Topics modeling is used to estimate the similarity

between documents.

We found the TNA to be a valuable method for extracting elite’s attitude towards

regional integration process from public strategic documentation.



Threshold selection for pseudo-bimodal networks of retweets via

different metrics of network centrality

Alexander Semenov

International Laboratory for Applied Network Research, National Research University

Higher School of Economics, Moscow

We present a novel approach to cluster users of Twitter and characterize their

preferences based on graph features of communication networks extracted from their

tweets. We show that network clustering on Twitter can be observed more distinctively

on unimodal projections of artificially created bimodal networks, where the most

popular users in the networks, constructed from the @retweet relationship are

considered as nodes of the second mode. The theoretical assumption behind this

approach is that the central users in this network can be considered as “power-users”

and other users retweet behavior towards them differs from their retweets of each-other.

For this purpose, we select a subset of top n users based on their centrality value and

iteratively assign them to be the second mode in our pseudo-bimodal networks, adding

one user on each step in the descending order of their centrality scores. After that for

each step we create two projections of the obtained pseudo-bimodal network: one for

“top” users and one for “bottom”. As a result we get unimodal networks with more

distinct clusters structure for each class of users which allows us to show indirect

connections among users from both classes. We developed our approach on a dataset

gathered during the Russian protest meetings on 24th of December, 2011 and tested it

with different centrality measures: Degree, Closeness, Betweennes, PageRank,

Eigenvector, Katz’, Bonacich’s Alpha and Power centralities and Klinenberg’s hub and

authority scores. For each measure we calculate the optimal threshold for the number of

top nodes to be converted to the second mode in the pseudo-bimodal network, which

maximizes modularity of graphs in both projections. We found out that PageRank gives

the best results and discuss the issues with performance of our approach and its further

applications.


An Algorithm for Constraint/Generator Removal from Double

Sergey Bastrakov


A convex polyhedron in general dimension can be represented in two ways: as a set of

solutions to a system of linear inequalities (facet representation) and as a convex-conical

hull of a set of vectors (vertex representation). Facet and vertex representations together

form the double description of a polyhedron. We consider a problem of removing

elements from one of the representations (constrains or generators) given irreducible

double description of the original polyhedron. Namely, the problem is to compute the

vertex representation of a polyhedron defined by a subsystem of inequalities or facet

representation of a polyhedron generated by a subset of vectors. One of the applications

of the problem is automatic analysis, verification and optimization of software.

The naive approach is to directly solve the dual description problem for the resulting

polyhedron, neglecting the information of the double description of the original

polyhedron. However, this information can be used to construct a more efficient

algorithm, particularly when the number of removed elements is low. In 2014 Amato,

Scozzari & Zaffanella presented a new algorithm of this kind, called incremental.

We present a new algorithm. Similar to the incremental algorithm we construct a set of

facets adjacent to the facets being removed, but instead of solving the dual description

problem for this subset we find intersection points between those facets and a set of

specially constructed rays. The rays are continuations of edges of a polyhedron with

exactly one vertice lying on a removed facet. The complexity of the proposed algorithm

for removing one constraint is a product of squared sizes of the facet and vertex

representations of the original polyhedron. Computational experiments show that the

proposed algorithm outperforms the incremental algorithm by factor of 1.2 to 2x on

most test problems used by Amato, Scozzari & Zaffanella.



Improved polynomial time approximation guarantees for well

structured quadratic optimization problems

Yury Maximov

The Institute for Information Transmission Problems Russian Academy of Sciences,

National Research University Higher School of Economics, Moscow

Semidefinite programming arises as a relaxation for a wide variety of combinatorial

optimization problems. For most of them it is tight in the class of polynomial algorithms

under the unique games conjecture. Nevertheless, for well structured quadratic

programming problems, approximation guarantees can be significantly improved even

is the problem itself is still NP-hard. In this talk, we introduce a new approach to

overcome semidefinite programming approximation barrier by introducing a low

complexity pattern into a semidefinite dual and combining semidefinite programming

with the dynamic programming techniques to solve the problem. We provide some new

approximation guarantees as well as numerical experiments for practical problems

(max-cut, max-k-cut, correlation clustering). Some applications to coding theory and

network optimization are also mentioned. The talk is based on the joint research with

Yu. Nesterov.


Kernel methods in Natural Gas Storage Valuation

Theodore B. Trafalis, Alexander M. Malyscheff

School of Industrial and Systems Engineering, University of Oklahoma, USA

The valuation of natural gas storage contracts has recently received significant attention

in the energy management community. Least-Squares Monte Carlo (LSMC) represents

one approach to value such contracts. We apply kernel-based machine learning

techniques to derive the regression function required in the LSMC method.



Sentiment Analysis in Russian Social Networks

Dmitry Zhelonkin

National Research University Higher School of Economics, Nizhny Novgorod

The present research focuses on investigating question of sentiment analysis in Russian

social networks based on opinion mining by machine learning techniques. As a network

node we consider one marked message. Set of such texts form network. One of the main

features of the present work is considering of texts sentiment or tone not in terms of

linguistics in the conventional sense but in the way of mental perception or tonality of

the document which is understood by a person in light of the events during some period.

Another novelty of the research is new feature creation method using delta-TFIDF for

accelerating learning process. The result of the work is a program application of the

prototyped algorithm aimed at working with big data which can evaluate general mood

in social network in the framework of some topic.


On Laplacian regularization for generalized eigenvalue classifiers

Mario R. Guarracino, Mara Sangiovanni

High Performance Computing and Networking Institute, National Research Council,

Naples - Italy

Marco Viola, Gerardo Toraldo

Department of Mathematics and Applications, University of Naples “Federico II”,

Naples – Italy

Generalized Eigenvalue Classifiers are a class of supervised learning techniques derived

from Support Vector Machines. In this talk we describe some recent progresses

regarding the regularization of the classifiers in case of problems with more features

than samples (p>>n). We motivate the adoption of a novel regularization term that takes

into account the network structure of the training data and describe the advantages. We

provide some comparisons in terms of classification accuracy with other de facto

standard methods on real world datasets.



Network analysis of Mass spectroscopy medical data

Artem Ryblov


The amount of available medical data is dramatically increasing: in the last 10 minutes

we generated more data than from prehistoric times until 2003! There is the same

situation with proteomic data, and we need to develop new methods to analyse it. There

are many well-established ways to predict risk of disease by doing analysis of

proteomic biomarkers, but recent investigations have shown that observed biomarkers

do not cover the whole set of disease data. In this situation it is very promising to

discover and analyse network biomarkers, which take into account the changes in the

topology of interrelations between different parameters. Many well established methods

translated from graph analysis can be then utilized but what to do if links between

parameters are unknown? Recently developed parenclitic networks analysis is very

useful for this set-up. Our method is based on the classical approach utilizing search of

the best model by multivariate logistic regression, but instead of doing regression on the

original data we preprocess the data by building the parenclitic network and analysing

its topology.


Network analysis of methylation data for cancer diagnostics

Alexander Karsakov


At the present time DNA methylation patterns are established as having fundamental

role in the development of cancer diseases. Actually it is well known that methylation

values of specific genes are different between normal and cancer cells. In this work we

used mathematical theory of complex networks analysis to investigate some of features

of methylation regulation. Information obtained from patients with various kinds of

oncology diseases is represented as networks. Nodes in the network represent a specific

gene, while edges connecting them show an abnormal relation between their

methylation levels or other measures. The analysis of network topology allows us to

detect which topological indices are associated with the cancer development. Moreover,

networks of control and oncology subjects are different and there are numerical metrics

that can be used to distinguish and then perform classification of them. After analyzing

data of 12 cancer diseases I got accuracy rate of classification comparable with using

classical machine learning algorithms. In addition, a described approach allows

discovering important functional relationship between specific genes. Knowledge of

significant genes and their relationships can significantly help biologists and clinicians

to study possible ways of cancer treatment. Beyond the results obtained in the study of

this specific disease, the proposed algorithm may be used for analysis any other clinical

data, where the relationships between different features are more important than their

values.



Comparing Complexity of Combinatorial Polytopes

Alexandr Maximenko

P.G. Demidov Yaroslavl State University, Yaroslavl

We consider 0-1 polytopes associated with NP-hard problems. It is known that the face

lattice of such a polytope reflects the structure of the feasible solutions set of the

appropriate discrete optimization problem. Therefore we can measure complexity of a

problem in terms of combinatorial characteristics of its polytope. The simplest examples

of such characteristics are the number of vertices, the number of facets and the

dimension of a polytope. More interesting are the diameter of the graph, its clique

number and the extension complexity of a polytope. In this talk we compare complexity

of combinatorial polytopes associated with well known NP-hard problems: boolean

quadratic programming, graph coloring, knapsack problem, travelling salesman problem

and many others. In particular, we show that boolean quadratic polytopes are faces of

mentioned polytopes. Hence, in this sense they contain no extra details in comparison

with other polytopes associated with NP-hard problems.


König graphs for 4-path. Full description.

Dmitry Mokeev

Laboratory of Algorithms and Technologies for Networks Analysis (LATNA), National

Research University Higher School of Economics, Nizhny Novgorod

Let F be a class of graphs. A König graph for F is a graph in which every induced

subgraph has the property that minimum carginality of a set of vertices meeting every

induced F-subgraph of G equals a maximum number of vertex-disjoint induced F-

subgraphs in G.

The aim of this work is to characterize König graphs for set consisting from one simple

path with 4 vertices (4-path). There are two approaches to description of this class. One

of them is constructive: we show how to construct a graph of given class by operations

of edge subdivision and replacement of vertices and terminal paths with cographs. In

another approach we look for a standard description of hereditary class by forbidden

subgraphs.



Integer programming in simplices

Dmitry Gribanov



We investigate integer programming problems on polyhedrons that restricted to be

simplices. Note that deciding whether a simplex contains an integer point is trivially

NP-complete, since the set of feasible solutions of the knapsack problem is a simplex.

So the integer programming in simplex becomes NP-hard problem. Papadimitriou

(1981) showed, using dynamic programming, that if the n-dimensional polyhedron is

induced by system of inequalities with fixed number of rows and bounded elements,

then the integer programming problem can be solved in polynomial time. Thus one can

show, that the integer programming problem in simplex can be solved in polynomial

time, if the maximum absolute value of the system sub-determinants is fixed. We

developed an algorithm for integer programming, based on the unimodular cone

decomposition procedure, that can find all vertices of integer hull of the simplex. Our

algorithm has polynomial complexity in previous assumptions and not depends

exponentially from the right part coefficients of the system. Also we show how to make

some generalisations of this approach using more general class of polytopes.

Finally we show existence of the much more efficient algorithm, if the width of a

simplex is large enough. Unfortunately the computation of the width of a simplex is NP-

hard due to Andras Sebo (1999).


Computationally efficient PageRank algorithm exploting graph sparsity

Dmitry Kamzolov


In this work, we explore various mechanisms ranking web sites in terms of their

computational efficiency. Many Internet sites and links between them represented as a

weighted graph whose vertices correspond to the sites, and the edges correspond to the

links between sites. Rapid growth of the Internet motivates the creation of new efficient

algorithms. The main problem in the ranking problem is a huge number of sites that we

need ranking. The method, that works in linear time on the number of sites in the space

of dimension 108 and more, is computational expensive and inefficient. In this work, we

consider an algorithm based on the sparsity ideas for ranking web-pages. The key idea

of the method is using of component-wise descent with 1-norm for sparse matrix. In

contrast to the gradient descent it increases the number of steps of the algorithm, but

each step is done in a small number of arithmetic operations. Using this idea we can

solve a large class of ranking problems in logarithmic time with respect to the number

of sites. We also provide a computational experiment that check theoretical estimates of

the time of the algorithm. It is shown that the theoretical estimate of the number of steps

matches to the experiment.



Semi-Supervised PageRank Model Learning with Gradient-Free

Optimization Methods

Alexander Gasnikov

Moscow Institute of Physics and Technology (MIPT), The Institute for Information

Transmission Problems Russian Academy of Sciences, Moscow

In our work we consider a problem of web page relevance to a search query. We are

working in the framework called Semi-Supervised PageRank which can account for

some properties which are not considered by classical approaches such as PageRank and

BrowseRank algorithms. We introduce a graphical parametric model for web pages

ranking. The goal is to identify the unknown parameters using the information about

page relevance to a number of queries given by some experts (assessors). The resulting

problem is formulated as an optimization one. Due to hidden huge dimension of the last

problem we use random gradient-free methods with oracle error to solve it. We prove

the convergence theorem and give the number of arithmetic operations which is needed

to solve it with a given accuracy.


Sparsity and randomization based techniques in huge scale traffic

matrix estimation problems

Alexander Gagloev, Nazar Buzun, Yuriy Dorn, Alexander Gasnikov, Andrey

Golov, Aydar Gubaydullin, Yury Maximov, Mikhail Mendel


The problem is to recover the unknown Traffic Matrix, which is a high dimensional ill-

posed inverse problem. The typical dimension of the problems we are dealing with is

106 - 10

8. To solve this we propose to reduce the problem to a linearly constrained

quadratic convex optimization problem. The main goal of the work is to compare the

properties of different optimization techniques depending on the problem structure. We

focus on gradient type methods such as gradient descent, stochastic gradient descent and

componentwise descent in primal and dual spaces. We show that some special

properties of the incidence matrix (column sparsity, row sparsity) allows to improve

convergence guarantees for the algorithms above. Finally, we provide some numerical

experiments on real and synthetic data.



Bayesian Evidence Cascades and Seed-Initiated Marketing Campaigns

in Social Networks

Alexander Nikolaev

University at Buffalo, the State University of New York

The influence maximization problem, as defined in social network science, lies in

finding a set of seeds that can initiate a diffusion-driven cascade in an optimal way. We

explore flexible, time-dependent seed activation solutions for long-term

intervention/campaign planning on networks. We model influence propagation as

parallel Bayesian evidence cascades. The investigations with the model shed light on

the phenomena of belief reinforcement and viral spread of innovations, rumors,

opinions, etc., in social networks.

The NP-Hard problem of selecting a set of influential nodes to generate a maximal

cascade of "positive" subjective evidence (in support of a hypothesis claim preferred by

the decision-maker), is solved as a mixed-integer program.


Identification of concentration graph in Gaussian graphical model

Petr Koldanov, Alexander Koldanov, Panos Pardalos



Concentration graph is an important structure in Gaussian graphical models. Problem of

identification of concentration graph from observation attract a growing attention last

decade.

This problem is interesting from theoretical point of view and from practical

applications as well. Different identification procedures were studied in the literature.

A few results are known about optimality of considered procedures. In this talk we will

prove optimality of multiple testing identification procedure based on simultaneous

inference of optimal two-decision tests.



Dynamics of the ensemble of inhibitory coupled neuron-like Rulkov

maps

Alexey Kazakov, Tatiana Levanova

Lobachevsky State University of Nizhny Novgorod, National Research University

Higher School of Economics, Nizhny Novgorod

We study three neuron-like Rulkov maps [1-3] with mutual inhibitory couplings. In

order to receive more biological relevant description of couplings we consider main

features of real biological inhibitory couplings, such as dependence of postsynaptic

element activity level on presynaptic element activity level and inertia of couplings.

Constructed in such a way model is discrete and so it is very easy to numerical analysis.

We study numerically (using NN HSE cluster) different dynamical regimes that can be

obtained in this ensemble by governing coupling parameters, including chaotic regime,

and bifurcation transition from one regime to another.

[1] N.F. Rulkov, Phys. Rev. E 65 041922 (2002)

[2] A.L. Shilnikov, N.F. Rulkov, Bifurcations and Chaos 13(11), (2003)

[3] A.L. Shilnikov, N.F. Rulkov, Physics Letters A 328, 177 (2004)

the fifth international conference on network analysis net

Documents