statistical and machine learning approaches for network analysis (dehmer/machine learning for...

10
STATISTICAL AND MACHINE LEARNING APPROACHES FOR NETWORK ANALYSIS

Upload: subhash-c

Post on 31-Jan-2017

212 views

Category:

Documents


0 download

TRANSCRIPT

STATISTICAL ANDMACHINE LEARNINGAPPROACHES FORNETWORK ANALYSIS

STATISTICAL ANDMACHINE LEARNINGAPPROACHES FORNETWORK ANALYSIS

Edited by

MATTHIAS DEHMERUMIT – The Health and Life Sciences University, Institute for Bioinformatics andTranslational Research, Hall in Tyrol, Austria

SUBHASH C. BASAKNatural Resources Research InstituteUniversity of Minnesota, DuluthDuluth, MN, USA

Copyright © 2012 by John Wiley & Sons, Inc. All rights reserved

Published by John Wiley & Sons, Inc., Hoboken, New JerseyPublished simultaneously in Canada

No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form orby any means, electronic, mechanical, photocopying, recording, scanning, or otherwise, except aspermitted under Section 107 or 108 of the 1976 United States Copyright Act, without either the priorwritten permission of the Publisher, or authorization through payment of the appropriate per-copy fee tothe Copyright Clearance Center, Inc., 222 Rosewood Drive, Danvers, MA 01923, (978) 750-8400, fax(978) 750-4470, or on the web at www.copyright.com. Requests to the Publisher for permission shouldbe addressed to the Permissions Department, John Wiley & Sons, Inc., 111 River Street, Hoboken, NJ07030, (201) 748-6011, fax (201) 748-6008, or online at http://www.wiley.com/go/permission.

Limit of Liability/Disclaimer of Warranty: While the publisher and author have used their best efforts inpreparing this book, they make no representations or warranties with respect to the accuracy orcompleteness of the contents of this book and specifically disclaim any implied warranties ofmerchantability or fitness for a particular purpose. No warranty may be created or extended by salesrepresentatives or written sales materials. The advice and strategies contained herein may not be suitablefor your situation. You should consult with a professional where appropriate. Neither the publisher norauthor shall be liable for any loss of profit or any other commercial damages, including but not limited tospecial, incidental, consequential, or other damages.

For general information on our other products and services or for technical support, please contact ourCustomer Care Department within the United States at (800) 762-2974, outside the United States at (317)572-3993 or fax (317) 572-4002.

Wiley also publishes its books in a variety of electronic formats. Some content that appears in print maynot be available in electronic formats. For more information about Wiley products, visit our web site atwww.wiley.com.

Library of Congress Cataloging-in-Publication Data:

ISBN: 978-0-470-19515-4

Printed in the United States of America

10 9 8 7 6 5 4 3 2 1

To Christina

CONTENTS

Preface ix

Contributors xi

1 A Survey of Computational Approaches to Reconstruct andPartition Biological Networks 1Lipi Acharya, Thair Judeh, and Dongxiao Zhu

2 Introduction to Complex Networks: Measures,Statistical Properties, and Models 45Kazuhiro Takemoto and Chikoo Oosawa

3 Modeling for Evolving Biological Networks 77Kazuhiro Takemoto and Chikoo Oosawa

4 Modularity Configurations in Biological Networks withEmbedded Dynamics 109Enrico Capobianco, Antonella Travaglione, and Elisabetta Marras

5 Influence of Statistical Estimators on the Large-ScaleCausal Inference of Regulatory Networks 131Ricardo de Matos Simoes and Frank Emmert-Streib

vii

viii CONTENTS

6 Weighted Spectral Distribution: A Metric for StructuralAnalysis of Networks 153Damien Fay, Hamed Haddadi, Andrew W. Moore, Richard Mortier,Andrew G. Thomason, and Steve Uhlig

7 The Structure of an Evolving Random Bipartite Graph 191Reinhard Kutzelnigg

8 Graph Kernels 217Matthias Rupp

9 Network-Based Information Synergy Analysis forAlzheimer Disease 245Xuewei Wang, Hirosha Geekiyanage, and Christina Chan

10 Density-Based Set Enumeration in Structured Data 261Elisabeth Georgii and Koji Tsuda

11 Hyponym Extraction Employing a Weighted Graph Kernel 303Tim vor der Bruck

Index 327

PREFACE

An emerging trend in many scientific disciplines is a strong tendency toward beingtransformed into some form of information science. One important pathway in thistransition has been via the application of network analysis. The basic methodology inthis area is the representation of the structure of an object of investigation by a graphrepresenting a relational structure. It is because of this general nature that graphs havebeen used in many diverse branches of science including bioinformatics, molecularand systems biology, theoretical physics, computer science, chemistry, engineering,drug discovery, and linguistics, to name just a few. An important feature of the book“Statistical and Machine Learning Approaches for Network Analysis” is to combinetheoretical disciplines such as graph theory, machine learning, and statistical dataanalysis and, hence, to arrive at a new field to explore complex networks by usingmachine learning techniques in an interdisciplinary manner.

The age of network science has definitely arrived. Large-scale generation ofgenomic, proteomic, signaling, and metabolomic data is allowing the constructionof complex networks that provide a new framework for understanding the molecularbasis of physiological and pathological states. Networks and network-based methodshave been used in biology to characterize genomic and genetic mechanisms as wellas protein signaling. Diseases are looked upon as abnormal perturbations of criticalcellular networks. Onset, progression, and intervention in complex diseases such ascancer and diabetes are analyzed today using network theory.

Once the system is represented by a network, methods of network analysis canbe applied to extract useful information regarding important system properties and toinvestigate its structure and function. Various statistical and machine learning methodshave been developed for this purpose and have already been applied to networks. Thepurpose of the book is to demonstrate the usefulness, feasibility, and the impact of the

ix

x PREFACE

methods on the scientific field. The 11 chapters in this book written by internationallyreputed researchers in the field of interdisciplinary network theory cover a wide rangeof topics and analysis methods to explore networks statistically.

The topics we are going to tackle in this book range from network inference andclustering, graph kernels to biological network analysis for complex diseases usingstatistical techniques. The book is intended for researchers, graduate and advancedundergraduate students in the interdisciplinary fields such as biostatistics, bioinfor-matics, chemistry, mathematical chemistry, systems biology, and network physics.Each chapter is comprehensively presented, accessible not only to researchers fromthis field but also to advanced undergraduate or graduate students.

Many colleagues, whether consciously or unconsciously, have provided us withinput, help, and support before and during the preparation of the present book. Inparticular, we would like to thank Maria and Gheorghe Duca, Frank Emmert-Streib,Boris Furtula, Ivan Gutman, Armin Graber, Martin Grabner, D. D. Lozovanu, AlexeiLevitchi, Alexander Mehler, Abbe Mowshowitz, Andrei Perjan, Ricardo de MatosSimoes, Fred Sobik, Dongxiao Zhu, and apologize to all who have not been namedmistakenly. Matthias Dehmer thanks Christina Uhde for giving love and inspiration.We also thank Frank Emmert-Streib for fruitful discussions during the formation ofthis book.

We would also like to thank our editor Susanne Steitz-Filler from Wiley who hasbeen always available and helpful. Last but not the least, Matthias Dehmer thanksthe Austrian Science Funds (project P22029-N13) and the Standortagentur Tirol forsupporting this work.

Finally, we sincerely hope that this book will serve the scientific community ofnetwork science reasonably well and inspires people to use machine learning-drivennetwork analysis to solve interdisciplinary problems successfully.

Matthias DehmerSubhash C. Basak

CONTRIBUTORS

Lipi Acharya, Department of Computer Science, University of New Orleans, NewOrleans, LA, USA

Enrico Capobianco, Laboratory for Integrative Systems Medicine (LISM)IFC-CNR, Pisa (IT); Center for Computational Science, University of Miami,Miami, FL, USA

Christina Chan, Departments of Chemical Engineering and Material Sciences,Genetics Program, Computer Science and Engineering, and Biochemistry andMolecular Biology, Michigan State University, East Lansing, MI, USA

Ricardo de Matos Simoes, Computational Biology and Machine Learning Lab,Center for Cancer Research and Cell Biology, School of Medicine, Dentistry andBiomedical Sciences, Queen’s University Belfast, UK

Frank Emmert-Streib, Computational Biology and Machine Learning Lab,Center for Cancer Research and Cell Biology, School of Medicine, Dentistry andBiomedical Sciences, Queen’s University Belfast, UK

Damien Fay, Computer Laboratory, Systems Research Group, University ofCambridge, UK

Hirosha Geekiyanage, Genetics Program, Michigan State University, East Lansing,MI, USA

Elisabeth Georgii, Department of Information and Computer Science, HelsinkiInstitute for Information Technology, Aalto University School of Science andTechnology, Aalto, Finland

xi

xii CONTRIBUTORS

Hamed Haddadi, Computer Laboratory, Systems Research Group, University ofCambridge, UK

Thair Judeh, Department of Computer Science, University of New Orleans, NewOrleans, LA, USA

Reinhard Kutzelnigg, Math.Tec, Heumühlgasse, Wien, Vienna, Austria

Elisabetta Marras, CRS4 Bioinformatics Laboratory, Polaris Science andTechnology Park, Pula, Italy

Andrew W. Moore, School of Computer Science, Carnegie Mellon University, USA

Richard Mortier, Horizon Institute, University of Nottingham, UK

Chikoo Oosawa, Department of Bioscience and Bioinformatics, Kyushu Institute ofTechnology, Iizuka, Fukuoka 820-8502, Japan

Matthias Rupp, Machine Learning Group, Berlin Institute of Technology, Berlin,Germany, and, Institute of Pure and Applied Mathematics, University of California,Los Angeles, CA, USA; currently at the Institute of Pharmaceutical Sciences, ETHZurich, Zurich, Switzerland.

Kazuhiro Takemoto, Department of Bioscience and Bioinformatics, KyushuInstitute of Technology, Iizuka, Fukuoka 820-8502, Japan; PRESTO, JapanScience and Technology Agency, Kawaguchi, Saitama 332-0012, Japan

Andrew G. Thomason, Department of Pure Mathematics and MathematicalStatistics, University of Cambridge, UK

Antonella Travaglione, CRS4 Bioinformatics Laboratory, Polaris Science andTechnology Park, Pula, Italy

Koji Tsuda, Computational Biology Research Center, National Institute ofAdvanced Industrial Science and Technology AIST, Tokyo, Japan

Steve Uhlig, School of Electronic Engineering and Computer Science, Queen MaryUniversity of London, UK

Tim vor der Bruck, Department of Computer Science, Text Technology Lab, JohannWolfgang Goethe University, Frankfurt, Germany

Xuewei Wang, Department of Chemical Engineering and Material Sciences,Michigan State University, East Lansing, MI, USA

Dongxiao Zhu, Department of Computer Science, University of New Orleans;Research Institute for Children, Children’s Hospital; Tulane Cancer Center, NewOrleans, LA, USA