Computational intelligence for big data analytics bda 2013

Download Computational intelligence for big data analytics   bda 2013

Post on 26-May-2015




0 download

Embed Size (px)


Big Data Analytics - a presentation by Dr Dom Heger at the 2013 International Conference on Knowledge, Innovation and Enterprise, London UK


<ul><li> 1. Big Data Analytics: Challenges and l h ll d What Computational Intelligence Techniques May Offer h ff Ah-Hwee Tan ( School of Computer Engineering Nanyang Technological University Big Data Analytics Symposium London, UK 13 September 2013</li></ul> <p> 2. Outline Big Data Analytics Computational Intelligence Techniques Web Data Analytics Flexible Organizer for Competitive Intelligence (FOCI)Web Information Fusion and Associative Discovery Di Analytics for Active Living for Elderly 3. The Era of Big Data Big data refers to collection of data sets so large and complex that th t exceed th competence of commonly used d the t f l d IT systems in terms of processing space and/or time. time 4. Sources of Big Data g Traditionally, mostly produced in scientific fields such as astronomy, meteorology astronomy meteorology, genomics physics biology and physics, biology, environmental research. With rapid development of IT technology and the p p gy consequent decrease of cost on collecting and storing data, big data has been generated from almost every industry and sector as well as governmental department department, including retail, finance, banking, security, audit, electric power, healthcare. Recently, big data over the Web (big Web data for short), which includes all the context data, such as, user generated contents, browser/search log data deep web contents data, data, etc. 5. Examples of Big Data (Source: Wikipedia) Walmart handles more than 1 million customer transactions every h hour, which i i hi h is imported i t d t b t d into databases estimated t ti t d to contain more than 2.5 petabytes (2560 terabytes) of data the equivalent of 167 times the information contained in all the books in the US Library of Congress. Facebook handles 50 billion photos from its user base. FICO Falcon Credit Card Fraud Detection System protects 2.1 billion active accounts world-wide. Windermere Real Estate uses anonymous GPS signals from nearly 100 million drivers to help new home buyers yp determine their typical drive times to and from work throughout various times of the day. 6. Examples of Big Data (Source: Wikipedia) NASA Center for Climate Simulation (NCCS) stores 32 petabytes of climate observations and simulations on the Discover supercomputing cluster. Utah Data Center is a data center currently c rrentl being constr cted b the constructed by United States National Security Agency. When finished, the facility will handle yottabytes of information collected by NSA over the Internet.ValueMetric1000kBkilobyte10002MBmegabyte10003GBgigabyte10004TBterabyte10005PBpetabyte10006EBexabyte10007ZBzettabyte10008YByottabyte 7. Money of Big Data (Source: Wikipedia) "Big data" have increased the demand of information g management specialists Software AG, Oracle Corporation, IBM, Microsoft, SAP, EMC, d SAP EMC and HP h have spent more than $15 billion t th billi on software firms specializing in data management and analytics. y In 2010, this industry on its own was worth more than $100 billion and was growing at almost 10 percent a year: about twice as fast as the software business as a whole. 8. Market of Big Data (Source: Wikipedia) Developed economies make increasing use of datadata intensive technologies. There are 4.6 billion mobilephone subscriptions worldwide and there are between 1 billion and 2 billion people accessing the internet The world's effective capacity to exchange information through telecommunication networks was 281 petabytes in 1986, 471 petabytes in 1993, 2.2 exabytes in 2000, 65 exabytes in 2007[14] and it is predicted that the amount of traffic flowing over the internet will reach 667 exabytes annually by 2013.[5] 9. Big Data Market Segments (Report by Transparency Market Research) Segmentation of the big data market by components, by g g y p , y applications and by geography. The different components included are software and services, hardware and storage. Software and services segment dominates the components market whereas storage segment will be the fastest growing segment for the next 5 years owing to the perpetual growth in th d t generated. t l th i the data t d 10. Big Data Market Segment by Applications Covered eight applications namely financial services, manufacturing, healthcare, telecommunication, government, retail and media &amp; entertainment and others in the application segment. Financial Services, healthcare and the government sector are the top three contributors of the big data market and together held more than 55% of the big data market in 2012. M di and E t t i Media d Entertainment and th h lth t d the healthcare sectors will t ill grow at high CAGR of nearly 42% from 2012 to 2018. The g growth in data in the form of video, images, and g g games is driving the media and entertainment segment. Read more: 11. Challenges of Big Data Volume Size in the order of petabytes, exabytes, Velocity Time sensitive data, data that g grow exponentially or even in p y rates that overwhelm the wellknown Moore's LawValueMetric1000kBkilobyte10002MBmegabyte10003GBgigabyte i b t10004TBterabyte10005PBpetabyte10006EBexabyte10007ZBzettabyte10008YByottabyte V i t Variety From structured data into semi-structured and completely unstructured data of different types such as types, text, image, audio, video, click streams, log files, 12. Deeper Issues of Big Data (The additional 3Vs) Validity Is the data correct and accurate for the intended usage? V Veracity i Are the results meaningful for the given problem space? Volatility How long do you need to look/store this data? 13. Computational Intelligence Neural Networks (IJCNN) Brain-like mathematical models for pattern recognition, memory, and association discovery Examples: Perceptron, BP, SVM, SOM, ART, Fuzzy Systems (IEEE-FUZZ) Fuzzy operators for handling non-discrete reasoning Examples: FNN, Fuzzy C-Means, 14. Computational Intelligence Evolutionary Computing (CEC) Classes of heuristic algorithms repeatedly search for good solutions by mimicking g y g the process of natural evolution Commonly used for optimization and search problems Examples: Genetic Algo, Memetic Algo, 15. Flagship Events of Computational Intelligence World Congress on Computational Intelligence (Australia 2012, Beijing 2014) y p p g IEEE Symposium on Computational Intelligence (Singapore 2013, Florida, USA 2014) IEEE Symposium on Computational Intelligence in Big Data (IEEE CIBD'2014) 16. Examples of Use of CI in Big Data Data size and feature space adaptation Uncertainty modeling in learning from big data Distributed learning techniques in uncertain environment Uncertainty in cloud computing Distributed Di ib d parallel computation ll l i Feature selection/extraction in big data Sample selection based on uncertainty Incremental Learning Manifold Learning on big data Uncertainty techniques in big data classification/clustering Imbalance learning on big data Active learning on big data Random weight networks on bi d t R d i ht t k big data Transfer learning on big data 17. Self-Organizing N S lf O i i Neural l Networks for Personalized W b Intelligence P li d Web I t lliTowards Personalized Web Intelligence g Ah-Hwee Tan, Hwee-Leng Ong, Hong Pan, Jamie Ng, Qiu-Xiang Li Knowledge and Information Systems 18 (2004) 297-306 18. Workflow for Web Data Analytics y Search Getting the information Organize (clustering/categorizing) Putting things in perspectives Analyze (data mining) Discover hidden knowledge Share (knowledge management) Saving for reference and sharing Track Constant monitoring 19. Approaches to Organizing/Analyzing Cl stering Clustering Organizing information into groups based on similarity functions and thresholds e.g. BullsEye, NorthernLight, Vivisimo Categorization g Organizing information into a predefined set of classes e.g. Yahoo!, Autonomy Knowledge Server Which is better? 20. Clustering g Pros Unsupervised/self-organizing, require no training or predefinition of classes Able to identify new themes Cons Users have no control Ever changing cluster structure Difficult to navigate and track 21. Categorization g Pros Good control on classes Every info assigned to one or more classes of interests Cons R Require l i learning ( i (supervised) and/or i d) d/ definition of classification rules/knowledge Every info has to be assigned to one or more classes Good control but lack flexibility to handle new information 22. User-configurable Clustering (Tan &amp; Pan, PAKDD 2002) Pan Information organization and content organi ation management Online incremental clustering + user userdefined structure (preferences) Reduces to a clustering system if no user indication given Allows personalization in a direct direct, intuitive, and interactive manner Control + flexibility 23. ARAM for Personalized Information Management Information Clusters F2bF1aF1 aa-xbxb++Information Vector-ABPreference Vector 24. Flexible Organizer for Competitive Intelligence (FOCI) A platform for gathering, organizing, tracking, analyzing, and sharing competitive information Natural way of turning raw search results into personalized CI portfolios Multilingual enabled with Multilingual Efficient Analyzer g y Domain localization (Technology) Patented and licensed to many companies 25. FOCI User Interface 26. FOCI Architecture Intranet/ InternetUsers CI Portfolio Domain-Specific KnowledgeContent Management Content Publishing g Content AnalysisVisu ualization Front End dContent Gathering 27. Personalized Content Management g Portfolio created through Search f S Unsupervised clustering (ARAM Pattern Channel A) Loop Personalization by users (ARAM Pattern Channel B) Reorganization of clusters (ARAM Pattern Channel A&amp;B) Saving of personalized portfolio Tracking of new information 28. Personalization Functions Marking/labeling (selected) clusters Personal interpretation Inserting Clusters Indicate preference on groupings Merging clusters Indicate preferences on similarities Splitting clusters Indicate preferences on differences ... 29. Information Clustering g A portfolio created by a meta-search of y 4 search engines with a query on Text Mining 30. A Personalized Portfolio after</p>