information processing architectures

32
Information Processing Architectures Raji Gogulapati, Sep 2014

Upload: raji-gogulapati

Post on 20-Jun-2015

87 views

Category:

Education


0 download

DESCRIPTION

Information processing architectures

TRANSCRIPT

Page 1: Information processing architectures

Information Processing Architectures

Raji Gogulapati, Sep 2014

Page 2: Information processing architectures

Information Search

Information Acquisition

Information Processing

Information Maintenance

Information Retention

Information System Management

Page 3: Information processing architectures

Information Processing

Online transaction processing

(OLTP)

Online Analytical Processing (OLAP)

Complex Event Processing

(CPP)

Massively Parallel Processing (MPP)

Legacy

Random

Page 4: Information processing architectures

Infrastructure Essentials for Information Processing

Page 5: Information processing architectures

Shared

Nothing

• OLAP • BI, DW, Big Data

Shared

Disk

• Traditional RDMS • OLTP

Shared

Everything

• Traditional RDMS • OLTP

Infrastructure Models of Databases

Page 6: Information processing architectures

Process

Disk

Process Process Process

Disk

Process

Shared Everything Shared Disk

Database Architectures

Relational Data management systems for OLTP information

Page 7: Information processing architectures

Process

Disk

Process

Disk

Process

Disk

Process

Disk

Master

Shared Nothing, Massively Parallel Architecture Layout

For Data Warehousing, Business Intelligence, Big Data loads of information

Page 8: Information processing architectures

Trade offs

Assigning tasks at proper time in the determined order

Batch and online scheduling algorithms

Priority based, First come first served, Round Robin

Load balancing across nodes

Serializing data transfer

Data Transfer, computation delays

Data overflow, underflow

Reference: chapter3, Hu, Wen-Chen, and Naima Kaabouch (eds). Big Data Management, Technologies, and Applications. IGI Global. © 2014.

Page 9: Information processing architectures

Map Reduce Approach For Big Data Processing

Chapter 2,Hu, Wen-Chen, and Naima Kaabouch (eds). Big Data Management, Technologies, and Applications. IGI Global. © 2014.

Step 1 - Split Big data among multiple parallel map data Step 2 - Merge and Reduce

data by grouping

Distributed Memory system

Dynamic Job scheduling Scalable

Key Value Pairs Fault Tolerant

Page 10: Information processing architectures

Map Reduce Concept - Key Value Pairs

A B C D D CA D B

A B C

D D C

A D B

Input A - 1B - 1 C - 1

D – 1D – 1C - 1

A – 1D – 1B - 1

Map

Shuffle/ Sort

A – 1A – 1

B – 1 B - 1

C – 1 C - 1

D – 1D – 1D - 1

Reduce

D – 3

C – 2

B – 2

A - 2

A – 2B – 2C - 2D – 3

Output

Page 11: Information processing architectures

Information Processing – Focus and Changes

Map Reduce Framework and Hadoop Distributed File system

• To perform analytics in parallel• Map & Reduce Functions run in parallel Parallelism

• Share nothing • Compute Nodes

Fault Tolerance

• Scale CPU, memory. Robust data management techniques to optimize data retrieval and storage.

• Assign data processing work load to that server where the data is stored as per Map Reduce.

ScalabilityData Locality

Page 12: Information processing architectures

A Few Basics

Page 13: Information processing architectures

ACID, BASE, CAP

Relational database management systems follow ACID rules – Atomicity, Consistency, Isolation, Durability

What to expect from Search – BASE

Yes, Search returns innumerable pages of data

Only one page is basically available - BA

Rest of the data is in Soft State - S Rest of the data becomes eventually consistent - E

According to Database Theory – Distributed NoSQL big databases can satisfy only two of CAP and have to relax the Expectations on the third.. CAP – Consistency, Availability, Partition Tolerance

Page 14: Information processing architectures

Distributed Information management

C J Date’s Rules (12) for Distributed Databases

Location autonomyNo reliance on a central site for any particular service

Continuous operation

Location Independence

Fragmentation independence

Replication independence Distributed query processing

Distributed transaction managementHardware independence

Operating system independence

Network IndependenceDBMS independence

Page 15: Information processing architectures

Multiple Models For Data Architectures

Legacy, traditional RDBS Object oriented

Distributed Client Server

Data Warehouses

Parallel and Massively Parallel

Partitioning Active Databases - Intelligence

Spatial Multimedia

Temporal

Page 16: Information processing architectures

Client Server Databases, Middleware - Drivers

Remote Database Access (RDA)Distributed Relational Database Architecture

Integrated Database Application Programming Interface (IDAPI)

Data Access Language (DAL)

Open Database Connectivity (ODBC)

1990’s

Page 17: Information processing architectures

Client Server basic model in the ‘80s

Adapted from figure 3.2 mid ‘80s client/ server environment, chapter 3, client server databases and middleware

Server applications

Interface Interface

Client PC

Request

Data

Page 18: Information processing architectures

Data Warehouse – Applications

Non volatile

Time variant

Integrated

Subject oriented

Page 19: Information processing architectures

Data warehousing Models for analytical applications – pre-web

Star

Snowflake

Constellation

Page 20: Information processing architectures

Data warehousing Models for analytical applications – complex web data

Use XML to model data warehouses

Combining OLAP tools with Data mining

Rule based multi dimensional model

Page 21: Information processing architectures

Next generation data warehouse

Analytics

Semantic interfaces/ Rules engines, Hadoop/ NoSQL, RDBMS

Data layerOLTP, legacy data, web data

Page 22: Information processing architectures

Source: http://www.sybase.com/files/White_Papers/TDWI_BPR_NextGenDWPlatforms_Q409.pdf

Page 23: Information processing architectures

Business Intelligence – Models

Source: www.beyenetwork.com, http://www.b-eye-network.com/view/8385.

DSS 2.0 architecture

Page 24: Information processing architectures

Multi tier distributed enterprise applications – Y2k period

Information system tier

Client tier

Presentation (Web) Tier

Frameworks such as J2EE,.Net

Database

Business logic tier

Database serverApplication Server Client server

Page 25: Information processing architectures
Page 26: Information processing architectures

Mobile data progress

Adapted from gsma.com, Mena, Jesus. "Chapter 3 - Mobile Data". Data Mining Mobile Devices. Auerbach Publications, © 2013

1 G 2G 2.5G 3G 4G

analog Digital

GSM GPRS EDGE WCDMA

Page 27: Information processing architectures

Legacy Migrations Cloud environment – Suitability

On going discussions and debates

Page 28: Information processing architectures

Social, Mobile, Cloud environments for enterprise applications

Page 29: Information processing architectures

Cloud Infrastructures for processing information

In the context of Big data,

This topic is reserved for a more comprehensive coverage separately

“ Bandey, D.(2012), Doctor of Law says "When a Corporation mines the Big Data within its IT infrastructure a number of laws will automatically be in play. However, if That Corporation wants to analyze the same Big data in the cloud-a new tier of legal obligations and restrictions arise. Some of them quite foreign to a management previously accustomed to dealing with its own data within its own infrastructure“ “

Raj, Pethuru, and Ganesh Chandra Deka (eds). "Chapter 2 - Big Data Computing and the Reference Architecture".Handbook of Research on Cloud Infrastructures for Big Data Analytics. IGI Global. © 2014.

Page 30: Information processing architectures

Topics for cloud and information processing

Raj, Pethuru, and Ganesh Chandra Deka (eds). "Chapter 9 - Cloud Database Systems: NoSQL, NewSQL, and Hybrid".Handbook of Research on Cloud Infrastructures for Big Data Analytics. IGI Global. © 2014

Several terms and topics in this area.

Cloud database systems Cloud Storage Data as a Service

Database as a service Data Models

Cloud computing demands five crucial characteristics for evaluating databases fit for cloud environment

On demand self service, broad network access, resource pooling, rapid elasticity and Measured service.

Page 31: Information processing architectures

Big Data Case Studies

Conversions – Traditional Main frame to Hadoop, NoSQL db

Recommendation Engine Video Streaming Analytics

Real Time Traffic monitoring

Social behaviors log processing

Page 32: Information processing architectures

References:

Dow, K. E., Hackbarth, G., & Wong, J. (2013). Data architectures for an organizational memory information system. Journal Of The American Society For Information Science & Technology, 64(7), 1345-1356. doi:10.1002/asi.22848

Chessell, Mandy & Smith, Harald C.. ( © 2013). Patterns of Information management.

Hu, Wen-Chen, and Naima Kaabouch (eds). Big Data Management, Technologies, and Applications. IGI Global. © 2014.

Alan R. Simon, Strategic Database Technology: Management for the year 2000.

http://www-01.ibm.com/software/data/infosphere/hadoop/hdfs/

Krishnan, Krish. ( © 2013). Data warehousing in the age of big data.