information processing architectures

Information Processing Architectures

Raji Gogulapati, Sep 2014

Information Search

Information Acquisition

Information Processing

Information Maintenance

Information Retention

Information System Management

Information Processing

Online transaction processing

(OLTP)

Online Analytical Processing (OLAP)

Complex Event Processing

(CPP)

Massively Parallel Processing (MPP)

Legacy

Random

Infrastructure Essentials for Information Processing

Shared

Nothing

• OLAP • BI, DW, Big Data

Shared

Disk

• Traditional RDMS • OLTP

Shared

Everything

• Traditional RDMS • OLTP

Infrastructure Models of Databases

Process

Disk

Process Process Process

Disk

Process

Shared Everything Shared Disk

Database Architectures

Relational Data management systems for OLTP information

Process

Disk

Process

Disk

Process

Disk

Process

Disk

Master

Shared Nothing, Massively Parallel Architecture Layout

For Data Warehousing, Business Intelligence, Big Data loads of information

Trade offs

Assigning tasks at proper time in the determined order

Batch and online scheduling algorithms

Priority based, First come first served, Round Robin

Load balancing across nodes

Serializing data transfer

Data Transfer, computation delays

Data overflow, underflow

Reference: chapter3, Hu, Wen-Chen, and Naima Kaabouch (eds). Big Data Management, Technologies, and Applications. IGI Global. © 2014.

Map Reduce Approach For Big Data Processing

Chapter 2,Hu, Wen-Chen, and Naima Kaabouch (eds). Big Data Management, Technologies, and Applications. IGI Global. © 2014.

Step 1 - Split Big data among multiple parallel map data Step 2 - Merge and Reduce

data by grouping

Distributed Memory system

Dynamic Job scheduling Scalable

Key Value Pairs Fault Tolerant

Map Reduce Concept - Key Value Pairs

A B C D D CA D B

A B C

D D C

A D B

Input A - 1B - 1 C - 1

D – 1D – 1C - 1

A – 1D – 1B - 1

Map

Shuffle/ Sort

A – 1A – 1

B – 1 B - 1

C – 1 C - 1

D – 1D – 1D - 1

Reduce

D – 3

C – 2

B – 2

A - 2

A – 2B – 2C - 2D – 3

Output

Information Processing – Focus and Changes

Map Reduce Framework and Hadoop Distributed File system

• To perform analytics in parallel• Map & Reduce Functions run in parallel Parallelism

• Share nothing • Compute Nodes

Fault Tolerance

• Scale CPU, memory. Robust data management techniques to optimize data retrieval and storage.

• Assign data processing work load to that server where the data is stored as per Map Reduce.

ScalabilityData Locality

A Few Basics

ACID, BASE, CAP

Relational database management systems follow ACID rules – Atomicity, Consistency, Isolation, Durability

What to expect from Search – BASE

Yes, Search returns innumerable pages of data

Only one page is basically available - BA

Rest of the data is in Soft State - S Rest of the data becomes eventually consistent - E

According to Database Theory – Distributed NoSQL big databases can satisfy only two of CAP and have to relax the Expectations on the third.. CAP – Consistency, Availability, Partition Tolerance

Distributed Information management

C J Date’s Rules (12) for Distributed Databases

Location autonomyNo reliance on a central site for any particular service

Continuous operation

Location Independence

Fragmentation independence

Replication independence Distributed query processing

Distributed transaction managementHardware independence

Operating system independence

Network IndependenceDBMS independence

Multiple Models For Data Architectures

Legacy, traditional RDBS Object oriented

Distributed Client Server

Data Warehouses

Parallel and Massively Parallel

Partitioning Active Databases - Intelligence

Spatial Multimedia

Temporal

Client Server Databases, Middleware - Drivers

Remote Database Access (RDA)Distributed Relational Database Architecture

Integrated Database Application Programming Interface (IDAPI)

Data Access Language (DAL)

Open Database Connectivity (ODBC)

1990’s

Client Server basic model in the ‘80s

Adapted from figure 3.2 mid ‘80s client/ server environment, chapter 3, client server databases and middleware

Server applications

Interface Interface

Client PC

Request

Data

Data Warehouse – Applications

Non volatile

Time variant

Integrated

Subject oriented

Data warehousing Models for analytical applications – pre-web

Star

Snowflake

Constellation

Data warehousing Models for analytical applications – complex web data

Use XML to model data warehouses

Combining OLAP tools with Data mining

Rule based multi dimensional model

Next generation data warehouse

Analytics

Semantic interfaces/ Rules engines, Hadoop/ NoSQL, RDBMS

Data layerOLTP, legacy data, web data

Source: http://www.sybase.com/files/White_Papers/TDWI_BPR_NextGenDWPlatforms_Q409.pdf

Business Intelligence – Models

Source: www.beyenetwork.com, http://www.b-eye-network.com/view/8385.

DSS 2.0 architecture

http://www.beyenetwork.com/

http://www.b-eye-network.com/view/8385

Multi tier distributed enterprise applications – Y2k period

Information system tier

Client tier

Presentation (Web) Tier

Frameworks such as J2EE,.Net

Database

Business logic tier

Database serverApplication Server Client server

Mobile data progress

Adapted from gsma.com, Mena, Jesus. "Chapter 3 - Mobile Data". Data Mining Mobile Devices. Auerbach Publications, © 2013

1 G 2G 2.5G 3G 4G

analog Digital

GSM GPRS EDGE WCDMA

Legacy Migrations Cloud environment – Suitability

On going discussions and debates

Social, Mobile, Cloud environments for enterprise applications

Cloud Infrastructures for processing information

In the context of Big data,

This topic is reserved for a more comprehensive coverage separately

“ Bandey, D.(2012), Doctor of Law says "When a Corporation mines the Big Data within its IT infrastructure a number of laws will automatically be in play. However, if That Corporation wants to analyze the same Big data in the cloud-a new tier of legal obligations and restrictions arise. Some of them quite foreign to a management previously accustomed to dealing with its own data within its own infrastructure“ “

Raj, Pethuru, and Ganesh Chandra Deka (eds). "Chapter 2 - Big Data Computing and the Reference Architecture".Handbook of Research on Cloud Infrastructures for Big Data Analytics. IGI Global. © 2014.

Topics for cloud and information processing

Raj, Pethuru, and Ganesh Chandra Deka (eds). "Chapter 9 - Cloud Database Systems: NoSQL, NewSQL, and Hybrid".Handbook of Research on Cloud Infrastructures for Big Data Analytics. IGI Global. © 2014

Several terms and topics in this area.

Cloud database systems Cloud Storage Data as a Service

Database as a service Data Models

Cloud computing demands five crucial characteristics for evaluating databases fit for cloud environment

On demand self service, broad network access, resource pooling, rapid elasticity and Measured service.

Big Data Case Studies

Conversions – Traditional Main frame to Hadoop, NoSQL db

Recommendation Engine Video Streaming Analytics

Real Time Traffic monitoring

Social behaviors log processing

References:

Dow, K. E., Hackbarth, G., & Wong, J. (2013). Data architectures for an organizational memory information system. Journal Of The American Society For Information Science & Technology, 64(7), 1345-1356. doi:10.1002/asi.22848

Chessell, Mandy & Smith, Harald C.. ( © 2013). Patterns of Information management.

Hu, Wen-Chen, and Naima Kaabouch (eds). Big Data Management, Technologies, and Applications. IGI Global. © 2014.

Alan R. Simon, Strategic Database Technology: Management for the year 2000.

http://www-01.ibm.com/software/data/infosphere/hadoop/hdfs/

Krishnan, Krish. ( © 2013). Data warehousing in the age of big data.

information processing architectures

Education

data architectureslegacy

legacy data

data retrieval

big data management

split big data

data warehousing models

big data loads of information

mobile data progress1