information processing architectures
DESCRIPTION
Information processing architecturesTRANSCRIPT
Information Processing Architectures
Raji Gogulapati, Sep 2014
Information Search
Information Acquisition
Information Processing
Information Maintenance
Information Retention
Information System Management
Information Processing
Online transaction processing
(OLTP)
Online Analytical Processing (OLAP)
Complex Event Processing
(CPP)
Massively Parallel Processing (MPP)
Legacy
Random
Infrastructure Essentials for Information Processing
Shared
Nothing
• OLAP • BI, DW, Big Data
Shared
Disk
• Traditional RDMS • OLTP
Shared
Everything
• Traditional RDMS • OLTP
Infrastructure Models of Databases
Process
Disk
Process Process Process
Disk
Process
Shared Everything Shared Disk
Database Architectures
Relational Data management systems for OLTP information
Process
Disk
Process
Disk
Process
Disk
Process
Disk
Master
Shared Nothing, Massively Parallel Architecture Layout
For Data Warehousing, Business Intelligence, Big Data loads of information
Trade offs
Assigning tasks at proper time in the determined order
Batch and online scheduling algorithms
Priority based, First come first served, Round Robin
Load balancing across nodes
Serializing data transfer
Data Transfer, computation delays
Data overflow, underflow
Reference: chapter3, Hu, Wen-Chen, and Naima Kaabouch (eds). Big Data Management, Technologies, and Applications. IGI Global. © 2014.
Map Reduce Approach For Big Data Processing
Chapter 2,Hu, Wen-Chen, and Naima Kaabouch (eds). Big Data Management, Technologies, and Applications. IGI Global. © 2014.
Step 1 - Split Big data among multiple parallel map data Step 2 - Merge and Reduce
data by grouping
Distributed Memory system
Dynamic Job scheduling Scalable
Key Value Pairs Fault Tolerant
Map Reduce Concept - Key Value Pairs
A B C D D CA D B
A B C
D D C
A D B
Input A - 1B - 1 C - 1
D – 1D – 1C - 1
A – 1D – 1B - 1
Map
Shuffle/ Sort
A – 1A – 1
B – 1 B - 1
C – 1 C - 1
D – 1D – 1D - 1
Reduce
D – 3
C – 2
B – 2
A - 2
A – 2B – 2C - 2D – 3
Output
Information Processing – Focus and Changes
Map Reduce Framework and Hadoop Distributed File system
• To perform analytics in parallel• Map & Reduce Functions run in parallel Parallelism
• Share nothing • Compute Nodes
Fault Tolerance
• Scale CPU, memory. Robust data management techniques to optimize data retrieval and storage.
• Assign data processing work load to that server where the data is stored as per Map Reduce.
ScalabilityData Locality
A Few Basics
ACID, BASE, CAP
Relational database management systems follow ACID rules – Atomicity, Consistency, Isolation, Durability
What to expect from Search – BASE
Yes, Search returns innumerable pages of data
Only one page is basically available - BA
Rest of the data is in Soft State - S Rest of the data becomes eventually consistent - E
According to Database Theory – Distributed NoSQL big databases can satisfy only two of CAP and have to relax the Expectations on the third.. CAP – Consistency, Availability, Partition Tolerance
Distributed Information management
C J Date’s Rules (12) for Distributed Databases
Location autonomyNo reliance on a central site for any particular service
Continuous operation
Location Independence
Fragmentation independence
Replication independence Distributed query processing
Distributed transaction managementHardware independence
Operating system independence
Network IndependenceDBMS independence
Multiple Models For Data Architectures
Legacy, traditional RDBS Object oriented
Distributed Client Server
Data Warehouses
Parallel and Massively Parallel
Partitioning Active Databases - Intelligence
Spatial Multimedia
Temporal
Client Server Databases, Middleware - Drivers
Remote Database Access (RDA)Distributed Relational Database Architecture
Integrated Database Application Programming Interface (IDAPI)
Data Access Language (DAL)
Open Database Connectivity (ODBC)
1990’s
Client Server basic model in the ‘80s
Adapted from figure 3.2 mid ‘80s client/ server environment, chapter 3, client server databases and middleware
Server applications
Interface Interface
Client PC
Request
Data
Data Warehouse – Applications
Non volatile
Time variant
Integrated
Subject oriented
Data warehousing Models for analytical applications – pre-web
Star
Snowflake
Constellation
Data warehousing Models for analytical applications – complex web data
Use XML to model data warehouses
Combining OLAP tools with Data mining
Rule based multi dimensional model
Next generation data warehouse
Analytics
Semantic interfaces/ Rules engines, Hadoop/ NoSQL, RDBMS
Data layerOLTP, legacy data, web data
Source: http://www.sybase.com/files/White_Papers/TDWI_BPR_NextGenDWPlatforms_Q409.pdf
Business Intelligence – Models
Source: www.beyenetwork.com, http://www.b-eye-network.com/view/8385.
DSS 2.0 architecture
Multi tier distributed enterprise applications – Y2k period
Information system tier
Client tier
Presentation (Web) Tier
Frameworks such as J2EE,.Net
Database
Business logic tier
Database serverApplication Server Client server
Mobile data progress
Adapted from gsma.com, Mena, Jesus. "Chapter 3 - Mobile Data". Data Mining Mobile Devices. Auerbach Publications, © 2013
1 G 2G 2.5G 3G 4G
analog Digital
GSM GPRS EDGE WCDMA
Legacy Migrations Cloud environment – Suitability
On going discussions and debates
Social, Mobile, Cloud environments for enterprise applications
Cloud Infrastructures for processing information
In the context of Big data,
This topic is reserved for a more comprehensive coverage separately
“ Bandey, D.(2012), Doctor of Law says "When a Corporation mines the Big Data within its IT infrastructure a number of laws will automatically be in play. However, if That Corporation wants to analyze the same Big data in the cloud-a new tier of legal obligations and restrictions arise. Some of them quite foreign to a management previously accustomed to dealing with its own data within its own infrastructure“ “
Raj, Pethuru, and Ganesh Chandra Deka (eds). "Chapter 2 - Big Data Computing and the Reference Architecture".Handbook of Research on Cloud Infrastructures for Big Data Analytics. IGI Global. © 2014.
Topics for cloud and information processing
Raj, Pethuru, and Ganesh Chandra Deka (eds). "Chapter 9 - Cloud Database Systems: NoSQL, NewSQL, and Hybrid".Handbook of Research on Cloud Infrastructures for Big Data Analytics. IGI Global. © 2014
Several terms and topics in this area.
Cloud database systems Cloud Storage Data as a Service
Database as a service Data Models
Cloud computing demands five crucial characteristics for evaluating databases fit for cloud environment
On demand self service, broad network access, resource pooling, rapid elasticity and Measured service.
Big Data Case Studies
Conversions – Traditional Main frame to Hadoop, NoSQL db
Recommendation Engine Video Streaming Analytics
Real Time Traffic monitoring
Social behaviors log processing
References:
Dow, K. E., Hackbarth, G., & Wong, J. (2013). Data architectures for an organizational memory information system. Journal Of The American Society For Information Science & Technology, 64(7), 1345-1356. doi:10.1002/asi.22848
Chessell, Mandy & Smith, Harald C.. ( © 2013). Patterns of Information management.
Hu, Wen-Chen, and Naima Kaabouch (eds). Big Data Management, Technologies, and Applications. IGI Global. © 2014.
Alan R. Simon, Strategic Database Technology: Management for the year 2000.
http://www-01.ibm.com/software/data/infosphere/hadoop/hdfs/
Krishnan, Krish. ( © 2013). Data warehousing in the age of big data.