distributed systems cs 15-440
DESCRIPTION
Distributed Systems CS 15-440. Distributed System Architecture and Introduction to Networking Lecture 3, Sep 12, 2011 Majd F. Sakr, Vinay Kolar , Mohammad Hammoud. Today…. Last Session: Trends and challenges in Distributed Systems Today’s session: - PowerPoint PPT PresentationTRANSCRIPT
Distributed SystemsCS 15-440
Distributed System Architecture
and
Introduction to Networking
Lecture 3, Sep 12, 2011
Majd F. Sakr, Vinay Kolar, Mohammad Hammoud
Today…
Last Session: Trends and challenges in Distributed Systems
Today’s session: Part I: Distributed System Architectures Part II: Introduction to Networking
Announcements: Project 1 design report is due on Wednesday at midnight
Project handout has been updated with a “design deliverable” section
Part I
Distributed Systems Architecture
A Distributed System
A distributed system is simply a collection of hardware or software components that communicate to solve a complex problem
Each component performs a “task”
Communication mechanism
Componentsperforming
tasks
Bird’s eye view of some Distributed Systems
How would one classify these distributed systems?
Google Server
Search Client 1
Search Client 2
Search Client 3
Req
uest
Res
pons
e
Expedia
Peer 1
Peer 2
Peer 3
Peer 4
Google Search Bit-torrent
Reservation Client 1
Reservation Client 2
Reservation Client 3
Airline Booking Skype
Classification of Distributed Systems
What are the entities that are communicating in a DS?a) Communicating entities
How do the entities communicate?b) Communication paradigms
What roles and responsibilities do they have?c) Roles and responsibilities
How are they mapped to the physical distributed infrastructure?d) Placement of entities
Classification of Distributed Systems
What are the entities that are communicating in a DS?a) Communicating entities
How do the entities communicate?b) Communication paradigms
What roles and responsibilities do they have?c) Roles and responsibilities
How are they mapped to the physical distributed infrastructure?d) Placement of entities
Communicating Entities
What entities are communicating in a DS?System-oriented entities
Processes
Threads
Nodes
Problem-oriented entitiesObjects (in object-oriented programming based approaches)
Classification of Distributed Systems
What are the entities that are communicating in a DS?a) Communicating entities
How do the entities communicate?b) Communication paradigms
What roles and responsibilities do they have?c) Roles and responsibilities
How are they mapped to the physical distributed infrastructure?d) Placement of entities
Communication Paradigms
Three types of communication paradigmsInter-Process Communication (IPC)
Remote Invocation
Indirect Communication
Internet ProtocolsInternet Protocols
IPC Primitives
Remote Invocation, Indirect Communication
Applications, ServicesApplications, Services
Middleware layers
Inter-Process Communication (IPC)Relatively low-level support for communication
e.g., Direct access to internet protocols (Socket API)
AdvantagesEnables seamless communication between processes on heterogeneous operating systems
Well-known and tested API adopted across multiple operating systems
DisadvantagesIncreased programming effort for application developers
Socket programming: Programmer has to explicitly write code for communication (in addition to program logic)
Space Coupling (Identity is known in advance): Sender should know receiver’s ID (e.g., IP Address, port)
Time Coupling: Receiver should be explicitly listening to the communication from the sender
Remote Invocation
An entity runs a procedure that typically executes on an another computer without the programmer explicitly coding the details for this remote interaction
A middleware layer will take care of the raw-communication
ExamplesRemote Procedure Call (RPC) – Sun’s RPC (ONC RPC)
Remote Method Invocation (RMI) – Java RMI
Remote Invocation
Advantages:Programmer does not have to write code for socket communication
Disadvantages:
Space Coupling: Where the procedure resides should beknown in advance
Time Coupling: On the receiver, a process should be explicitly waiting to accept requests for procedure calls
Space and Time Coupling in RPC and RMI
doOperation..
(wait)..
(continuation)
getRequest.
Select operation
.Execute operation
.Send reply
Request Message
Reply Message
The sender knows the Identity of the receiver (space coupling)
Time Coupling
Sender Receiver
RMI strongly resembles RPC but in a world of distributed objects
Indirect Communication ParadigmIndirect communication uses middleware to:
Provide one-to-many communication
Some mechanisms eliminate space and time couplingSender and receiver do not need to know each other’s identities
Sender and receiver need not be explicitly listening to communicate
Approach used: IndirectionSender A middle-man Receiver
Types of indirect communication1. Group communication
2. Publish-subscribe
3. Message queues
1. Group Communication
One-to-many communicationMulticast communication
Abstraction of a groupGroup is represented in the system by a groupId
Recipients join the group
A sender sends a message to the group which is received by all the recipients
Sender
Recv 2 Recv 1
Recv 3
1. Group Communication (cont’d)
Services provided by middlewareGroup membership
Handling the failure of one or more group members
AdvantagesEnables one-to-many communication
Efficient use of bandwidth
Identity of the group members need not be available at all nodes
DisadvantagesTime coupling
2. Publish-SubscribeAn event-based communication mechanism
Publishers publish events to an event service
Subscribers express interest in particular events
Large number of producers distribute information to large number of consumers
Publish-subscribe Event Service
Publish (Event1)
Publish (Event2)
Publish (Event3)
Subscribe (Event1, Event2)
Subscribe (Event1)
Subscribe (Event3)
Publishers Subscribers
2. Publish-Subscribe (cont’d)
Example: Financial trading
Dealer process
3. Message Queues
A refinement of Publish-Subscribe where Producers deposit the messages in a queue
Messages are delivered to consumers through different methods
Queue takes care of ensuring message delivery
AdvantagesEnables space decoupling
Enables time decoupling
Recap: Communication Entities and Paradigms
Communicating entities(what is communicating)
System-oriented
Problem-oriented
• Nodes• Processes• Threads
• Objects
Communication Paradigms (how they communicate)
IPC Remote Invocation
Indirect Communication
• Sockets • RPC• RMI
• Group communication• Publish-subscribe• Message queues
Classification of Distributed Systems
What are the entities that are communicating in a DS?a) Communicating entities
How do the entities communicate?b) Communication paradigms
What roles and responsibilities do they have?c) Roles and responsibilities
How are they mapped to the physical distributed infrastructure?d) Placement of entities
Roles and Responsibilities
In DS, communicating entities take on roles to perform tasks
Roles are fundamental in establishing overall architectureQuestion: Does your smart-phone perform the same role as Google Search Server?
We classify DS architectures into two types based on the roles and responsibilities of the entities
Client-Server
Peer-to-Peer
Client-Server Architecture
Approach:Server provides a service that is needed by a client
Client requests to a server (invocation), the server serves (result)
Widely used in many systemse.g., DNS, Web-servers
Client
Client
Server
Server
Client-Server Architecture: Pros and Cons
Advantages:Simplicity and centralized control
Computation-heavy processing can be offloaded to a powerful server, Clients can be “thin”
DisadvantagesSingle-point of failure at server
Scalability
Peer to Peer (P2P) Architecture
In P2P, roles of all entities are identicalAll nodes are peers
Peers are equally privileged participants in the application
e.g.: Napster, Bit-torrent, Skype
Peer to Peer Architecture
Example: Downloading files from bit-torrent
Peer 2
Peer 6
Peer 1
Peer 3
Peer 4
Peer 5
Peer 1 wants a file. Parts of the file is present at peers 3,4 and 5
Peer 3 wants a file stored at peers 1,2 and 6
Architectural Patterns
Primitive architectural elements can be combined to form various patterns
Tiered Architecture
Layering
Tiered architecture and layering are complementaryLayering = vertical organization of services
Tiered Architecture = horizontal splitting of services
Tiered Architecture
A technique to:1. Organize the functionality of a service, and
2. Place the functionality into appropriate servers
Airline Search Application
Get user Input
Get data from
database
Rank the offers
ServerServerServerServer ServerServer
Display UI screen
ClientClient
Application and data
management
Application and data
management
A Two-Tiered Architecture
User view, controls and
data manipulation
Application and data
management
Application and data
management
Application and data
management
Application and data
management
User view, controls and
data manipulation
Personal computer or mobile devices Server
How do you design an airline search application:
Organize functionality of a given layer
A Three-Tiered Architecture
EXPEDIA Airline Search Application
Rank the offers
Display result to
user
Get user Input
Display user input
screenAirline
Database
Tier 1 Tier 2 Tier 3
Application logic
Application logic
A Three-Tiered Architecture
Database managerDatabase manager
User view, and
controls
Application logic
Application logic
User view, and
control
Personal computer or mobile devices
Database Server
Application Server
LayeringA complex system is partitioned into layers
Upper layer utilizes the services of the lower layer
A vertical organization of services
Layering simplifies design of complex distributed systems by hiding the complexity of below layers
Control flows from layer to layer
Layer 1
Layer 3
Requestflow
Responseflow
Layering – Platform and middleware
Distributed Systems can be organized into three layers
1. PlatformLow-level hardware and software layers
Provides common services for higher layers
2. MiddlewareMask heterogeneity and provide convenient programming models to application programmers
Typically, it simplifies application programming by abstracting communication mechanisms
3. Applications
Operating system
Applications
Classification of Distributed Systems
What are the entities that are communicating in a DS?a) Communicating entities
How do the entities communicate?b) Communication paradigms
What roles and responsibilities do they have?c) Roles and responsibilities
How are they mapped to the physical distributed infrastructure?d) Placement of entities
Placement
Observation:A large number of heterogonous hardware (machines, network).
Smart mapping of entities (processes, objects) to hardware helps performance, security and fault-tolerance.
“Placement” maps entities to underlying physical distributed infrastructure.
Placement should be decided after a careful study of application characteristics
Example strategies:Mapping services to multiple servers
Moving the mobile code to the client
Placement
Hi-performance Server
Hi-performance Server DesktopDesktop Smart-phoneSmart-phone
User-interface
Web search indexing
Mobile Code
Entities
Physical infrastructure
Recap
So far, we have covered primitive architectural elements
Communicating entities
Communication paradigms of entitiesIPC, RMI, RPC, Indirect Communication
Roles and responsibilities that entities assume, and resulting architectures
Client-Server, Peer-to-Peer, Hybrid
Placement of entities
Part II
Introduction to Networking
Introduction to Networking – Learning objectives
You will identify how computers over Internet communicate.
After today’s classYou will be able to identify different types of networks
After the next classDescribe networking principles such as layering, encapsulation and packet-switching
Examine how packets get routed and how congestion is avoided
Analyze scalability, reliability and fault-tolerance of Internet
Networks in Distributed Systems
Distributed System is simply a collection of components that communicate to solve a problem
Why should distributed systems programmers know about networks?
Networking issues severely affect performance, fault-tolerance and security in Distributed Systems.
e.g., Gmail outage on Sep 1, 2010 – Google Spokesman said “we had slightly underestimated the load which some recent changes placed on the request routers. … . few of the request routers became overloaded… causing a few more of them to also become overloaded, and within minutes nearly all of the request routers were overloaded.”
Networks in Distributed Systems
Networking Issue Comments on Distributed System design
Affects latency and data-transfer-rate of messages.
Size of Internet is increasing. Expect greater traffic in future.
Detect communication errors and perform error-checks at the application layer
Install firewalls. Deploy end-to-end authentication, privacy and security modules.
Expect intermittent connection for mobile devices.
Internet is best-effort. It is hard to ensure strict QoS guarantees for, say, multimedia messages.
Performance
Scalability
Reliability
Security
Mobility
Quality-of-service
Network Classification
Important ways to classify networks1. Based on size
Body Area Networks (BAN)
Personal Area Networks (PAN)
Local Area Networks (LAN)
Wide Area Networks (WAN)
2. Based on technologyEthernet Networks
Wireless Networks
Cellular Networks
Network classification – BANs and PANs
Network Classification – LANComputers connected by single communication medium
e.g., twisted copper wire, optical fiber
High data-transfer-rate and low latency
LAN consists of
1.SegmentUsually within a department/floor of a building
Shared bandwidth, no routing necessary
2.Local NetworksServes campus/office building
Many segments connected by a switch/hub
Typically, represents a network within an organization
Network classification – WAN
Generally covers a wider area (cities, countries,...)
Consists of networks of different organizations
Traffic is routed from one organization to anotherRouters
Bandwidth and delayVaries
Worse than a LAN
Largest WAN = Internet
Brief Summary of Important Networks (Based on Size)
A Segment
A Network
Types of Networks – Based on Technology
Ethernet NetworksPredominantly used in the wired Internet
Wireless LANsPrimarily designed to provide
wireless access to the Internet
Low-range (100s of m), High-bandwidth
Cellular networks (2G/3G)Initially, designed to carry voice
Large range (few kms)
Low-bandwidth
Typical Performance for Different Types of Networks
Network Example Range Bandwidth (Mbps)
Latency (ms)
Wired LAN Ethernet 1-2 km 10 – 10,000 1 – 10
Wired WAN Internet Worldwide 0.5 – 600 100 – 500
Wireless PAN Bluetooth 10 – 30 m 0.5 – 2 5 – 20
Wireless LAN WiFi 0.15 – 1.5 km 11 – 108 5 – 20
Cellular 2G – GSM 100m – 20 km 0.270 – 1.5 5
Modern Cellular
3G 1 – 5 km 348 – 14.4 100 – 500
Next Class
Describe networking principles such as layering, encapsulation and packet-switching
Examine how packets get routed and how congestion is avoided
Analyze scalability, reliability and fault-tolerance of Internet
Referenceshttp://en.wikipedia.org/wiki/Remote_procedure_call
http://www.generalsoftwares.co.uk/remote-services.html
http://gmailblog.blogspot.com/2009/09/more-on-todays-gmail-issue.html
http://innovation4u.wordpress.com/2010/08/17/why-we-dont-share-stuff/
http://essentiawhipsfloggers.wordpress.com/2010/05/08/waiting-times-queue-jumping/
http://www.cdk5.net/