distributed software engineering lecture 2 communication fundamentals middleware solutions sam malek...

61
Distributed Software Engineering Lecture 2 Communication Fundamentals Middleware Solutions Sam Malek SWE 622, Fall 2012 George Mason University

Upload: brooke-hamilton

Post on 01-Jan-2016

218 views

Category:

Documents


0 download

TRANSCRIPT

DistributedSoftware Engineering

Lecture 2Communication Fundamentals

Middleware Solutions

Sam MalekSWE 622, Fall 2012

George Mason University

SWE 622 – Distributed Software Engineering

© Malek Lecture 2 – 2

outlineNetworking fundamentals

OSI layers

Java sockets

Middleware solutions

Remote Procedure Calls (RPC)

Remote Method Invocation (RMI)

SWE 622 – Distributed Software Engineering

© Malek Lecture 2 – 3

OSI layer 1

physical

network

data link

transport

session

presentation

application

1

2

3

4

5

6

7

specifies: pin layout, voltages, modulation

does: establish & terminate access to medium,flow control, contention resolution

at this level: hubs, repeaters,network adapters

SWE 622 – Distributed Software Engineering

© Malek Lecture 2 – 4

OSI layer 2

physical

network

data link

transport

session

presentation

application

1

2

3

4

5

6

7

specifies: how to transfer data in a LAN

does: detect and correct errors

at this level: MAC addresses (flat, HW-based)

SWE 622 – Distributed Software Engineering

© Malek Lecture 2 – 5

OSI layer 3

physical

network

data link

transport

session

presentation

application

1

2

3

4

5

6

7

specifies: how to transfer data sequences across LANs (e.g., IP)

does: routing

at this level: hierarchical address scheme,routers, bridges & switches

IP Internet

Concatenation of Networks

Protocol Stack

R2

R1

H4

H5

H3H2H1

Network 2 (Ethernet)

Network 1 (Ethernet)

H6

Network 3 (FDDI)

Network 4(point-to-point)

H7 R3 H8

R1

ETH FDDI

IPIP

ETH

TCP R2

FDDI PPP

IP

R3

PPP ETH

IP

H1

IP

ETH

TCP

H8

SWE 622 – Distributed Software Engineering

© Malek Lecture 2 – 6

IP Service ModelConnectionless (datagram/packet-based)Best-effort delivery (unreliable service)

packets are lostpackets are delivered out of orderduplicate copies of a packet are deliveredpackets can be delayed for a long time

Datagram formatVersion HLen TOS Length

Ident Flags Offset

TTL Protocol Checksum

SourceAddr

DestinationAddr

Options (variable) Pad(variable)

0 4 8 16 19 31

Data

SWE 622 – Distributed Software Engineering

© Malek Lecture 2 – 7

Datagram Forwarding Strategy

every datagram contains destination’s addressif directly connected to destination network, then forward to hostif not directly connected to destination network, then forward to some routerforwarding table maps network number into next hopeach host has a default routereach router maintains a forwarding table

SWE 622 – Distributed Software Engineering

© Malek Lecture 2 – 8

Forwarding Tables

Suppose there are n possible destinations, how many bits are needed to represent addresses in a routing table?

log2n

So, we need to store and search n * log2n bits in routing tables?

We’re smarter than that!

SWE 622 – Distributed Software Engineering

© Malek Lecture 2 – 9

Global Addresses

Propertiesglobally uniquehierarchical: network + host

Dot Notation10.3.2.4128.96.33.81192.12.69.77

Network Host

7 24

0A:

Network Host

14 16

1 0B:

Network Host

21 8

1 1 0C:

SWE 622 – Distributed Software Engineering

© Malek Lecture 2 – 10

SWE 622 – Distributed Software Engineering

© Malek Lecture 2 – 11

OSI layer 4

physical

network

data link

transport

session

presentation

application

1

2

3

4

5

6

7

specifies: reliable transference of data(e.g., TCP, UDP)

does: flow control, segmentation, error control,retransmission

UDP vs TCPUDP (User Datagram Protocol)

connectionless - sends independent packets of data, called datagrams, from one computer to another with no guarantees about arrivaleach time a datagram is sent, the local and receiving socket address need to be sent as well

TCP (Transmission Control Protocol)connection-oriented - provides a reliable flow of data between two computers: data sent from one end of the connection gets to the other end in the same orderin order to communicate using TCP protocol, a connection must first be established between the pair of socketsonce two sockets have been connected, they can be used to transmit data in both (or either one of the) directions

SWE 622 – Distributed Software Engineering

© Malek Lecture 2 – 12

UDP vs. TCP

SWE 622 – Distributed Software Engineering

© Malek Lecture 2 – 13

Options (variable)

Data

Checksum

SrcPort DstPort

HdrLen 0 Flags

UrgPtr

AdvertisedWindow

SequenceNum

Acknowledgment

0 4 10 16 31

Data

SrcPort DstPort

Length (bytes)

0 16 31

Checksum

UDP TCP

Which protocol to use?Overhead

UDP - every time a datagram is sent, the local and receiving socket address need to be sent along with itTCP - a connection must be established before communications between the pair of sockets start (i.e. there is a connection setup time in TCP)

Packet SizeUDP - there is a size limit of 64 kilobytes per datagramTCP - there is no limit; the pair of sockets behaves like streams

ReliabilityUDP - there is no guarantee that the sent datagrams will be received in the same order by the receiving socketTCP - it is guaranteed that the sent packets will be received in the order in which they were sent

SWE 622 – Distributed Software Engineering

© Malek Lecture 2 – 14

Which protocol to use? (cont.)

TCP - useful when indefinite amount of data need to be transferred ‘in order’ and reliably

otherwise, we end up with jumbled files or invalid information

examples: HTTP, ftp, telnet, …

UDP - useful when data transfer should not be slowed down by the extra overhead of the reliable connection

examples: real-time applications

e.g. consider a clock server that sends the current time to its client

• if the client misses a packet, it doesn't make sense to resend it because the time will be incorrect when the client receives it on the second try

• the reliability of TCP is unnecessary - it might cause performance degradation and hinder the usefulness of the service

SWE 622 – Distributed Software Engineering

© Malek Lecture 2 – 15

ExamplesSome Internet Application and their Underlying Transport Protocols

Application App. Protocol Transp. Protocol

e-mail smtp TCPremote access telnet TCPWeb http TCPfile transfer ftp TCPstreaming media proprietary TCP or UDPdomain name service DNS TCP or UDPinternet telephony proprietary UDP

SWE 622 – Distributed Software Engineering

© Malek Lecture 2 – 16

SWE 622 – Distributed Software Engineering

© Malek Lecture 2 – 17

OSI layer 5

physical

network

data link

transport

session

presentation

application

1

2

3

4

5

6

7

specifies: establishing long lived connections

does: checkpointing, adjournment, restart

SWE 622 – Distributed Software Engineering

© Malek Lecture 2 – 18

OSI layer 6

physical

network

data link

transport

session

presentation

application

1

2

3

4

5

6

7specifies: data formats and transformation(e.g., MIME)

does: serialization, compression, encryption, encoding transformation (EBCDIC/ASCII)

SWE 622 – Distributed Software Engineering

© Malek Lecture 2 – 19

OSI layer 7

physical

network

data link

transport

session

presentation

application

1

2

3

4

5

6

7specifies: application-specific protocols(e.g., http, smtp, ftp, telnet)does: support app-specific functionality

SWE 622 – Distributed Software Engineering

© Malek Lecture 2 – 20

the Open Systems Interconnectionis a reference model

physical

network

data link

transport

session

presentation

application

1

2

3

4

5

6

7

Goal: separation of concerns enables good implementationat each level

each layer is independentof the ones on top

layer n depends on the spec of n-1, but not on its implementation/manufacturer

SWE 622 – Distributed Software Engineering

© Malek Lecture 2 – 21

the OSI reference modelis roughly adhered

physical

network

data link

transport

session

presentation

application

1

2

3

4

5

6

7realm of middlewareapp-specific (SMPT, http…)or independent (RMI, CORBA…)

TCP/IP protocol stack

bits

frames

packets

segments

SWE 622 – Distributed Software Engineering

© Malek Lecture 2 – 22

outlineNetworking fundamentals

OSI layers

Java sockets

Middleware solutions

Remote Procedure Calls (RPC)

Remote Method Invocation (RMI)

What is a port?

Generally, a computer has a single physical connection to the network

this connection is identified by the computer’s 32-bit IP addressall data destined for a particular computer arrives through this connection

TCP and UDP use ports to identify a particular process/application

port = abstract destination point at a particular hosteach port is identified by a positive 16-bit number, in the range 0 - 65,535port numbers 0 - 1023 are reserved for well-known services (HTTP - 80, telnet – 23)SWE 622 – Distributed Software

Engineering© Malek Lecture 2 – 23

What is a socket?

socket = basic abstraction for network communication

“end-point of communication” uniquely identified with IP address and port

• example: Socket MyClient = new Socket("Machine name", PortNumber);

gives a file-system like abstraction to the capabilities of the network• two end-points communicate by “writing” into and “reading” out of socket

there are two types of transport via sockets• reliable, byte-stream oriented unreliable datagram

SWE 622 – Distributed Software Engineering

© Malek Lecture 2 – 24

Socket programming with TCPServer Side:

server runs on a specific computer and has a socket bound to a specific port numberserver listens to the socket for a client to make a connection request

Client Side:client tries to rendezvous with the server on the server's machine and port

Server Side:the server accepts the connection by creating a new socket bound to a different port

Client Side:if the connection is accepted, the client uses the new socket to communicate with the server

SWE 622 – Distributed Software Engineering

© Malek Lecture 2 – 25

Socket programming with TCP cont.

SWE 622 – Distributed Software Engineering

© Malek Lecture 2 – 26

Socket programming with UDP

All clients use the same socket to communicate with the server

Packets of data (datagrams) are exchangedNo new sockets need to be created

SWE 622 – Distributed Software Engineering

© Malek Lecture 2 – 27

Socket programming with UDP cont.

SWE 622 – Distributed Software Engineering

© Malek Lecture 2 – 28

C- vs. Java- socket programming

SWE 622 – Distributed Software Engineering

© Malek Lecture 2 – 29

C- vs. Java- socket programming cont.

Java keeps all the socket complexity “under the cover”

It does not expose the full range of socket possibilitiesBut, it enables sockets to be opened/used as easily as a file would be opened/used

By using the java.net.Socket class instead of relying on native code, Java programs can communicate over the network in a platform-independent fashion

SWE 622 – Distributed Software Engineering

© Malek Lecture 2 – 30

Java socket programmingall classes related to sockets are in java.net package Socket class - implements client sockets (also called just "sockets") ServerSocket class - implements server sockets

A server socket waits for requests to come in over the network. It performs some operation based on that request, and then possibly returns a result to the requester.

DatagramSocket class - socket for sending and receiving datagram packets DatagramPacket class - represents a datagram packet

Datagram packets are used to implement a connectionless packet delivery service. Multiple packets sent from one machine to another might be routed differently, and might arrive in any order.

InetAddress class - represents an Internet Protocol (IP) address MulticastSocket class - useful for sending and receiving IP multicast packets.

A MulticastSocket is a (UDP) DatagramSocket, with additional capabilities for joining "groups" of other multicast hosts on the internet. A multicast group is specified by a class D IP address.

SWE 622 – Distributed Software Engineering

© Malek Lecture 2 – 31

Java client/server example

A client reads a line from its standard input (keyboard) and sends the line to the server

• The server reads the line• The server converts the line to uppercase• The server sends the modified line back to client• The client reads the modified line, and prints the line on its

standard output

Implement above client/server scenario using both TCP and UDP!

SWE 622 – Distributed Software Engineering

© Malek Lecture 2 – 32

Java TCP-Server (TCP Echo Server)

SWE 622 – Distributed Software Engineering

© Malek Lecture 2 – 33

Java TCP-Server (cont.)

SWE 622 – Distributed Software Engineering

© Malek Lecture 2 – 34

Java TCP-Client

SWE 622 – Distributed Software Engineering

© Malek Lecture 2 – 35

Java TCP-Client (Cont.)

SWE 622 – Distributed Software Engineering

© Malek Lecture 2 – 36

Java UDP-Server (UDP Echo Server)

SWE 622 – Distributed Software Engineering

© Malek Lecture 2 – 37

Java UDP-Server (Cont.)

SWE 622 – Distributed Software Engineering

© Malek Lecture 2 – 38

Java UDP-Client

SWE 622 – Distributed Software Engineering

© Malek Lecture 2 – 39

Java UDP-Client (Cont.)

SWE 622 – Distributed Software Engineering

© Malek Lecture 2 – 40

SWE 622 – Distributed Software Engineering

© Malek Lecture 2 – 41

take 10

SWE 622 – Distributed Software Engineering

© Malek Lecture 2 – 42

outline• Networking fundamentals

– OSI layers

– Java sockets

• Middleware solutions– Remote Procedure Calls (RPC)

– Remote Method Invocation (RMI)

SWE 622 – Distributed Software Engineering

© Malek Lecture 2 – 43

middleware offersconceptual model for communication

device device

network

middleware middleware

c1 c2

distributed app

conceptualmodel

underthe hood

SWE 622 – Distributed Software Engineering

© Malek Lecture 2 – 44

different styles of conceptual modeladdress different problems

data volume

interactioncomplexity

protocols

call/return

read/write

messages

RPC/RMI

streaming data store

• middleware is more generic

• app writer works harder

• middleware is more specialized

• app writer is more constrained

SWE 622 – Distributed Software Engineering

© Malek Lecture 2 – 45

different styles have

different data sharing assumptions

data store data stream

C

RPC

address space(memory)

C S

object refs(middleware)

RMIc1 c2

messages

c1 c2

files / objectspersistent store

req

C S

datastore/source

SWE 622 – Distributed Software Engineering

© Malek Lecture 2 – 46

different styles have

different control flow assumptions

RPC/RMI messages

data store data stream

C Scall

return

c1 c2m1

m3

m2app-specific

protocol

C Sr/w

r/w

...

C Sreq

datastream-control protocol

SWE 622 – Distributed Software Engineering

© Malek Lecture 2 – 47

outline• Networking fundamentals

– OSI layers

– Java sockets

• Middleware solutions– Remote Procedure Calls (RPC)

– Remote Method Invocation (RMI)

SWE 622 – Distributed Software Engineering

© Malek Lecture 2 – 48

lifting the hood on RPC

• putting the R in Procedure Calling

• how to pass parameters?

• how about shared memory?

• handling limitations in practice

device

C Scall

return

SWE 622 – Distributed Software Engineering

© Malek Lecture 2 – 49

device device

mid

dle

ware

device

idea: stubs hide communication

• client stub, aka server proxy, appears to C like a server running on the client device

• server stub, aka client skeleton, appears to S like a client running on the server device

C Scall

return

C Scall

return

Cs Ss

call

return

SWE 622 – Distributed Software Engineering

© Malek Lecture 2 – 50

app-specific stubs need to be generated

globally unique interface ID (machine, timestamp)

procedure signatures

parameter marshalling

shipping bits

the u

sual

SsCs

SWE 622 – Distributed Software Engineering

© Malek Lecture 2 – 51

RPC is implementedby… sending messages

device device

C Scall

Cs Ss

call

SWE 622 – Distributed Software Engineering

© Malek Lecture 2 – 52

whereto send the messages?

• hardwired for fixed deployment• some RPC environments support dynamic binding

(more to come during the lecture on Service Discovery)

device

Ccall

Cs

?

SWE 622 – Distributed Software Engineering

© Malek Lecture 2 – 53

marshalling parametersis type-specific andplatform-specific

device device

C Scall

Cs Ss

call

char *myString;…someProc(257,”Fred”,myString);

void someProc(int d, char *n, char *m){…

OS send buffer OS receive bufferwire

id invertbig/littleendian

copy?

SWE 622 – Distributed Software Engineering

© Malek Lecture 2 – 54

simulate shared address space

to some extent

• references to simple, small structures resolved by copy/restore

• complex data structures not supported (structure contains pointers, e.g., linked lists)

RPC

address space(memory)

C S

device

Ccall

return

Cscopy

contents

restore contents

SWE 622 – Distributed Software Engineering

© Malek Lecture 2 – 55

outline• Networking fundamentals

– OSI layers

– Java sockets

• Middleware solutions– Remote Procedure Calls (RPC)

– Remote Method Invocation (RMI)

SWE 622 – Distributed Software Engineering

© Malek Lecture 2 – 56

solution: increase granularityfrom bytes to objects

• both local objects and references to remote objects are passed by value (serialization)

• the result of the called method is also serialized and passed back to the caller

C S

object refs(middleware)

RMI

SWE 622 – Distributed Software Engineering

© Malek Lecture 2 – 57

RMI uses similar ideas to RPC

• communication facilitated by local stubs (proxy/skeleton)

• stubs define/support an interface for method calling

• calling and return implemented by message passing

• separate mechanisms for dynamic binding (object registry)

SWE 622 – Distributed Software Engineering

© Malek Lecture 2 – 58

RMI is different from RPCin a number of ways

• doesn’t try to hide distribution in the language:remote objects are declared “remote”

• marshalling is simplified– by passing by value only

(object references can be used in nested RMIs)– (in Java) by having JVMs hide platform dependencies

in data representation

• serialization could be much heavier by having to pass the code for the objects with every call, but that can be avoided by passing URLs for downloading the code, rather than the code itself

SWE 622 – Distributed Software Engineering

© Malek Lecture 2 – 59

reasons to escapethe call-return style

• no result needs to be returned

• a server may not be availableat the time of the request

• make the client more responsiveto other events/user

• allow any component to initiate communication

RPC/RMI

C Scall

return

SWE 622 – Distributed Software Engineering

© Malek Lecture 2 – 60

some middlewarepush the envelope

• dealing with errors:idempotent, at-least-once, at-most-once…

• the promised simplicity of procedure calling sometimes hinders more sophisticated solutions

RPC/RMI

C Scall

return

• when does it make sense?

• who is resp. for reissuing?

SWE 622 – Distributed Software Engineering

© Malek Lecture 2 – 61

when to usethe call-return style

• the server is ready to process each request

• components and network are mostly reliable

• not many concurrent events in the caller:it is fine to block the caller

• one component (client) has the initiative,others (servers) wait for requests

RPC/RMI

C Scall

return