team 5: virtual online blackjack 17-654: analysis of software artifacts 18-841: dependability...
TRANSCRIPT
![Page 1: Team 5: Virtual Online Blackjack 17-654: Analysis of Software Artifacts 18-841: Dependability Analysis of Middleware Philip Bianco John Robert Vorachat](https://reader036.vdocuments.mx/reader036/viewer/2022070418/56649eef5503460f94bff7ef/html5/thumbnails/1.jpg)
&Electrical ComputerENGINEERING
Team 5: Virtual Online Blackjack
17-654: Analysis of Software Artifacts18-841: Dependability Analysis of Middleware
Philip Bianco
John Robert
Vorachat Tamarree
Lutz Wrage
Gene Wilson
![Page 2: Team 5: Virtual Online Blackjack 17-654: Analysis of Software Artifacts 18-841: Dependability Analysis of Middleware Philip Bianco John Robert Vorachat](https://reader036.vdocuments.mx/reader036/viewer/2022070418/56649eef5503460f94bff7ef/html5/thumbnails/2.jpg)
1
Team Members
Phil Bianco
John Robert
http://www.ece.cmu.edu/~ece841/team5/index.html
Vorachat
Tamarreevtamarree@
yahoo.com
Lutz
mu.edu
Gene
![Page 3: Team 5: Virtual Online Blackjack 17-654: Analysis of Software Artifacts 18-841: Dependability Analysis of Middleware Philip Bianco John Robert Vorachat](https://reader036.vdocuments.mx/reader036/viewer/2022070418/56649eef5503460f94bff7ef/html5/thumbnails/3.jpg)
1
Virtual Online Blackjack
An interactive client server application that allows multiple players to play blackjack in a virtual casino.
The server performs all the functions of a dealer in a Las Vegas casino including:– Selling chips to the players– Taking bets– Dealing the initial hand to each player– Presenting options to the player (Hit or Stay)– Officiating the game
This is an interesting application because– Multiple server side elements (Casino Floor, Bank, Table)– Clear fault tolerance and performance requirements– Completely Java solution
The application uses the Sun Microsystems IDL ORB for the following reasons:– The price was certainly well within our budget– The team wanted to play with CORBA
![Page 4: Team 5: Virtual Online Blackjack 17-654: Analysis of Software Artifacts 18-841: Dependability Analysis of Middleware Philip Bianco John Robert Vorachat](https://reader036.vdocuments.mx/reader036/viewer/2022070418/56649eef5503460f94bff7ef/html5/thumbnails/4.jpg)
1
Baseline Architecture
![Page 5: Team 5: Virtual Online Blackjack 17-654: Analysis of Software Artifacts 18-841: Dependability Analysis of Middleware Philip Bianco John Robert Vorachat](https://reader036.vdocuments.mx/reader036/viewer/2022070418/56649eef5503460f94bff7ef/html5/thumbnails/5.jpg)
1
Fault Tolerance Goals
Fault tolerant goals– Client automatically connects to a back up casino server.– New backup is automatically started.– Minimize transient state and store all state on the database.
Replicated Components– Casino Server (IDL interfaces for Casino Floor, Bank and Table)
All State is stored in a magnificently designed database.– Installed MS SQL Server on a PC in the Cave– Shared server with at least one other team
Sacred functions– Database– Naming Service– Players (clients)
![Page 6: Team 5: Virtual Online Blackjack 17-654: Analysis of Software Artifacts 18-841: Dependability Analysis of Middleware Philip Bianco John Robert Vorachat](https://reader036.vdocuments.mx/reader036/viewer/2022070418/56649eef5503460f94bff7ef/html5/thumbnails/6.jpg)
1
Fault Tolerant Elements
Replication Manager– Pings (1 per sec) all Casino servers to detect service faults– Automatically starts new servers to maintain the number of Casino servers (2)– User interface enables injecting faults (killing a server)– Very configurable using configuration files
Proxy classes for all communication– Isolates most fault tolerance functions from the application
![Page 7: Team 5: Virtual Online Blackjack 17-654: Analysis of Software Artifacts 18-841: Dependability Analysis of Middleware Philip Bianco John Robert Vorachat](https://reader036.vdocuments.mx/reader036/viewer/2022070418/56649eef5503460f94bff7ef/html5/thumbnails/7.jpg)
1
FT-Baseline Architecture
![Page 8: Team 5: Virtual Online Blackjack 17-654: Analysis of Software Artifacts 18-841: Dependability Analysis of Middleware Philip Bianco John Robert Vorachat](https://reader036.vdocuments.mx/reader036/viewer/2022070418/56649eef5503460f94bff7ef/html5/thumbnails/8.jpg)
1
FT-Baseline Architecture
![Page 9: Team 5: Virtual Online Blackjack 17-654: Analysis of Software Artifacts 18-841: Dependability Analysis of Middleware Philip Bianco John Robert Vorachat](https://reader036.vdocuments.mx/reader036/viewer/2022070418/56649eef5503460f94bff7ef/html5/thumbnails/9.jpg)
1
FT-Baseline Architecture
![Page 10: Team 5: Virtual Online Blackjack 17-654: Analysis of Software Artifacts 18-841: Dependability Analysis of Middleware Philip Bianco John Robert Vorachat](https://reader036.vdocuments.mx/reader036/viewer/2022070418/56649eef5503460f94bff7ef/html5/thumbnails/10.jpg)
1
Mechanisms for Fail-Over
How do you accomplish fail-over? How do you detect a fault? Which exceptions do you handle (mention the names)? What do you do, upon catching one of these exceptions? When do you obtain the names of the server references? What if you run
out of live references?
![Page 11: Team 5: Virtual Online Blackjack 17-654: Analysis of Software Artifacts 18-841: Dependability Analysis of Middleware Philip Bianco John Robert Vorachat](https://reader036.vdocuments.mx/reader036/viewer/2022070418/56649eef5503460f94bff7ef/html5/thumbnails/11.jpg)
1
Local method call
Local Methods Fault Free Standard Garbage Collection
![Page 12: Team 5: Virtual Online Blackjack 17-654: Analysis of Software Artifacts 18-841: Dependability Analysis of Middleware Philip Bianco John Robert Vorachat](https://reader036.vdocuments.mx/reader036/viewer/2022070418/56649eef5503460f94bff7ef/html5/thumbnails/12.jpg)
1
Local Method call with failover to remote server
![Page 13: Team 5: Virtual Online Blackjack 17-654: Analysis of Software Artifacts 18-841: Dependability Analysis of Middleware Philip Bianco John Robert Vorachat](https://reader036.vdocuments.mx/reader036/viewer/2022070418/56649eef5503460f94bff7ef/html5/thumbnails/13.jpg)
1
Fault Free Remote method call with standard GC
![Page 14: Team 5: Virtual Online Blackjack 17-654: Analysis of Software Artifacts 18-841: Dependability Analysis of Middleware Philip Bianco John Robert Vorachat](https://reader036.vdocuments.mx/reader036/viewer/2022070418/56649eef5503460f94bff7ef/html5/thumbnails/14.jpg)
1
Remote Method call with failover to local server
![Page 15: Team 5: Virtual Online Blackjack 17-654: Analysis of Software Artifacts 18-841: Dependability Analysis of Middleware Philip Bianco John Robert Vorachat](https://reader036.vdocuments.mx/reader036/viewer/2022070418/56649eef5503460f94bff7ef/html5/thumbnails/15.jpg)
1
Remote with server running IGC Fault Free
![Page 16: Team 5: Virtual Online Blackjack 17-654: Analysis of Software Artifacts 18-841: Dependability Analysis of Middleware Philip Bianco John Robert Vorachat](https://reader036.vdocuments.mx/reader036/viewer/2022070418/56649eef5503460f94bff7ef/html5/thumbnails/16.jpg)
1
Remote with client and server running IGC
![Page 17: Team 5: Virtual Online Blackjack 17-654: Analysis of Software Artifacts 18-841: Dependability Analysis of Middleware Philip Bianco John Robert Vorachat](https://reader036.vdocuments.mx/reader036/viewer/2022070418/56649eef5503460f94bff7ef/html5/thumbnails/17.jpg)
1
Timing of failover and activation time during failover
![Page 18: Team 5: Virtual Online Blackjack 17-654: Analysis of Software Artifacts 18-841: Dependability Analysis of Middleware Philip Bianco John Robert Vorachat](https://reader036.vdocuments.mx/reader036/viewer/2022070418/56649eef5503460f94bff7ef/html5/thumbnails/18.jpg)
1
Active Replication Timing Data
Active Replication Timing Data
0
10
20
30
40
50
60
70
80
1 251 501 751 1001 1251 1501 1751 2001 2251
Number of BuyChips call
Tim
e in
ms
Othello up
Go up
buyChips call
![Page 19: Team 5: Virtual Online Blackjack 17-654: Analysis of Software Artifacts 18-841: Dependability Analysis of Middleware Philip Bianco John Robert Vorachat](https://reader036.vdocuments.mx/reader036/viewer/2022070418/56649eef5503460f94bff7ef/html5/thumbnails/19.jpg)
1
Fail-Over Measurements
Show your graphs, one by one, over the next few slides– Place one graph per slide– Select at most one graph (out of the entire set that you have)– Pick the most interesting graph
• Showing at least 15-20 fail-overs• Showing the “spike” of the Naming Service (or Replication Manager)
communication
![Page 20: Team 5: Virtual Online Blackjack 17-654: Analysis of Software Artifacts 18-841: Dependability Analysis of Middleware Philip Bianco John Robert Vorachat](https://reader036.vdocuments.mx/reader036/viewer/2022070418/56649eef5503460f94bff7ef/html5/thumbnails/20.jpg)
1
RT-FT-Baseline Architecture
Should describe your strategy for reducing the fail-over time, in the interests of obtaining “real-time” bounded behavior under faults
![Page 21: Team 5: Virtual Online Blackjack 17-654: Analysis of Software Artifacts 18-841: Dependability Analysis of Middleware Philip Bianco John Robert Vorachat](https://reader036.vdocuments.mx/reader036/viewer/2022070418/56649eef5503460f94bff7ef/html5/thumbnails/21.jpg)
1
Bounded “Real-Time” Fail-Over Measurements
Show your RT-FT-baseline graphs– Select at most one graph (out of the entire set that you have)– Pick the most interesting graph
• Showing at least 15-20 fail-overs• Showing the “spike” of the Naming Service (or Replication Manager)
communication being mitigated• Include on the slide the percentage by which you’ve reduced the “spike”• Tell us what the bounds for the fail-over now are
![Page 22: Team 5: Virtual Online Blackjack 17-654: Analysis of Software Artifacts 18-841: Dependability Analysis of Middleware Philip Bianco John Robert Vorachat](https://reader036.vdocuments.mx/reader036/viewer/2022070418/56649eef5503460f94bff7ef/html5/thumbnails/22.jpg)
1
RT-FT-Performance Strategy
Used active replication strategy to address performance– The proxy classes that handle everything.– During player startup the AR proxies get references to all running replicas.– Each method call is sent to all replicas. The sequence numbers for one call are
identical across replicas.
What mechanisms did you need in addition to what your system has?
![Page 23: Team 5: Virtual Online Blackjack 17-654: Analysis of Software Artifacts 18-841: Dependability Analysis of Middleware Philip Bianco John Robert Vorachat](https://reader036.vdocuments.mx/reader036/viewer/2022070418/56649eef5503460f94bff7ef/html5/thumbnails/23.jpg)
1
Performance Measurements
Show your performance graphs for active replication or load balancing– Select at most one graph (out of the entire set that you have)– Pick the most interesting graph
• For load balancing, show the system performance under several clients (try to scale up to more than 20 clients)
• For active replication, show what the fail-over times vs. run-time performance trade-offs are, as compared to cold passive replication
![Page 24: Team 5: Virtual Online Blackjack 17-654: Analysis of Software Artifacts 18-841: Dependability Analysis of Middleware Philip Bianco John Robert Vorachat](https://reader036.vdocuments.mx/reader036/viewer/2022070418/56649eef5503460f94bff7ef/html5/thumbnails/24.jpg)
1
Other Features
List other features that you used– Used CVS throughout the project– Moved to ant (from make files) after baseline– Some use of a scripting language for automated player (clients)– Gene’s tool to find current usage of cluster machines
Explored garbage collection– Incremental GC– Turned off
How to design for testability (fault injection)
This is where you get to show off about how you’ve gone the extra mile in this project!
– Performance at different times of the day– Extensive use of configuration files enabled greater flexibility for implementation and
testing
![Page 25: Team 5: Virtual Online Blackjack 17-654: Analysis of Software Artifacts 18-841: Dependability Analysis of Middleware Philip Bianco John Robert Vorachat](https://reader036.vdocuments.mx/reader036/viewer/2022070418/56649eef5503460f94bff7ef/html5/thumbnails/25.jpg)
1
Insights from Measurements
What insights did you gain from the three sets of measurements, and from analyzing the data?
– Java garbage collection is the dominate factor for performance• Time is double for remote clients.• Changing garbage collection impacts the performance by ???
– Replication tradeoffs
– In the tested configuration, network latency impact on performance is negligible compared to the database access time.
– How did you use each set of insights in the next phases of the project?
![Page 26: Team 5: Virtual Online Blackjack 17-654: Analysis of Software Artifacts 18-841: Dependability Analysis of Middleware Philip Bianco John Robert Vorachat](https://reader036.vdocuments.mx/reader036/viewer/2022070418/56649eef5503460f94bff7ef/html5/thumbnails/26.jpg)
1
Open Issues
List any issues that you still need to resolve, and that you might want to see discussed openly
– Profiler testing with Java?– Impact of other JVMs?
If you had the time, what are the 2-3 additional features that you would have liked to have implemented for your system?
– Improve the user interface– Examine impact of security requirements on FT, RT and performance
Analyzing why the performance edges up over time
![Page 27: Team 5: Virtual Online Blackjack 17-654: Analysis of Software Artifacts 18-841: Dependability Analysis of Middleware Philip Bianco John Robert Vorachat](https://reader036.vdocuments.mx/reader036/viewer/2022070418/56649eef5503460f94bff7ef/html5/thumbnails/27.jpg)
1
Conclusions
Lessons Learned: Technical Lessons:
– Fault tolerance rapidly increases complexity of the system– Active replication is not-trivial.– In Java applications garbage collection is the largest performance bottleneck.– Name server lookups contributed a very minor amount of delay to failover
recovery compared to state recovery from the database.
Configuration Issues for Remote Development:– We did most of our development independenly and remotely.– Linux, CVS, ssh and afs all made this much easier.– Simple scripts can make life much easier: clusterload, project.env.– e-mail, instant messaging, shared file system space help communication.– Separate databases for each developer.– Design still need face-to-face meetings to be effective.
![Page 28: Team 5: Virtual Online Blackjack 17-654: Analysis of Software Artifacts 18-841: Dependability Analysis of Middleware Philip Bianco John Robert Vorachat](https://reader036.vdocuments.mx/reader036/viewer/2022070418/56649eef5503460f94bff7ef/html5/thumbnails/28.jpg)
1
Conclusions
Accomplishments:– Met objective to create a distributed middleware application and demonstrated
improvements at each milestone– Our Fault-Tolerant design worked very well: fast, scalable, robust.– Replication Manager with fault injection, powerful user interface.– Player manager could start and control many clients on multilple hosts at once.– Active Replication.– Name caching.– Automatic player script with Expect.– Found the sources of the worst performance bottlenecks.– 1.3 babies
![Page 29: Team 5: Virtual Online Blackjack 17-654: Analysis of Software Artifacts 18-841: Dependability Analysis of Middleware Philip Bianco John Robert Vorachat](https://reader036.vdocuments.mx/reader036/viewer/2022070418/56649eef5503460f94bff7ef/html5/thumbnails/29.jpg)
1
Conclusions
What would you do differently, if you could start the project from scratch now?
– Focus on state management earlier in the development.– Design on active replication much earlier.– Using the clients as callback servers is a very bad idea, makes Active
Replication and/or load balancing hard to impossible.– Have root access to the development machines.– Better configuration management, keeping better track of milestone versions.– Better test plans.– A little more structure to our team:
• Better meeting scheduling.• Agendas for meetings.• Minutes for meetings.• Maybe a team leader, possibly as a rotating assignment (i.e. 3 weeks/person)