Prepared by
Craig Taylor ([email protected])
Christopher Russo ([email protected])
Presentation to2002 OSISoft Users Conference
Large PI System Redundancy, Performance
and Security Strategies
Large PI System Redundancy, Performance
and Security Strategies
AgendaAgenda
California ISO System Overview
PI-UDS Hardware Cutover
PI-UDS Network Monitoring Tool
Large PI-UDS Primary Design Goals
Tips/Potential Problems for Large Systems
Discussion
California ISO FactoidsCalifornia ISO Factoids
Territories covered: Pacific Gas and Electric Southern California Edison Comision Federal de Electricidad
Covers 124,000 square miles
21,000 circuit miles of transmission
Approximately 600+ generators
45,000 Megawatt summer peak load
$23 billion energy consumed annually
PI System CutoverPI System Cutover
Energy Management System CutoverDecember 2001
From ABB Spider to ABB Ranger
Ranger system improved reliability
Included Universal Data Server hardware upgrades
150,000+ points Currently world’s largest single system
PI System Hardware ChangesPI System Hardware Changes
Component Spider System RANGER SystemSystem Compaq 6400R Compaq DL 580CPU's 4 x 500Mhz - 512Kbyte 4 x 700Mhz - 1MbyteMemory 1 Gigabytes 1 GigabytesController Smart Array 221 Controller Built into Mother BoardOutside Network 100 Megabit Full Duplex 100 Megabit Full DuplexBackup Network FDDI FDDI
EMC Symmetrix 3000 Compaq HSG80 ControllerDisk Storage 300 Gigabytes Brocade Silkworm 2800
300 Gigabytes
PI-UDS Issues Description and Containment
PI-UDS Issues Description and Containment
ISO experienced client disconnects:
Potential service denial causes Large DataLink data queries tied up PI Archiving Subsystem Network suspected in some cases Disk subsystem suspected
Question was: How do we identify and fix? We decided to get more network use information and identify pi hogs
We wrote program to organize PICONFIG data in web page
Monitoring UDS Use with PICONFIGMonitoring UDS Use with PICONFIG
Visual Basic program ran PICONFIG Command (every 5 minutes)
PICONFIG commands:
@login UDS_SERVER,piadmin,password,5450@mode list@table pinetmgrstats@ostru
ID,ConStatus,ConTime,ConType,MsgRecv,MsgSent,Name,PeerAddress,PeerName,PID,RecvErrors,SendErrors,BytesRecv,BytesSent
@select ID=*@ends
Monitoring UDS Use with PICONFIGMonitoring UDS Use with PICONFIG
PICONFIG Output:
1948,[0] Success,20-Feb-02 07:46:24,PI-API connection,224.,224.,pideE,IP.IP.IP.IP,PeerName,-1,0.,0.,8620.,42437
56,[0] Success,13-Feb-02 10:56:06,PI-API connection,107.,107.,pideE, IP.IP.IP.IP, PeerName,-1,0.,0.,4752.,19628
3000,[0] Success,4-Mar-02 05:55:29,PI-API connection,30.,30.,pideE, IP.IP.IP.IP, PeerName,-1,0.,0.,1600.,1081.
2727,[0] Success,4-Mar-02 02:07:11,PI-API connection,1282.,1282.,pideE, IP.IP.IP.IP, PeerName,-1,0.,0.,1.0027E+005,2.614E+005
2932,[0] Success,26-Feb-02 10:40:58,PI-API connection,1.0431E+005,1.0432E+005,pideE, IP.IP.IP.IP, PeerName,-1,0.,0.,4.2131E+005,2.5783E+006
3202,[0] Success,4-Mar-02 08:29:19,PI-API connection,15054,42290,pideE, IP.IP.IP.IP, PeerName,-1,0.,0.,7.0155E+005,1.2982E+008
8625,[0] Success,1-Mar-02 13:54:18,PI-API connection,79446,79446,PIPeE, IP.IP.IP.IP, PeerName,-1,0.,0.,4.1275E+007,1.934E+007
The important information is: PeerName (The person’s computer) Connection Time BytesRecv and BytesSent (Data amount transferred)
Monitoring UDS Use with PICONFIGMonitoring UDS Use with PICONFIG
PICONFIG Output:pinetmgrstats ValueID 1948ConStatus [0] SuccessConTime 4-Mar-02 02:07:11ConType PI-API connectionMsgRecv 1282MsgSent 1282Name pideEPeerAddress IP.IP.IP.IPPeerName PeerNameRecvErrors -1SendErrors 0.BytesRecv 1.00E+05BytesSent 2.61E+05
Monitoring UDS Use with PICONFIGMonitoring UDS Use with PICONFIG
IP.IP.IP.IP, Peer NameIP.IP.IP.IP, Peer NameIP.IP.IP.IP, Peer NameIP.IP.IP.IP, Peer NameIP.IP.IP.IP, Peer NameIP.IP.IP.IP, Peer NameIP.IP.IP.IP, Peer NameIP.IP.IP.IP, Peer NameIP.IP.IP.IP, Peer NameIP.IP.IP.IP, Peer NameIP.IP.IP.IP, Peer NameIP.IP.IP.IP, Peer NameIP.IP.IP.IP, Peer NameIP.IP.IP.IP, Peer NameIP.IP.IP.IP, Peer NameIP.IP.IP.IP, Peer NameIP.IP.IP.IP, Peer Name
Monitoring UDS Use with PICONFIGMonitoring UDS Use with PICONFIG
NOTE: IP AddressesAnd
Peer NamesRemoved
forSecurity
Monitoring Tool BenefitsMonitoring Tool Benefits
Enabled quick PI user “hog” identification
Helped to identifying and optimize abusive queries
PI System Hardware ChangesPI System Hardware Changes
Component Spider System RANGER SystemSystem Compaq 6400R Compaq DL 580CPU's 4 x 500Mhz - 512Kbyte 4 x 700Mhz - 1MbyteMemory 1 Gigabytes 1 GigabytesController Smart Array 221 Controller Built into Mother BoardOutside Network 100 Megabit Full Duplex 100 Megabit Full DuplexBackup Network FDDI FDDI
EMC Symmetrix 3000 Compaq HSG80 ControllerDisk Storage 300 Gigabytes Brocade Silkworm 2800
300 Gigabytes
Christopher RussoRusso & Associates
www.russoandassociates.com
Large PI-UDS Primary Design Goals
Tips/Potential Problems for Large Systems
Designing Large PI SystemsPrimary Goals
Designing Large PI SystemsPrimary Goals
Reliability & Robustness Avoiding single-element failure
Redundancy Clustering Solutions
• Hardware (GeoCluster, Marathon, Legato, EMC)• Software (MSCS, Unix)
Performance Server tweaking
• Archive parameters, bottlenecks Redundant Solutions
• IP Load Balancing• Different Servers• PI to PI Distribution
Peak Performance TipsPeak Performance Tips Decide what you’re realistically going to need
Consider dedicated systems for specific applications
OS/Hardware Specific Network performance and latency Disk performance: disk striping, fiber-channel, dedicated
hardware Processor, network subsystem
PI-Specific Parameters Archive Subsystem Tuning
• Archive cache record tuning Update subsystem tuning
• Tuning for real-time performance versus historical retrieval• Considerations for totalizer users
“Pre-digesting” of specific calculations
Large System Potential ProblemsLarge System Potential Problems
Performance monitoring only manages some bottlenecks Certain requests can still “nuke” the server
Loading and unloading records into memory is time-consuming
Shutdown times increase linearly with archive ratio size
Memory image cannot exceed 2 GB
Microsoft IP Load-Balancing doesn’t help PI Connections are “stateful” and are not all equal
Achieving “bumpless” transfer is difficult without hardware solutions
Archive Subsystem BottlenecksArchive Subsystem Bottlenecks
Archive Cache
Rate at which archive cachRate at which dirty archivArchive cache records in mArchive cache disk reads. Archive cache disk writes.Archive cache memory hits.
9/14/2001 9:05:00 9/14/2001 9:20:0015.00 Min(s)
2
4
6
8
10
0
12
0
12
33500
38000
0
350
1.24
1.4
0
5000
Our PI 4 Wish ListOur PI 4 Wish List
A true multi-threaded archive subsystem
A connection and request logging facility Not just who and how much, but what
A way to restrict expensive API queries or users
A point-database change-log feature The issue of “meta-data” Some current workarounds with scripts Implemented in PI 3.3
Questions & Discussion