dr. rahul razdan€¦ · dr. rahul razdan. outline •context •requirements •analysis and...
TRANSCRIPT
Empirical Results from the Transformation of a large Commercial Technical Computing Environment
Dr. Rahul Razdan
Outline
• Context
• Requirements
• Analysis and Project Launch
• Solution Architecture
• Results
• Future Work
Context
• Industry: Electronics Design Automation ($4B):
– Design Tools for Semi-Conductor Design($250B)
– Industry Norms: complexity, release management, M&A
• Group: Hardware Virtual Modeling
– Features: Run-time Performance, Infrastructure for Virtual
Experiments (Modeling, Testing, Coverage, Project Management)
– Four Sites, 250 developers, most advanced degrees, 3 recent mergers
Compelling Reasons for Change
• Acquisitions and Growth Impact
• Globally-distributed software development teams
• Expanded Product Line with new capability
• Introduction of newly-supported platforms
• Not organized for growth:
– Internally – product focused versus infrastructure focused
– Externally – new verification languages, OS changes
Environment: Technical Computing
• Run-Time Performance
• Customer Responsiveness
• Innovation
• Multiple Platforms/Release Cycles
• Limited Skilled Developers
Outline
• Context
• Requirements:
• Analysis and Project Launch
• Solution Architecture
• Results
• Future Work
SW Development Heartbeat
• Core SW Development:
– Build-Link-Validate Cycle
– Scaled for developers across geographies
– Scaled for multiple steams
• Accelerating Time-to-market or Increasing Quality involves fundamental restructure
of this core process across the enterprise.
• Measure with metrics
S
C B T
TriageBugs
CollectReqs
U
U
U
C B T
C B T
B TMerge
Enterprise-Wide Process MetricsOptimizing All Aspects of Productivity
• Productivity:
– time to first test
– incremental time to create new tests
– coverage/day
– gates/functions verified/engr mm
– time to derivative environments
• Predictability:
– total coverage
– coverage convergence rate
– bug convergence rate
– project resource & convergence stats• to plan next project better
• Quality:
– # respins
– # functional bugs ID'd in post silicon
– # field recalls
– breakout of hardware vs. software bugs
• Human Resource Utilization:
– % reuse of verification plans
– % reuse of verif’n environments
– % reuse of verification components
• Compute Resource Utilization:
– % of sims running 24x7
– cycles used for last 10% coverage
• Best Practices Deployment:
– Automation deployment level• block, chip, system, project levels
– Verification maturity scale
• Directed testing
• Automated testing
• Coverage driven
• Scalable coverage driven
– Reuse
Observations
• Issues:
– Release Planning Ad-hoc
– Test infrastructure large, complex, and difficult to handle with IT
environment (ex: performance)
– Coordination between sites very difficult
– Release management error-prone and high stress
– PV, CM, and PM not tier-one career paths
• Solutions: No commercial solutions available
Project Launch
• Organizational Decisions:
– Integration of PV, CM, and PM under GM (not popular)
– Develop a separately resourced project for infrastructure
• Project Launch:
– Pull two key architect level individuals for this work (very unpopular)
– Resource appropriately to do the job (capital, services)…many skeptics in
finance organization
– Explicitly manage the process of change
Outline
• Context
• Requirements:
• Analysis and Project Launch
• Solution Architecture
• Results
• Future Work
Environment Maturity Model• Motivation (the why) drives
downward from the upper
layers to trigger change
• Implementation and process
schema (the how) provide the
foundation for the model
Development
Environment
Development
Processes
Business
Processes
Infrastructure
Solutions
Foundations• Central storage structure
• Common language (Perl)
• Core modules
– Command-line processing
– Messaging and logging
– Common parsing framework
– Site customization
– Object Data Definitions
– Platform classification
Infrastructure and Policies• Infrastructure
– Fault-tolerant central storage
– Robust local network
– Controlled image configurations
– Dedicated servers
– Monitoring & Management
• Policies
– Managed growth
– Defined API for tools
– Resource management
Configuration Management• Tools
– ClearCase
– MultiSite
• Resources
– Central VOB/View servers
– Central registry and licensing servers
• Methodology
– Branching and Merging
– Trigger conventions
Project/Variant• Context for developer activity
– Policy-based control
– Standard build/install
• Managed data
– Dependency kits
– User environment
– Build components
– Project policy
– Configuration Management
– Testing environments
Server Farm• Foundation
– Hardware
– Tools
– DRM (eg LSF)
• Services
– Meeting the user need
• Bridging the gap
– Management services
– BuildJob
– TestJob
– AutoControl sequencing
KitExchange• Inter-project collaboration
– Software integration
– Distributed build support
– Development merge support
• Managed Data
– KitExchange meta-data
– Content depots
• Flexible Architecture
– Communication plugins
– Fall-back data sources
Development Processes• Coordination
– Merge schedules
– External dependency validation
• Quality
– Perpetual release readiness
• Release Engineering
– Decision criteria
– Unified Release
• Applied Governance
– Control and measurement
– Policies to address internal and external
compliance
– Drive consistency and best practices
– Benefits-driven model
Business Processes• Early Adopter engagements
• Requirements gathering
• Product release model
• Solutions integration
Results after 6 years
• 6.6 million avg daily tests [10.2 million
peak]
• 349 projects
• 7 sites [plus other satellite locations]
• 270 R&D/PV engineers
• 7 release streams [current and future]
• 2187 cpu [1041 hosts] server farm
200,000 avg daily tests
Unknown # of projects
1 site
80 R&D/PV engineers
3 release streams
? cpu [? hosts] server farm
Before After
Scaling capability while growing customer satisfaction !
And Less Chaos…
Outline
• Context
• Requirements:
• Analysis and Project Launch
• Solution Architecture
• Results
• Future Work
SW Integration Efforts Outweigh Pure HW
Factors into productivity, quality, predictability risks
10
4
1
Application
Middleware
OS
Firmware
HW
Solution
Whole ProductView
HW Mgr
SW Mgr
TeamRatios
The Desired Process
Software
Hardware
Chip
Design Phase
Concurrent Flow
Design BuildHW Integration
& Debug
CodeDesign
System
Integration
& SW Debug
Chip Debug FabDesign
The Process Today
Design Code
System
Integration
& Debug
Chip Respins
Design Phase Sequential Flow
FabDesign Chip
Debug
Software
Hardware
Chip
Design Build
Hardware
Integration
& Debug
Cadence & IBM Joint Customer Solution View
Design to Silicon
System Validation, Logic Signoff
SW Design,
Debug, and
Environment
HW Design,
Verification,
Implementation
System Level Design & Verification
Design & Verification Plan to Closure
System Wide Management
Verification Engineers
HW Design Engineers
System Validation Engineers
Embedded SW Developers
Exec & Project Manager
System Engineers
Embedded
SoftwareSoC
Verification
MethodologyLifecycle
Automation
Lessons Learned• Treat as a whole system – cannot look at piece parts
• Swallow hard and make the decision to go for it – it
must be central to the business
• Need to make the investment with the right focus
• Processes developed internally can open the door for an
infrastructure element in the products delivered to your
customers – leads to opportunity for IBM and Cadence
to partner further to deliver to the industry
• It’s not pie-in-the-sky. It works !!
Questions
32
NC-Verilog
P0 Activity on Previous
Releases
33
Customer Support Summary
• Overall Volume is decreasing and
resolution time is improving.
– Total number of days that support
cases are outstanding in August is
about a third of what it was in
June (6400 vs 17,000).
– Calls not resulting in a PCR is
down to 7.96 days. Calls
resulting in PCRs still high at
57.2 days.
• Satisfaction survey results
improved slightly in August.
115139 131 115 135
108 12298
88
105 104 114105
9190
67
38
43 57 4955
55 45
35
71
69 6468
63
50 43
71413
471 489 473500
408
283
10
12 3 51
4 6
2
2020
26
19
2220
14 15
19
1716 16
1515
12
-10
-5
0
5
10
15
20
25
30
0
100
200
300
400
500
600
Ju
n-0
1
Ju
l-01
Au
g-0
1
Sep
-01
Oct-0
1
No
v-0
1
Dec-0
1
Jan
-02
Feb
-02
Mar-0
2
Ap
r-02
May-0
2
Ju
n-0
2
Ju
l-02
Au
g-0
2
Other
SourceLink
Manual
Phone
Avg Calls/Day
511
6 36 5 5 7
13
36
23 2325
30
34
23
6
31
49 4142
34
25
32
6.26 6.29 6.266.13
66.13
5.65 5.98
0%
20%
40%
60%
80%
100%
Jan-02 (24) Feb-02 (78) Mar-02 (78) Apr-02 (67) May-02 (73) Jun-02 (69) Jul-02 (64) Aug-02 (62)
1
1.5
2
2.5
3
3.5
4
4.5
5
5.5
6
6.5
7
Poor<=33 Good (34-43) Excellent>=44 Average SRSI
SFV CUSTOMER SUPPORT SATISFACTION SURVEY
INCOMING YEAR TO YEAR VOLUME
SFV 2001-2002