the role of java in high energy physics slac colloquium - march 8 th 1999 tony johnson
TRANSCRIPT
Outline Of Talk
• Introduction to Java– Features, Pro’s and Con’s
• Applicability to HEP– GUI’s and Event Displays
• Wired - Babar Event Display- CMS event display
– Offline Simulation and Reconstruction• hep.lcd package for Linear Collider Detector studies
– Data Analysis• Java Analysis Studio and Related tools
– Conclusion
History of Java• 1991 James Gosling at Sun creates Java language (née Oak)
– Targeted at consumer electronics - cable top boxes, VCR, TV etc.
– Goal was reliability not speed
• 1994 Hot Java Web browser written (in Java)– Supports Applets - Downloadable programs that run inside web browser
– Java licensed by Netscape, Oracle, Microsoft many others
• Huge hype surrounding “Web Programming language”
• 1997 Java 1.1 released with many standard libraries– Sun’s mantra becomes “Write Once Run Anywhere”
– Enthusiastically supported by all major hardware and many software vendors
– Microsoft begins to have second thoughts
• 1998 Java 2 released, even more standard libraries– Now truly general purpose language
– Sun (and DOJ) sue Microsoft
– Java returns to its roots? - Jini
Java Architecture• More than just a Web Tool
– Java is a fully functional, platform independent, object-oriented language
– Powerful set of machine independent libraries, including GUI library.
• Totally Buzzword Compliant
– Simple, Object Orientated, Distributed, Dynamic, Robust, Secure, Architecture Neural, Portable, High Performance, Multithreaded.
• Interpreted?
Java Source code
Java “Bytecodes”
Compiler
Mac Unix PC
Bytecode
Interpreter
JITCompiler
Machine Code
– Compiled + Interpreted.– Hotspot may make Java faster than statically
compiled languages.
Java Features• Simple
– But not trivial…you need to read a book• Syntax very close to C++
– No backwards compatibility issues– Some features of C++ which add undue complexity dropped.– Good stepping stone to (or from) C++
• Clean and Efficient Object-Oriented Language
– Language features guide programmer toward reliable programming habits
• Robust• Extensive Compile-Time checking of code
• Second level of run-time checking of code
• Memory management done by system, not by programmer
• No pointers to mess up (Java uses references rather than pointers)
– Chances of program running as designed without the need for time-consuming debugging is greatly increased.
Java Features (continued)
• Highly Portable– Java works today on NT, Win95/98, Unix (including Linux), Mac, VMS
• Personal Java - Windows CE, Palm Pilot
– Programs written in Java are very portable• Move to another platform and it just works
– Care needed with AWT GUI components (obsolete) and web browsers
– Lifetime of HEP experiments > OS lifetime. • Lifetime of Java > Lifetime of HEP experiment??
• Encourages true modularity– Build entire framework for HEP experiment in Java
– Abstract away underlying systems (batch system, IO system etc.)
Java Features (continued)
• Distributed– Built in support for Internet protocols, URL’s, HTTP, Remote Method
Invocation, Corba, Database access etc.
• Secure– Bytecode “verifier”, padded cell (c.f. Web Browser)
• Multithreaded– Language has direct support for multithreading
• Dynamic
– Libraries can change without recompiling programs that use them
– Can dynamically load and unload code during program execution
– Can move objects across the network (agents), or store them in databases and retrieve them later.
Java Libraries and API’s
• Standard Libraries and API’s– 2D + 3D graphics + GUI (Swing) + Imaging + Printing
– Database connectivity (JDBC) + ODMG
– Collections, IO (Serialization), Data Compression
– Networking, Sockets, SSL, Corba, RMI
– Java Beans (components), Help
– Multimedia, Sound, Speech
– Security, Code Signing, Cryptography
– Math, Arbitrary Precision Math
– Shared Data (Collaborative Applications)
• Huge “Community-Ware” software archive– IBM alone has hundreds of Java resources on its Alphaworks site
Java Tools
• Popularity of Java = many tools• And they are cheap (or even free)
– Development Environments (IDE’s)• Editor, Compiler, Debugger,
WYSIWYG GUI designer, Source control
– Automatic Documentation generators
– Memory and CPU Optimizers• Since debugging time is minimal you
might actually have time to use them
– Object Modelers
• Many commercial sets of components
Java Limitations?
• No operator overloading– Annoying for complex numbers, matrices, 3/4-vectors
– Perhaps more often abused than sensibly used
• Floating point performance– Requirement for identical results on all platforms can be a problem
– Being addressed by Java Grande Forum + Sun
• http://www.javagrande.org/
• Bugs sometimes slow to be fixed– Printing, Imaging existed for >1 year
– Perhaps “Community Source License” will help
• Little control over Memory Allocation• Integration with C++ could be better
Where would HEP use Java?
GUI systems online + control (not really any alternative) Event Display
Reconstruction+Simulation packages? Data Analysis tasks
Offline Online
Event Generators
Event Displays• WIRED
– Experiment independent event display framework developed at CERN by
• Mark Donszelmann– Good example of modular
design and code reusability– Client Server model
• Event Display runs on desktop– Uses Swing and 2D API– Has rich library of projections
(e.g. Fisheye)
– Being used by Babar• Fetches data using CORBA
• CMS– Uses Java binding to
Objectivity
– Java 3D API
Java for Reconstruction/Simulation
• Dual Goals:– Contribute to Linear Collider
Detector/Physics Studies
– Experiment with using Java for full offline reconstruction and analysis package
LC Detector studies in US • Goals:
– Detailed Study of physics processes in a variety of possible LC Detectors.
• Reference Small and Large detectors
– Full simulation with GISMO • Switch to Geant4, when ready
– Analysis using • Paw
• C++ & Root
• Java & JAS
– Software Requirements
• Flexibly handle different detector geometries and technologies
• Rapid development of variety of reconstruction and analysis algorithms
Why Java for HEP Computing?• Previous generation of experiments
used Fortran + Data Management System (== Jazelle, Zebra, BOS)
• Solves Three Problems
– Ability to Represent Complex Data Structures
– Persistence (i.e. read in and write out complex structures)
– Run time access to named data in structures (for analysis)
• Now time has marched on and modern experiments use C++ Represent Complex Data Persistence Run time access to data
– Still need to build (or buy and deploy) data management system (e.g. Root, Objectivity)
• Java Represent Complex Data Persistence (serialization) Run time access to data
(reflection)
– support built-in to language
Java package hep.lcd• Reconstruction Processors
– Track finder written, track fitter in progress
– Several clustering algorithms
• Parameterized MC Processors– Can read generator input or
Gismo output
– Track and Cluster smearing
• Analysis Utilities– Event Shape + Thrust utilities
– Jet finder [Jade, Durham]
– Histograming
• Event Display– Simple 2D Event display currently
– Plan to use WIRED later
• Framework– Driver framework
• interactively control
– calling of processors
– debugging/histograming
– Parameter (Constant) access• driven by detector geometry
– MC event input (StdHEP format)
– IO system based on Java IO• random access files
– Can be run inside JAS or standalone
Is Java fast Enough for HEP offline?• Current (266Mhz PII, JDK 1.1.7)
• Clustering .6 secs/event – 13.5 Million Calorimeter Cells
• Fast MC 6 ms/event
• Track Finding ~2secs/event
– Very competitive with Root implementation
• Getting even better!!!• JDK 1.2
• HotSpot - Run-time optimization
• In real life may be faster than C++ (c.f. Babar)• Better, cheaper analysis tools
• Manageable complexity
Java for Reconstruction/Simulation
• Looks very promising– Have been able to develop framework very fast
– People have no problem learning and using it
– Performance looks good
• Future
– Standard Java interface to Geant4?– Real difficulty in porting offline code from platform to platform is not
core reconstruction code, but framework• Batch System, scripting languages, setup
• Java could also be used for this– Abstract interfaces to Batch System etc.
What if I just want to user Java for data analysis?
Java Analysis Studio
• Experiment independent analysis tools for High Energy Physics data
What is Java Analysis Studio?• Set of experiment independent analysis tools for event
oriented (High Energy Physics) data – Data Access classes provide access to many common HEP data formats
– Histogram Accumulation + Manipulation Classes
– Plot Display classes
– Lightweight framework for users to create physics analysis applications in Java.
• Tools work alone, in combination, or within– Java Analysis Studio GUI which gives:
• Integrated editor and compiler
• Efficient access to local and remote data
• Extensibility via Plug-ins, Fitters, Functions etc.
Data Access Classes• Currently Support
– PAW n-tuples, Histoscope Files, Hippo n-tuples, any SQL database, flat-file n-tuples, StdHep MC files, Random access Java files
– Any data format via user supplied Data Interface Module
• Experimenting with– Objectivity - Caltech (CMS), DESY
– Root - see http://www-sldnt.slac.stanford.edu/tony/root
– Direct access to MC generators
• Future– XML, CDF/HDF
Data Access Classes
• Supports both n-tuples and Structured Data– n-tuples are fast and allow for simplifications in GUI
• Simple Interactive cuts
• Simple plot generation
but n-tuples ultimately limiting
– Arbitrarily Structured Data provides maximum flexibility
• Requires slightly more work from end-user
• Complete Object Oriented Analysis Environment
• Flexible enough to write (or prototype) reconstruction code
Data Access Classes
• Analyze local or remote data
– User interface independent of Data Location– Does not assume fast network (works well at 28.8 bps]– Analysis code moves (transparently) to data
Desktop Client DIM
Local Data
Network Data Server DIM
Remote Data
Remote Data Analysis
GUIDataAnalysis Engine
UsersJava Code
ExperimentInterface
JavaCompiler +Debugger
ExperimentExtensions(Event Display)
TCP/IP Network
Padded Cell
C++ Code
Data•Zebra•Jazelle•Paw•Root•Objectivity
Distributed Data Analysis
Network Data Server
Desktop Client
Network Data Controller
Distributed DataData Server DIMData Server DIMData Server DIMData Server DIMData Server DIMData Server DIM
Histogram Filling+Manipulation
• Histogram delegates binning to: – “partition classes” [idea stolen from LHC++]
• Map from X, Y to bin number– supports real, integer, string, date’s etc.
• Calculation of contents, errors in bin– allows efficiency plots, mean/rms plots etc
• Data storage method – immediate binning (c.f. Hbook)
– delayed binning, allows rebinning, axis changes via GUI
– Many standard partitions provided.
– User can provide own partition functions
Plot Display Package
• 1-d/2-d Histogram/ScatterPlot Display– multiple axes, direct user interaction, overlays,
fitting
JAS Availability• 1.0 Beta currently available
– Windows (NT, 95, 98) + Unix (Solaris+Linux)
– Installed on Solaris at SLAC (/usr/local/bin/jas)
– Limitations• Detailed documentation still under development
• May still be some changes to user API
– Download from: http://www-sldnt.slac.stanford.edu/jas
• 1.0 Final release in April– More plot types– More flexible control of histograms
• Ability to easily compare multiple datasets
– More n-tuple handling tools (c.f. HippoDraw)
– Greatly improved printing
Collaboration
• CERN– WIRED (M. Donszelmann)
• Integrate Wired event display as “Plug In”
– LHC++ • will use JAS as alternative GUI to Explorer
• will provide interface to Gemini fitter
• DESY, Caltech - Objectivity interface
• FNAL, interface to histoscope, online-control
More Info
• Java Analysis Studio– http://www-sldnt.slac.stanford.edu/jas
• Please give us feedback– [email protected]
• Mailing List:– http://www.slac.stanford.edu/cgi-bin/lwgate/JAS-L/
• Also general mailing list for Java in HEP:– http://www.slac.stanford.edu/cgi-bin/lwgate/HEP-JAVA/
Acknowledgments• JAS Programming
– Kevin Garwood, Jonas Gifford, Azhar Zuberi
• JAS Ideas Borrowed From– LHC++(YemI Adesanya), Histoscope (Paul LeBrun et al), HippoDraw
(Paul Kunz et al)
• Babar Event Display/Corba/WIRED– Joe Perl, Serge Du, Mark Donszelmann
• CMS Event Display– Julian Bunn, Rick Wilkinson
• hep.lcd Programming– Joanne Bogart, Mike Ronan, Gary Bower
• LCD Gismo/Root Programming– Richard Dubois, Rob Shanks
Conclusions
• Java is a very useful language+environment that could be very beneficial to HEP in many areas.
• Could Java be used for entire offline for major experiment?
– Technically - Yes
– Will Java Survive long enough?• Need ISO standard
• Need to see how market forces play out.
• Programming in Java is Fun!!– Spend time architecting an elegant solution to problem to be solved
• Not– Reinventing the wheel,
– Debugging someone else’s problem
– Porting to different platforms