2005 07 19 ivt integration techniques
TRANSCRIPT
http://www.amphora-research.com/
Integration Techniques for ELNs
• My background• Why do we need to integrate ELNs?• Why kinds of integration do we need to do?• What prerequisites are there?• Some examples of technologies and techniques• Summary
• You can download copies of this presentation from our web site
2
http://www.amphora-research.com/
My background
• MEng in Information Systems Engineering• First “ELN” was a consulting project for Kodak
• Started in 1996• Completely electronic, fully integrated• Thousands of users, worldwide
• This grew into Amphora• Merged with PatentPad in 2003
• Paper or electronic records according to legal preference
• Scientists still get an “Electronic” system• Partner with a wide variety of “ELN” vendors
• Member of CENSA, working on long term records, serving on Steering Team
3
http://www.amphora-research.com/
Experience
• Primarily in ELNs for discovery• Where patents are a major concern• I am sure some of this is relevant to regulated areas,
but that’s not my focus• Work a lot with other “ELN” vendors
• Seldom do you buy one system• Which means we end up seeing a lot of integration!
• In a variety of industries, all sizes of deployment• Pharma• Biotech• Chemicals
• Customers around the world, offices in the US & the UK
4
http://www.amphora-research.com/
What’s an ELN?
• The term “ELN” is now used to described a wide variety of systems• Science specific
• Reaction planning tools, Cheminformatics databases, structure drawing tools
• Analysis packages, LIMS• Workflow tools
• General • Knowledge/Document Management• Scientific data management
• Laptop/Tablet computers
5
http://www.amphora-research.com/
Observations
• The term “ELN”• Is so ambiguous it can mean almost anything
(especially to a marketing person)• Doesn’t help us much from a systems architecture
perspective• A company is unlikely to have just one system that
could be called an “ELN”• Those ELNs will need to integrate with your
existing & future systems• Your needs will change with time, so you need to
be able to protect your investment• In data• In tools• In processes
6
http://www.amphora-research.com/
Deconstructing “ELN”
7
“Broad” aspectsSecurity, Collaboration, Patent Protection
etc.
A B C D
• At first sight an ELN project success can look very complex
• ELN functionality can be split into two dimensions• Some aspects are common to everyone• Other requirements are specific to a particular group of
scientists• Splitting out the functionality into these dimensions really
helps to keep you sane
http://www.amphora-research.com/
Benefits
• The corporate functions (Legal, Records, etc.) can buy/provide a system that provides a service to the niche-specific systems• Meet corporate requirements for records etc.• Provide a cross-discipline collaboration
• The individual niches can buy/find systems to support their specific needs• Leverage existing investments• Justified according to the benefits they bring• Removes any need to balance competing requirements• Reduce the need
• Systems can be acquired/purchased in a phased approach tailored to the needs & requirements of the business
• Life is a lot less stressful
8
http://www.amphora-research.com/
Different levels of abstraction
9
“Broad” aspects
A B C D
ProjectsExperiments
ReportsRaw Data
The “Experiment” is generally the boundary between Broad Vs Deep
systems
http://www.amphora-research.com/
Types of integration
10
“Broad” aspects
A B C D
Broad/Deep boundary is often exposed as
network-level services which are relatively
standardized
Integrations between different niche systems is generally custom
http://www.amphora-research.com/
What prerequisites are there?
• From your ELN product(s)• Open Interfaces• Open Data
• Plumbing• Various technologies, some simple, some more complex• Expertise - often in-house, sometimes consultants
• Good news - the Open Source movement is really helpful• Tools & techniques• Drive for openness
• Remember: you need to ask your vendor for all of the “Open” stuff before you sign the order
11
http://www.amphora-research.com/
Open Interfaces
• What’s an “Interface”?• Where one system “prods” another to do something• Or get some information out• Or put some information in• Generally some data is passed back & forth
• What’s “open”?• Something you can use without undue burden or
barrier• This covers both commercial and technical aspects• Concerns are very similar to those involved with Open
Data
12
http://www.amphora-research.com/
Open Data
• This is currently a bit of a blind spot for purchasers of IT systems
• Unfortunately, Open Data is absolutely critical• For long term records• For your ability to build up an integrated system• To protect your IP (partly from a patent perspective, but
mainly from a re-use aspect)• To maintain a balanced relationship with your vendors
• This absolutely needs to be part of the ELN purchasing process
13
http://www.amphora-research.com/14
• Publicly documented• Legally unencumbered
• No patents, copyright concerns etc.• Any patents or copyright must be in the public domain
• Ideally, self documenting (XML is a good start)• Degrade gracefully
• If you can’t the data, at least you can see a picture• Based on more open, primitive formats where
possible• At least two implementations of readers, one of
which is Open Source• Widely used (W3C or IETF standards are good
signs)
“Good” (open) file formats
http://www.amphora-research.com/15
• Good• For text: Plain ASCII, Unicode, HTML, possibly RTF• For graphics: PNG, SVG• For structured data: XML• To preserve appearance: PDF
• Worry about• Storing files in databases
• The database file format is probably undocumented• Store objects on the file system and use the
database to point to them• Anything that is proprietary - there’s no excuse for it,
and it dramatically increases your risk• Binary files generally• Mixing content in files (e.g. embedding XML in PDF)• Proprietary digital signatures
Data formats for the long term
http://www.amphora-research.com/16
IP concerns & data formats
• Companies have always used Proprietary Data Formats as a competitive weapon
• Companies are waking up to the use of IP tools (licenses, patents, copyrights) to reinforce their control over data formats
• Just because a format is published doesn’t mean it is open• The Microsoft Office XML formats are a particularly
bad example• Right now it looks positively radioactive• They’re being very careful what they say which
indicates to me they’re planning something• http://www.groklaw.net/article.php?
story=20050330133833843• (see section: 4. Dissecting Microsoft’s “Patent License”)
http://www.amphora-research.com/17
• There are so many to choose from!• Two key ways of generating “Standards”
• De Facto - dominant supplier/format• De Jure - committee based
• Who gets to “bless” a standard?• What makes a “good standard”
• De Jure process has difficulty keeping up with the real world
• De Facto process has risk of lock-in• Pragmatic approach
• Expect your suppliers to use open file formats• If there is an acceptable standard, use it• Make sure you are using the right kind of format for
each purpose
Standards
http://www.amphora-research.com/
Technologies and techniques
• There are a wide variety of tools you can use to integrate IT systems• Tight Vs Loose coupling• Synchronous Vs Asynchronous• Text Vs Binary• Proprietary Vs Open• Simple Vs Complex
• As a rule• Loose is cheaper than Tight coupling• Asynchronous is easier to manage than
Synchronous• Text is easier to work with, and more flexible than
Binary• Open interfaces are always better than Proprietary• Simple are better Complex approaches
18
http://www.amphora-research.com/
Considerations when picking tools
• Use stable interfaces• Get a commitment from the vendor about what they’ll
keep stable across version upgrades• Use public, documented interfaces• Sample code is really really useful• Pick language-neutral interfaces where possible• Platform-neutrality
• Doesn’t worry (too much) about locking yourself into Windows on the client
• But if you lock yourself to Windows on the server, it is going to hurt
19
http://www.amphora-research.com/
Glue Languages
• There are a number of really useful “Glue” languages around• Python (and Jython, and other relatives)• Perl (although I have some concerns about
maintainability)• Groovy, Beanshell, etc.
• All of them• Play well with XML, http, SOAP etc.• Play well OLE• Are cross platform
• My personal preference is Python• You can learn it in a matter of hours• You can read other people’s code• It does everything I need it to do
20
http://www.amphora-research.com/
Cool stuff
• SOAP/Web Servers• Valuable in many areas• But don’t treat it as a religion• There are lighter alternatives which bring most of the
benefits for much less effort• The whole WS-* effort seems to have got out of control
• REST (XML over http) - a lighter alternative to SOAP
• File swapping (generally, in XML)• HTTP GET/POST
• Wonderfully easy to debug!• Very flexible
21
http://www.amphora-research.com/
Nice things to see
• Integration points exposed as stable URLs• For example, our PatentSafe product, we have
committed to stable URL formats to• Submit a record via http (content & metadata)• Get a record for display to the user
• These can be used by other systems• And also embedded in Word documents...
• Lack of wheel re-invention• e.g. LDAP is The One True place for user information• e.g. RSS/Atom is The One True alerting mechanism
• Example code• In multiple languages
22
http://www.amphora-research.com/
Here be dragons
• OLE - some times it is unavoidable (e.g. UI stuff), but avoid it when you can• Tight coupling• Buggy• Proprietary• Reduces your platform options• File format issues are awful• Version-to-version compatabilty is “interesting”
• Direct database access• Tight coupling• Difficult to guarantee system integrity• If you wrote both systems you might want to do this
23
http://www.amphora-research.com/24
• Definitely one to watch• Not the “Free” lunch you might think, but a
pragmatic business too• Examples
• Linux• Postgres• JBoss, Tomcat etc.• Ghostscript
• Open Source is part of everyone’s infrastructure• Make sure you can run your systems on a variety of
platforms
Open Source
http://www.amphora-research.com/25
• Good for records• Gives you top-to-bottom control
• Good for TCO• We’re finding the Open Source infrastructure easier to
setup and reliable than proprietary alternatives• Enables a better solution
• Transparent systems mean you can do things the original designers didn't think of
• This is especially important for ELNs
Why?
http://www.amphora-research.com/26
• XML generally (what did we ever do without it)• Jabber (as computer messaging and IM framework)• Portals & Portlets
• Especially JSR168, WSRP• Remember you may well want to portalize any useful application
• AJAX• Google is my hero• You can build usable, functional Web Applications• If you haven’t seen GMail I can send you an “invite”
• VMWare - virtualize your world• Wow• Great for serve consolidation, great for testing, great for
development• Wikis
• Beginning to turn into a lightweight application environment
Other stuff to watch
http://www.amphora-research.com/
Trends to watch
• File format nasties• Closed/Private interfaces
• Unlikely to be stable• DMCA and other copyright legislation
27
http://www.amphora-research.com/
Summary
• You’ll be assembling an “ELN System” from a series of components• Some you have, some you’ll build, some you’ll buy
• Get the open stuff before you sign the deal• Open, documented, stable interfaces• Open file formats
• Use open, loosely coupled approaches where possible
• If you can, keep the capability to own the integration issues in-house
28
http://www.amphora-research.com/
Contact information
• Web site: http://www.amphora-research.com• EMail: [email protected]• Phone (US): (513) 697 4764• Phone (UK): +44 (0)845 2300160 x2001• AIM: [email protected]• Skype: sjcoles
29