advanced excel technologies in early development applications

33
[email protected] zer.com Advanced Excel Technologies in Early Development Applications Brian Bissett Molecular Properties Group Pfizer Global Research & Development Groton CT

Upload: brian-bissett

Post on 27-Jun-2015

595 views

Category:

Documents


2 download

DESCRIPTION

Advanced Excel Technologies in Early Development Applications presented at ISLAR 2003. Introduction to Automated Data Analysis Using Excel.

TRANSCRIPT

Page 1: Advanced Excel Technologies In Early Development Applications

[email protected]

Advanced Excel Technologies in Early Development Applications

Brian Bissett

Molecular Properties Group

Pfizer Global Research & Development

Groton CT

Page 2: Advanced Excel Technologies In Early Development Applications

[email protected]

Presentation Overview

• Advantages/Disadvantages of Excel– The Good, the Inadequate, the Aborted.

• Example Applications– ELogD Assay– Solubility Assay– Stability Assay

• Proven Techniques– Developing Specifications– Windowing– 3 ’s and you’re out.– DDE, SendKeys, OLE

• Demos & Wrap Up

Page 3: Advanced Excel Technologies In Early Development Applications

[email protected]

The Good

• Excel is Ubiquitous.• Excel can read a wide variety of File Formats.• Excel can write a wide variety of File Types (*.xls,

*.csv, *.txt, etc.)• Excel can communicate with other applications

through both DDE (although no longer “officially” supported.) and the more up to date OLE protocol.

• Excel has the VBA macro language built in and has the most comprehensive “toolkit” of properties and methods available in the commercial spreadsheet market.

Page 4: Advanced Excel Technologies In Early Development Applications

[email protected]

The Inadequate

• The “Object” model has “holes” in it. Especially with regard to the autocomplete feature.

• Higher Math is a problem.

• It Crashes (a lot).• Misbehavior. (Worse than Crashing) Properties and

Methods which should be available suddenly no longer function. The Cure: Reboot Windows.

Page 5: Advanced Excel Technologies In Early Development Applications

[email protected]

The Aborted

Memory Leaks in Excel necessitate numerous reboots of the Windows Operating System when doing intensive development work.

An example of one of the many instances when Excel self aborted its operation due to memory issues present in the application.

Page 6: Advanced Excel Technologies In Early Development Applications

[email protected]

What Tasks should be Automated with Excel?

• Data – Analysis– Extraction– Parsing– Reporting– Uploading

• Reports• Laboratory Notebook Entries

Page 7: Advanced Excel Technologies In Early Development Applications

[email protected]

Specifications – The Basics• Inputs

– Files• Data Files from Machines• Files with Information from In-house.

– GUI• Parameters to be Calculated

– What needs to be calculated.– Where do I get the information required to calculate it?

• Outputs– Reports– Files

• Database Upload• Laboratory Notebooks• Reports

Page 8: Advanced Excel Technologies In Early Development Applications

[email protected]

ELogD Automated Analysis Macro - Requirements

• Load a comma delimited data file from an Agilent HPLC.

• Sort the data in the file.• Determine the largest peak in a data series.• Extract the retention time corresponding to the

largest peak.• Perform some calculations (regression, formulas).• Prepare a Report.• Prepare an Uploadable file for the corporate

database.

Page 9: Advanced Excel Technologies In Early Development Applications

[email protected]

ELogD Automated Analysis Macro GUI

Page 10: Advanced Excel Technologies In Early Development Applications

[email protected]

A Sample ELogD Report – Columns Shown

Page 11: Advanced Excel Technologies In Early Development Applications

[email protected]

A Sample ELogD Report – Columns Hidden

Page 12: Advanced Excel Technologies In Early Development Applications

[email protected]

Kinetic Solubility Macro - Requirements

• Load in several comma delimited data files from a Labsystems Platereader.

• Load an Excel Spreadsheet which contains information from the Candidate Enhancement Group about the Compounds to be Assayed.

• Average successive well readings and remove “outlier” values.

• Determine wells where light scattering indicates compound has come out of solution

• Determine Corresponding Solubility.• Prepare a Report.• Prepare an Uploadable file for the corporate

database.

Page 13: Advanced Excel Technologies In Early Development Applications

[email protected]

Kinetic Solubility Assay Data Analysis GUI

Page 14: Advanced Excel Technologies In Early Development Applications

[email protected]

Kinetic Solubility Data Report Worksheet

Page 15: Advanced Excel Technologies In Early Development Applications

[email protected]

Stability Assay Macro - Requirements

• Load in several comma delimited data files from an ESA coul array instrument.

• Load an Excel Spreadsheet which contains information from the Candidate Enhancement Group about the Compounds to be Assayed.

• Check for the presence of a UV signal for each sample.• Find Dominant Potentials (DP’s) and Potential Dominant

Potentials (PDP’s).• Assign a Rank in terms of stability.• Prepare a Report.• Prepare an Uploadable file for the corporate database.

Page 16: Advanced Excel Technologies In Early Development Applications

[email protected]

Stability Assay Graphical User Interface

Page 17: Advanced Excel Technologies In Early Development Applications

[email protected]

Stability Assay Data Report

Page 18: Advanced Excel Technologies In Early Development Applications

[email protected]

Developing Specifications

• Every Assay begins with an Idea.• The Idea is tested to check its validity.• If the Idea is feasible, it will go through a period of

refinement until a process has been developed.

Page 19: Advanced Excel Technologies In Early Development Applications

[email protected]

Common Pitfall

The macro meets the designed specification but it does not extract the parameters the user wanted.

Page 20: Advanced Excel Technologies In Early Development Applications

[email protected]

The Basic Problem

“People by their nature tend to be flexible in interpreting data while algorithms tend to be very rigid (by design) in analyzing data.”

Page 21: Advanced Excel Technologies In Early Development Applications

[email protected]

Example

Agreed upon specification:

Extract an Area Corresponding within ± 1.0% of the given retention time.

peak Ret. Time Area1 1.24 10242 2.25 7683 4.45 202

Further suppose the 2nd peak is the peak of interest. In this case with an ideal retention time of 2.25 and an area of 768.

Page 22: Advanced Excel Technologies In Early Development Applications

[email protected]

Rigid Window Limits

If the “given” retention time is above 2.2275 and below 2.2725, then the area of 768 will be extracted by the program.

Hence, a rigid window has been formed based on a value ± 1.0% of the ideal retention time.

lower 2.2275upper 2.2725

peak Ret. Time Area1 1.24 10242 2.25 7683 4.45 202

Page 23: Advanced Excel Technologies In Early Development Applications

[email protected]

The Problem with Rigid Window Limits

But what happens when the given RT = 2.2265 or = 2.2735 ?

Since the RT falls outside the Rigid Window Limits, no Area will be extracted.

While this meets the specification, invariably a scientist will say what follows on the next slide to you.

lower 2.2275upper 2.2725

peak Ret. Time Area1 1.24 10242 2.25 7683 4.45 202

Page 24: Advanced Excel Technologies In Early Development Applications

[email protected]

The Classic Complaints

“Your Macro doesn’t work. I would have extracted the area corresponding to Retention Time X.”

To the scientist it doesn’t matter that Retention Time X falls outside of the specified window, if that’s what he/she would have chosen, that’s what they expect to see.

Algorithms however don’t care what you want to see, they merely report that which falls within the given specifications or parameters.

Page 25: Advanced Excel Technologies In Early Development Applications

[email protected]

The Solution: Window Widening

• Rather than have a fixed window for a parameter to fall into, a field of ranges can be set up for a parameter to fall within. If the parameter falls within any of the ranges it will be picked up.

• An example of such ranges could be:– Ideal– High– Max

• The program would first scan for the parameter to be extracted within the “Ideal” range.

Page 26: Advanced Excel Technologies In Early Development Applications

[email protected]

The Solution: Window Widening• If the parameter cannot be found within the “Ideal”

range then it would begin searching for an appropriate parameter by Widening the Window.

• A maximum window size must be set as well as a delta (or increment) for the window to be widened on each successive pass of the search.

• Recursive calls are made to the searching subroutine widening the window on each successive pass by the corresponding delta.

• The search is complete when a parameter is found to extract or the maximum window size is reached.

• The extracted parameter can be color coded in the report to reflect the Range from which it was extracted.

Page 27: Advanced Excel Technologies In Early Development Applications

[email protected]

The Solution: Window Widening

limits Ret. Timehigh max 2.0250high flag 2.1375low ideal 2.2275high ideal 2.2725high flag 2.3625high max 2.4750

i

h

h

m

m

123

Window Range

Page 28: Advanced Excel Technologies In Early Development Applications

[email protected]

3 ’s and You’re Out.

Outlier Removal

When analyzing multiple data series it is best to remove outlier values, those greater than 3 ’s from the mean. This is an especially useful tool when analyzing solubility data (scattering).

Page 29: Advanced Excel Technologies In Early Development Applications

[email protected]

For the tasks Excel doesn’t Excel At.• Excel like Lotus evolved as a “bean counting”

application, not an application for scientific development.

• As Excel began to be utilized for scientific development, more and more add-in (or third party) applications became available to enhance Excel’s limited capabilities.

• Some of the better third party products can be found at these links:

http://www.octavian.com/excel.html

http://www.add-ins.com/assistnt.htm

http://j-walk.com/ss/

Page 30: Advanced Excel Technologies In Early Development Applications

[email protected]

Using DDE and OLE

• In addition to third party applications it is also possible to control another application remotely by Excel if it supports DDE or OLE automation.

• Utilizing a third party application is a must for tasks such as:

– Curve Fitting– Generating “Nice” Plots and Graphs– “Higher” Math, FFT’s, Matrices, Complex Numbers– Statistical Functions such as ANOVA– Linear Programming

Page 31: Advanced Excel Technologies In Early Development Applications

[email protected]

DDE Example

Here is an example in which Excel utilizes the Program Origin to Curve Fit some sample Data.

http://www.originlab.com/

Page 32: Advanced Excel Technologies In Early Development Applications

[email protected]

DDE Curve Fit in Origin Initiated from Excel

http://www.originlab.com/

Page 33: Advanced Excel Technologies In Early Development Applications

[email protected]

More Information Available in My Textbook

http://www.crcpress.com/

http://www.pharmalabauto.com/