knime and command line tools - the best of two … different views of a command line node 15...
TRANSCRIPT
KNIME and Command Line Tools- The Best of Two Worlds
Man-Ling Lee
KNIME User Group Meeting 03/07/2013
M. Lee KNIME User Group Meeting 2013
Autocorrelator package A. Gobbi, M. Lardy, M. Lee http://code.google.com/p/autocorrelator/
Aestel package M. Lee, A. Gobbi http://sourceforge.net/projects/aestel/
Genentech package A. Gobbi, J. Feng, M. Lee, B. Seller
Genentech’s Command Line Tools 2
Data Manipulation Chemical Structure Properties Diversit ysdf2HtmlTab.csh sdfMDLSSSMatcher.csh sdf2name.py sdfCFP.csh
sdf2Tab.csh sdfEnumerator.csh sdf2Omega.py sdfFingerprinter.csh
sdfAggregator.csh sdfNormalizer.csh sdfCalcProps.py sdfCluster.pl
sdfAlign.csh sdfSmarsGrep.csh sdfCNS_MPO.grvy sdfFPCluster.pl
sdfBinning.csh sdfStructureTagger.csh sdfGiniCalculator.csh sdfFPNNFinder.csh
sdfDataPivoter.csh sdfSubRMSD.csh sdfRingSystemExtraction.csh sdfMCSSNNFinder.csh
sdfFromImage.py sdfTransformer.csh sdfSelectivityCalculator.csh sdfFPSphereExclusion.csh
sdfSdfMerger.csh sdfMCSSSphereExclusion.csh
sdfSliceByRe.pl Database Modeling OthersdfSorter.csh AEREAExporter.csh sdfRRandomForestCreator sdfGroovy.csh
sdfSplicer.csh dataLoader.csh sdfRSVMCreator sdfPPilot.pl
sdfTabMerger.csh sdfExport.pl sdfRModelPredictor sdfSubRMSD.csh
sdfTagTool.csh sdfSdfExporter.csh sdfMultiplexer.pl
tab2sdf.csh sdfUserData.pl
tabTabMerger.pl tabExport.pl
M. Lee KNIME User Group Meeting 2013
AEREA UI for Compilation of SQL Statements 3
Specify compounds of interest
+Display data of interest
ComponentsCommercial Compounds
M. Lee KNIME User Group Meeting 2013
AEREA UI for Compilation of SQL Statements 4
Table report with compounds and data of interest
A
K
J
I
H
G
F
E
D
C
B
M. Lee KNIME User Group Meeting 2013
Command Line Program for Retrieving Data from Datab ase 5
AEREAExporter.csh -uName manle \
-treeName smdiRes \
-hitlistLevel "Base Compound" \
-qName “demo_searchQuery" \
-rName “demo_reportTemplate” \
-out .sdf
☺ AEREA enables users to retrieve data from the database☺ No need to know about SQL and data model☺ The data export function can be called from command line
M. Lee KNIME User Group Meeting 2013
A Unix Pipe Example 6
( AEREAExporter.csh -out .sdf -hitlistLevel "Base Compound" -uName manle \
-qName "project cmpds" -rName “report" -treeName smdiRes \
;AEREAExporter.csh -out .sdf -hitlistLevel "Base Compound" -uName albertgo \
-qName "tested in assay" -rName “report" -treeName smdiRes \
) | sdfTagTool.csh -in .sdf -out .sdf -rmRepeatTag 'G-Number=1‘ \
| sdfTagTool.csh -in .sdf -out .sdf -rename rename_list.txt \
| sdfGiniCalculator.csh -in .sdf -out .sdf -idField "G-Number" -conc 1 \
| sdfStructureTagger.csh -in .sdf -out .sdf -smarts projectCore_SMARTS.tab \
-sets projectCores -tag_info "firstTag" \
| sdfSelectivityCalculator.csh -in .sdf -out .sdf \
-denominator "Target-1 Ki" –nominator "Target-2 Ki"\
-outputMode separate -selectivity "Target-1/Target-2" \
| sdfGroovy.csh -in .sdf -out .sdf –f calcLigandEfficiency.grvy \
| sdfTabMerger.csh -sdf .sdf –out .sdf \
-tab $userDump/PK_data.tab -mergeMode multiRecordKeepTemplate \
-mergeTag "G-Number" -mergeCol "G-Number“ -quiet \
| sdfTagTool.csh -in .sdf -out.sdf -reorder reorder_list.txt \
> projectVortex_SAR.sdf
M. Lee KNIME User Group Meeting 2013
( AEREAExporter.csh -out .sdf -hitlistLevel "Base Compound" -uName manle \
-qName "project cmpds" -rName “report" -treeName smdiRes \
;AEREAExporter.csh -out .sdf -hitlistLevel "Base Compound" -uName albertgo \
-qName "tested in assay" -rName “report" -treeName smdiRes \
) | sdfTagTool.csh -in .sdf -out .sdf -rmRepeatTag 'G-Number=1‘ \
| sdfTagTool.csh -in .sdf -out .sdf -rename rename_list.txt \
| sdfGiniCalculator.csh -in .sdf -out .sdf -idField "G-Number" -conc 1 \
| sdfStructureTagger.csh -in .sdf -out .sdf -smarts projectCore_SMARTS.tab \
-sets projectCores -tag_info "firstTag" \
| sdfSelectivityCalculator.csh -in .sdf -out .sdf \
-denominator "Target-1 Ki" –nominator "Target-2 Ki"\
-outputMode separate -selectivity "Target-1/Target-2" \
| sdfGroovy.csh -in .sdf -out .sdf –f calcLigandEfficiency.grvy \
| sdfTabMerger.csh -sdf .sdf –out .sdf \
-tab $userDump/PK_data.tab -mergeMode multiRecordKeepTemplate \
-mergeTag "G-Number" -mergeCol "G-Number“ -quiet \
| sdfTagTool.csh -in .sdf -out.sdf -reorder reorder_list.txt \
> projectVortex_SAR.sdf
A Unix Pipe Example 7
…not convenient to debug UNIX pipes
M. Lee KNIME User Group Meeting 2013
2011: Imagine Using KNIME to Debug Unix Pipes 8
AEREAExporter
AEREAExporter
sdfTagTool-rmRepeatTag sdfGiniCalculator
Unix ScriptCompiler
☺ Pass Unix command syntax through the pipe☺ Compiler node writes out Unix script☺ Provide a node for inline execution via ssh to be
used for debugging, data checking, etc.
M. Lee KNIME User Group Meeting 2013
2012: Results of Three Days Programming at KNIME. com 9
☺ Created 51 “Command Line” Nodes☺ “Command Line” ports handle
• Unix command text• Data in SD file format
☺ “Command Line” Node Categories:
• Generator:
• Processor
• Consumer
• Others
M. Lee KNIME User Group Meeting 2013
XML file with definition of the command line programs• Generate one node per <command>element
• Deduce ports from <ports> element
• During startup of knime/eclipse node set are initialized
<commands><config>
<exchangeDir local='\\resfiles….' remote='/gnet/…'/><ssh remoteHost='rescomp2' timeout='1000'
initFileTemplate='~cdduser/bin/knimerc.$mode'/></config>
<command name='AEREAExporter.csh'><IO out="-out .sdf"/><default>-hitlistLevel 'Base Compound' -uName XXXX
-rName 'XVortex Project Subst‘ ….</default><ports out='sdf'/>
</ command>
<command name='sdfGrep.pl'><IO in="" out=""/><default>-i GNum</default><ports in='sdf' out='sdf'/>
</ command>…
Dynamic Command Line Node Configuration 10
M. Lee KNIME User Group Meeting 2013
Dynamic Command Line Node Generation 11
M. Lee KNIME User Group Meeting 2013
Dynamic Command Line Node Generation 12
M. Lee KNIME User Group Meeting 2013
Configure the Command Line Node 13
Copy and paste the text from the Node Description window
M. Lee KNIME User Group Meeting 2013
Setting up SSH Connection 14
• All Generator nodes have additionally the “SSH Connection” configuration tab• Mode: Switch between production and development environment on Unix host• Remote Directory: specification will result in execution of the “cd <directory>”• Execute in each node: Enable compilation of the command text only mode and
execution of the assemble command once.
M. Lee KNIME User Group Meeting 2013
The Different Views Of A Command Line Node 15
Vertical listing of columns allow sorting
by column names
Display the “compiled” Unix command up to
the current node
M. Lee KNIME User Group Meeting 2013
Unix Pipe compilation – Past, Present, and Future 16
Creating this workflow
took about one hour
• Pipe command manually
copied into a csh script
• Future: Automated
insertion with KNIME
M. Lee KNIME User Group Meeting 2013
Generate Project Vortex Files 17
Comp chemists provide Vortex sessions to their projects.The Unix scripts are run on the cluster once a day using cron.
M. Lee KNIME User Group Meeting 2013
Take Home Messages 18
☺ KNIME as a Unix pipe editor• Facilitate the maintenance of Unix pipes
☺ Converter nodes allow integration with other KNIME nodes
☺ Dynamic KNIME node generation• Rapid integration of new command line programs
Future Directions� Assemble Unix scripts within KNIME workbench� Handle tab-delimited files� Provide an easy way to run commands in parallel� Convince software vendors to offer programs that
• Run from the command line• Read data from stdin and release data to stdout
M. Lee KNIME User Group Meeting 2013
Acknowledgements 19
Command Line NodesAlberto GobbiThomas Gabriel (KNIME.com)Bernd Wieswedel (KNIME.com)
Command Line ProgramsJW FengAlberto GobbiBenjamin Sellers
Matthew Lardy (Takeda San Diego)
Chemical Computing Group, OpenEye & Schroedinger
SupportJeff BlaneyMichael Berthold (KNIME.com)
Slaton Lipscomb
CompChem/Cheminformatics Group