knime and command line tools - the best of two … different views of a command line node 15...

20
KNIME and Command Line Tools - The Best of Two Worlds Man-Ling Lee KNIME User Group Meeting 03/07/2013

Upload: phamduong

Post on 18-Mar-2018

219 views

Category:

Documents


5 download

TRANSCRIPT

Page 1: KNIME and Command Line Tools - The Best of Two … Different Views Of A Command Line Node 15 Vertical listing of columns allow sorting by column names Display the “compiled” Unix

KNIME and Command Line Tools- The Best of Two Worlds

Man-Ling Lee

KNIME User Group Meeting 03/07/2013

Page 2: KNIME and Command Line Tools - The Best of Two … Different Views Of A Command Line Node 15 Vertical listing of columns allow sorting by column names Display the “compiled” Unix

M. Lee KNIME User Group Meeting 2013

Autocorrelator package A. Gobbi, M. Lardy, M. Lee http://code.google.com/p/autocorrelator/

Aestel package M. Lee, A. Gobbi http://sourceforge.net/projects/aestel/

Genentech package A. Gobbi, J. Feng, M. Lee, B. Seller

Genentech’s Command Line Tools 2

Data Manipulation Chemical Structure Properties Diversit ysdf2HtmlTab.csh sdfMDLSSSMatcher.csh sdf2name.py sdfCFP.csh

sdf2Tab.csh sdfEnumerator.csh sdf2Omega.py sdfFingerprinter.csh

sdfAggregator.csh sdfNormalizer.csh sdfCalcProps.py sdfCluster.pl

sdfAlign.csh sdfSmarsGrep.csh sdfCNS_MPO.grvy sdfFPCluster.pl

sdfBinning.csh sdfStructureTagger.csh sdfGiniCalculator.csh sdfFPNNFinder.csh

sdfDataPivoter.csh sdfSubRMSD.csh sdfRingSystemExtraction.csh sdfMCSSNNFinder.csh

sdfFromImage.py sdfTransformer.csh sdfSelectivityCalculator.csh sdfFPSphereExclusion.csh

sdfSdfMerger.csh sdfMCSSSphereExclusion.csh

sdfSliceByRe.pl Database Modeling OthersdfSorter.csh AEREAExporter.csh sdfRRandomForestCreator sdfGroovy.csh

sdfSplicer.csh dataLoader.csh sdfRSVMCreator sdfPPilot.pl

sdfTabMerger.csh sdfExport.pl sdfRModelPredictor sdfSubRMSD.csh

sdfTagTool.csh sdfSdfExporter.csh sdfMultiplexer.pl

tab2sdf.csh sdfUserData.pl

tabTabMerger.pl tabExport.pl

Page 3: KNIME and Command Line Tools - The Best of Two … Different Views Of A Command Line Node 15 Vertical listing of columns allow sorting by column names Display the “compiled” Unix

M. Lee KNIME User Group Meeting 2013

AEREA UI for Compilation of SQL Statements 3

Specify compounds of interest

+Display data of interest

ComponentsCommercial Compounds

Page 4: KNIME and Command Line Tools - The Best of Two … Different Views Of A Command Line Node 15 Vertical listing of columns allow sorting by column names Display the “compiled” Unix

M. Lee KNIME User Group Meeting 2013

AEREA UI for Compilation of SQL Statements 4

Table report with compounds and data of interest

A

K

J

I

H

G

F

E

D

C

B

Page 5: KNIME and Command Line Tools - The Best of Two … Different Views Of A Command Line Node 15 Vertical listing of columns allow sorting by column names Display the “compiled” Unix

M. Lee KNIME User Group Meeting 2013

Command Line Program for Retrieving Data from Datab ase 5

AEREAExporter.csh -uName manle \

-treeName smdiRes \

-hitlistLevel "Base Compound" \

-qName “demo_searchQuery" \

-rName “demo_reportTemplate” \

-out .sdf

☺ AEREA enables users to retrieve data from the database☺ No need to know about SQL and data model☺ The data export function can be called from command line

Page 6: KNIME and Command Line Tools - The Best of Two … Different Views Of A Command Line Node 15 Vertical listing of columns allow sorting by column names Display the “compiled” Unix

M. Lee KNIME User Group Meeting 2013

A Unix Pipe Example 6

( AEREAExporter.csh -out .sdf -hitlistLevel "Base Compound" -uName manle \

-qName "project cmpds" -rName “report" -treeName smdiRes \

;AEREAExporter.csh -out .sdf -hitlistLevel "Base Compound" -uName albertgo \

-qName "tested in assay" -rName “report" -treeName smdiRes \

) | sdfTagTool.csh -in .sdf -out .sdf -rmRepeatTag 'G-Number=1‘ \

| sdfTagTool.csh -in .sdf -out .sdf -rename rename_list.txt \

| sdfGiniCalculator.csh -in .sdf -out .sdf -idField "G-Number" -conc 1 \

| sdfStructureTagger.csh -in .sdf -out .sdf -smarts projectCore_SMARTS.tab \

-sets projectCores -tag_info "firstTag" \

| sdfSelectivityCalculator.csh -in .sdf -out .sdf \

-denominator "Target-1 Ki" –nominator "Target-2 Ki"\

-outputMode separate -selectivity "Target-1/Target-2" \

| sdfGroovy.csh -in .sdf -out .sdf –f calcLigandEfficiency.grvy \

| sdfTabMerger.csh -sdf .sdf –out .sdf \

-tab $userDump/PK_data.tab -mergeMode multiRecordKeepTemplate \

-mergeTag "G-Number" -mergeCol "G-Number“ -quiet \

| sdfTagTool.csh -in .sdf -out.sdf -reorder reorder_list.txt \

> projectVortex_SAR.sdf

Page 7: KNIME and Command Line Tools - The Best of Two … Different Views Of A Command Line Node 15 Vertical listing of columns allow sorting by column names Display the “compiled” Unix

M. Lee KNIME User Group Meeting 2013

( AEREAExporter.csh -out .sdf -hitlistLevel "Base Compound" -uName manle \

-qName "project cmpds" -rName “report" -treeName smdiRes \

;AEREAExporter.csh -out .sdf -hitlistLevel "Base Compound" -uName albertgo \

-qName "tested in assay" -rName “report" -treeName smdiRes \

) | sdfTagTool.csh -in .sdf -out .sdf -rmRepeatTag 'G-Number=1‘ \

| sdfTagTool.csh -in .sdf -out .sdf -rename rename_list.txt \

| sdfGiniCalculator.csh -in .sdf -out .sdf -idField "G-Number" -conc 1 \

| sdfStructureTagger.csh -in .sdf -out .sdf -smarts projectCore_SMARTS.tab \

-sets projectCores -tag_info "firstTag" \

| sdfSelectivityCalculator.csh -in .sdf -out .sdf \

-denominator "Target-1 Ki" –nominator "Target-2 Ki"\

-outputMode separate -selectivity "Target-1/Target-2" \

| sdfGroovy.csh -in .sdf -out .sdf –f calcLigandEfficiency.grvy \

| sdfTabMerger.csh -sdf .sdf –out .sdf \

-tab $userDump/PK_data.tab -mergeMode multiRecordKeepTemplate \

-mergeTag "G-Number" -mergeCol "G-Number“ -quiet \

| sdfTagTool.csh -in .sdf -out.sdf -reorder reorder_list.txt \

> projectVortex_SAR.sdf

A Unix Pipe Example 7

…not convenient to debug UNIX pipes

Page 8: KNIME and Command Line Tools - The Best of Two … Different Views Of A Command Line Node 15 Vertical listing of columns allow sorting by column names Display the “compiled” Unix

M. Lee KNIME User Group Meeting 2013

2011: Imagine Using KNIME to Debug Unix Pipes 8

AEREAExporter

AEREAExporter

sdfTagTool-rmRepeatTag sdfGiniCalculator

Unix ScriptCompiler

☺ Pass Unix command syntax through the pipe☺ Compiler node writes out Unix script☺ Provide a node for inline execution via ssh to be

used for debugging, data checking, etc.

Page 9: KNIME and Command Line Tools - The Best of Two … Different Views Of A Command Line Node 15 Vertical listing of columns allow sorting by column names Display the “compiled” Unix

M. Lee KNIME User Group Meeting 2013

2012: Results of Three Days Programming at KNIME. com 9

☺ Created 51 “Command Line” Nodes☺ “Command Line” ports handle

• Unix command text• Data in SD file format

☺ “Command Line” Node Categories:

• Generator:

• Processor

• Consumer

• Others

Page 10: KNIME and Command Line Tools - The Best of Two … Different Views Of A Command Line Node 15 Vertical listing of columns allow sorting by column names Display the “compiled” Unix

M. Lee KNIME User Group Meeting 2013

XML file with definition of the command line programs• Generate one node per <command>element

• Deduce ports from <ports> element

• During startup of knime/eclipse node set are initialized

<commands><config>

<exchangeDir local='\\resfiles….' remote='/gnet/…'/><ssh remoteHost='rescomp2' timeout='1000'

initFileTemplate='~cdduser/bin/knimerc.$mode'/></config>

<command name='AEREAExporter.csh'><IO out="-out .sdf"/><default>-hitlistLevel 'Base Compound' -uName XXXX

-rName 'XVortex Project Subst‘ ….</default><ports out='sdf'/>

</ command>

<command name='sdfGrep.pl'><IO in="" out=""/><default>-i GNum</default><ports in='sdf' out='sdf'/>

</ command>…

Dynamic Command Line Node Configuration 10

Page 11: KNIME and Command Line Tools - The Best of Two … Different Views Of A Command Line Node 15 Vertical listing of columns allow sorting by column names Display the “compiled” Unix

M. Lee KNIME User Group Meeting 2013

Dynamic Command Line Node Generation 11

Page 12: KNIME and Command Line Tools - The Best of Two … Different Views Of A Command Line Node 15 Vertical listing of columns allow sorting by column names Display the “compiled” Unix

M. Lee KNIME User Group Meeting 2013

Dynamic Command Line Node Generation 12

Page 13: KNIME and Command Line Tools - The Best of Two … Different Views Of A Command Line Node 15 Vertical listing of columns allow sorting by column names Display the “compiled” Unix

M. Lee KNIME User Group Meeting 2013

Configure the Command Line Node 13

Copy and paste the text from the Node Description window

Page 14: KNIME and Command Line Tools - The Best of Two … Different Views Of A Command Line Node 15 Vertical listing of columns allow sorting by column names Display the “compiled” Unix

M. Lee KNIME User Group Meeting 2013

Setting up SSH Connection 14

• All Generator nodes have additionally the “SSH Connection” configuration tab• Mode: Switch between production and development environment on Unix host• Remote Directory: specification will result in execution of the “cd <directory>”• Execute in each node: Enable compilation of the command text only mode and

execution of the assemble command once.

Page 15: KNIME and Command Line Tools - The Best of Two … Different Views Of A Command Line Node 15 Vertical listing of columns allow sorting by column names Display the “compiled” Unix

M. Lee KNIME User Group Meeting 2013

The Different Views Of A Command Line Node 15

Vertical listing of columns allow sorting

by column names

Display the “compiled” Unix command up to

the current node

Page 16: KNIME and Command Line Tools - The Best of Two … Different Views Of A Command Line Node 15 Vertical listing of columns allow sorting by column names Display the “compiled” Unix

M. Lee KNIME User Group Meeting 2013

Unix Pipe compilation – Past, Present, and Future 16

Creating this workflow

took about one hour

• Pipe command manually

copied into a csh script

• Future: Automated

insertion with KNIME

Page 17: KNIME and Command Line Tools - The Best of Two … Different Views Of A Command Line Node 15 Vertical listing of columns allow sorting by column names Display the “compiled” Unix

M. Lee KNIME User Group Meeting 2013

Generate Project Vortex Files 17

Comp chemists provide Vortex sessions to their projects.The Unix scripts are run on the cluster once a day using cron.

Page 18: KNIME and Command Line Tools - The Best of Two … Different Views Of A Command Line Node 15 Vertical listing of columns allow sorting by column names Display the “compiled” Unix

M. Lee KNIME User Group Meeting 2013

Take Home Messages 18

☺ KNIME as a Unix pipe editor• Facilitate the maintenance of Unix pipes

☺ Converter nodes allow integration with other KNIME nodes

☺ Dynamic KNIME node generation• Rapid integration of new command line programs

Future Directions� Assemble Unix scripts within KNIME workbench� Handle tab-delimited files� Provide an easy way to run commands in parallel� Convince software vendors to offer programs that

• Run from the command line• Read data from stdin and release data to stdout

Page 19: KNIME and Command Line Tools - The Best of Two … Different Views Of A Command Line Node 15 Vertical listing of columns allow sorting by column names Display the “compiled” Unix

M. Lee KNIME User Group Meeting 2013

Acknowledgements 19

Command Line NodesAlberto GobbiThomas Gabriel (KNIME.com)Bernd Wieswedel (KNIME.com)

Command Line ProgramsJW FengAlberto GobbiBenjamin Sellers

Matthew Lardy (Takeda San Diego)

Chemical Computing Group, OpenEye & Schroedinger

SupportJeff BlaneyMichael Berthold (KNIME.com)

Slaton Lipscomb

CompChem/Cheminformatics Group

Page 20: KNIME and Command Line Tools - The Best of Two … Different Views Of A Command Line Node 15 Vertical listing of columns allow sorting by column names Display the “compiled” Unix