dbbm cesmg g. paolella ceinge. csi internet ceinge university campus

Click here to load reader

Post on 27-Mar-2015

213 views

Category:

Documents

1 download

Embed Size (px)

TRANSCRIPT

  • Slide 1

DBBM CESMG G. Paolella CEINGE Slide 2 CSI INTERNET CEINGE University Campus Slide 3 CAPRI Image restoration and analysis Comparative Genomics Francesco Salvatore 0503 Research and Services in Bioinformatics Slide 4 - Comparative genomics - DG-CST - KinWeb - Non Coding RNAs - Bacterial - Eukaryotic - Cell motility Research subjects Slide 5 Conserved Sequence Tags (CST) Slide 6 DG-CST Slide 7 DG-CST DB Slide 8 Genome browser Slide 9 KinWeb Slide 10 (a) (b) (c) (d) (e) KinWeb DB Slide 11 Three genes a) b) Ig-IIg-IIIg-IIITMTyr Kinase // CSTs Ser-Thr Kinase CST // Ser-Thr Kinase c) // acb CST IIIIII Slide 12 Selection of homologous chromosome regions from human and mouse genomes. Comparison of selected regions using BLASTZ, a program based on a local similarity algorhitm. Further analysis on the dataset looking for subpopulations sharing specific characteristics, using different programs, such as: - Blast of CSTs vs EST, human and other species genomes - Program for calculation of CPS score (Coding Potential Score) - RNA structure prediction programs Selection of the definitive set of CSTs based on specified thresholds (identity >= 70%; length >= 100 bp) using StrongHits. Insertion of selected CSTs into DB and extensively annotation for: - type (i.e. intergenic, exonic etc.) according to Ensembl - Coding capability according to Ensembl - Distances from other genes and coding regions - Calculation of Log Score according to UCSC comparison of human and mouse genomes Masking sequences of repetitive elements to reduce the noise fatally introduced by repeated sequences through RepeatMasker. Pipeline Slide 13 Pipeline units Slide 14 Non coding RNAs ncRNA DNA transcription reverse transcription Proteins translation mRNA tRNA rRNA Antisense miRNA transcription/maturation snoRNA maturation Self-splicing intron snRNA Imprinting H19, AIR X inactivation XIST Chromatin structure dynamics small RNAs DNA demethylation KHPS1a Slide 15 Bacterial SLSs Slide 16 SLS Families Slide 17 Position in the genome Position Slide 18 Alignment Slide 19 RNAz P = 0.99 PFOLD Secondary structures Slide 20 Processing time Slide 21 4x14x2=112 procs 2.8 GHz 4x14x2=112 GB RAM 2 GB/s per scheda - 4 GB/s aggregata Cluster Slide 22 Bioinfo portal Slide 23 Servizi bioinformatici per la ricerca gia attivi Francesco Salvatore 0503 Circa 100 banche dati di interesse biologico accessibili mediante SRS (sequenze nucleotidiche, genomi, mutazioni, malattie ereditarie, enzimi, etc.) Sistema integrato per analisi di dati biologici con oltre 150 programmi per analisi di sequenze, modelli evolutivi, studio di mutazioni, proteine etc. Banche dati realizzate nellambito di progetti di ricerca (DG-CST, KinWEB, etc.) Sistemi per la gestione di dati sperimentali (campioni biologici, sequenze, immagini da microscopia etc.) Slide 24 Research and services Research and Services in Bioinformatics CAPRI Image restoration and analysis Comparative Genomics Slide 25 CEINGE DBBM IIGB BIOGEM Facolta di Medicina Facolta di Biotecnologie Altre Facolta Pubblico (accesso limitato) Francesco Salvatore 0503 Servizi: chi ha accesso ? Slide 26 WEB SERVER CAPRI SRS PISE Other Emboss Fasta Blast User Data DB Primary remote databases ENSEMBL Services organization Slide 27 Graphic interface to programs Slide 28 CAPRI Slide 29 Various operations in a row: Complement ->Translation -> Isoelectric point of the resulting protein. DNA Complement Translation Isoelectric point CAPRI workflow Slide 30 CGI Plugin Object Pise Plugin Object CLI Simple Programs Plugin Object CURL Base Obj. Plugin Object SOAP Plugin Object JEMBOSS Program Object Tasks Obj. Menu Table Disk Buffering BLAST FASTA EMBOSS HMMer Genscan ClustalW Programmi Dischi del Server Phylip CLIENTSERVER CAPRI Program Object Program Object Legenda Relazione tra oggetti: Uso Eredit Esecuzione programmi Trasferimento dati Relazione temporale CAPRI architecture Slide 31 ClusterCluster Cluster Nodes Access Server Access Server Access Server For each user request, a process is launched on a different node Distributed execution Slide 32 Cluster Broker Web applicatio n server DB server Cluster Manager Cluster Manager 3 Request the status of the cluster 5 - launch the command on the node 1 Run a command 2 Request a node IP 4 Search for the best resource and return the corresponding node IP Relational DB 6 Return the result Cluster activity http Slide 33 Slide 34 Broker virtual node virtual node DB Grid node Slide 35 PROGETTO DI RICERCA -------------- *Cell line *Colture conditions *Fixation and inclusion methods, stainings, ecc *Objective *Focus Position *Stage position x/y *Project title *Experiment name, *Author, group, group leader, ecc. WEB INTERFACE *Exposure time *Resolution, ecc. DB Image archival and management Slide 36 Image-DB interface Slide 37 timelapse at 6 positions timelapse actin wound healing timelapse 2 adhesion actin staining IPROC Slide 38 HPC on Cluster nodes GatewayGateway iPage image area data + images page iPane proc- steps IPROC architecture Slide 39 ClusterCluster Cluster Nodes Access Server Access Server Access Server A tool can require the execution of multiple, simultaneous processes Distributed execution of parallel requests Slide 40 -PHP internal routines (basic drawing, processing) -ImageMagick (more advanced processing) -Image converters -Special tools (PDL, deconvolution) -Tools developed in-house (cell tracking) -...... What software may be linked Slide 41 -Convenient graphic interface -Access to a vast library of image processing steps -No specific interface requirements -Remote processing on parallel hardware -Support for a large number of concurrent users -System independent (works on Mac, PC, Linux etc.) -No need to install. A browser is enough. Advantages