practicum pressentation pdf

14
Predicting lncRNA Transcripts Out of Comprehensive Rat Renal Cell type-specific Transcriptome Libraries Gui Chen 11/20/2015

Upload: gui-chen

Post on 06-Apr-2017

59 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: Practicum Pressentation PDF

Predicting lncRNA Transcripts Out of Comprehensive Rat Renal Cell type-specific Transcriptome Libraries

Gui Chen 11/20/2015

Page 2: Practicum Pressentation PDF

WHY LONG NON CODING RNA?➤ Many long non-coding transcripts

(lncRNAs) function in a variety of responses which include differentiation, cell cycle, and maintenance of stem-cell like phenotypes, and are cell-type specific in their expression. Yet, very little is known about their regulation or roles in disease states.

➤ A newly established rat renal gene expression database and recently assembled rn6 genome sequecne have paved a way for us to conduct such study.

Page 3: Practicum Pressentation PDF

WHAT IS EXACTLY THE DATA SOURCE?

➤ 110(renal tubule segments) + 5(glomeruli) renal cell-type specific gene expression profiles as a product of work described in the paper shown left.

➤ 7 polyadenylated mRNA-seq(PA-seq) & cortical collecting duct(4 control rat and 4 water loaded rat)

➤ Totally 125 libraries

Page 4: Practicum Pressentation PDF
Page 5: Practicum Pressentation PDF

WHAT IS THE FORMAT OF THE DATA➤ Original transcripts data are stored in

GTF format which is a flat tab-delimited file format that can be directly loaded into excel.

➤ Next is a real case example of what GTF records looks like.

Page 6: Practicum Pressentation PDF

GTF FILE EXAMPLE

Page 7: Practicum Pressentation PDF

How can we pick out those transcripts that potentially are long non coding RNA transcripts from thousands of transcripts?

1. What are the characteristics of lncRNA from preliminary data and experience?

➤ Less conserved than protein-coding genes.(PhyloCSF)

➤ A much shorter ORF(open reading frame) than that of genes(they don’t necessarily have, if have, have one short and by chance or they are originally genes?)

➤ When forcely translated into protein, there is no counterpart in nr database(none redundant protein database).(Blastx)

➤ They are consistently and significantly expressed at least in one type of cell.

2. Extract records satisfying all the characteristics above.

A pipeline is established based on this idea.

Page 8: Practicum Pressentation PDF

Theoretically the pipeline works like this…

➤ The biggest circle represents the whole searching space.

➤ small rectangles inside the big circle represent subset of records in the whole searching space, which satisfy certain lncRNA charateristic.

➤ The intersection of all the small rectangles representing the predicted set of lncRNA transcripts.

all the transcripts

less conserved ones

no counterpart in nrdatabase

short ORF

true positive expression

Predicted lncRNAs

Page 9: Practicum Pressentation PDF

What do we get by each step? (take multiexon transcripts as examples)

➤ Find transcripts with short ORF(length < 150)

Because each record in fasta file contains two rows, there are actually n/2 records.

Page 10: Practicum Pressentation PDF

What do we get by each step? (take multiexon transcripts as examples)

➤ Find transcripts with no counterpart in nr database(E-value threshold > 10E-4 )

Page 11: Practicum Pressentation PDF

What do we get by each step? (take multiexon transcripts as examples)

➤ Find transcripts are consistently and significantly expressed for all replicates in at least one type of cell (fpkm > 0.1)

Page 12: Practicum Pressentation PDF

Classification of lncRNAs

➤ sense and antisense lncRNAs

➤ sense lncRNAs can be classified into intergenic, cons, incs, ponds lncRNAs

Page 13: Practicum Pressentation PDF

RESULT

Page 14: Practicum Pressentation PDF

THANK YOU& Happy Thanksgiving!