david goldberg cs 1950 directed study

12
David Goldberg CS 1950 Directed Study

Upload: signa

Post on 21-Jan-2016

28 views

Category:

Documents


0 download

DESCRIPTION

David Goldberg CS 1950 Directed Study. RNA Sequence. Exon. Down Intron. Up Intron. GATTACACATGCCGTAG. CCCACTCCATGATTACAC. CATGCCGTAGCTCATGCC. GCCACGTCTTTTGCTCTTTGCAGGATTACATCACTGGAAACTTTAGCCACGTAAACTTTA. Pattern 1:ACATCAC Pattern 2:ACGT. Desired Upgrades. Current Program: - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: David Goldberg CS 1950 Directed Study

David GoldbergCS 1950 Directed Study

Page 2: David Goldberg CS 1950 Directed Study

RNA Sequence

Up IntronExon Down Intron

CCCACTCCATGATTACAC

CATGCCGTAGCTCATGCC

GATTACACATGCCGTAG

GCCACGTCTTTTGCTCTTTGCAGGATTACATCACTGGAAACTTTAGCCACGTAAACTTTA

Pattern 1:ACATCAC Pattern 2:ACGT

Page 3: David Goldberg CS 1950 Directed Study

Desired Upgrades

Page 4: David Goldberg CS 1950 Directed Study

Possible Problems

• Programming in Perl• Extensive Use of Regular Expressions• Trouble figuring out exactly what is needed to

be done• Don’t know if what we want to be done can be

done

Page 5: David Goldberg CS 1950 Directed Study

Old Program:• Command line arguments only (2 patterns)• Cannot Use Y or R or N• Only Checks Human RNA for patterns• Has static search length • Result file displays Human RNA id, mouse RNA id, and last 75 characters.• Only Searches Down IntronNew Program:• Prompts user for inputs:

–Path of database with default– 2 patterns–Minimum and Maximum distance between patterns– Searches from either 3’ splice site(beginning) or 5’ splice site(end)– Length from beginning or end to search–Which part to search(down intron, exon, up intron)

• Will find matches in either the Human RNA, Mouse RNA or both• Result file displays Human RNA id, Mouse RNA id, sequence searched, 1st

pattern found, sequence in between 1st pattern and 2nd pattern, and 2nd pattern

Page 6: David Goldberg CS 1950 Directed Study

Old Program Results FilePattern1=ACG Pattern2=TThumanIDmouseID

ENSG00000124721_61 ENSMUSG00000033826_64 CENA GTAAGTTTTTATTTTTATTTATATCTACGTAGAAAGAGTTCCTTATTTAAAGGTGCTTAGTTTGCCTTCTCTGAT

ENSG00000113569_8 ENSMUSG00000022142_8 CENA GTAAGTAGAAAACAATAAATTTGGCAAGTACAACTAATTTCTAACACATTGTTCCCTCAACGTTTTCTTCAGAAA

ENSG00000105323_14 ENSMUSG00000040725_13 CENA GTGAGAGAATGAGTGTGTGTTTGTATGTAGTGATCGCACGTGTGCTTTTGAACCTGAGCAAGTTAGGTGGAGGCG

...

Page 7: David Goldberg CS 1950 Directed Study

New Program Results FilePattern1=ACG Pattern2=TT Search=up SITE=3'humanIDmouseIDENSG00000134690_4 SE CTACAACGTTCTTTTTAAAG ACG TTENSMUSG00000028873_3 SE Not Found

ENSMUSG00000026954_6 CENA TTTTATTCATACGCTTACAG ACG C TTENSG00000115145_5 CENA Not Found

ENSG00000124721_67 CENA CCACGTCTTCTTCTTTTCAG ACG TC TTENSMUSG00000033826_70 CENA Not Found

ENSG00000052126_20 CENA ACGTTTTCTAATATTCCCAG ACG TTENSMUSG00000030231_11 CENA Not Found

ENSG00000138468_2 SE CACGTCTTTGGTTTTTGTAG ACG TC TTENSMUSG00000022591_2 SE TACGTCTTTCATTTTTGTAG ACG TC TT

ENSG00000151376_4 CENA ACGTGTTTTATTTCTTTTAG ACG TG TTENSMUSG00000030621_4 CENA Not Found...

Page 8: David Goldberg CS 1950 Directed Study

Exon, Intron Program

• Wanted a program that searched the end of the down intron and beginning of the exon.

• The first pattern would be in the intron.• The second pattern would be in the exon.• Exon usually start with a GT pattern so if it

starts with that it should ignore that part in the pattern matching, but if the GT is not present it should still try to match the 2 patterns.

Page 9: David Goldberg CS 1950 Directed Study

RNA Sequence

Up IntronExon Down Intron

CCCACTCCATGATTACAC

CATGCCGTAGCTCATGCC

GATTACACATGCCGTAG

ACTCCATGATTACAC GATTACACATG

Pattern 1:GATT Pattern 2:ACAT

Page 10: David Goldberg CS 1950 Directed Study

Exon, Intron Program•Prompts user for inputs:–Path of database with default–2 patterns–Minimum distance between patterns–Will find matches in either the Human RNA, Mouse RNA or

both•Result file displays Human RNA id, Mouse RNA id, small

part of the down intron before first pattern, 1st pattern found, sequence in between 1st pattern and end of down intron, the GT sequence if it was at the start of the exon, the beginning of the exon until the 2nd pattern, and 2nd pattern, small part of the exon after the 2nd pattern, the length of the pattern in between the 1st pattern and the end of the intron, the length of the pattern between the start of the exon and the 2nd pattern.

Page 11: David Goldberg CS 1950 Directed Study

RNA Sequence

Up IntronExon Down Intron

CCCACTCCATGATTACAC

CATGCCGTAGCTCATGCCGTATTACACATGCCGTA

Pattern 1:GATT Pattern 2:ACAT

ACTCCATGATTACAC GTATTACACATGCCGTA

54

Page 12: David Goldberg CS 1950 Directed Study

Exon, Intron Program Results FilePattern1=ACTG Pattern2=TTAC Max Space:15humanID mouseIDENSG00000163872_13 ENSMUSG00000041215_12 CENA TGTAACATCT

ACTG TCAAG GT AACATTC TTAC TGCGTT 5 7ENSG00000135390_3 ENSMUSG00000010371_3 CENA GGAGAT ACTG

ACAGATGAG GT ACC TTAC AGTGGAGTTG 9 3ENSG00000103876_8 ENSMUSG00000030630_8 CENA CTTATGAACG

ACTG GAGTG GT AA TTAC TGGAGCTCTGC 5 2ENSG00000156253_3 ENSMUSG00000041079_3 SE TGCCTGAAATT

ACTG TCAG GT ACG TTAC AGAAGCTCTG 4 3ENSG00000151490_18 ENSMUSG00000030223_18 CENA AGAAGAGGAA

ACTG ACAAA GT AAGTTTTTC TTAC TATG 5 9...