applying ai to human genome part 1 : collecting data prof. m. embrechts robert bress bram heyns
TRANSCRIPT
![Page 1: Applying AI to Human Genome Part 1 : Collecting data Prof. M. Embrechts Robert Bress Bram Heyns](https://reader035.vdocuments.mx/reader035/viewer/2022062516/56649e585503460f94b51518/html5/thumbnails/1.jpg)
Applying AI to Human Genome
Part 1 : Collecting data
Prof. M. EmbrechtsRobert BressBram Heyns
![Page 2: Applying AI to Human Genome Part 1 : Collecting data Prof. M. Embrechts Robert Bress Bram Heyns](https://reader035.vdocuments.mx/reader035/viewer/2022062516/56649e585503460f94b51518/html5/thumbnails/2.jpg)
Overview
Basics of DNA Collecting the data Collection : my application Perl Goal
![Page 3: Applying AI to Human Genome Part 1 : Collecting data Prof. M. Embrechts Robert Bress Bram Heyns](https://reader035.vdocuments.mx/reader035/viewer/2022062516/56649e585503460f94b51518/html5/thumbnails/3.jpg)
Basics of DNA
DNA = polymer of 4 molecules : bases or nucleotides
A = Adenine , C = Cytosine , G = Guanine , T = Thymine Replication ( copying ) and translation ( reading )
=> double helix : AT , GC ( copying ) 3 letter combination = codon RNA : U = Uracil in place of T => Transcribing Protein = polymer composed of 20 amino acids
( reading )=> more complex structure than DNA
![Page 4: Applying AI to Human Genome Part 1 : Collecting data Prof. M. Embrechts Robert Bress Bram Heyns](https://reader035.vdocuments.mx/reader035/viewer/2022062516/56649e585503460f94b51518/html5/thumbnails/4.jpg)
Transition DNA RNA Protein
![Page 5: Applying AI to Human Genome Part 1 : Collecting data Prof. M. Embrechts Robert Bress Bram Heyns](https://reader035.vdocuments.mx/reader035/viewer/2022062516/56649e585503460f94b51518/html5/thumbnails/5.jpg)
Intron – Exon - Splicejunction
• exon 200 characters intron thousands
• 30,000 genes identified out of possible 100,000
• Identification gene patent
![Page 6: Applying AI to Human Genome Part 1 : Collecting data Prof. M. Embrechts Robert Bress Bram Heyns](https://reader035.vdocuments.mx/reader035/viewer/2022062516/56649e585503460f94b51518/html5/thumbnails/6.jpg)
![Page 7: Applying AI to Human Genome Part 1 : Collecting data Prof. M. Embrechts Robert Bress Bram Heyns](https://reader035.vdocuments.mx/reader035/viewer/2022062516/56649e585503460f94b51518/html5/thumbnails/7.jpg)
![Page 8: Applying AI to Human Genome Part 1 : Collecting data Prof. M. Embrechts Robert Bress Bram Heyns](https://reader035.vdocuments.mx/reader035/viewer/2022062516/56649e585503460f94b51518/html5/thumbnails/8.jpg)
![Page 9: Applying AI to Human Genome Part 1 : Collecting data Prof. M. Embrechts Robert Bress Bram Heyns](https://reader035.vdocuments.mx/reader035/viewer/2022062516/56649e585503460f94b51518/html5/thumbnails/9.jpg)
Summary
Human : 23 chromosomes Chromosomes thousands of genes Gene info : exons , comments : introns Exons and introns codons Codon bases
![Page 10: Applying AI to Human Genome Part 1 : Collecting data Prof. M. Embrechts Robert Bress Bram Heyns](https://reader035.vdocuments.mx/reader035/viewer/2022062516/56649e585503460f94b51518/html5/thumbnails/10.jpg)
Datacollection
Human Genome Project NCBI website : http//www.ncbi.nlm.nih.gov Entrez-Nucleotide.htm NCBI Sequence Viewer.htm
![Page 11: Applying AI to Human Genome Part 1 : Collecting data Prof. M. Embrechts Robert Bress Bram Heyns](https://reader035.vdocuments.mx/reader035/viewer/2022062516/56649e585503460f94b51518/html5/thumbnails/11.jpg)
![Page 12: Applying AI to Human Genome Part 1 : Collecting data Prof. M. Embrechts Robert Bress Bram Heyns](https://reader035.vdocuments.mx/reader035/viewer/2022062516/56649e585503460f94b51518/html5/thumbnails/12.jpg)
Datacollection
Human Genome Project NCBI website : http//www.ncbi.nlm.nih.gov Entrez-Nucleotide.htm NCBI Sequence Viewer.htm
![Page 13: Applying AI to Human Genome Part 1 : Collecting data Prof. M. Embrechts Robert Bress Bram Heyns](https://reader035.vdocuments.mx/reader035/viewer/2022062516/56649e585503460f94b51518/html5/thumbnails/13.jpg)
Datacollection : my application
BioBrowser
Download HTML ExtractLinks() Download HTML - data
ExtractData()
TranslateData()
![Page 14: Applying AI to Human Genome Part 1 : Collecting data Prof. M. Embrechts Robert Bress Bram Heyns](https://reader035.vdocuments.mx/reader035/viewer/2022062516/56649e585503460f94b51518/html5/thumbnails/14.jpg)
![Page 15: Applying AI to Human Genome Part 1 : Collecting data Prof. M. Embrechts Robert Bress Bram Heyns](https://reader035.vdocuments.mx/reader035/viewer/2022062516/56649e585503460f94b51518/html5/thumbnails/15.jpg)
Datacollection : my application
BioBrowser
Download HTML ExtractLinks() Download HTML - data
ExtractData()
TranslateData()
![Page 16: Applying AI to Human Genome Part 1 : Collecting data Prof. M. Embrechts Robert Bress Bram Heyns](https://reader035.vdocuments.mx/reader035/viewer/2022062516/56649e585503460f94b51518/html5/thumbnails/16.jpg)
Perl
Practical Extraction and Report Language POD – files -> web Portability Free – CPAN modules String manipilation Extremely powerfull regex-engine Glue language designed for short and simple tasks, not
equal to lack of power or “serious” features
Tutorial : http://www.netcat.co.uk/rob/perl/win32perltut.html
![Page 17: Applying AI to Human Genome Part 1 : Collecting data Prof. M. Embrechts Robert Bress Bram Heyns](https://reader035.vdocuments.mx/reader035/viewer/2022062516/56649e585503460f94b51518/html5/thumbnails/17.jpg)
Regular Expression – Pattern Matching
Practical Extraction and Report Language Scan through data and extract useful
information m/PATTERN/ s/PATTERN/REPLACEMENT/ 1 line Perl = 100 lines C or Java Complex, but easy
![Page 18: Applying AI to Human Genome Part 1 : Collecting data Prof. M. Embrechts Robert Bress Bram Heyns](https://reader035.vdocuments.mx/reader035/viewer/2022062516/56649e585503460f94b51518/html5/thumbnails/18.jpg)
Regex examples
/[KCZ]arl^sa/ /<I>/(.*?)<\/I>/i $1,$2,… i , g , c , … . , * , + , ? /([0-9a-zA-Z])+/ or /([\w])+/ s/us[^a-z]/them/g or s/us\W/them/g /([acc|act][ttt|ttc|att])/ TIMTOWTDT
![Page 19: Applying AI to Human Genome Part 1 : Collecting data Prof. M. Embrechts Robert Bress Bram Heyns](https://reader035.vdocuments.mx/reader035/viewer/2022062516/56649e585503460f94b51518/html5/thumbnails/19.jpg)
Part 2 : Applying AI
Our choice : evolutionary computing First part : identify exon part Second part : identify splicejunctions Third part : combine previous parts Hope to reach +90% accuracy
![Page 20: Applying AI to Human Genome Part 1 : Collecting data Prof. M. Embrechts Robert Bress Bram Heyns](https://reader035.vdocuments.mx/reader035/viewer/2022062516/56649e585503460f94b51518/html5/thumbnails/20.jpg)
Questions
?