2015/2016. 1.non-small cell lung cancer (nsclc) 1.1 adenocarcinomas are often found in an outer area...

9
METHODS AND APPLICATIONS OF ARTIFICIAL INTELLIGENCE RADIAL BASIS NETWORKS APLICATION EGFR GENE MUTATIONS IDENTIFIER Prof. dr Zikrija Avdagić, dipl.ing.el. [email protected] 2015/2016

Upload: jennifer-pitts

Post on 17-Jan-2016

214 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 2015/2016. 1.NON-SMALL CELL LUNG CANCER (NSCLC) 1.1 Adenocarcinomas are often found in an outer area of the lung. 1.2 Squamous cell carcinomas are usually

METHODS AND APPLICATIONS OF ARTIFICIAL INTELLIGENCE

RADIAL BASIS NETWORKS APLICATION

EGFR GENE MUTATIONS IDENTIFIER

Prof. dr Zikrija Avdagić, [email protected]

2015/2016

Page 2: 2015/2016. 1.NON-SMALL CELL LUNG CANCER (NSCLC) 1.1 Adenocarcinomas are often found in an outer area of the lung. 1.2 Squamous cell carcinomas are usually

1.NON-SMALL CELL LUNG CANCER (NSCLC)1.1 Adenocarcinomas are often found in an outer area of the lung.1.2 Squamous cell carcinomas are usually found in the center of the lung next to an air tube (bronchus).(1.3) Large cell carcinomas can occur in any part of the lung. They tend to grow and spread faster than the other two types.)

2. SMALL CELL LUNG CANCER(SCLC)2.1 Small cell carcinoma (oat cell cancer).2.2 Combined small cell carcinoma.

3. CARCINOIDS (COIDS) form a distinct histologic tumor subtype

LUNG CANCER

Coid

Page 3: 2015/2016. 1.NON-SMALL CELL LUNG CANCER (NSCLC) 1.1 Adenocarcinomas are often found in an outer area of the lung. 1.2 Squamous cell carcinomas are usually

EGFR mutations

The tyrosine kinase (TK) domain is a region on the EGFR gene which is prone to mutation in patients with NSLC . The TK domain has 7 exons (exons 18-24), of which exons 8-21 carry somatic mutations in patients with NSCLC .

Page 4: 2015/2016. 1.NON-SMALL CELL LUNG CANCER (NSCLC) 1.1 Adenocarcinomas are often found in an outer area of the lung. 1.2 Squamous cell carcinomas are usually

The data about microdeletions on the EGFR genes was collected from an online database , and the NCBI database nucleotide sequence NG_007726 was used to build the training data sets.The transformed statistical table is given in Table, and was used for generation of mutated exons’ sequences. For this purpose was developed an algorithm “ approxsimative predictor” in MATLAB , which can generate the required number of mutated exons based on a sample of healthy exons, statistical data on the type of mutation, span of nucleotides that are affected, and the number of patients with this type of mutation shown in Table .

Transformed statistical table in MATLAB software package

Page 5: 2015/2016. 1.NON-SMALL CELL LUNG CANCER (NSCLC) 1.1 Adenocarcinomas are often found in an outer area of the lung. 1.2 Squamous cell carcinomas are usually

Approach based on exact identifier

Due to the poor results (training, validation, and test errors) of approximative predictor, we developed the model based on generators of “predictive” combinations of all microdeletion mutations on exons 18, 19 and 20 (shown in Figure).

The aim of development of this EGFR mutations identifier was to achieve exact identification of mutated exons 18,19 and 20 in the EGFR gene and thus create the basis for the discovery of new treatments for Non-Small Cell Lung Cancer (NSCLC).

We have developed an integrated software suit using two levels of ensembling: 1. Ensembling of EGFR gene to exons and 2. ensembling of global exon combinations to partial exon combinations.

Page 6: 2015/2016. 1.NON-SMALL CELL LUNG CANCER (NSCLC) 1.1 Adenocarcinomas are often found in an outer area of the lung. 1.2 Squamous cell carcinomas are usually

Ensembling of combinations of the exon 19 to groups of 10 nucleotides

microdeletions in mutated exons. exon nucleotides can be in two states: deletion or nucleotide is normal. 2n mutated exons where n is length of exon.

PREMISES1

Instead of generating a 2lot of combinations for each exon (in real time would take a long time) we generate a 3combinations of the partial parts of exons, and identified mutations in these parts we integrate in the exon that gives us complete information about mutations.

2

3

EGFR GENE MUTATIONS IDENTIFIER - PREMISES

This model consists of three modules:

Page 7: 2015/2016. 1.NON-SMALL CELL LUNG CANCER (NSCLC) 1.1 Adenocarcinomas are often found in an outer area of the lung. 1.2 Squamous cell carcinomas are usually

1. The first module includes preprocessing data (extraction, encoding, and normalization);2. The second module includes functions for training of radial basis (radbas) neural network ensemble using “predictive” mutations training set. 3. The third module is intended for exploitation in two modes:

EGFR GENE MUTATIONS IDENTIFIER - STRUCTURE

Page 8: 2015/2016. 1.NON-SMALL CELL LUNG CANCER (NSCLC) 1.1 Adenocarcinomas are often found in an outer area of the lung. 1.2 Squamous cell carcinomas are usually

DIAGNOSIS PREDICTIVE SYSTEM

EGFR GENE MUTATIONS IDENTIFIER

On-line mode utilizing sample patients’ data with microdeletion mutations extracted on-line from EGFR mutation database or off- line mode( simulation mode with “predictive” microdeletion mutations from own database).The steps of operation are: masking, exon combinations ensembling, radbas mutations identification and conversion from binary to decimal format, reensembling of partial parts of exons, and counting.

Page 9: 2015/2016. 1.NON-SMALL CELL LUNG CANCER (NSCLC) 1.1 Adenocarcinomas are often found in an outer area of the lung. 1.2 Squamous cell carcinomas are usually

Radial Basis Network

Test of radbas identifier with 10 input vectors with 10 binary elements (0= mutation, 1= no mutation), and outputs with decimal identification of positions/number of mutations

Identification of mutations; odd output from radbas network indicates the beginning of a mutation sequence, and even output indicates the number of mutations from that position