modeling biological variables using neural networks

65
H. Anwar Ahmad, PhD, MBA, MCIS Associate Professor Luma Akil. PhD-C Dept. Biology, Jackson State University Modeling Biological Variables using Neural Networks

Upload: plato

Post on 25-Jan-2016

44 views

Category:

Documents


0 download

DESCRIPTION

Modeling Biological Variables using Neural Networks. H. Anwar Ahmad, PhD, MBA, MCIS Associate Professor Luma Akil. PhD-C Dept. Biology, Jackson State University. Two Research Projects. NIH Biostatistics Support Unit DOS Biostatistics Consulting Center - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Modeling Biological Variables using Neural Networks

H. Anwar Ahmad, PhD, MBA, MCIS

Associate ProfessorLuma Akil. PhD-C

Dept. Biology, Jackson State University

Modeling Biological Variables using Neural Networks

Page 2: Modeling Biological Variables using Neural Networks

Two Research Projects

• NIH Biostatistics Support Unit

• DOS Biostatistics Consulting Center

• ACKNOWLEDGEMENT: The project described was supported by Grant Numbers G12RR013459 from the National Center of Research Resources and PGA-P210944 from DOS.

Page 3: Modeling Biological Variables using Neural Networks

Broiler Growth Modeling

Page 4: Modeling Biological Variables using Neural Networks

Broiler Growth Modeling

• Determination of optimal nutrient requirements in broiler requires an adequate description of bird growth and body composition using growth curves.

• One-way to calculate nutrient requirements and predict the feed intake of growing broilers needs to start with the understanding of the genetic potential.

Page 5: Modeling Biological Variables using Neural Networks

Broiler Growth Modeling

• Typically nutritionists takes one nutrient at a time and keeping everything else constant monitor the body growth over certain period of time on a graded level of that particular nutrient, ignoring overall system approach and thus missing the bigger picture.

Page 6: Modeling Biological Variables using Neural Networks

Gompertz non linear Regression Equation

• W = A exp[−log(A/B)exp(−Kt)]. • W is the weight to age (t) with 3 parameters: • A is asymptotic or maximum growth

response, • B is intercept or weight when age (t) = 0,• K is rate constant.]

Page 7: Modeling Biological Variables using Neural Networks

Broiler Growth Modeling

• The current research project is a step towards a holistic system approach.

• Using all the available information from published literature, simulated and experimental data artificial intelligence (AI) models were developed using body growth for future energy requirements in broilers.

Page 8: Modeling Biological Variables using Neural Networks

Neural Networks

• A neural network is a mathematical model of an information processing structure that is loosely based on the working of human brain.

• An artificial neural network consists of large number of simple processing elements connected to each other in a network.

Page 9: Modeling Biological Variables using Neural Networks

Neural Networks-Neurons

Page 10: Modeling Biological Variables using Neural Networks
Page 11: Modeling Biological Variables using Neural Networks

Schematic View of Artificial Neural Network

Output Layers

Hidden Layers

Input Layers

Page 12: Modeling Biological Variables using Neural Networks

Data Collection• Average daily body weights of 25 male broiler

chicks (Ross x Ross 308) from day 1- day 70, were selected from a recently published report (Roush et al., 2006).

• These averages were converted into 14 interval classes of 5 days each, with means and standard deviations.

• These five day-interval classes of broiler growth reflect accurate growth patterns and curves in broilers.

Page 13: Modeling Biological Variables using Neural Networks

5-D Average Mean Standard 95% confidence Lower Upper RiskNormalBwt Interval Deviation interval 95% CI 95% CI Distribution

1 55.20 14.46 6.34 48.86 61.54 55.202 138.20 38.80 17.01 121.19 155.21 138.203 312.20 70.95 31.09 281.11 343.29 312.204 588.80 104.71 45.89 542.91 634.69 588.805 945.40 120.20 52.68 892.72 998.08 945.406 1354.00 135.07 59.20 1294.80 1413.20 1354.007 1826.40 162.98 71.43 1754.97 1897.83 1826.408 2334.60 166.98 73.18 2261.42 2407.78 2334.609 2828.60 147.32 64.57 2764.03 2893.17 2828.60

10 3292.40 152.40 66.79 3225.61 3359.19 3292.4011 3765.80 137.95 60.46 3705.34 3826.26 3765.8012 4159.80 110.91 48.61 4111.19 4208.41 4159.8013 4488.20 94.15 41.26 4446.94 4529.46 4488.2014 4708.80 32.87 14.40 4694.40 4723.20 4708.80

Page 14: Modeling Biological Variables using Neural Networks

Data Simulation

• The 14 means and standard deviations were used to simulate broiler growth data from day 1 to day 70, with Normal distribution of @Risk software (Palisade Corporation).

• Six simulations with one thousand iterations each were performed.

Page 15: Modeling Biological Variables using Neural Networks

Data Manipulation and Simulation

• On each 14 interval classes of 5days, 100 data points were randomly drawn for a total of 1400 observations, representing 20 observations for each day up to 70 days.

• Only 50 days data were used for this research and were arranged in one row of spreadsheet to determine training examples for neural network training.

Page 16: Modeling Biological Variables using Neural Networks

Neural Networks Training

• Out of 1400 data points, 750 training examples (epoch) for NN training were generated

Page 17: Modeling Biological Variables using Neural Networks

Neural Networks Training

• Out of 1400 data points, 750 training examples (epoch) for NN training were generated.

• Starting from the day one, the first four body weight observations were used as inputs while the fifth day body weight observation were used as output, to constitute one training example (epoch).

Page 18: Modeling Biological Variables using Neural Networks

Neural Networks Training

• The second training example consisted of second, third, fourth, and fifth observations of day one body weight, as inputs, while the sixth observation of day one as output.

• There were a total of 750 such examples (epoch) generated with the simulated data.

Page 19: Modeling Biological Variables using Neural Networks

Neural Networks Testing

• For the NN testing the same procedure of epoch generation was applied.

• However, instead of simulated data, the original body weight data was used for a total of 50 epoch.

Page 20: Modeling Biological Variables using Neural Networks

Age(d) BW(g) I1 I2I I3 I4 O Epoch1 43 43 43 51 62 77 12 43 43 51 62 77 93 23 51 51 62 77 93 113 34 62 62 77 93 113 134 45 77 77 93 113 134 159 56 93 93 113 134 159 192 67 113 113 134 159 192 227 78 134 134 159 192 227 265 89 159 159 192 227 265 308 9

10 192 192 227 265 308 355 1011 227 227 265 308 355 406 1112 265 265 308 355 406 460 1213 308 308 355 406 460 520 1314 355 355 406 460 520 586 1415 406 406 460 520 586 654 1516 460 460 520 586 654 724 1617 520 520 586 654 724 796 1718 586 586 654 724 796 868 1819 654 654 724 796 868 944 1920 724 724 796 868 944 1018 2021 796 796 868 944 1018 1101 2122 868 868 944 1018 1101 1185 2223 944 944 1018 1101 1185 1270 2324 1018 1018 1101 1185 1270 1349 2425 1101 1101 1185 1270 1349 1438 2526 1185 1185 1270 1349 1438 1528 2627 1270 1270 1349 1438 1528 1632 2728 1349 1349 1438 1528 1632 1704 2829 1438 1438 1528 1632 1704 1830 2930 1528 1528 1632 1704 1830 1936 3031 1632 1632 1704 1830 1936 2030 3132 1704 1704 1830 1936 2030 2122 3233 1830 1830 1936 2030 2122 2230 3334 1936 1936 2030 2122 2230 2335 3435 2030 2030 2122 2230 2335 2442 3536 2122 2122 2230 2335 2442 2544 3637 2230 2230 2335 2442 2544 2650 3738 2335 2335 2442 2544 2650 2739 3839 2442 2442 2544 2650 2739 2801 3940 2544 2544 2650 2739 2801 2942 4041 2650 2650 2801 2942 3011 3089 4142 2739 2739 2801 2942 3011 3089 4243 2801 2801 2942 3011 3089 3207 4344 2942 2942 3011 3089 3207 3295 4445 3011 3011 3089 3207 3295 3395 4546 3089 3089 3207 3295 3395 3476 4647 3207 3207 3295 3395 3476 3568 4748 3295 3295 3395 3476 3568 3706 4849 3395 3395 3476 3568 3706 3770 4950 3476 3476 3568 3706 3770 3867 50

Page 21: Modeling Biological Variables using Neural Networks

Results- Neural Networks Comparisons

Output: BP3 Output: BP5 Output: WardR squared: 0.998 R squared: 0.967 R squared: 0.973r squared: 0.998 r squared: 0.994 r squared: 0.9986Mean squared error:3514 Mean squared error:47376 Mean squared error:38949Mean absolute error:51.198 Mean absolute error:190.369 Mean absolute error:182.532Min. absolute error:0.453 Min. absolute error:0.058 Min. absolute error:35.495Max. absolute error:118.192 Max. absolute error:516.575 Max. absolute error:415.5Correlation coefficient r:0.999 Correlation coefficient r:0.997 Correlation coefficient r:0.9993Percent within 5%:64.000 Percent within 5%:4.000 Percent within 5%: 0Percent within 5% to 10%:20.000 Percent within 5% to 10%:34.000 Percent within 5% to 10%:28Percent within 10% to 20%:2.000 Percent within 10% to 20%:24.000 Percent within 10% to 20%:38Percent within 20% to 30%:2.000 Percent within 20% to 30%:32.000 Percent within 20% to 30%:12Percent over 30%:12.000 Percent over 30%:6.000 Percent over 30%:22

Page 22: Modeling Biological Variables using Neural Networks

-500

500

1500

2500

3500

45001 4 7

10 13 16 19 22 25 28 31 34 37 40 43 46 49

Age (d)

BWt (g

)

Actual

BP3Network

Act-BP3Net

BP3- NN Body Weight Prediction with Roush Test Data

Page 23: Modeling Biological Variables using Neural Networks

BP5- NN Body Weight Prediction with Roush Test Data

-500

500

1500

2500

3500

4500

1 4 710 13 16 19 22 25 28 31 34 37 40 43 46 49

Age (d)

BWt (

g)

Actual

BP5Network

Act-BP5Net

Page 24: Modeling Biological Variables using Neural Networks

-500

500

1500

2500

3500

45001 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49

Age (d)

BWt (g

)

Actual

WardNetwork

Act-WardNet

Ward NN Body Weight Prediction with RoushTest Data

Page 25: Modeling Biological Variables using Neural Networks

Conclusion• A holistic system approach encompasses

all aspects of the problem instead fragmented solution.

• Simulated approach is efficient and feasible once the underlying variables and parameters of the problem are clearly understood and defined.

Page 26: Modeling Biological Variables using Neural Networks

Conclusion• Ahmad, H. A, 2009. Poultry Growth

Modeling using Neural Networks and Simulated Data. J. Appl. Poultry Res. 18:440-446

• Ahmad, H.A., M. Mariano, 2006. Comparison of Forecasting Methodologies using Egg Price as a Test Case. Poultry Science 85:798-807

Page 27: Modeling Biological Variables using Neural Networks

Conclusion• BP3 NN gives the best prediction lines with near

perfect R² (0.998) value out of all the NN architecture utilized.

• NN modeling approach is an efficient and better alternative to its traditional statistical counterparts.

• Further research on energy and amino acid modeling will enhance our understanding of these modeling approaches.

Page 28: Modeling Biological Variables using Neural Networks

Egg Production Modeling

Page 29: Modeling Biological Variables using Neural Networks

Egg Production ModelingMathematical and Stochastic Egg Production Models

• Compartmentalization model (Grossman and Koops, 2001)

• Very complex • Stochastic model (Alvarez and Hocking, 2007)• Require too many variables to determine egg

production.• Impractical under most commercial situations.

Page 30: Modeling Biological Variables using Neural Networks

Data Collection• Average weekly egg production data of 240

layers from 22 US commercial strains were graciously provided by David A. Roland of Auburn University.

• The data was collected in three phases:

wk 21-36; wk 37- 52; wk 53-66.

• Daily egg production data of 22 strains were computed into 7 day-hen production means and standard deviations.

Page 31: Modeling Biological Variables using Neural Networks

Egg Production Curve• Egg production curve in commercial layers

follows a unique pattern: • Production begins around 20 weeks of age

starting slowly around 5%, increase weekly for the next 7-8 weeks until attain peak production of 95-97% around week 28.

• During next 8-10 weeks, egg production maintains a plateau around its peak production.

• Egg production during week 38-52 (phase II), slowly start declining.

• During phase III (wk 53-72), production declines further until it reaches a point of non-profitability, around 60%

Page 32: Modeling Biological Variables using Neural Networks

Data Collection

• The original data was used to map egg production curves in three different phases, compare curves among 22 commercial strains and compare strains laying white eggs with those laying brown eggs.

Page 33: Modeling Biological Variables using Neural Networks

Egg Production Curve

0

20

40

60

80

100

120

Age from wk 21

Egg

Prod

uctio

n %

Page 34: Modeling Biological Variables using Neural Networks

Egg Production Curve-phase I

Phase I

0

20

40

60

80

100

120

Age from wk 21

Eg

g P

rod

uct

ion

%

Page 35: Modeling Biological Variables using Neural Networks

Egg Production Curve-phase II

Phase II

9192939495

96979899

Age from wk 37

Eg

g P

rod

ucti

on

%

Page 36: Modeling Biological Variables using Neural Networks

Egg Production Curve-phase III

Phase III

8082848688

90929496

Age from wk 53

Eg

g P

rod

uct

ion

%

Page 37: Modeling Biological Variables using Neural Networks

Comparative Egg Production

Brown vs. White Layers

0

20

40

60

80

100

120

1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47

Age from wk 22

EP%

Brown Av

White Av

Page 38: Modeling Biological Variables using Neural Networks

Comparative Egg Production Curves of US Commercial Strains-phase I

0

20

40

60

80

100

120

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

Age from wk 22

Eg

g P

rod

uct

ion

%

Only strain 1 was different from 13

Page 39: Modeling Biological Variables using Neural Networks

Egg Production Modeling• For the current project, data from one of the

brown egg laying strain was chosen for simulation and training of neural networks.

• Mean weekly egg weights and standard deviations from wk 22-36 were computed.

• These parameters were fed into Normal Distribution of @Risk 4.0 software (Palisade Corporation).

• Six simulations with one thousand iterations each were performed.

Page 40: Modeling Biological Variables using Neural Networks

Data Simulation and Manipulation

• On each 15 set of mean and standard deviation representing each week of production from week 22 to week 36, 20 data points were randomly drawn for a total of 300 observations, for training the neural networks.

• Similarly for testing the neural networks, ten random data points were drawn.

Page 41: Modeling Biological Variables using Neural Networks

Data Simulation and Manipulation

• Each of 20 training and 10 test observations, for each week egg production, were arranged in one row of a spreadsheet to determine 105 neural network examples, respectively.

Page 42: Modeling Biological Variables using Neural Networks

Neural Networks Training

• Starting from the wk 22, the first four egg production observations were used as inputs while the fifth observation were used as output, to constitute one training example (epoch).

Page 43: Modeling Biological Variables using Neural Networks

Neural Networks Training• The second training example consisted of

second, third, fourth, and fifth observations of egg production, as inputs, while the sixth observation as output.

• There were a total of 105 such examples (epoch) generated with the simulated data.

• For the NN testing the same procedure of epoch generation was applied.

Page 44: Modeling Biological Variables using Neural Networks

NN Examples (Epoch)Age Eg. I1 I2 I3 I4 O

wk22 d1, epoch1 6.67 12.62 21.11 54.44 10.86

wk22 d2, epoch2 12.62 21.11 54.4 10.86 24.06

wk22 d3, epoch3 21.11 54.44 10.86 24.06 14.16

wk22 d4 54.44 10.86 24.06 14.16 19.67

wk22 d5 10.86 24.06 14.16 19.67 40.96

wk22 d6 24.06 14.16 19.67 40.96 54.21

wk22 d7 14.16 19.67 40.96 54.21 46.47

wk23 d8, epoch8 32.09 24.4 33.3 48.34 78.37

wk23 d9 24.4 33.3 48.34 78.37 47.75

wk23 d10 33.3 48.34 78.37 47.75 52.19

wk23 d11 48.34 78.37 47.75 52.19 73.86

wk23 d12 78.37 47.75 52.19 73.86 50.92

wk23 d13 47.75 52.19 73.86 50.92 49.9

wk23 d14 52.19 73.86 50.92 49.9 44.69

Page 45: Modeling Biological Variables using Neural Networks

BP-3 Model

BP-3 NN Strain 1, phase 1

-100

-50

0

50

100

150

1 6 11 16 21 26 31 36 41 46 51 56 61 66 71 76 81 86 91 96 101

Age(d)

EP%

ActualBP-3NNAct-Net

R² = 0.68

Page 46: Modeling Biological Variables using Neural Networks

GRNN Model

GRNN Strain 1, phaseI

-100

-50

0

50

100

150

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99

Age(d)

EP

%

ActualGRNNAct-Net

R² = 0.72

Page 47: Modeling Biological Variables using Neural Networks

WARD-5 Model

Ward-5 NN Strain 1, phaseI

-100

-50

0

50

100

150

1 7 13 19 25 31 37 43 49 55 61 67 73 79 85 91 97 103

Age(d)

EP%

ActualWard-5NNAct-Net

Page 48: Modeling Biological Variables using Neural Networks

Results- Neural Networks Comparisons

Parameter BP-3 GRNN Ward-5

R squared: 0.681 0.715 0.697

r squared: 0.78 0.75 0.76

Mean squared error: 136.09 121.51 129.38

Mean absolute error: 7.81 7.28 7.40

Min. absolute error: 0.04 0.12 0.09

Max. absolute error: 46.54 43.54 48.40

Correlation coefficient r: 0.88 0.86 0.87

Percent within 5%: 45.71 49.52 50.48

Percent within 5% to 10%: 25.71 24.76 20.95

Percent within 10% to 20%: 16.19 15.24 18.10

Percent within 20% to 30%: 5.71 2.86 4.76

Percent over 30%: 6.67 7.62 5.71

Page 49: Modeling Biological Variables using Neural Networks

GRNN comparison with brown strains, phase I

GRNN Brown strains, phase I

0

20

40

60

80

100

120

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

Age from wk 22

EP

%

Page 50: Modeling Biological Variables using Neural Networks

GRNN comparison with white strains, phase I

GRNN White strains, phase I

0

20

40

60

80

100

120

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15Age from wk 22

EP

%

White strains average

GRNN

Page 51: Modeling Biological Variables using Neural Networks

GRNN comparison with commercial strains, phase I

Strains comparisons with GRNN, phase I-EP%

0

20

40

60

80

100

120

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

Strains

EP%

Series1 Series2 Series3 Series4 Series5 Series6 Series7Series8 Series9 Series10 Series11 Series12 Series13 Series14Series15 Series16 Series17 Series18 Series19 Series20 Series21Series22 Series23 Series24 Series25

GRNN

Page 52: Modeling Biological Variables using Neural Networks

Conclusion• Data simulation may offer a feasible

alternative to expensive original data collection once the underlying variable’s parameters are defined and understood.

• Compared to other mathematical and statistical models neural networks predicted egg production curve accurately and efficiently.

Page 53: Modeling Biological Variables using Neural Networks

Conclusion

• All three architect of NN produced comparable results in terms of R² that varied from 0.681 to 0.715.

• All the networks over predicted during the initial period when egg production was peaking.

Page 54: Modeling Biological Variables using Neural Networks

Conclusion• Once the initial over-prediction anomaly is

corrected, NN modeling approach will be an efficient and superior to its traditional mathematical and statistical counterparts for egg production prediction.

• Further research on energy and amino acid modeling will enhance our understanding of these modeling approaches.

Page 55: Modeling Biological Variables using Neural Networks

Conclusion

• Ahmad, H. A, 2011. Egg Production Forecasting: Determining efficient modeling approaches. J. Appl. Poultry Res. 20(#4):463-473. NIHMSID # 365792

Page 56: Modeling Biological Variables using Neural Networks

CD4+ Pattern Recognition using GRNN in HIV Patients

Page 57: Modeling Biological Variables using Neural Networks

CD4+ prediction with Viral Load using BP-3NN

Page 58: Modeling Biological Variables using Neural Networks

Immunology 101

• CD4 is a protein that is present on Thelper lymphocytes (blood cells that respond to

foreign substances).• Thelper do not kill the cells that carries the antigen

but interact with B-lymphocytes or Tkiller lymphocytes to respond to antigens.

• Simple tests are available for CD4 proteins that in turn identify and count Thelper lymphocytes.

Page 59: Modeling Biological Variables using Neural Networks

BPNN for CD4 Prediction with VL

NN Prediction

020406080

100120140160180200

1 15 29 43 57 71 85 99 113 127 141 155 169

CD 4

Test data NN predicted CD4

Actual CD4 R2 = .1392

Page 60: Modeling Biological Variables using Neural Networks

0

200

400

600

800

1000

1200

1400

1 7 13 19 25 31 37 43 49 55 61 67 73 79 85 91 97 103 109 115 121 127 133 139 145 151 157 163 169 175

GRNN for CD4 Prediction with VL

Actual CD4

GRNN pred. CD4

CD4

Test Data

R2 = 0.194

Page 61: Modeling Biological Variables using Neural Networks

0

200000

400000

600000

800000

1000000

1200000

1 17 33 49 65 81 97 113 129 145 161 177

GRNN Prediction for Viral Load with CD4

Viral load

Test data

Actual viral load

GRNN pred VL

Page 62: Modeling Biological Variables using Neural Networks

CD4

Test data

0

200

400

600

800

1000

1200

1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35

Actual CD4

NN CD4

BP-3NN CD4 Prediction using Pattern

R2 = .84

Page 63: Modeling Biological Variables using Neural Networks

High Blood Pressure vs. Obesity

Page 64: Modeling Biological Variables using Neural Networks

Ozone Modeling

Page 65: Modeling Biological Variables using Neural Networks

Salmonella Outbreaks