determination of wheat quality during the development of ... · sviluppo del grano e sono già...

140
Determination of wheat quality during the development of the grain using MALDI-TOF mass spectrometry and multivariate data analysis M.Sc. Thesis Andrea Ghirardo BioCentrum-DTU Department of Biochemistry and Nutrition Technical University of Denmark

Upload: others

Post on 22-Oct-2019

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Determination of wheat quality during the development of ... · sviluppo del grano e sono già presenti quindici giorni dopo l’impollinazione (dpa). Il grano continua ad accumulare

Determination of wheat quality during the development of the grain using

MALDI-TOF mass spectrometry and multivariate data analysis

M.Sc. Thesis

Andrea Ghirardo

BioCentrum-DTU

Department of Biochemistry and Nutrition

Technical University of Denmark

Page 2: Determination of wheat quality during the development of ... · sviluppo del grano e sono già presenti quindici giorni dopo l’impollinazione (dpa). Il grano continua ad accumulare

ii

UNIVERSITÀ DEGLI STUDI DI TORINO

Scuola Universitaria Interfacoltà per le Biotecnologie

CORSO DI LAUREA IN BIOTECNOLOGIE Indirizzo INDUSTRIALE

TESI DI LAUREA

Determination of wheat quality during the development of the grain using MALDI-TOF mass

spectrometry and multivariate data analysis

Relatore Candidato Prof. Rosa Pia FERRARI Andrea GHIRARDO

Correlatore Prof. Massimo MAFFEI

Anno Accademico 2003-2004

Page 3: Determination of wheat quality during the development of ... · sviluppo del grano e sono già presenti quindici giorni dopo l’impollinazione (dpa). Il grano continua ad accumulare

iii

Riassunto e introduzione alla tesi Questa tesi rappresenta il mio progetto finale di studi ed è stata realizzata completamente in Danimarca, nel laboratorio di biochimica e nutrizione del BioCentrum-DTU, il polo biotecnologico della facoltà d’ingegneria DTU (Danish Technical University, Copenaghen), attraverso il programma europeo di scambio interculturale Erasmus. Lo svolgimento di questo progetto è nato dalla collaborazione tra la professoressa Rosa Pia Ferrari dell’Università di Torino e i professori Ib Søndergaard e Susanne Jacobsen dell’Università danese. La mia scelta di andare all’estero è maturata da varie esigenze: innanzi tutto volevo acquisire nuove competenze scientifiche in un laboratorio d’alto livello presso un’Università importante come il DTU e avere la possibilità di svolgere un mio progetto di ricerca indipendente e autonomo, progettandolo e sviluppandolo. Tale lavoro mi ha consentito di ottenere risultati scientificamente importanti: ho redatto una parte del contenuto del mio progetto in un articolo che è stato pubblicato da Rapid Communication in Mass Spectrometry nel gennaio 2005 (appendice M1). Questa tesi è stata svolta nell’arco di undici mesi ed è stata presentata e difesa durante l’esame finale, consentendomi di ottenere a pieni voti il Danish Master Thesis (appendice N1). Il progetto concerne studi chemiometrici sulle proteine del complesso del glutine, estratte dal grano, mediante la spettrometria di massa MALDI-TOF. Lo scopo del progetto è di creare un nuovo e veloce metodo per la determinazione della qualità del grano durante il suo processo di sviluppo. Inoltre si vogliono investigare i cambiamenti della composizione proteica del glutine nelle varietà adatte alla produzione del pane. Le caratteristiche che le varietà di grano devono avere per essere adatte alla produzione del pane sono connesse al complesso del glutine, formato da gliadine e glutenine. Quest’ultimo forma una rete nella pasta del pane e le proprietà visco-elastiche delle gliadine e delle glutenine permettono un’appropriata espansione durante il processo di lievitazione. La composizione del glutine è probabilmente connessa alla qualità del pane. Le proteine del glutine sono sintetizzate durante lo sviluppo del grano e sono già presenti quindici giorni dopo l’impollinazione (dpa). Il grano continua ad accumulare le gliadine e le glutenine fino circa a 45 dpa, quando ormai è pronto per la mietitura. Lo scopo del progetto è di creare un metodo che possa determinare la qualità del grano in uno stadio precoce di maturazione. Il lavoro si basa sull’estrazione e sulla separazione di proteine del glutine da farine provenienti da diverse varietà di grano: alcune varietà hanno qualità adatte alla panificazione (appartenenti dunque alla stessa classe, chiamata “baking quality”), altre invece non hanno le qualità adeguate per la

Page 4: Determination of wheat quality during the development of ... · sviluppo del grano e sono già presenti quindici giorni dopo l’impollinazione (dpa). Il grano continua ad accumulare

iv

produzione del pane (classe “feeding quality”), ma in ogni caso potrebbero essere utilizzate per la nutrizione animale. Le due classi di qualità sono quindi state confrontate, e si sono ricercate le sottili differenze capaci di distinguerle. Dal momento che si voleva determinare la qualità ad uno stadio di sviluppo precoce del grano, i campioni sono stati raccolti a partire dal 15 dpa, fino al 45 dpa. La spettrometria di massa di tipo MALDI-TOF rappresenta la veloce tecnica usata per la separazione delle proteine. Gli spettri sono stati confrontati mediante l’analisi multivariata dei dati. Qualora l’osservazione e il confronto fossero possibili per l’occhio umano, l’analisi sarebbe comunque soggettiva. L’analisi multivariata invece, ha offerto oggettività e un potentissimo strumento per analizzare e manipolare un’enorme tabella di dati, come quella generata dagli spettri di massa. Utilizzando l’analisi multivariata e la spettrometria di massa, è stato creato un metodo basato sulla partial least squared regression (PLS-R) o sulla soft independent modelling of class analogy (SIMCA) in grado di predire correttamente la qualità ignota di nuovi campioni di grano. I risultati sono stati straordinari e offrono ottime speranze per la possibilità di determinare la qualità di campioni ignoti già un mese prima della mietitura. Questo metodo potrebbe un giorno sostituire quelli tradizionali che, come ad esempio i gel elettroforetici, richiedono lunghi tempi d’attesa. Il risultato è molto importante, dal momento che molte industrie da tempo sono alla ricerca di un metodo veloce che possa essere usato durante la caricatura del grano nel corretto silo, abbattendo in questo modo i prezzi dell’immagazzinamento dovuti ai lunghi periodi di analisi. Inoltre, potendo determinare la qualità in anticipo, il metodo potrebbe essere di uso pratico nelle aziende agrarie che volessero avere informazioni sulla qualità di un campo in coltivazione, analizzando una piccolissima quantità di campione. Durante le analisi tra il 15 dpa e il 45 dpa è stato inoltre interessante studiare in che modo le proteine del glutine cambiano la loro composizione durante lo sviluppo del grano. Lo stadio di 15 dpa e quello di 45 dpa sono chiaramente distinguibili: durante questo mese il contenuto proteico cambia profondamente diventando più complesso. Si è riscontrato che delle gliadine e glutenine di peso molecolare negli intervalli 30-42 kDa, 49-55 kDa e 60-72 kDa, caratterizzano lo stadio tardivo della maturazione del grano. La spettrometria di massa combinata all’analisi multivariata dei dati ha quindi mostrato di essere un rapido metodo potente ed efficace per predire la qualità del grano e investigare i cambiamenti della composizione del glutine.

Page 5: Determination of wheat quality during the development of ... · sviluppo del grano e sono già presenti quindici giorni dopo l’impollinazione (dpa). Il grano continua ad accumulare

v

Andrea Ghirardo S030340 45 points assignment

Determination of wheat quality during the development of the grain using MALDI-TOF mass spectrometry and multivariate data analysis.

The aim of this project is to develop a fast method for determination of the wheat quality before the grain is harvested. The quality of wheat is dependent of the end-use. For breadmaking purposes, the quantity and the quality of gluten proteins are responsible for the wheat quality. Gluten proteins are the major component of storage proteins, which are accumulated during the development of the grain. Mass spectrometry is an efficient and fast technique for protein separation. Multivariate data analysis is connected to analysis of the complex data table originating, for example by mass spectrometry. By selecting known wheat varieties suitable and not suitable for breadmaking purposes, extracting the gluten protein and subsequently separating them by mass spectrometry, it is believed that the multivariate data analyses can extract the differences between the two qualities. On the basis of these differences the development of a method allows to determine the quality of unknown samples. Moreover, the analysis on the extracts at different stages of development may allow tracing the changes of gluten proteins composition of grain. The assignment:

• Explorative multivariate analysis of the gluten proteins extracted from different wheat quality during the grain filling

• Determination of the wheat quality at different stages of the grain development using MALDI-TOF MS and multivariate data analysis

• Study of the changes in gluten protein composition during the development of the grain

Page 6: Determination of wheat quality during the development of ... · sviluppo del grano e sono già presenti quindici giorni dopo l’impollinazione (dpa). Il grano continua ad accumulare

vi

Abstract

The properties in breadmaking quality are related to gluten complex, which is composed of gliadins and glutenins. The gluten complex form a network in the dough, and the viscoelastic proprieties of gliadin and glutenin allow its expansion during the fermentation. The gluten composition is likely related to breadmaking property. The gluten proteins are synthesised during the development of the grain and are already present fifteen days after the pollination (dpa). The grain continues to accumulate gluten proteins, until the harvest, at around 45 dpa. The aim of this project is determining a fast method for the wheat quality before the harvest of the grain. The work is based on separation of gluten proteins from varieties suitable and not suitable for breadmaking purpose and then confronted in order to find the difference between the two qualities. The samples are collected at different stages of the development of the grain between 15 and 45 dpa, i.e. during the period that the grain contains gluten proteins. Matrix assisted laser desorption/ionisation time of flight mass spectrometry (MALDI-TOF MS) represents the fast technique chosen for separation of the proteins. The comparison of the mass spectra is supported by multivariate data analysis. When several mass spectra are to be confronted at the same time it becomes impossible for the human capability of eye inspection, which is furthermore an analysis subjective. Multivariate data analysis offers objectivity of powerful tools in managing and handing of large data table as, e.g. originating by MALDI-TOF MS. Therefore the mass spectrograms data, collected from two wheat qualities, are examined with multivariate analysis in order to find differences that are used to make a model for determination of wheat quality. By use of multivariate analysis, it has been possible create two models based on discriminant partial least squared regression (PLS-R) and soft independent modelling of class analogy (SIMCA) for the determination of wheat quality. The results are encouraging for the prediction of the quality at 15 dpa (i.e. around one month before the grain is harvested). During the analyses between 15 dpa and 45 dpa, it was interesting the study of how the gluten proteins of a variety or of a quality change during the development of the grain. The 15dpa and 45 dpa stages are clearly distinguished; after one month of grain filling the gluten composition has become more complex. The use of multivariate analysis on mass spectra has thus shown to be a method for a fast investigation of changes of protein composition.

Page 7: Determination of wheat quality during the development of ... · sviluppo del grano e sono già presenti quindici giorni dopo l’impollinazione (dpa). Il grano continua ad accumulare

vii

Preface and acknowledgments The thesis presents the final project of my studies. It has been performed completely in Denmark, as an Erasmus project after a background in biotechnology of University of Turin (U.N.I.TO), Italy. The decision to come to Denmark to work on my thesis originated from the requirement to acquire the scientific knowledge and the experience to plan and carry out a project of research. Denmark has been a perfect destination to improve my English and to learn a new culture. The laboratory is situated at BioCentrum-DTU, Department of Biochemistry and Nutrition, and the work has been performed for eleven months. The project has been performed under the supervision of Ib Søndergaard and Susanne Jacobsen and Helle Aagaard Sørensen at DTU in Denmark and under the supervision of Rosa Pia Ferrari in Italy. The scope of the project includes chemometrics studies based on analysis by MALDI-TOF mass spectrometry of storage proteins from the wheat gluten complex. The aim of the project is creating a fast method for determination of wheat quality at different stages of the development of the grain. Furthermore the investigations of the changes of gluten proteins composition during the grain filling for the varieties suitable for breadmaking purpose are also determined. Thanks to Helle, for explanation of the wheat grain theory and for helping along the construction of the thesis. Thanks for your forbearance with me; I will always be in debt to you of some boxes of chocolate! Special thanks to Marianne Peterson, for collaboration, technician support and company during my project. She helped me to keep high my moral during the worse period, and it was really nice to work with your company! Thanks to Susanne Blune for lending your desk and your nice company. Thanks to Nina, who threw away my food every week from the fridge, she helped me to lose weight! Thanks to Haiko, for elucidation about wheat proteins. Thanks to all of the staff of the department who introduced me to the Danish culture. Thanks to Ib and Susanne for professional support and especially to give me an opportunity of working in a professional way on scientific research. Thanks to my family, without your support I would not have the possibility to study. Finally, but not for importance, thanks to all my friends, especially who supported me during the last period of my thesis composition. Thanks to Laura, to have been always on my side; Jenny, Kate, Shih Fu, in some way, you have contributed to the realisation of this project. Thanks to Denmark who captivated me in this fabulous North Country.

Page 8: Determination of wheat quality during the development of ... · sviluppo del grano e sono già presenti quindici giorni dopo l’impollinazione (dpa). Il grano continua ad accumulare

viii

List of abbreviation

2-D Two-dimensional 2-D PAGE See [2-D] and [PAGE] CA Correspondance Analysis Da Dalton DTT Dithothreitol g gyrate HAc Acetic acid HMW High molecular weight HPLC High performance liquid chromatography IEF Isoelectric focusing kDa 1000 Da LMW Low molecular weight M Molarity MALDI Matrix assisted laser desorption/ionisation MALDI-TOF MS See [MALDI], [TOF] and [MS] min minute MDS Multidimensional Scaling MLR Multiple Linear Regression MS Mass spectrometry MSC Multi Scatter Correction MW Molecular weight m/z mass-to-charge ratio NaCl Sodium clorure PAGE Polyacrylamide gel electrophoresis PC Principal Component PCs Principal Components PCA Principal component analysis PCO Principal coordinate analysis PLS Partial least squares PLS-R Partial least squares regression R Cophenetic Correlation Coefficient RMSEP Root mean squared error of prediction RVYV Residual validation Y variance SA Sinapinic acid Sdev Standard Deviation SDS Sodium dodecyl sulphate

Page 9: Determination of wheat quality during the development of ... · sviluppo del grano e sono già presenti quindici giorni dopo l’impollinazione (dpa). Il grano continua ad accumulare

ix

SDS PAGE See [SDS] and [PAGE] SIMCA Soft Independent Modelling Of Class Analogy TOF Time-of-flight Y-pred Predicted Y-values Y-ref. Reference (measured) Y-values

Page 10: Determination of wheat quality during the development of ... · sviluppo del grano e sono già presenti quindici giorni dopo l’impollinazione (dpa). Il grano continua ad accumulare

x

Index Riassunto e introduzione alla tesi iii Assignment v Abstract vi Preface and acknowledgements vii List of abbreviation viii Index..................................................................................................................................................... x 1. INTRODUCTION ................................................................................................................... 1

1.1 Information about the project........................................................................................... 2 1.2 Construction of the thesis................................................................................................. 3

THEORY ............................................................................................................................................. 4 2. WHEAT ................................................................................................................................... 5

2.1 General information............................................................................................................. 6 2.1.1 Introduction................................................................................................................. 6 2.1.2 Historical origin .......................................................................................................... 7 2.1.3 Uses of wheat.............................................................................................................. 8 2.1.4 Classification of wheat: species. ................................................................................. 9 2.1.5 Nutritional value ....................................................................................................... 10 2.1.6 Production and economical importance .................................................................... 12 2.1.7 Criteria of wheat quality ........................................................................................... 14

2.2 Microscopic structure of the wheat ................................................................................... 17 2.2.1 Main components of grain ........................................................................................ 17 2.2.2 Morphology of the grain ........................................................................................... 19 2.2.3 Wheat proteins .......................................................................................................... 21 2.2.4 Storage proteins ......................................................................................................... 24 2.2.5 Gluten proteins .......................................................................................................... 24 2.2.6 Gliadins and glutenins............................................................................................... 25

2.3 The development of the grain ........................................................................................... 29 2.3.1 Germination .............................................................................................................. 30 2.3.2 Early growth.............................................................................................................. 31 2.3.3 Stem Elongation........................................................................................................ 32 2.3.4 Flowering and Fertilisation-Pollen release ............................................................... 33 2.3.5 Grain growth after anthesis ....................................................................................... 34 2.3.6 Grain filling during the first 10 days (1-10 dpa) ....................................................... 34 2.3.7 Grain filling 11 to 16 days ........................................................................................ 35 2.3.8 Grain filling 17-21 dpa.............................................................................................. 36

Page 11: Determination of wheat quality during the development of ... · sviluppo del grano e sono già presenti quindici giorni dopo l’impollinazione (dpa). Il grano continua ad accumulare

xi

2.3.9 Grain filling 21 to 30 days ........................................................................................ 37 2.3.10 From 30 to 45 dpa ...................................................................................................... 37 2.3.11 Changes in protein composition during grain development ...................................... 39

3. MULTIVARIATE DATA ANALYSIS ................................................................................ 41 3.1 Introduction: Why multivariate data analysis? .............................................................. 42 3.2 Principal component analysis ......................................................................................... 45 3.3 Soft Independent Modelling of Class Analogy.............................................................. 49 3.4 Partial least square ......................................................................................................... 49

4. MALDI-TOF MASS SPECTROMETRY ............................................................................. 53 4.1 Introduction.................................................................................................................... 54 4.2 Basic principles of MALDI-TOF MS............................................................................ 54 4.3 Sample preparation ........................................................................................................ 56 4.4 Matrix solution for MALDI-TOF .................................................................................. 56

5. EXPERIMENTAL WORK....................................................................................................... 57 5.1 Experimental Background .............................................................................................. 58 5.2 The Dataset- Varieties Used .......................................................................................... 59 5.3 Extraction....................................................................................................................... 61 5.4 Procedures of MALDI-TOF MS analyses ..................................................................... 63 5.5 Multivariate data analysis .............................................................................................. 67 5.6 Pre-processing................................................................................................................ 68 5.7 Correlation of the peaks ................................................................................................. 73 5.8 Outliers........................................................................................................................... 74

6. RESULTS-DISCUSSION ..................................................................................................... 81 6.1 Determination of wheat quality.......................................................................................... 82

66..11..11 DDeettee rrmmiinnaatt iioonn of baking and feeding quality and outliers detection by PCA .......... 83 6.1.2 Determination of wheat quality by PLS-R................................................................ 90 6.1.3 Prediction of unknown samples by PLS-R and SIMCA.......................................... 97

6.2 Gluten proteins development ........................................................................................... 114 6.2.1 Study on development of varieties.......................................................................... 115 6.2.2 Study on development of quality............................................................................ 120

7. CONCLUSION.................................................................................................................... 125 8. REFERENCES .................................................................................................................... 127 The Appendix is enclosed separately as a supplement to the thesis

Page 12: Determination of wheat quality during the development of ... · sviluppo del grano e sono già presenti quindici giorni dopo l’impollinazione (dpa). Il grano continua ad accumulare

1

1. INTRODUCTION

• 1.1 Information about the project

• 1.2 Construction of the thesis

Page 13: Determination of wheat quality during the development of ... · sviluppo del grano e sono già presenti quindici giorni dopo l’impollinazione (dpa). Il grano continua ad accumulare

1. Introduction

2

1.1 Information about the project The project has been performed at BioCentrum-DTU, Department of Biochemistry and Nutrition at Technical University of Denmark. One of the most important cereals for human purpose is wheat. Wheat sustains a part of Western cultures and has economic importance for the Danish market. The quality usable for breadmaking purpose is distinguished from the quality unsuitable, which can be used for other purposes (e.g. animal feeding). The project described in this thesis is based on gluten complex, which is formed by storage proteins important for determination of quality: the gliadins and the glutenins. The knowledge about the proteins responsible of wheat quality have been used to develop a fast method based on MALDI-TOF MS and multivariate data analysis in order to predict the quality of unknown wheat samples at different stages of grain development. The method allows replacement of traditional time-consuming methods, as such e.g. gel electrophoresis. It has furthermore interesting to study the changes of gluten protein composition during the grain filling. The overall purposes of the thesis are:

• Develop a fast method for quality identification of the wheat at different stage of development

• Follow the changes of the gluten protein composition during the grain filling. The gliadins and glutenins are extracted from the wheat flour and separated by MALDI-TOF MS. Multivariate data analysis is applied as support of data handling to all the analyses carried out in this project.

Page 14: Determination of wheat quality during the development of ... · sviluppo del grano e sono già presenti quindici giorni dopo l’impollinazione (dpa). Il grano continua ad accumulare

1. Introduction

3

1.2 Construction of the thesis The thesis is formed by the main chapters summarised:

• Theoretical part o Wheat chapter 2 o MALDI-TOF Mass Spectrometry chapter 3 o Multivariate Data Analysis chapter 4

• Experiment work chapter 5

• Result and Discussion chapter 6

• Conclusion chapter 7 The theoretical part of the project composes the basis for the work and is divided into three chapters. Chapter 2 is devoted to wheat, the object of analysis; the general information is introduced in section 2.1; the morphology, composition and proteins of the grain are described in section 2.2; the development of the grain is analysed in section 2.3. The method for determination of wheat quality is based on MALDI-TOF MS, which is the fast technique for protein separation; the fundamental principle is described in chapter 3. Handling, managing and analysing of mass spectra are carried out by multivariate data analysis, termed also chemometrics; the methods utilised are given in chapter 4. The experimental work (chapter 5) describes the proceedings of analysis: gliadins and glutenins extraction, the MALDI-TOF MS proteins separation and the pre-processing of mass data before the analyses with chemometrics. The result and discussion part (chapter 6) of the project is the section with the experiments and explanation of the results. The chapter has been divided into sections, according the aims of the project. The determination of wheat quality (section 6.1) is initially examined by principal component analysis (PCA) in section 6.1.1 and the investigation of the quality is carried out by partial least squared regression (PLS-R) in section 6.1.2; the methods are based on discriminative PLS-R and soft independent modelling of class analogy (SIMCA) described in the section 6.1.3. The results of the studies on changes of gluten proteins composition during the development of the grain is described in section 6.2. Finally the conclusion part is composed of a summary with results to compare to the expectation of the project. References used throughout the thesis are listed in chapter 8. An Appendix is enclosed separately as a supplement to the thesis.

Page 15: Determination of wheat quality during the development of ... · sviluppo del grano e sono già presenti quindici giorni dopo l’impollinazione (dpa). Il grano continua ad accumulare

4

THEORY

• 2. Wheat

• 3. MALDI TOF MS

• 4. Chemometrics classification and multivariate date analysis

Page 16: Determination of wheat quality during the development of ... · sviluppo del grano e sono già presenti quindici giorni dopo l’impollinazione (dpa). Il grano continua ad accumulare

5

2. WHEAT

• 2.1 General information about wheat

§ 2.1.1 Introduction § 2.1.2 Historical origin § 2.1.3 Uses of wheat § 2.1.4 Classification of wheat: species § 2.1.5 Nutritional value § 2.1.6 Production and economical importance § 2.1.7 Criteria of wheat quality

• 2.2 Microscopic structure

§ 2.2.1 Main components of the grain § 2.2.2 Morphology of the grain § 2.2.3 Wheat proteins § 2.2.4 Storage proteins § 2.2.5 Gluten proteins § 2.2.6 Gliadins and glutenins

• 2.3 Development of the grain

§ 2.3.1 Germination § 2.3.2 Early growth § 2.3.3 Stem elongation § 2.3.4 Flowering and fertilisation - Pollen release § 2.3.5 Grain growth after anthesis § 2.3.6 Grain filling during the first 10 days (1-10 dpa) § 2.3.7 Grain filling 11 to 16 days § 2.3.8 Grain filling 17-21 dpa § 2.3.9 Grain filling 21 to 30 days § 2.3.10 Dry down from 30 to 45 dpa § 2.3.11 Changes in protein composition during grain development

Page 17: Determination of wheat quality during the development of ... · sviluppo del grano e sono già presenti quindici giorni dopo l’impollinazione (dpa). Il grano continua ad accumulare

1. Wheat

6

2.1 General information

2.1.1 Introduction Wheat, barley, rice and corn are the most grown cereals in the world with more than 70% of the world's farmlands devoted to the cultivation of cereal grains. Wheat and barley sustained the Western cultures, while dominated rice the Far East, and corn was predominately used during the pre-Columbian New World cultures. The reasons for their widespread are connected to the importance of cereals in human diet, the easy access to farming and their adaptation to different latitudinal (1). Cereal grains provide more than one half of the total calories consumed by man (1). Cereal grains feed humans, domestic animals and also several industrial products originate from them. Many varieties of cereals are adapted to the local conditions almost everywhere in the world. Cereals have become significant in human society resulting in a large, conspicuous economical business. The interests in cereals have in the last two centuries brought forth investigation and studies of them. Specifically, biochemical and genetic properties of the cereal proteins have been researched. The thesis is centered on one of the most popular cereals studied: the wheat. In breadmaking the gluten complex has a relevant role and it has been studied intensively. The differences in variety and in quality are used for selection of optimal breeding and production of cereals. In Europe especially, wheat is mainly used for baking purposes. There are many studies about wheat proteins implicated in baking quality.

Page 18: Determination of wheat quality during the development of ... · sviluppo del grano e sono già presenti quindici giorni dopo l’impollinazione (dpa). Il grano continua ad accumulare

1. Wheat

7

2.1.2 Historical origin Wheat is believed to have originated in the southwest part of Asia. Some of the earliest remains of the crop have been found in Syria, Jordan, and Turkey. Primitive relatives of wheat have been discovered in some of the oldest excavations of the world specifically in eastern Iraq, dating back 9,000 years. Other archaeological findings show that bread wheat was grown in the Nile Valley about 5,000 B.C. as well as in India, China, and even England at about the same time. Wheat was first grown in the United States in 1602 on an island the Massachusetts coast. Man has depended upon the wheat plant for himself and his beasts for thousands of years.

Figure 2.1 Wheat farmland at Bornholm, Denmark. (Andrea Ghirardo 2003)

Page 19: Determination of wheat quality during the development of ... · sviluppo del grano e sono già presenti quindici giorni dopo l’impollinazione (dpa). Il grano continua ad accumulare

1. Wheat

8

2.1.3 Uses of wheat Wheat is always associated with everyday human food uses. It is nutritious, concentrated, easily stored and transported, and easily processed into various types of food (3). Today wheat is used several ways in food: pasta, noodles, rolls, breads, biscuits, cakes, crackers, cookies, steamed bread, doughnuts, muffins, pancake, waffles, pie crust, croissant, bagels, flat bread and chapattis, ice cream cones, macaroni, spaghetti, pizza and many other products. Much of the wheat used for livestock and poultry feed is a by-product of the flour milling industry. Wheat straw is used for livestock bedding. The green forage may be grazed by livestock or used as hay or silage. Industrial use of wheat grain includes starch for paste, alcohol, oil, and gluten (3). The straw may be used for newsprint, paperboard, and other product. Generally wheat uses can be divided into (4):

• Human food • Animal feed • Industrial use

Page 20: Determination of wheat quality during the development of ... · sviluppo del grano e sono già presenti quindici giorni dopo l’impollinazione (dpa). Il grano continua ad accumulare

1. Wheat

9

2.1.4 Classification of wheat: species. SPECIES In general, wheat can be divided genetically into different tribes. Each tribe has been generated from separate ancestors: Einkorn, Emmer and Spelt (16). The most important and relevant crops are four species of the genus Triticum:

• T. monococcum (diploid)

• T. turgidum (tetraploid)

• T. aestivum (hexaploid)

• T. compactum (diploid) The species most used for baking purposes are T. aestivum, and T. turgidum. T. turgidum includes furthermore the durum and dicocconi species, where Italic durum is used to make pasta and is also called “Pasta wheat”. The features of T. compactum are optimal in cake and cookie production (1). T. Aestivum is originated from crossing of a tetraploid specie, T.Turgidum and diploid specie, Aegilops Tauschii (2). In this project the experiments have been done with different varieties of T. aestivum.

Page 21: Determination of wheat quality during the development of ... · sviluppo del grano e sono già presenti quindici giorni dopo l’impollinazione (dpa). Il grano continua ad accumulare

1. Wheat

10

2.1.5 Nutritional value Wheat is an important source of essential nutrients. Food produced from wheat contains carbohydrates, proteins, vitamins B, iron, calcium, phosphorus, zinc, potassium and magnesium and are located in various parts of the grain (table 2.1) (2,1).

Table 2.1 Nutrients and their location in grain fraction (1).

The main component in wheat is carbohydrates, components of starch. They are embedded from protein. The composition of wheat may vary depending on growing and ripening factors and variety. Temperature, water, sun, weather and climate in general have an influence. Soil composition is a relevant factor and also human modification with fertiliser is another important factor (2). The protein content of the wheat is higher compared to other cereals, but it is not characterized by a low amount of some essential amino acid, like lysine, methionine and threonine (table 2.2) (2).

Page 22: Determination of wheat quality during the development of ... · sviluppo del grano e sono già presenti quindici giorni dopo l’impollinazione (dpa). Il grano continua ad accumulare

1. Wheat

11

Table 2.2 Amino acid composition of prolamins. Comparison between wheat, rye and barley (2). Deficiency of these essential amino-acid allowed civilizations to develop a diet with cereals with a supplement of small quantity of legumes, or fish or meat mixed together. The fat content is very low in cereals. Barley, rice and wheat contribute as major sources of fiber, vitamins B, and minerals(2).

Page 23: Determination of wheat quality during the development of ... · sviluppo del grano e sono già presenti quindici giorni dopo l’impollinazione (dpa). Il grano continua ad accumulare

1. Wheat

12

2.1.6 Production and economical importance Wheat is widely the most grown cereal in the world; wheat has the highest global acreage. The production is around 580 million per ton (2001) (figure 2.2) (5). The value of the only United States 1996 wheat crop was 9.7 billion of dollars (NASS estimate).

Figure 2.2 Wheat is the most extensively and oldest grown of all crops (5).

One reason of the large production of wheat is its importance for the diet of human and domestic animals as mentioned. It is furthermore utilised by industry to make several products. Wheat sustained the Western culture. In Denmark, growing of wheat represents a considerable part of the economy and several investments have been made. In Denmark, the production of cereals has increased during the last decade. Denmark is self-sufficiently in wheat production. The table 2.3 shows the usable production, the domestic uses and the percentage of self-sufficiency for each European Union state.

Page 24: Determination of wheat quality during the development of ... · sviluppo del grano e sono già presenti quindici giorni dopo l’impollinazione (dpa). Il grano continua ad accumulare

1. Wheat

13

Table 2.3 Usable production and domestic uses. Self sufficiency for every European Union state (6).

Wheat and barley are the two most economically important cereals for the Danish market. This is indicated by the fact that harvesting of the wheat constitutes approximately half of the total cereal harvest in Denmark.

Page 25: Determination of wheat quality during the development of ... · sviluppo del grano e sono già presenti quindici giorni dopo l’impollinazione (dpa). Il grano continua ad accumulare

1. Wheat

14

2.1.7 Criteria of wheat quality The objective of this thesis is focused on the quality of wheat, focusing specifically on breadmaking. This section describes the general meaning of wheat quality. The concept of quality is related to the end-use of the wheat. In breadmaking quality the endosperm is used for flour and the quality is related to the capability of the flour to make good bread. Therefore the wheat quality is defined on the basis of characteristics physical, botanical or chemical that can distinguish two grains into two groups: one able to a specific end-use, and other inadequate to this specific end-use. For example, one of several ways for classifying wheat for purpose of baking and cooking end-uses is on the basic physics of their kernel. The hard red spring and hard red winter flours are used for making bread and soft red winter and soft white wheat flours are used to making cake, cookies and pastries. Physical characteristics of the kernel have been used first as a marker of quality. For breadmaking purpose, the characteristic is especially focused of the endosperm part, and nowadays the modern molecular studies have found new criteria to classify the wheat quality for breadmaking: the gluten complex (3). Actually the role of gluten complex has been proved in breadmaking quality (3), and it is used in this thesis as the criteria for identification of the wheat quality. However, in the follow section are briefly described the generally criteria to distinguish wheat quality. The parameters will be separated into two categories, physical and chemical characteristics.

Page 26: Determination of wheat quality during the development of ... · sviluppo del grano e sono già presenti quindici giorni dopo l’impollinazione (dpa). Il grano continua ad accumulare

1. Wheat

15

Characteristics physical and chemical that determinate wheat quality The first parameters that have been used for wheat quality classification are established on physical characteristics (table 2.4), as weight or hardness. Other parameters utilized for classification are based on chemical characteristics, such as the content of protein, which is responsible for baking quality.

• PHYSICAL CHARACTERISTICS

• Weight per unit volume

• Kernel weight

• Kernel size and shape

• Kernel hardness

• Colour

• Impurities Table 2.4 Table of physical and chemical characteristics (based on ref. (1))

The moisture content and protein content are very important in wheat quality. Moisture content is important for ‘keeping quality’ of the wheat. Dry wheat can endure and be kept for years. The wet wheat may deteriorate fast after few days. There is also an economical importance being moisture inversely related with the amount of dry matter in wheat (1). The Protein content is variable between different varieties. It depends on genome and environmental condition during growth. Some of these conditions may influence the protein content between 8 and 20 % (1). High quantity of water during grain development causes a low protein content, while dry conditions favour high protein content. Nitrogen is needed for proteins to be produced. Heavy nitrogen fertilization increases protein content of wheat (1). Another important chemical characteristic is the type of proteins. Wheat proteins generally lack the essential amino acids as lysine, threonine and methionine. Also gluten quality is a varietals characteristic. High temperature and relative low humidity during the maturation of the wheat in the field have a deleterious effect on the quality of the gluten. Some variances in gliadins or glutenins composition are responsible for different gluten complex. Gliadins and glutenins are implicated in breadmaking (7). Some gliadins have been found on varieties non suitable for baking and though their role in breadmaking is still unknown they may be used to recognise the wheat quality. Among other characteristics fat acidity is mentioned as possibility to deterioration of the grain decreasing the quality. For example, when storage condition is not unfavourable, the fast breakdowns of fats by lipases release free fatty acids, which deteriorate the grain.

• CHEMICAL CHARACTERISTICS

• Moisture content

• Protein content

• Protein quality

• a-Amylase activity

• Fat acidity

• Crude fiber and Ash

Page 27: Determination of wheat quality during the development of ... · sviluppo del grano e sono già presenti quindici giorni dopo l’impollinazione (dpa). Il grano continua ad accumulare

1. Wheat

16

Quality of wheat for breadmaking purpose and importance of gluten complex Generally, the quality of wheat is established on the basis of its usability in breadmaking. When it is not used for this purpose, the wheat may be used for animal feed. The qualities of wheat are referred during this entire thesis as baking quality (suitable for breadmaking) and feeding quality (not suitable). The difference in dough is a sign of breadmaking quality. The dough should be soft for the bread. The characteristic of dough is closely correlated to the gluten complex (8). Monomeric gluten proteins (gliadins) show viscous behaviour and polymeric gluten proteins (glutenin) give the property of elasticity (9).Therefore the viscoelastic propriety of the gluten complex allows the dough to be expanded by fermentation (8). The quantity and quality of gluten proteins condition the baking quality from the feeding quality (3). Generally, when the end-uses are breadmaking, a characteristic of gluten is that it must be strong and extensible (3). In spite of this gluten, minimal gluten is needed when the end-use is making cookies or cakes (3). Quality and quantity of these proteins, their structure, size and subunit composition determine the difference in breadmaking quality.

Page 28: Determination of wheat quality during the development of ... · sviluppo del grano e sono già presenti quindici giorni dopo l’impollinazione (dpa). Il grano continua ad accumulare

1. Wheat

17

2.2 Microscopic structure of the wheat

2.2.1 Main components of grain The main components in the structure of the mature spikelet are rachilla, glume, palea lemma floret and awn (figure 2.3).

Figure 2.3 Main components of wheat in schematic representation (1).

Wheat grains are born on a spike, or ear (1). The rachilla is the axis in the centre of a spikelet. Spikelets are arranged alternately on the main axis or rachis. The structure of the mature spikelet is composed by glumes at its base and floret. They are arranged alternately along its length. In each floret, a grain is enclosed between two pales or fertile glumes (1) (figure 2.4).

ear

Page 29: Determination of wheat quality during the development of ... · sviluppo del grano e sono già presenti quindici giorni dopo l’impollinazione (dpa). Il grano continua ad accumulare

1. Wheat

18

Figure 2.4 Spikelets are arranged on alternate sides of the rachis. The collar is a rudimentary spikelet which only rarely sets grain. Image (10).

Page 30: Determination of wheat quality during the development of ... · sviluppo del grano e sono già presenti quindici giorni dopo l’impollinazione (dpa). Il grano continua ad accumulare

1. Wheat

19

2.2.2 Morphology of the grain The wheat grain is botanically a single-seeded fruit called a caryopsis, where the ripened ovary wall is fused to the seed (1). The main components in the seed are:

• Pericarp

• Seed coat

• Endosperm

• Aleurone

• Embryo The characteristic morphological features of the grain are showed in figure 2.5, in cross-section of the wheat kernel.

Figure 2.5 The wheat kernel in cross section (11).

The pericarp or fruit coat is the ripened ovary wall that is dead at harvest ripeness. This is composed of an outer epidermis, hypodermis, parenchyma, intermediate cells, cross-cells, and the tube cells. The pericarp has the function to protect the seed (1). In the mature grain the protein content of the protein of the pericarp is less then 4% of total protein in the grain. These proteins are highly insoluble and bound together (12).

Pericarp

Embryo

Page 31: Determination of wheat quality during the development of ... · sviluppo del grano e sono già presenti quindici giorni dopo l’impollinazione (dpa). Il grano continua ad accumulare

1. Wheat

20

The seed coat provides a covering around the seed. This is sometimes also called “testa” and is outermost of the true seeds and is fused to the pericarp. Testa regulates the water absorption of the grain and it also contains pigments. These give the characteristic colour that can be yellow to red-brown (2). The endosperm and aleurone tissues contain the storage of proteins and starch. They play a different role during germination. The aleurone layer consists of one layer of cells that provide an instant source of energy and amino acids for the synthesis of enzymes during germination. These enzymes cleave the storage of starch and protein in the endosperm. Starch granules and proteins are bound together in a matrix. The strength of this binding has a influence on the hardness of grain. The hardness of the grain is also genetically influenced and characterizes wheat variety. Proteins content in aleurone is about 20% (2). The embryo (called also germ) is composed of the embryonic axis and the scutellum. The function is a protective role for the shoot during germination and a storage organ as a reserve during germination. The embryo is the potential new plant. Scutellum has a high concentration of enzymes and potential for enzymes synthesis, which can be activated during germination. The embryo contains proteins, lipid and sugars but not starch. The protein content in the germ is around 30% (2).

The proteins stored in the starchy endosperm act as stores of nitrogen, carbon and sulphur. These are called storage proteins. These storage proteins are deposited in protein bodies. Storage proteins are much more concentrated in the subaleurone region than in the rest of the endosperm. The endosperm is the tissue from which white flour is engendered (figure 2.5) (1). The relationships between the physiological and technological part of the wheat kernel are shown in figure below (figure 2.5).

Figure 2.5 Connection between physiological part of the wheat and its grain fraction (11).

Page 32: Determination of wheat quality during the development of ... · sviluppo del grano e sono già presenti quindici giorni dopo l’impollinazione (dpa). Il grano continua ad accumulare

1. Wheat

21

2.2.3 Wheat proteins In this section a thorough description of wheat protein is discussed. Wheat proteins represent the object of the thesis. In the first part a generic view of protein contents in wheat are given starting with the Osborne’s nomenclature (2). Proteins are classified on the basis of biological functionality and protein solubility. The last part of the section is concentrated on the gluten complex. Specific references and more information are given about gluten complex and its importance in breadmaking.

The wheat proteins are synthesised during the fruiting period of the plant. The amount and content of the proteins in the grain is influenced by the availability of nitrogen in the fruiting period. If the nutrients are low, the plant reduces storage proteins to keep on the metabolic proteins (13). The weather conditions, the variety and the environment influence the protein content in the wheat grain (14).

Wheat proteins are commonly separated by two systems. The first, a traditional and older one, is based on solubility and extractability in a series of solvents. The second, is more modern, is based on functionality and a molecular/biochemical relationship. The traditional classification was introduced by T.B. Osborne (1924)(2). The classification and fractionation of the proteins, referred to as solubility in different solvents, is still pertinent today. He concluded that four groups of proteins are the major types present in seed tissues and they are the water-soluble albumins, the salt-soluble globulins, the alcohol-soluble gliadins and the glutenin which are soluble (or at least dispersible) in dilute acid or alkali solution (table 2.5).

Tab 2.5 Wheat proteins classified by solubility in different solutions in according to Osborn’s nomenclature (2).

Albumins: Soluble in water and coagulated by heating.

Globulins: Insoluble in water but soluble in saline solutions.

Gliadins: Soluble in ‘relatively strong’ (i.e. 60-70% v/v) alcohol but not in water or in saline solution.

Glutenins: Not soluble in neutral aqueous solutions or in saline or alcohol but may be extractable in alkali.

Page 33: Determination of wheat quality during the development of ... · sviluppo del grano e sono già presenti quindici giorni dopo l’impollinazione (dpa). Il grano continua ad accumulare

1. Wheat

22

Most commonly, wheat proteins are divided into gluten proteins (generally about 75% of total wheat protein) and nongluten proteins (generally about 25% of total wheat protein) (11). Albumins and globulins are nongluten proteins; gliadins and glutenins are gluten proteins (table 2.6).

Albumins Globulins

Nongluten proteins

Gliadins Glutenins

Gluten proteins

Table 2.6 Classification in gluten and non-gluten protein.

Gluten proteins have a low solubility in water or are dilute in salt solution. Factors that contribute to this low solubility are the low content of amino acids with ionisable side chains and high content of non-polar amino acids and glutamine (8). Albumins and globulins have a globular form (monomeric). The molecular weight is small for most of them (below 40 kDa) (13). This group covers approximately 25% of the total proteins content (11). The functional composition of this group is heterogeneous. The group contains metabolic enzymes that the plant needs in the developing grain stage. Hydrolytic enzymes are necessary for the germination of the seed. In addition, enzyme inhibitors and storage proteins are used for storing of nitrogen and sulphur. Storage proteins are positioned into protein bodies localised in the endosperm of the grain. Metabolic- functional proteins are positioned in the cytoplasm of the cells and are especially located in the aleurone layer and in the embryo but they might also be found in the endosperm (15). Gliadins and glutenins are storage proteins and they cover about 75 % of the total protein content (16). The classification based on solubility is undoubtedly the easiest to use. However there are some disadvantages. First of all the classes are not completely separated. A fraction from an extraction procedure also contains proteins from another class. The limits between the classes are ambiguous. Secondly, the solvents cannot extract all the proteins (sometimes the amount of proteins insoluble is 35 %).

Page 34: Determination of wheat quality during the development of ... · sviluppo del grano e sono già presenti quindici giorni dopo l’impollinazione (dpa). Il grano continua ad accumulare

1. Wheat

23

The second classification has been inserted later and it is based on two criteria: functionality and molecular/biochemical relationship (2). Proteins are divided on the basis of their biological functionality, which includes a general classification in metabolically active cytoplasm proteins (e.g. enzymes), structural proteins, protective proteins and storage proteins:

• Storage protein: The major function is to act as a store of nitrogen, carbon, and sulphur.

• Structural and metabolic proteins: They are essential for the growth and structure of the seed.

• Protective proteins: These proteins play a role in providing resistance to microbial pathogens, invertebrate pests, or desiccation.

Proteins also serve other purpose. For example, ß-amylase in cereals is a storage protein, but it is also act as a metabolic protein.

Page 35: Determination of wheat quality during the development of ... · sviluppo del grano e sono già presenti quindici giorni dopo l’impollinazione (dpa). Il grano continua ad accumulare

1. Wheat

24

2.2.4 Storage proteins

Storage proteins provide storage of nitrogen and sulphur during the germination. These proteins compose about 80% of the protein content in the grain (17). The storage proteins are situated in specific places in the cell called proteins bodies. They are full of dense deposits and guarantee their separation from the metabolic compartment of the cell (3). In general these protein bodies at cereals are located in perisperm, endosperm, and embryo. The majority of the storage proteins of wheat are located in endosperm cells. Gluten proteins form the major class of wheat storage proteins.

2.2.5 Gluten proteins The importance of gluten proteins in breadmaking quality is largely related to the capability of forming the dough a viscoelastic network, i.e. the gluten complex, which is responsible of expansion during the fermentation. The distinction of gluten proteins is based of solubility (two groups, gliadins and glutenins) or on the basis of amino acid composition and structure (S-rich, S-poor, and HMW prolamins). In this section the gluten complex and the proteins that constitute are discussed.

Gluten Complex After gentle washing of flour-water dough in an excess of water or a dilute salt solution a rubbery mass containing about 80% of the total protein of the flour is obtained. This mass is called gluten. This is the traditional method of gluten preparation (1). This complex was described first by the Italian scientist Beccari in 1728 (18). The first systematic studies of the wheat grain proteins were done by Osborne in 1907 and later Dill published a number of gluten studies (18). Gluten is the elastic, cohesive mass formed by mixing of two protein, gliadins and glutenins, also called generically prolamins (figure 2.6) (9).

Page 36: Determination of wheat quality during the development of ... · sviluppo del grano e sono già presenti quindici giorni dopo l’impollinazione (dpa). Il grano continua ad accumulare

1. Wheat

25

Figure 2.6 Gliadin and glutenin in the gluten complex (19).

Gliadins and glutenins are divided based on the pattern of disulphide bonds of the polypeptide chain. Single polypeptide chains with intra-molecular disulfide bonds are typical of gliadins. The glutenins consist of polymers stabilized by inter-chain disulphide bonds. In the glutenin polymers the high molecular weight glutenin subunits (HMW-GS) are linked together via head-to-tail inter-chain disulphide bonds to form a linear backbone (4). The low molecular weight glutenin subunits (LMWGS) form a cluster, which are linked via disulphide bonds to linear backbone. Glutenins contain a central repetitive domain and two non-repetitive domains at the N- and C-terminal. The polypeptide chains of gliadins and glutenins are structurally related. Genetical studies indicate that HMWGS are presents in polymers and that the proportion of glutenins in such polymers is related to bread making quality (20).

2.2.6 Gliadins and glutenins

Gliadins and glutenins are the proteins which form gluten complex. Gluten is divided into monomeric gliadins, with MWs between 30 and 80 kDa and the heterogeneous mixture of glutenins polymers with MWs range from about 80 to seve ral thousands of kDa (21). Three structurally distinct groups of gliadins can be distinguished on basis of cysteine residues (separation on the basis of their mobility on SDS-PAGE at low pH) (11) figure (2.7):

Page 37: Determination of wheat quality during the development of ... · sviluppo del grano e sono già presenti quindici giorni dopo l’impollinazione (dpa). Il grano continua ad accumulare

1. Wheat

26

Figure 2.7 The classification and nomenclature of wheat gluten proteins separated by SDS-PAGE at low pH (2).

• a-type (six cysteine residues)

• ß-type(six cysteine residues)

• γ- type (eight cysteine residues)

• ω- type (lack cysteine residues)

α- type and γ- type are also called “sulphur - rich prolamins” and ω- type “sulphur - poor prolamins”.

The glutenins consist of polymers stabilized by inter-chain disulphide bonds. Glutenins are divided in glutenins subunits (GS) that are obtained after treatment of disulphide reduction agents (such ß-mercapto-ethanol or dithiothreitol). Two classes are obtained on basis of molecular weight (SDS-PAGE):

• High molecular weight glutenins subunits (HMW-GS) (80-160 kDa)

• Low molecular weight glutenins subunits (LMW-GS) (30-45 kDa) Gliadins and glutenins both have unusually high levels of proline and glutamine. This is the reason why they are called also prolamins.

Page 38: Determination of wheat quality during the development of ... · sviluppo del grano e sono già presenti quindici giorni dopo l’impollinazione (dpa). Il grano continua ad accumulare

1. Wheat

27

Gliadins and glutenins are classified together into three groups called sulphur-rich (S-rich), sulphur-poor (S-poor) and HMW prolamins (22). This is a reclassification of the prolamins that do not correspond directly to the glutenins and gliadins fraction, but it is referred on the basis of amino acid composition and structure of prolamins (22) (table 2.7).

Table 2.7 Table of prolamins (9).

Prolamins are the generic name for every cereal rich in proline, glycine and glutamine. S-Rich Prolamins S-Rich Prolamins are classified in three groups in wheat (table 2.7). They are the quantitatively major component, maintaining approximately 80, 90 % of the total fraction. Their MW is around 30 to 45 kDa and includes both monomeric and polymeric components (23). Amino acid compositions of s-rich prolamin are characterised by high levels of cysteine and methionine in addition to high level of glutamine and proline (2). Their structure has several common characteristics. They all consist of two domains with repeated sequences close to the N-terminus and non-repetitive domain near the C-terminal domain (figure 2.8).

Page 39: Determination of wheat quality during the development of ... · sviluppo del grano e sono già presenti quindici giorni dopo l’impollinazione (dpa). Il grano continua ad accumulare

1. Wheat

28

Figure 2.8 Schematic structures of typical S-rich, S-poor and HMW prolamins are confronted. Simplified from ref (9).

S-Poor Prolamins S- poor prolamins consist of monomeric proteins, which are also called ω- gliadins. They account for about 11% of the proteins in the wheat. S-poor prolamins contain little or no cysteine and methionine and are predominantly monomeric (23). They contain high levels of glutamine proline (70% mol) and phenylalanine (8-9% mol) but they have no cysteine; they have a MW reported to range from about 30 to 80 kDa (23). HMW Prolamins HMW prolamins are relatively minor components, accounting for about 10% of the total prolamins fraction in wheat. They are characterised by high contents of glycine in addition to high proline and glutamine. Their MW varies from about 65 to 90 kDa and they are present only in high MW polymers stabilized by inter-chain disulphide bonds (2). HMW prolamins consist of polymeric HMW subunits of glutenins. They are composed of four or five different subunits in hexaploid bread wheat. HMW subunits are important in bread making quality (9). The relationship to the other groups is the region A, B, C (figure 2.8) that is clearly related to those present in the C-terminal domains of the rich-prolamins.

Page 40: Determination of wheat quality during the development of ... · sviluppo del grano e sono già presenti quindici giorni dopo l’impollinazione (dpa). Il grano continua ad accumulare

1. Wheat

29

2.3 The development of the grain In this section the life cycle of the wheat, from seed germination to the harvest ripe is described. Fives short sections describe the development, with a longer section subdivided into parts for describing grain filling. Specifically, the examination of the grain after pollination (fertilization) and its change in protein content is discussed in greater detail.

• Germination section 2.3.1

• Early growth section 2.3.2

• Stem elongation section 2.3.3

• Flowering and fertilisation - Pollen release section 2.3.4

• Grain growth after anthesis section 2.3.5

• Grain filling during the first 10 days (1-10 dpa) section 2.3.6

• Grain filling 11 to 16 dpa section 2.3.7

• Grain filling 17 to 21 section 2.3.8

• Grain filling 21-30 dpa section 2.3.9

• From 30 to 45 dpa section 2.3.10

The event when the anthers release their pollen is called anthesis, or flowering. The measurement of age grain is commonly counted in days after anthesis, or dpa.

Page 41: Determination of wheat quality during the development of ... · sviluppo del grano e sono già presenti quindici giorni dopo l’impollinazione (dpa). Il grano continua ad accumulare

1. Wheat

30

2.3.1 Germination In the endosperm of a mature grain contains the embryo wheat plant and sufficient protein to start growing (1). The shoot apex of the embryo has three or four leaf primordia and a tiller bud protected by the coleoptile (figure 2.9). The root pole has a structure called the coleorhiza, which protects the root until it has broken through the seed coat (1) (figure 2.9).

After sowing, the germination process starts with the absorption of water (imbibition), which begins a series of events. The embryo sends out hormonal signals, which induce the synthesis of hydrolytic enzymes in the aleurone. The stored reserves located in the endosperm are moved to the embryo (figure 2.10). The hydrolytic enzymes degrade the cells walls, starch, and storage proteins of the endosperm. Lipids, nucleic acids and mineral reserves are also degraded. The simple sugars and peptides released by the activities of these hydrolytic enzymes are absorbed by the scutellum and are used by the growing embryo before the leaves emerge and photosynthesis begins (1) (figure 2.10).

Figure 2.9 The coleoptile, protects the shoot and the root. The coleorhiza is the sheath of the root (1). Image (10), mm size.

Page 42: Determination of wheat quality during the development of ... · sviluppo del grano e sono già presenti quindici giorni dopo l’impollinazione (dpa). Il grano continua ad accumulare

1. Wheat

31

2.3.2 Early growth In this phase the leaves of the plant capture light energy from the sun, and transform it into photosynthesis (figure 2.11). This will support the growth of the shoot and store light in the harvested seed (figure 2.12) (1). The coleoptile is the robust structure, which protects the first true leaf until it reaches the soil surface (figure 2.12).

The long vegetative phase produces a plant with more side shoots called tillers. The tillers form at the base or axil, of the first formed leaves of the main stem and of the coleoptile. Tillers in the axils of Leaf 1, Leaf 2, and Leaf 3 emerge first and are usually strong enough to grow to full canopy height and to set grain (figure 2.13). These tillers produce fewer leaves than the main stem, which has the effect of synchronising ear emergence, pollen release, grain growth, and ripening for all the

Figure 2.12 Coleoptile and the first two leaf of the plant. Image (10).

Figure 2.10 The main component of the grain in subsection. The stored reserved are moved from the endosperm to embryo. The scutellum absorbs the sugar and proteins for embryo feeding. Image (10).

Figure 2.11 The leaves of the plant. Image (10).

Page 43: Determination of wheat quality during the development of ... · sviluppo del grano e sono già presenti quindici giorni dopo l’impollinazione (dpa). Il grano continua ad accumulare

1. Wheat

32

shoots in the canopy. Tillers in the axil of Leaf 4 and above and any secondary tillers, which form in the axils of the leaves of the tillers themselves, will die back during the phase of rapid shoot growth in the spring. The late tillers contribute to the stored reserves of the plant (1).

Figure 2.13 Picture of the wheat tillering. Image (10).

2.3.3 Stem Elongation Stem elongation is the phase where wheat in the field starts to extend its stems. The stem is still hidden inside the upper leaves, which continue to grow before they sequentially unfurl. When the growing point or the shoot apex ceases producing leaves and begins producing spikelets, this time is termed “Floral initiation” (figure 2.14). The spikelet is a sub-unit of the ear consisting of several florets on a thin axis, the rachilla. Every spikelet will differentiate 8-12 floret primordia (potential grain site) each containing a single carpel and three anthers; each floret is a potential grain site (1). The stem is fully elongated at anthesis.

Page 44: Determination of wheat quality during the development of ... · sviluppo del grano e sono già presenti quindici giorni dopo l’impollinazione (dpa). Il grano continua ad accumulare

1. Wheat

33

2.3.4 Flowering and Fertilisation-Pollen release The flower parts are maturing while the ear remains within the protection of leaves. Each spikelet produces 8 -12 florets but only few of them will survive. Few days after the ear emerges from the leaf sheath, and when the anther becomes yellow the flowering, or anthesis, phase starts. At the tip of each anther two pores appear through which the pollen will be shed. The pollen (figure 2.15) from the anthers falls onto the receptive feathery stigmas of the carpel, which have unfolded to receive it (1). The first visible sign of anthesis is the appearance of spent anthers dangling outside the spikelets (figure 2.15).

Figure 2.14 Wheat plant and diagrammatic form. Tree tillers, nine main stem leaves and coleoptile internode. The ear in flower to the top of the canopy being producing spikelets. Image (10).

Figure. 2.15 The anthesis is the phase during the pollen is released from the pore. Image (10).

Page 45: Determination of wheat quality during the development of ... · sviluppo del grano e sono già presenti quindici giorni dopo l’impollinazione (dpa). Il grano continua ad accumulare

1. Wheat

34

2.3.5 Grain growth after anthesis The grain growth is divided in five sections, which describe the more significance change and features. The measurement of the grain growth is counted in days after anthesis, or dpa.

• Grain filling during the first 10 days (1-10 dpa) section 2.3.6

• Grain filling 11 to 16 dpa section 2.3.7

• Grain filling 17 to 21 dpa section 2.3.8

• Grain filling 21-30 dpa section 2.3.9

• Dry down from 30 to 45 dpa section 2.3.10

2.3.6 Grain filling during the first 10 days (1-10 dpa)

The first ten days after fertilization are characterized by a rapid growth. This period is subdivided in two periods, 1-4 days and 4-10 days. In the first 3-4 days the nuclei divide synchronously inside the embryo sac that surrounds the fertilised embryo (figure 2.16). The embryo sac contains a large central liquid-filled vacuole with the nuclei, which are in the state of ‘free nuclear division’. The duration of this period will, in part, determine the final number of cells in the endosperm. The storage proteins will be further accumulated into endosperm cells (1).

Figure 2.16 Long section of the whole grain at 4 days after fertilization. Image (10).

Page 46: Determination of wheat quality during the development of ... · sviluppo del grano e sono già presenti quindici giorni dopo l’impollinazione (dpa). Il grano continua ad accumulare

1. Wheat

35

From 4 to 10 dpa the nuclei of endosperm continue to divide rapidly. There is very rapid growth of both the outer maternal tissues and the liquid-filled embryo sac inside. By 7 days after flowering, the embryo starts to show compartmentalization. At 10 days after flowering the grain is ready to start the grain filling stages. The densely cytoplasmic cellular endosperm, which has been nourishing the embryo at the bottom of the embryo sac, starts to degenerate at this time (1).

2.3.7 Grain filling 11 to 16 days From days 11 the grain starts filling. The meristematic cells of the endosperm continue to divide. A new compartment forms with the endosperm. At 16 dpa are visible the first large starch grains and the lipid and protein bodies (figure 2.17) (1).

The protein bodies start to be visible during accumulation of the storage proteins. The typical aleurone cells are first visible near the nucellar projection at 12 dpa (figure 2.17). At 16 dpa the scutellum is clearly defined and the embryo now uses the endosperm starch reserves near the scutellum for its own development (1). The protein composition at this stage of development is dominated by water-soluble and salt-soluble albumin and globulin proteins (25) (for simplicity, called ‘non-prolamin proteins’). From 7 dpa to 15 dpa the amount of albumins and globulins starts decrease to with a concomitant increase of gliadins (25). At 15 dpa, the storage proteins are around the 20% of the total protein content of the grain (26). An extraction of proteins from wheat-grain endosperm at 15 dpa, and successive separation by 2-D-electrophoresis, is characterized by abundance of alpha-amylase inhibitor and alpha-amylase/trypsin inhibitor families and the protein disulphide isomerase (PDI) in different classes (26). The abundance of these families are expected in this stage of development, where PDI is an enzyme that form inter

Figure 2.17 The endosperm is divided into compartments, they will later be packed with starch and proteins. nu-nucellus en-endosperm vg-ventral groove 15 dpa. Image (10).

Page 47: Determination of wheat quality during the development of ... · sviluppo del grano e sono già presenti quindici giorni dopo l’impollinazione (dpa). Il grano continua ad accumulare

1. Wheat

36

and intra-chair disulphide bonds and it is required for the correct folding of the newly synthesised proteins. These enzymes form the disulphide-based polymerisation of gluten subunit, important for dough-forming property. Alpha-amylase inhibitor and alpha-amylase/tripsin inhibitor are also synthesised during this phase of development. The high amount could be explained by a need to protect starch from a precocious degradation, necessary during germination time (26). The amount of these proteins and the quantity in the endosperm will gradually decrease during the grain filling (26). It agrees with starting of high synthesis of gliadins and glutenins at 15 dpa (26). Another protein highly expressed and with the same pattern is 60S acid ribosomal. For the same reasons as for the amylases and PDI the concentration of class of 60S acid ribosomal decrease during the grain filling (26).

2.3.8 Grain filling 17-21 dpa Cell division has stopped in the endosperm and during this stage the starch will be packed into the compartments (figure 2.18). As these storage reserves become tightly packed the cell layers surrounding them are stretched and crushed. The gliadin increase started at 7 dpa continues until 21 dpa (25). This period represents the most rapid phase for gliadins synthesis (25).

Figure 2.18 The endosperm at about 21 dpa. The storage reserves start to be visible into the endosperm. Image (10).

Page 48: Determination of wheat quality during the development of ... · sviluppo del grano e sono già presenti quindici giorni dopo l’impollinazione (dpa). Il grano continua ad accumulare

1. Wheat

37

2.3.9 Grain filling 21 to 30 days The cells of the aleurone are producing storage proteins. The vacuoles are filling with into cells of the aleurone layer (figure 2.19); the granular appearance is visible in figure 2.19. The embryo is now a fully developed miniature plant and is accumulating its own storage reserves of proteins that provide a store of nitrogen and sulphur, during the germination. The scutellum, which is essentially a massive storage organ for the support of the embryo during germination, has started to adhere to the endosperm. The scutellum is a specific organ that transfers nutrients from the endosperm (25).

2.3.10 From 30 to 45 dpa The structure of the embryo is completed during the grain filling period but it will still receive storage reserves of lipid droplets and protein bodies until later. The grain loses water during the last 15 days. The stage is known and called the “harvest ripe”. Here embryo and aleurone enter a state of dormancy. Visibly the grain is ripe when the colour changes from golden to light brown and when the ears bend over allowing the ripe seed to fall away (figure 2.20). The farmer carefully monitors the moisture content of the grain so that the crop is harvested at the optimal time. This is an important aspect because without a controlled process of desiccation, the internal biological processes of pre-germination and germination may start and the grain can be ruined with a consequence decrease of the wheat quality(25).

Figure (2.19) Vacuoles filling with into cells of the aleurone layer. Granular appearance. endo-endosperm al-aleurone nu – nucellus Image (10).

Page 49: Determination of wheat quality during the development of ... · sviluppo del grano e sono già presenti quindici giorni dopo l’impollinazione (dpa). Il grano continua ad accumulare

1. Wheat

38

Environmental conditions combined with genetic factors contribute to the influence of wheat quality during the dry period. The principal problem to why this may occur is called pre-harvest sprouting. In wheat it causes downgrading of grain quality. Reasons implicated are uptake and drying rate of the ear, grain dormancy and mobilisation of storage reserve to support the germination. High temperature and heavy rain can cause sprouting in the ear. High amylase activity in flour is implicated in the degradation of starch to simple sugar causing ‘sticky crumb’ structure of the bread loves and caramelisation of the crust during the baking process (25). The protein composition at 45 dpa is changed from the 15dpa : the gliadins and glutenins are the major group of proteins in the grain (25).

Figure 2.20 Ripe crop with bended over ears light brown colour. Image (10).

Page 50: Determination of wheat quality during the development of ... · sviluppo del grano e sono già presenti quindici giorni dopo l’impollinazione (dpa). Il grano continua ad accumulare

1. Wheat

39

2.3.11 Changes in protein composition during grain development During grain development the endosperm changes in protein composition. The most evident shift are gliadins, glutenins subunits and non-prolamins proteins (25).

Figure 2.21 Curve of proteins content change. In abscissa the day after anthesis, in ordinate the percentage of proteins (25).

The early stage of development (10-15 dpa) is characterized by water-soluble and salt-soluble albumins and globulin proteins. The rate of synthesis of glutenin proteins is lower then gliadins until 30 dpa (25). At 7dpa the percentage of glutenin proteins extracted from endosperm is very low and even lower than gliadins(25). The synthesis of glutenins proteins increase at 14 dpa and between 21 and 28 there is a rapid increase until 35-42 dpa. After this period there is no significant increase in glutenins synthesis (25). The high protein content of albumins and globulin is decreasing during all of grain development (25). Conversely a concomitant increase of gliadins and glutenins started from 7 until 35 dpa, where they show almost the same equal amounts (25). This is the stage of the most rapid phase for gliadins synthesis and the ratio at 21 dpa is similar to ‘non-prolamin’ proteins. The increase of gliadins starts at 7 dpa with the maximum rate of synthesis. The increase of glutenins occurs 6-8 days after (25). HMWGS are synthesised before LMWGS. The stage 30 dpa is characterized by high level of gliadins and relative low of glutenins and non-proteins. At 45 dpa gliadins and glutenins are the major group of proteins in the grain (25). The ratio may be influenced by the environment (25).

Perc

enta

ge o

f pro

tein

s

Page 51: Determination of wheat quality during the development of ... · sviluppo del grano e sono già presenti quindici giorni dopo l’impollinazione (dpa). Il grano continua ad accumulare

1. Wheat

40

7dpa Albumins and globulins are predominantly (80 % )

7-21dpa Rapid phase for gliadins synthesis.

21 dpa High rate of gliadins synthesis and non-prolamins while glutenins are low.

30 dpa Gliadins, glutenins, and non-prolamins are synthesised almost at the same ratio.

45 dpa The extraction of proteins from endosperm content almost gliadins and glutenins (80%) while low content of non-prolamin (20 %)

Table 2.8 Summary of changes of protein composition during the development of the grain on bases of ref. (25).

Page 52: Determination of wheat quality during the development of ... · sviluppo del grano e sono già presenti quindici giorni dopo l’impollinazione (dpa). Il grano continua ad accumulare

41

3. MULTIVARIATE DATA ANALYSIS

• 3.1 Introduction: Why multivariate data ana lysis

• 3.2 Principal component analysis

• 3.3 Soft independent modelling of class analogy

• 3.4 Partial least square

Page 53: Determination of wheat quality during the development of ... · sviluppo del grano e sono già presenti quindici giorni dopo l’impollinazione (dpa). Il grano continua ad accumulare

3. Multivariate data Analysis

42

3.1 Introduction: Why multivariate data analysis? Multivariate data analysis is able to extract the reliable and relevant information from a complex data matrix, without loosing essential dimensionality, which is a limitation of the human cognitive capacity (27). Multivariate data analysis furthermore offers an objectivity and neutral way to evaluated results obtained by a complex analysis. Multivariate data analysis can be useful in three main purposes (28):

1. Description of the data 2. Discrimination and classification 3. Regression and prediction

The description of the data is assisted by graphic plots of these, which can be interpreted easier than a matrix of data. The principal component analysis (PCA) is a method used for this purpose. The discrimination analysis separates data into groups on the basis of their dissimilarity. News objects can be classified into these groups. In classification, as soft independent modelling of class analogy (SIMCA) method, these classes have to be known a priori, and the news objects are classified on the basis of distance to these groups. In regression, two sets of variables (X-variables and Y-variables) are related to each other in order to make a model that can separate the objects on the basis of these relationships. The regression model is then used to predict new objects. Multivariate analysis has been applied in the analysis of mass spectra, in order to implement an easier and a faster way of handling data. Multivariate data analysis can uncover the hidden structure present in data by decomposing the original enormous data matrix into a reconstructed data matrix, and therefore obtain relevant information.

Page 54: Determination of wheat quality during the development of ... · sviluppo del grano e sono già presenti quindici giorni dopo l’impollinazione (dpa). Il grano continua ad accumulare

3. Multivariate data Analysis

43

Fundamental of multivariate data analysis In multivariate data analysis a matrix of NxK size is examined, where N represent the object [obj1…objN], and K the variables [var1…varK] as shown in figure (3.1)(28).

A matrix is consisting of mass spectrometric data of samples (objects, N) and the peak intensities at a given molecular mass (variables, K) in a mass spectrum (figure 3.2).

11000 17000 23000 29000 35000 41000 47000 53000 59000 65000 71000 m/z46.0 57.2 66.6 74.8 82.2 89.0 95.2 101.1 106.6 112.0 117.0 uS

420

500

580

660

740

820

900

980

1060

1140

Abundance

Sum of 30 from 38 | Positive Polarity | No Filter

Figure 3.2 The intensities of mass spectra are given in ordinate.

Figure 3.1 The decomposition of data in a matrix NxK (28).

Page 55: Determination of wheat quality during the development of ... · sviluppo del grano e sono già presenti quindici giorni dopo l’impollinazione (dpa). Il grano continua ad accumulare

3. Multivariate data Analysis

44

Multivariate data analysis offers several methods for analysis of the matrix. Generally, the methods can be divided into methods of ‘supervised learning’ and methods of ‘unsupervised learning’. The ‘supervised learning’ methods extract relevant information from one data table (X-variable) related to another data table (Y-variable). The regression methods belong to ‘supervised learning’, as multiple linear regression (MLR), principal component regression (PCR), and partial least squares regression (PLS-R). The ‘unsupervised learning’ methods take out the hidden information from a data table, based on their mathematical structure. This group belongs to a cluster analysis, principal coordinate analysis (PCO), non- linear metric scaling (NMS) and correspondence analysis (CA) and principal component analysis (PCA). The methods are listed in table (3.1).

Methods Supervised learning MLR

PCR PLS-R SIMCA

Unsupervised learning CA Cluster Analysis NMS PCA PCO

Table 3.1 Summary of the methods.

The methods used in this project are briefly described in the next subsection:

• Principal component analysis. Section 3.2

• Simca classification. Section 3.3

• Partial least square regression. Section 3.4 They are selected from the software packages, “The Unscrambler”, version 8.0 (CAMO A/S Norway) and they are applied to analysis of the mass spectra.

Page 56: Determination of wheat quality during the development of ... · sviluppo del grano e sono già presenti quindici giorni dopo l’impollinazione (dpa). Il grano continua ad accumulare

3. Multivariate data Analysis

45

3.2 Principal component analysis Principal Component Analysis (PCA) is an unsupervised method used to explore the data matrix. This method is able to decompose a complex multidimensional data set into a set of dimensions, which contains the most important information. The reduction of dimensions can help investigate the data information in an easier way. For example, two-dimensional graphical outputs help getting a visual overview of the data structure in order to find the correlation between the samples. PCA is concerned with decomposition of the data-matrix into a structure part and a noise part (28).

(Equation 3.1) EPTX T +⋅= = Structure + Noise

where X is divided into a sum of matrix product, T*PT and a residual matrix E. T is the score matrix; PT is the loading matrix (transposed) and E the error component (residuals) (28). This is the solution to transform a large number of possible correlated variables into a smaller number of uncorrelated variables that are called principal component (PCs) (28).

(Equation 3.2) From equation 3.2 the new PC coordinate system is calculated as shown below figure 3.3:

Figure 3.3 The PC model is composed by sum of rows (samples, ti), columns (variables, pi) and the unmodelled residue E that comprised individual PCs (28). Where ti is the score vector for PCi and pi is the loading vector. The relationships of the PCs to the samples (the data rows, ti) are called scores, and to the variables (data columns, pi) are called loadings (28).

X

=

p1

t1

+

t2

p2

+ ...

+

E

1 - - - - p 1 - - - - p

EptptptX TAA

TT +⋅++⋅+⋅= ...2211

Page 57: Determination of wheat quality during the development of ... · sviluppo del grano e sono già presenti quindici giorni dopo l’impollinazione (dpa). Il grano continua ad accumulare

3. Multivariate data Analysis

46

The first PC covers as much as possible of the variation in the dataset, and each subsequent component covers as much as possible of the remaining variation (28). The new variables are uncorrelated. The first PC is a line describing the main structure on the same line information among the multi-dimensiona l observation in the data. This projection PC-axis represents the first dimension of a new coordinate system. The second PC-axis follows the maximal variation of the remaining part of the dataset and is orthogonal to the previous as shown in figure 3.4 (28).

Figure 3.4 On the left is shown the data in a system of coordinates. After PC analysis the two first PC-axes take place of the ‘old’ system of coordinate (28). The decomposition of the matrix in new uncorrelated variables continues until all the variation is explained. With this analysis the interpretation of the information in the dataset is accessible by easy visual inspection (figure 3.5). When all the variation is explained, the original data matrix has been reduced into few dimensions that are ready for interpretation. An example is shown in figure 3.5 where the data set is transformed into two-dimensional spaces.

Figure 3.5 Schematically representation of data set transformation into 2-dimentional space by PCA (28). The validation is the verification of how well a model will perform on new samples taken from the same population as the calibration samples. The validation is expressed in terms of variance (equation 3.3), which is expressed in square units as compared to the original values (28).

Page 58: Determination of wheat quality during the development of ... · sviluppo del grano e sono già presenti quindici giorni dopo l’impollinazione (dpa). Il grano continua ad accumulare

3. Multivariate data Analysis

47

1

)()( 1

2_

−=

∑=

n

xxXVar

n

ii

The relevant number of principal components to use is based on the explained variance. The explained variance progressively increases with the number of principal components, until a limit where with addition of principal component the variance stops increasing. This limit is the maximum explained variance, and it measures how much of the original variation in the data is described by the model. The noise level (E) is reached at maximum explained variance (28).

(Equation 3.3)

Page 59: Determination of wheat quality during the development of ... · sviluppo del grano e sono già presenti quindici giorni dopo l’impollinazione (dpa). Il grano continua ad accumulare

3. Multivariate data Analysis

48

The principles of PCA and its objective can be summarized as a mathematical procedure that transforms a number of (possibly) correlated variables into a (smaller) number of uncorrelated variables that are called principal components (PCs). The first PC accounts most of the variability in the data set. Each successive principal component is orthogonal to the previous and covers as much of the remaining variation as possible. It is based on orthogonal rotation to principal axes (28). The follow table 3.2 indicate the main objectives and principles of PCA.

Objectives of Principal Component Analysis

• To discover or to reduce the dimensionality of the data set.

• To identify new meaningful underlying variables.

Principles of Principal Component Analysis

• Principal Component Analysis determines factors from a data table by making new ‘pseudo’ –variable (principal components) as linear combinations of the original variables

• The factors (principal components) are found to explain as much information (variance) in the data as possible.

• The new variables eliminate redundant information (and filter noise) from the original data table

Table 3.2 Summarize of the fundamental and main objectives of PCA.

Page 60: Determination of wheat quality during the development of ... · sviluppo del grano e sono già presenti quindici giorni dopo l’impollinazione (dpa). Il grano continua ad accumulare

3. Multivariate data Analysis

49

3.3 Soft Independent Modelling of Class Analogy Soft Independent Modelling of Class Analogy (SIMCA) is a method for classification of samples on the basis of modelling the similarities between samples from the same class. First of all the classes are created on the basis of PCA models (class models) of samples in the training set (28). When the classes are known a priori, the model of the class can be created easily and new samples can be predicted. The new samples are than applied to the model: if they are similar to a member of one class they will be classified, otherwise they will be rejected (28).

3.4 Partial least square A partial least square regression (PLS-R) relates two sets of data matrices (X and Y) by regression and is a supervised method. The principle in PLS-R is the same as in PCA: to find the best straight line explaining most of the variation through the data points in a multidimensional space (28). In PLS-R, the purpose is to build a linear model from a data table in order to predict a desired characteristic in another data table. Thus, whereas PCA is used for extracting hidden information from one data table, the X-matrix, PLS-R is used for examination of the relationships between two data tables, the X- and Y-matrix (28). The X matrix has dimension (NxK) and the Y matrix has dimension (NxJ), where N are the samples and K and J the X and Y variables respectively (27). PLS-R works by performing a PCA for the X- and Y data matrix respectively, but these are not performed independently. The score values in Tx-matrix during the algorithm are interchanged between each other, and connect the X- and Y-values in the end (28).

Equation 3.4 EPTxX x +⋅=− '

Equation 3.5 FQTyY x +⋅=− '

Equation 3.6 FBXBY +⋅+= 0

Equation 3.4, 3.5, 3.6 describe the PLS-R model, X-x and Y-y are the mean centered data matrices, E and F the residuals, P’ and Q’ the loading matrices (28), Tx the score matrix, B is the regression coefficient of the matrix (KxJ), B0 is (1xJ) (27). A prediction model becomes the result, when the equations are combined, and where Y can be predicted from a set of X-variables (28). The PLS-R is usually used in multivariate data analysis to create prediction models for a discriminative PLS-R.

Page 61: Determination of wheat quality during the development of ... · sviluppo del grano e sono già presenti quindici giorni dopo l’impollinazione (dpa). Il grano continua ad accumulare

3. Multivariate data Analysis

50

The Validation In PLS-R the significance of va lidation is given by two parameters: the residual validation Y-variance (RVYV) gives the residual between the measured (Yref ) and the predicted (Ypred ) that is the modelling error (figure 3.6 number 3). It gives the optimal dimensionality with the lowest number (28). The comparison of different modelling is given by the square root of RVYV, called RMSE (28).

n

YYRVYV

n

irefpred∑

=

−= 1

2)(

n

YYRMSE

n

irefpred∑

=

−= 1

2)(

Equation 3.7 Equation 3.8 The X-data are used to make a model. The X-values are than inserted into the model in order to predict Y. The modelling error is obtained by the subtraction of Y data from Y- predicted (figure 3.6) (28).

Figure 3.6 Schematic explanation of PLS-R

The deviation is the expression of how similar the prediction samples are to the calibration samples used when making the model. The deviation is small when the predicted samples are similar to the calibration samples. When the deviation is high, the predicted Y-values cannot be trusted (28). The last important parameter is the correlation coefficient (r) that is defined as the correlation between X and Y as shown in (equation 3.9):

Xref

Yref

model 1

2

3

Xref + model Ypred

Modelling error = Ypred - Yref

Page 62: Determination of wheat quality during the development of ... · sviluppo del grano e sono già presenti quindici giorni dopo l’impollinazione (dpa). Il grano continua ad accumulare

3. Multivariate data Analysis

51

Equation 3.9

−−

=

∑∑

==

=

1

)(

1

)(

1

))((

12

_

12

_

1

__

n

YY

n

XX

n

YYXX

rn

i in

i i

n

iii

The correlation is a measure of the linear relationship between two variables. The value modulus 1 of correlation means a linear relationship exists between the variables. Value of 0 is represented when the variables are not correlated (28).

Covariance between x and y

Sdev(x) ⋅ Sdev(y) r =

Page 63: Determination of wheat quality during the development of ... · sviluppo del grano e sono già presenti quindici giorni dopo l’impollinazione (dpa). Il grano continua ad accumulare

3. Multivariate data Analysis

52

Marten’s Uncertainty test and significative variables In a model, the variables are significance when there is a small probability that they are due to the chance. In Unscrambler, has been implemented a method based on Marten’s Uncertainty test to individualize the X-significative variables, available in option when the PLS-R is applied with cross validation. The method has been invented by Harald Martens and it is a modification of Jack-knifing method. The uncertainty test is based on the sum of variances calculated in cross-validation between the individual models and the full model. It reflects the stability and it is calculated removing one or more samples. In this way the unreliable variables may be removed, or keep out from recalculation of a model simplified and more reliable. The approximate uncertainty variance of the PLS-R coefficient b is estimated by jack –knifing:

( )

−= ∑

= NNbbbs

M

mm

1

1

22

where N= number of samples s2b= estimated uncertainty variance of b b= regression coefficient at the cross validated optimal components using all the n samples bm= regression coefficient at the cross validated optimal components using all the n samples except the samples left out in cross validation segment m The Uncertainty test is used in this work in order to assess the stability of the plot, and the significative variables are used to recalculate the models.

Equation 3.10

Page 64: Determination of wheat quality during the development of ... · sviluppo del grano e sono già presenti quindici giorni dopo l’impollinazione (dpa). Il grano continua ad accumulare

53

4. MALDI-TOF MASS SPECTROMETRY

• 4.1 Introduction

• 4.2 Basic principle of MALDI-TOF MS

• 4.3 Samples preparation

• 4.4 Matrix solution for MALDI-TOF MS

Page 65: Determination of wheat quality during the development of ... · sviluppo del grano e sono già presenti quindici giorni dopo l’impollinazione (dpa). Il grano continua ad accumulare

4. MALDI-TOF MS

54

4.1 Introduction Matrix assisted laser desorption/ionization time-of-flight mass spectrometry (MALDI-TOF MS) is an efficient separation technique that determines the molecular weight of chemical compounds(29).The physic principle of mass spectrometry is based on movement of a group of ions of differing mass/charge (m/z) ratio into an electric field. Whereby they get the same kinetic energy. The ions will traverse a field-free region. The flight-time through this field depending upon their m/z ratio (29). Mass spectrometry has emerged as an important tool for scientific research. The technique allows the analyses over 300 kDa of mass limit.

4.2 Basic principles of MALDI-TOF MS Mass spectrometry can be divided in several parts: the sample inlet, an ion source, ion optic, a mass analyser, a detector, a vacuum system, a repeller, an instrumental control system and a data system (figure 4.1). The instrument is used with a computer to get the data.

Figure 4.1 Scheme and main components of MALDI-TOF MS (29)

Prior to starting the analysis, a mixture of analyte and matrix has been place on probe tip and dried to form crystals. The probe with crystals on the top is inserted into the sample inlet. The proteins are then ionised by a laser beam. The ions are predominately one-charge ions and they are brought from the solid state to the gas phase by the pulses laser light as illustrated in figure 4.2.

Page 66: Determination of wheat quality during the development of ... · sviluppo del grano e sono già presenti quindici giorni dopo l’impollinazione (dpa). Il grano continua ad accumulare

4. MALDI-TOF MS

55

Figure 4.2 Simplify representation of matrix-assisted laser desorption/ionisation (29)

These ions enter into the flight tube consisting of a metal tube that house the ions optics and the detector providing a fixed distance through which the ions travel under height vacuum. These ions travel down the flight tube at the constant speed dependent only on their mass. Heavier ions take longer to reach the detector than lighter ions. The electronic field that that accelerates the ions (positive or negative) away from the probe tip and starts them on their trip down to the flight tube is provided by ion optic. The repeller is a component of the ionic optic in order to apply a voltage (+ / -28 kV) to force the ions away from the probe tip and into the flight tube. At the end of ions flight, a detector converts the ions to signal. The result is read by the computer and processed. The conclusion of the analysis is a mass spectrum, where the X-axis is the molecular mass to charge ratio (m/z) and the y-axis indicates an abundance value, measures as the relative intensity.

The principle of separation in time-of flight mass spectrometry is based on time required for ions, which have been supplied with the same given kinetic energy. The ions travel the flight tube in a time, which depends upon their m/z ratio according to:

∆Ekin = V*z = ½ mv

2

mzV

v⋅⋅

=2

Where m is the mass and v is the velocity. The ions travel with a velocities that are inversely proportional to their m/z. The lightest ion mass has the highest velocity and thus reaches the detector first.

Equation 4.1

Equation 4.2

Page 67: Determination of wheat quality during the development of ... · sviluppo del grano e sono già presenti quindici giorni dopo l’impollinazione (dpa). Il grano continua ad accumulare

4. MALDI-TOF MS

56

4.3 Sample preparation The preparation of a good sample is a main point during a MALDI-TOF MS analysis. (30,31). Concentration of analyte and matrix, choice of matrix, previous storing of analyte, contaminant and compatible solubilities of matrix and analyte solution are some of many variables that influence the analysis (30) (31). The amount of sample and the ratio sample-matrix is critical in mass spectra analysis (31). A low concentration of sample, may not be sufficient be detected. On the other side, a high concentration can saturate the detector or cause signal suppression (29). It is important that the samples-matrix ratio is in the right range.

4.4 Matrix solution for MALDI-TOF The analyses of peptide molecules require a preparation of samples where the extract is mixed within a matrix solution. The matrix-solution contains the matrix and the diluents that are acetonitrile, methanol and water. The matrix has several functions; first, isolates the analytes from each other, and then it plays a key role by absorbing laser light energy causing the material to vaporize and ionize. The vaporised matrix will carry some of the analyte with it. Finally, because the matrix absorbs most of incident energy, it minimises the fragmentation of samples caused by laser radiation (table 4.1).

Function of the matrix

• Isolation of the analyte from each other.

• Absorption of light laser energy.

• Vaporisation of matrix solution.

• Minimisation of the samples fragmentation caused by laser radiation.

Table 4.1 Main function of the matrix

The chosen matrix depends on the samples that is analysed. This pertains to molecular size and nature of analyte. Three matrix are suggested for analysis of proteins: Sinapinic acid (SA) for proteins and large peptides, 4HCCA (a-cyano-4-hydrozybenzoic acid) for peptides, proteins multiply charged, DHB (2,5-dihydroxy benzoic acid) for peptides, proteins, carbohydrates, nucleic acids and polymers. SA has been chosen for samples preparation of gliadins and glutenins extracts in this project.

Page 68: Determination of wheat quality during the development of ... · sviluppo del grano e sono già presenti quindici giorni dopo l’impollinazione (dpa). Il grano continua ad accumulare

57

5. EXPERIMENTAL WORK

• 5.1 Experimental Background

• 5.2 The Dataset- Varieties Used

• 5.3 Extraction

§ 5.3.1 Gliadins and glutenins extraction procedure

• 5.4 Procedures of MALDI-TOF MS analyses

§ 5.4.1 Calibration of the MALDI-TOF MS § 5.4.2 Preparation of the matrix § 5.4.3 Choice of dilution § 5.4.4 Analysis § 5.4.5 Washing Probe

• 5.5 Multivariate data analysis

§ 5.5.1 From mass spectrometry to Unscrambler

• 5.6 Pre-processing

§ 5.6.1 Deletion of unregistered spectra § 5.6.2 Baseline correction and Multi Scatter Correction § 5.6.3 Reduction of variables § 5.6.4 Smoothing § 5.6.5 Logarithmic transformation § 5.6.6 Pre-processing 15 dpa § 5.6.7 Standardization and Weighting

• 5.7 Correlation of the peaks

• 5.8 Outliers

Page 69: Determination of wheat quality during the development of ... · sviluppo del grano e sono già presenti quindici giorni dopo l’impollinazione (dpa). Il grano continua ad accumulare

5. Experimental work

58

5.1 Experimental Background This project is based on chemometrics analyses of the gluten complex, i.e. gliadins and glutenins proteins extracted from wheat. Multivariate data analyses are used in handling the MALDI-TOF MS data. The experimental work has been divided into two steps. The first part consists of extraction of proteins followed by MALDI-TOF MS analysis. The second part consisting of data obtained from the mass spectrometry which were analysed using the software packages ‘The Unscrambler 8.0’.

1. Extraction and subsequently analysis of the gluten proteins by MALDI-TOF MS of samples of 15, 20, 25, 30, 35, 45 dpa

2. Multivariate analysis on mass spectra data The extraction of gliadins and glutenins is carried out on the base of their solubility, according to Osborne classification. The samples are collected from six stages of grain developing, while the gluten proteins are synthesised. The proteins extracted are separated by MALDI-TOF MS, in a fully automated analysis in order to eliminate any interfering from the operator. Multivariate analysis is applied in managing and handling of data collected from MALDI-TOF MS. The overall purpose is the extraction of the hidden information from the data collected. The hidden information distinguishes the two wheat qualities and the difference between the samples collected during the development of the grain.

Page 70: Determination of wheat quality during the development of ... · sviluppo del grano e sono già presenti quindici giorni dopo l’impollinazione (dpa). Il grano continua ad accumulare

5. Experimental work

59

5.2 The Dataset- Varieties Used The composition of the gluten complex is commonly accepted as a criterium for breadmaking quality. The data set consists of MALDI-TOF MS spectra of gluten proteins (gliadins and glutenins extracts) from four different wheat (Triticum aestivum L.) varieties. The varieties have been selected according to their ability to be used in the production of bread. The four wheat varieties used in the experiments have been obtained from a plant breading Danish company (Sejet Planteforædling, Horsens, DK) and are divided into two groups of quality as table 5.1 indicates.

Wheat Variety Quality Abbreviation

Pentium Baking PE Miller Baking MI

Stakado Feeding ST

993618 Feeding 99

Table 5.1 The wheat varieties in the data set

The quality suitable for breadmaking is called baking quality, contrary to the quality not suitable for baking purpose, which is called feeding quality. For each variety, samples from different stages in grain development are collected (table 5.2). The grain development is commonly measured as days post anthesis (dpa). The anthesis is the event when the anthers release their pollen. The determination of wheat quality and the changes in protein composition during development of the grain begins at 15 dpa until 45 dpa. Fifteen dpa is the stage of development when the grain has already started the filling of storage proteins. At 45 dpa the wheat is usually harvested and the storage proteins have been accumulated into the grain.

Dpa Pentium Miller Stakato 993618

15 + + + + 20 + + Missing +

25 + + + + 30 + + + +

35 + + + +

45 + + + +

Table 5.2 Collection of samples according to dpa.

Page 71: Determination of wheat quality during the development of ... · sviluppo del grano e sono già presenti quindici giorni dopo l’impollinazione (dpa). Il grano continua ad accumulare

5. Experimental work

60

A total of sixty mass-spectra have been recorded per variety at each dpa, resulting 240 mass spectra for each stage (180 for 20 dpa) and a total of 1380 spectra. The separations of glutenin proteins are carried out in the molecular weight ranging from 14 kDa to 90 kDa in which gliadins and glutenins are present. The molecular weight ranging 14-90 kDa is spread over 15965 variables. This number of variables combined with the number of mass spectra gives a huge data table that is of arduous control for human capability. Multivariate data analysis consists in powerful tool for extracting useful information from a complex data table, and then it is applied in these studies. In the appendix are given as example, one mass spectrum for each variety divided in dpa, according the following index:

• Appendix D1 15 dpa

• Appendix E1 20 dpa

• Appendix F1 25 dpa

• Appendix G1 30 dpa

• Appendix H1 35 dpa

• Appendix I1 45 dpa

Page 72: Determination of wheat quality during the development of ... · sviluppo del grano e sono già presenti quindici giorni dopo l’impollinazione (dpa). Il grano continua ad accumulare

5. Experimental work

61

5.3 Extraction The gliadins and glutenins are the objects of this thesis and must be extracted from the wheat flour before starting the analyses. The wheat flour is obtained by milling of the wheat kernel and thus the interesting kernel components are separated by sifting. In according to Osborn’s nomenclature (section 2.2.3) the gliadins and glutenins are soluble in alcohol and alkaline solution. Applying an appropriate alcohol and alkali solution the proteins should be extracted, but while gliadins are extracted easily by ethanol, the extraction of glutenins is difficult. The main problem is given by an extract containing proteins of other classes, because the limits between the classes are ambiguous and a fraction from an extraction procedure contains proteins from another class. In order to obtain a purer gliadin/glutenin extracts, the water and salt soluble proteins have been extracted and eliminated before the extraction of the residuary and objects of studies proteins (gliadins and glutenins). In the follow section is described the extraction procedure.

5.3.1 Gliadins and glutenins extraction procedure The extraction is performed by mixing 50 mg flour in 500 µl of buffer in an Eppendorf test tube. Different steps and buffer (summarize in table 5.3) are necessary to extract the gliadins, glutenins and eliminate the other proteins from the extract.

Steps Solution Buffer time/min 1 Water-salt soluble fraction 0.1M NaCl,20 mM DTT 60

2 Centrifugation/ 20.000gr 10

3 Water-salt soluble fraction 0.1M NaCl,20 mM DTT 30

4 Centrifugation/ 20.000gr 10

5 Water soluble fraction Water 30

6 Centrifugation/ 20.000gr 10

7 Alcohol soluble fraction 50% (v/v) propan-1-ol,1%(v/v)HAc,2%DTT 60

8 Centrifugation/ 20.000gr 10

Table 5.3 The steps for gliadins and glutenins extraction.

In the first step, a buffer containing saline solution (sodium clorine with additional of dithiothreitol) is used to extract the proteins saline soluble. The flour and the saline solution are put into a test tube and whirly mixed for ten seconds by vortex mixer. Afterwards the test tube with a mixed solution is

Page 73: Determination of wheat quality during the development of ... · sviluppo del grano e sono già presenti quindici giorni dopo l’impollinazione (dpa). Il grano continua ad accumulare

5. Experimental work

62

placed into an ultrasonic bath for twenty minutes where the globulins are extracted to the saline solution. It is important that the temperature of bath must not rise above 30 C°, otherwise it may start the protein denaturation. Ice cubes can be used to keep the temperature low. The proteins extracted during the first phase are uninteresting in this context and they are separated by centrifugation (20.000 gyrate, 10 min) from the rest of solution thus eliminated. The extraction and elimination of saline soluble proteins is repeated another time: the supernatant is removed and new extraction solution is added to the sediment. The second group of proteins that must be removed are water soluble proteins. The same process is applied to the sediment but using water solution. Finally the gliadins and glutenins are extracted using buffer solution containing 50 % (v/v) of propan-1-ol, 1% (v/v) of acetic acid (HAc) and 2% of DTT (table 5.3). The supernatant with the proteins of interest can be store in a freeze at -80 C°.

Page 74: Determination of wheat quality during the development of ... · sviluppo del grano e sono già presenti quindici giorni dopo l’impollinazione (dpa). Il grano continua ad accumulare

5. Experimental work

63

5.4 Procedures of MALDI-TOF MS analyses The separations of gluten proteins are performed in an HP G2025A MALDI-TOF MS (Hewlett-Packard, Palo Alto, CA, USA). The procedures of analyses are composed of five steps. Firstly the instrument must be calibrated before starting the analyses (section 5.4.1). Second the matrix necessary for MALDI –TOF MS analysis is daily prepared (section 5.4.2). Third the extract of gluten proteins is mixed with the matrix in a ratio that must be considered and properly chosen (section 5.4.3). Fourth the analysis (section 5.4.4) is carried out. Finally between an analysis and the following, the probe that has located the samples must be carefully washed (section 5.4.5).

5.4.1 Calibration of the MALDI-TOF MS Before the analysis, the MALDI-TOF MS is calibrated with three standards. They have molecular weight in the range of the analysed proteins.

• Cytochrome C 12.361,2 Da • Apomyoglobin 16.952,6 Da • Bovine serum albumin 66.431,0 Da

5.4.2 Preparation of the matrix The preparation of the matrix solution is done by mixing a solution 0,1 M of Sinapinic acid in a solution of acetonitrile, methanol and water in the ratio of 64:36:8 (v/v/v) (table5.4). The solution should be prepared fresh every day.

• 11 mg SA

• 288 µl acetonitrile

• 173 µl methanol

• 39 µl milli-Q water Table 5.4 Preparation of 500 µl SA matrix solution.

Page 75: Determination of wheat quality during the development of ... · sviluppo del grano e sono già presenti quindici giorni dopo l’impollinazione (dpa). Il grano continua ad accumulare

5. Experimental work

64

5.4.3 Choice of dilution The matrix solution is added to proteins extracted in the chosen ratio after a right evaluation of the experiments. Concentration, amount of samples and the ratio samples-matrix are critical in mass spectra analysis. A low concentration of samples may not sufficiently be detected. On the other side a high concentration can saturate the detector or signal suppression (29). It is important to be the right range. The amount of the samples suggested by instrument provide company, for an analysis in MALDI-TOF of peptides of molecular weight range 14-90 kDa is between 1:10 and 1:50. Different experiments have been tested to find out the best concentration, or at least, to decide which concentration to use for all of the analysis. At the beginning, one sample has been chosen as a tester, and the concentration between 1:25 to 1:50 are tried. The mass spectra obtained were similar and it was an impossible value for improvement between one to the others concentration. The only result obtained is an easier phase of the crystallisation. However, for an objective analysis and in order to have a unique variable for all the samples analysed, the ratio of 1:30 samples-matrix has been chosen as concentration. One µl of protein extract is mixed with 29 µl of matrix solution. One µl of sample-matrix mixture is applied on the stainless steel probe tip, in a location called mesas. The samples-matrix mixture is ready to be dried in order to obtain crystals. For a better crystallisation, the mixture is applied in two

steps: first 0.5 µl of sample-matrix mixture is applied at a time allowing crystals to form between the applications. The probe contains 10 mesas, and it is set into a machine for drying. In this way, crystals are formed and the probe is ready for the analysis. The crystals are quite stable and can also be stored at room temperature in the dark or in vacuum for several days (29).

Page 76: Determination of wheat quality during the development of ... · sviluppo del grano e sono già presenti quindici giorni dopo l’impollinazione (dpa). Il grano continua ad accumulare

5. Experimental work

65

5.4.4 Analysis The probe is set into the instrument vacuum chamber and the pressure is changed from atmospheric pressure (760 Torr) to high vacuum (~10-6 Torr). The program is set to acquire spectra with the molecular range of interest. For separation of gliadins and glutenins the optimal molecular range is set to 14-90 kDa The analysis is completely automated, using a “multi-shot” option. This means the intensity of the laser beam and the part of the mesa where the laser will shoot are selected. Each mesa will be analysed in the same and automated way and several mesa can be analysed in few minutes due to such a fast technique. The tip surface may influence the crystallisation. Old scratchy tip surface and new stainless steel probe tip may have different conducts during crystallisation, and have been considered. Some initial experiments are done with one sample analysed on five different probes for age and usury. Two probes are chosen to carry out the experiments, which seem to be more similar. Whit multivariate data analysis none particular difference has been found between the probes (result no showed). The result of the analysis is a mass spectrum, where the x-axis is the molecular weight mass to charge ratio (m/z). The y-axis indicates an abundance value, measured as the relative intensity. The ‘multi-shot’ procedure adds up to 30 mass-spectra into one final spectrum. The analyses by MALDI-TOF MS are not quantitative measurements; they indicate only the qualitative presence of proteins at given m/z values. The spectroscopic data are given in outputs as TOF files, converted then in CSV files and later into Excel spreadsheet file ready to be analysed by “Unscrambler”.

Page 77: Determination of wheat quality during the development of ... · sviluppo del grano e sono già presenti quindici giorni dopo l’impollinazione (dpa). Il grano continua ad accumulare

5. Experimental work

66

Figure 5.1 Main steps of MS analysis (Hewlett-Packard).

5.4.5 Washing Probe After the analysis, the crystals must be removed from the probes. Probe wash is a standard procedure to eliminate the old carry over samples and casual contaminant on the tip of the probe. One of the reasons for the poor MALDI mass spectra produced may be due to the presence of contaminants. An accurately probe wash should decrease their amount a low enough level to reach a good analysis. The wash solution is composed of formic acid, ethanol, and water in the ratio 1:1:1. About 350 µl of solution is applied to the probe, which successively is dried with kimwipes. The step is repeated one more time before the probes are placed into an ultrasonic bath for 10 minutes. Acetone and successively methanol are applied in order to remove the wash solution and dry the probe. More information about chemical compounds is given in appendix A1. All solutions are in HPLC quality, except for the ethanol, which is of technical quality.

Mesa

Eppendolf

Obtain mass

spectra

Page 78: Determination of wheat quality during the development of ... · sviluppo del grano e sono già presenti quindici giorni dopo l’impollinazione (dpa). Il grano continua ad accumulare

5. Experimental work

67

5.5 Multivariate data analysis Multivariate data analysis has been applied to this project to handle the complex data table originating by MALDI-TOF MS. The wheat varieties have been collected on basis of their known quality, in order to obtain mass spectra from baking quality and spectra from feeding quality. The multivariate data analyses have been performed in order to extract the difference between two qualities and on basis of these differences, create a model for the prediction of unknown samples. The mathematical modelling is implemented and hidden inside the commercial software packages “The Unscrambler”, used in this project for the multivariate analyses.

5.5.1 From mass spectrometry to Unscrambler Results from MALDI-TOF MS are given in CSV-files. These files must be imported into Unscrambler files. For this purpose the CSV-file are before imported in to the EXCEL spreadsheet where they are arranged in a way that the variables of each single spectrum are placed in a column consisting of 15965 variables. The following spectra are arranged next to each other in columns. The EXCEL file is saved and imported in “The Unscrambler” where it must be transposed in order to replace the columns with the rows and vice versa.

Page 79: Determination of wheat quality during the development of ... · sviluppo del grano e sono già presenti quindici giorni dopo l’impollinazione (dpa). Il grano continua ad accumulare

5. Experimental work

68

5.6 Pre-processing The spectrometric data are pre-processed before the starting of the multivariate data analysis. The spectra are pre-processed all together.

5.6.1 Deletion of unregistered spectra A previously overview of the mass spectra shows that some of spectra look like without signals (figure 5.1). The analyses could be greatly influenced by these spectra, which are clearly due to erroneous measure. By a simple investigation using the line plot of ten-twenty spectra at the time, the spectra that clearly are outliers have been removed from the analysis (figure 5.1). An outlier is a sample or variable that is unusual compared to the rest of the data (28). The rest of outlier detection is carried out after the pre-processing (section 5.8).

Figure 5.1 The number one and two (9935B53 and 9935B57) look like spectra without signals thus they are two outliers. They are removed from the analysis.

Page 80: Determination of wheat quality during the development of ... · sviluppo del grano e sono già presenti quindici giorni dopo l’impollinazione (dpa). Il grano continua ad accumulare

5. Experimental work

69

5.6.2 Baseline correction and Multi Scatter Correction

The MALDI-TOF mass-spectra from the same samples differed from each other and are not comparable (figure .5.2)

Figure 5.2 Each spectra start from different baseline. The single spectra start with different baselines and have different slopes. Baseline correction and multiplicative scatter correction (MSC) solved the problem by aligning the spectra. The baseline is a transformation used to correct the baseline of samples. The transformation chosen is baseline offset, which uses subtract the lowest point in the spectrum to each variable in order to align the mass spectra that start in different value. The formula for the baseline offset correction is written below:

Equation 5.1 f(x) = x – min(X) where x is a variable and X denotes all selected variables for this sample (28).

Page 81: Determination of wheat quality during the development of ... · sviluppo del grano e sono già presenti quindici giorni dopo l’impollinazione (dpa). Il grano continua ad accumulare

5. Experimental work

70

MSC is a transformation method used to compensate for additive and/or multiplicative effects in spectral data (28). The spectra are aligned after baseline correction and MSC (figure 5.3).

Figure 5.3 The MALDI-TOF spectra after the baseline correction

5.6.3 Reduction of variables The separation interval between 14 and 90 kDa is distributed equally over 15951 variables. The large number of variables slackens the calculation of “The Unscrambler” and is reduced, when possible without distorting the amount of information given. The most sensible peak is the narrowest, which could be loose by an excessive reduction of variables. Therefore the choice of the reduction factor is covered between 5 and 50 and it is calculated separately for each analysis performed.

5.6.4 Smoothing The moving average smoothing was also applied on the data set to even the curve in spectra. Smoothing is used to reduce the noise in the data without reducing the number of variables (28).The variables are replaced each observation with an average of the adjacent variables (including itself). The number o variables, termed segment chosen by user, is selected in each experiment between 10 and 25, depending by dpa of the samples and the analysis performed.

Page 82: Determination of wheat quality during the development of ... · sviluppo del grano e sono già presenti quindici giorni dopo l’impollinazione (dpa). Il grano continua ad accumulare

5. Experimental work

71

5.6.5 Logarithmic transformation A logarithmic transformation is made to reduce the difference between the largest and smallest peak in the spectra. A logarithmic transformation is relevant to use, when the difference is higher than factor 10. The arbitrary constant 50 is added to eliminate the random noise in the bottom of the spectra. The noise limit, baseline, is used for the selection of the constant.

)50log( += XX

5.6.6 Pre-processing 15 dpa The pre-processed used in 45 dpa separation has emerged unfit for 15 dpa, and some new pre-processing have been tried on 15 dpa. Starting from row data, different combination of pre-processed has been tested. Smoothing, logarithmic transformation, reduction of variable have been tried with various value. The MSC is appeared needful; the use of baseline off-set is always kept. The values used are schematically reported in the follow table 5.5:

Table 5.5 For each transformation the table shows the value applied. They are tested with combinations of these pre-processing with these values.

Logarithmic Transformation )10log( += XX )50log( += XX )80log( += XX

Reduction of variables 5 10 20 Smoothing 10 15 18

MSC

Full offset

baX

X−

=

Common offset

aXX −=

Common amplification

bX

X =

Baseline Offset

Page 83: Determination of wheat quality during the development of ... · sviluppo del grano e sono già presenti quindici giorni dopo l’impollinazione (dpa). Il grano continua ad accumulare

5. Experimental work

72

5.6.7 Standardization and Weighting Standardization is the most common technique to weigh the data in order to reduce and modify the relative influence of the variables (28). By standardization, the data are first centered and then scaled, giving variables the equal chance to influence the model (28). Standardization is performed in this project using the common auto scaling:

Equation 5.2 A/(Sdev + B) with A = 1.0 and B = 0.0 (i.e. 1/Sdev) In spectrometry, scaling with 1/SDev is not always considered exclusively advantageous, because sometime the noise may be more emphasized (28). The application of standardization must be estimated according to the relative differences between the absolute variables ranges. The differences between the values of variables are not significative extreme, due to logarithmic transformation. The analyses are carried out both with standardization and without standardization, no significative differences are observed.

Summary of the pre-processing method applied The following pre-processing methods are applied on the data:

• Baseline correction using baseline offset function

• Smoothing with a segment size between 10 to 25

• Number of variables is reduced by a factor included between 5 and 50

• Logarithmic transformation

)50log( += XX

• Multiplicative scattering correction

baX

X−

=

Page 84: Determination of wheat quality during the development of ... · sviluppo del grano e sono già presenti quindici giorni dopo l’impollinazione (dpa). Il grano continua ad accumulare

5. Experimental work

73

5.7 Correlation of the peaks The analyses by MALDI-TOF MS are not quantitative measurements; they indicate only the qualitative presence of proteins at given m/z values. In a MALDI-TOF analysis, one protein generally gets one charge per each, thus the m/z values indicated also MW covered. However there is high probability the same protein arrives in the flight tube with one, double charge, or two proteins with only one charge. These proteins give on mass spectra different peaks that are associated. Using the MALDI software option, have been identified some peaks that may be correlated to the same protein (figure 5.4). It is believed that the proteins with one charge are the most probably in the MALDI analysis, thus they have higher values of relative abundance. In the analyses carried out the peaks at 15-20 kDa may be correlated to 32.5-44 kDa peaks as 25-29kDa to 48-55.5 kDa (figure 5.4).

Figure 5.4 Mass spectrum of Pentium sample at 45 dpa. The same protein may have one charge, two charges or be attached to another protein with a total of only one charge; they give a spectrum with correlated peaks. The numbers in the figure indicate which peaks that may be correlated to each other. The peaks encircled in red and blue may be correlated, due to the exact same plot and perfectly corresponding to the value given by software option. For example, the peaks at 15-20 may be related to the proteins with molecular weight 32.5-44 kDa, which have higher relative abundance. However, despite to the correlation of peaks mentioned before, some work based on 2DE gels show the presence of proteins at 15-20 kDa. These proteins found by 2DE gels may be hidden by peaks correlated to proteins of 32.5-44 kDa.

11000 17000 23000 29000 35000 41000 47000 53000 59000 65000 71000 m/z46.0 57.2 66.6 74.8 82.2 89.0 95.2 101.1 106.6 112.0 117.0 uS

420

500

580

660

740

820

900

980

1060

1140

Abundance

Sum of 30 from 38 | Positive Polarity | No Filter

1) 32.5-44 kDa 1) 15-20 kDa

2) 25-29 kDa

2) 48-55.5 kDa

Page 85: Determination of wheat quality during the development of ... · sviluppo del grano e sono già presenti quindici giorni dopo l’impollinazione (dpa). Il grano continua ad accumulare

5. Experimental work

74

5.8 Outliers The importance of the analysis is based on reproducibility of the studies, and it is greatly controlled by the detection of outliers. Outliers are samples or variables that are unusual compared to the rest of the data (28). The outliers are badly described by the model or greatly influence the model (28); and must be removed before the model is made. Some outliers are consequence of errors for the data, for example instrumental errors. Some time the instruments record mass spectra similar to a line due to some reason, ex. a bed crystallisation and no proteins reach the flight tube. These spectra are clearly visible looking the line plots. Before starting the analysis, during the pre-processing phase, the mass spectra have been carefully watched in order to remove the abnormal spectra. The control of outliers is detected by score plots, residuals and leverages plots. Some of these Unscrambler tools are used to discover the potential outliers, which are subsequently checked. The figure 5.5 shows for each sample the residual variance. The samples having high residual variance (located far from zero) may be outliers (28).

Figure 5.5 The residual samples variance plot is used to detect outliers. The samples having high residual variance are potential outliers and they must be examined in order to keep it out from the recalculation of the model if necessary.

ST25A11 Potential outlier

Page 86: Determination of wheat quality during the development of ... · sviluppo del grano e sono già presenti quindici giorni dopo l’impollinazione (dpa). Il grano continua ad accumulare

5. Experimental work

75

The mass spectra of potential outliers are thus checked in order to know if they contain erroneous information. For examples the line plot of ST25A11 is compared with the other samples (figure5.6).

Figure 5.6 The line plot of ST25A08 compared to other Stakado 25 dpa samples. When the mass spectrum is greatly different from the others, it may be caused by instrument errors or the quality of the matrix-sample crystals and the samples will be badly described by the model, or it can greatly influence the model. The samples ST25A11 have showed a spectrum relatively different from the others so it may be an outlier and it should be left out from the recalculation of the model. However when these difference (typical in MALDI-TOF MS) are not greatly evident, the spectra have been included into the model in order to consider these ‘difference’ as possibly new variables that will probably occur during the MALDI-TOF analysis (e.g. the quality of crystals) and will happen during the analysis of unknown samples. If these small differences were left out, there is a risk that the model will be unable to predict some of the spectra for the prediction of unknown samples. Some other tools are used to investigate the presence of outliers in the model and how they influence it. One of these is the leverage plot for samples (figure 5.7), i.e. the distance between a projected point and the model centre that are plotted versus samples. The samples having a large leverage (larger than 0.4-0.5) and large residual may be outliers and they have the strongest influence on the model (28).

ST25A11 is a

potential outlier

Page 87: Determination of wheat quality during the development of ... · sviluppo del grano e sono già presenti quindici giorni dopo l’impollinazione (dpa). Il grano continua ad accumulare

5. Experimental work

76

Figure 5.7 The leverage plot has been used to check the absence of outliers. The potential outlier (ST25A11) has high leverage but inferior to 0.4-0.5, value accepted the samples may behave as outliers (28). When the samples have values higher than 0.4-0.5, they may behave as outliers and influence negatively the model, and it has been kept out of the recalculation of the model (28). The potential outlier ST25A11 has value of leverage lower than 0.3 thus it is kept into the model (figure 5.7). Another way to check the presence of outliers is by plotting the residual X variance against the leverage. This plot is shown in figure (5.8); when the residual variance and leverage are both high (i.e. when the samples are on the upper right corner of the influence plot) the samples have poor model fit and they largely influence the model, i.e. they are outliers and they must be removed from the model. When the residual is high and the leverage is low, the samples may be an outlier and are also checked.

ST25A11 is a

potential outlier

Page 88: Determination of wheat quality during the development of ... · sviluppo del grano e sono già presenti quindici giorni dopo l’impollinazione (dpa). Il grano continua ad accumulare

5. Experimental work

77

Figure 5.8 The samples residual X-variances against leverages for the first four principal components. Each component is coloured by a diverse colour. The samples on the upper left side of the plot have a high residual variance, but a low leverage. The samples ST25A11 is an outlier because it moves from high residual X-variance of PC1 to relatively high leverage and residual X variance of PC2, PC3 and PC4. The samples ST25A11 has been left out from the recalculation of the model. The sample ST25A11 has a high residual variance in PC1 and relatively high leverage. The leverage increases from PC1 to PC4 and the sample has poor fit and large influence in the first four PCA. It has been kept out from the recalculation of the model. It is important to stress that the main problem of the analysis by MALDI-TOF is the impossibility to record identical spectra from the same sample. Each analysis of the same sample gives a spectrum a little bit different from the previous one. However, these small differences of the spectra must be considered by the model, otherwise some of the unknown mass spectra will be not predicted correctly. By using an adequate pre-processing it is possible to reduce the differences between the spectra but the inevitable remaining variation of these spectra should be included into the model. During these studies, some models have been done using all the possible spectra, in order to keep as much of the variance of the spectra as possible and they show a good capability of prediction. Due to limited time, two models with or without outliers are not compared, and it is furthermore necessary to try to predict a considerable number of unknown samples in order to test the ability of the two models (e.g. to make a model with a majority of the samples and a model detecting the outliers, proving their ability for prediction and comparing these two models).

Possible outliers

PC1

PC2 PC3

PC4

Page 89: Determination of wheat quality during the development of ... · sviluppo del grano e sono già presenti quindici giorni dopo l’impollinazione (dpa). Il grano continua ad accumulare

5. Experimental work

78

Some of the Unscrambler plots are quite powerful in order to illustrate which samples must be kept and which are too different from the rest of the population of objects and must be removed. The use of normal probability plot of residuals is a useful tool to search possible outliers and considerer how they diverge from the model. This is a statistical way to study the distribution of residuals, which should appear along a straight line. The plot shows all the residual for the baking Y-variable. When the residuals are along the straight line (figure 5.9), the model explains the variations of the variables (i.e. mass spectra) well and a good model should explain as much as possible of the recorded mass spectra. The samples that are far from this line, may be outliers, must be checked, and removed before the recalculation of the model if they have poor model fit and large influence on the model.

Figure 5.9 The normal probability plot of residuals helps to find possible outliers. The model explains the variation of the variables well when the residual of the samples are along the straight line. Contrarily the samples faraway from the straight line may be outliers and badly described by the model. The potential outlier lies on the straight line (25 dpa PLS-R model). The sample ST25A11 lies on the straight line and the model explain well the difference of this spectra, thus the samples has been kept into the model. The plot of Y-residuals against predicted Y-values (figure 5.10) may help to evaluate if the potential outliers will be predicted or not by the model. If the model is properly able to predict the variation in Y, the residual variation is only caused by noise, and the residuals are randomly distributed (28). An adequate model is given when the samples have relatively low residuals compared to variance of Y (i.e. considering that the average value is 0.5; the limit used for the prediction of the samples) and they are randomly distributed along a trend line (28). The samples show a clear trend in the residuals (figure 5.10), i.e. the Y-residual variation is due to noise only and the model is satisfactory.

Straight line

ST25A11 The potential outlier lies on the straight line

Page 90: Determination of wheat quality during the development of ... · sviluppo del grano e sono già presenti quindici giorni dopo l’impollinazione (dpa). Il grano continua ad accumulare

5. Experimental work

79

Figure 5.10 The plot of Y-residuals against the predicted Y values. The samples have relatively small Y-residuals and all samples are close to the trend of the model, i.e. there are not outliers. Finally, it is suggested to use the 2D scatter plot (figure 5.11) for detection and to test if the potential outliers are badly described by the model or if they influence the model so much that the prediction of unknown samples will be erroneous. The validation method used is a full cross-validation. In this type of validation, the same samples are used both for calibration and validation. It requires that one sample at time is kept out from the calibration and the model is calibrated with the others. The values of the sample left out are predicted. The process is repeated by keeping out another sample and so on until every sample has been left out once, and the total validation is calculated. When an outlier is left out from the calculation of the model, it will have a predicted value close to 0.5 if it is a real outlier thus it must be removed and the model recalculated. In other words the ‘real outliers’ are faraway from the regression line of the plot of figure 5.11 and they are badly described by the model. The 2D scatter plot for the 25dpa model, when the full cross validation model is used to predict the Y-values of unknown samples is shown in figure 5.11. The model has been calculated without leaving out the potential outliers in order to test if it influences negatively the prediction model. The predicted values are close to 1 for feeding samples and close to 0 for baking samples (figure 5.11) thus the model is good, i.e. the samples used for the calibration set are absent of outliers, or these potential outliers describe some variance of the spectra that probably occur during MALDI-TOF analysis.

Trend in the model

Page 91: Determination of wheat quality during the development of ... · sviluppo del grano e sono già presenti quindici giorni dopo l’impollinazione (dpa). Il grano continua ad accumulare

5. Experimental work

80

Figure 5.11 The 2D scatter plot of the PLS -R model of predicted vs. measured Y-values. The samples lying far from the regression line are badly described by the model. The model’s fitness measures how far the validation set is from the regression line. When the potential outliers are kept in the model and the correlation coefficient for the validation is high (0.98) it proves that the potential outliers describe the variance in the spectra of the unknown samples well. Although there are outliers believed to have impact on the model, they equally could carry some information, i.e. they are not without signification. However, some spectra have been believed outliers and are removed from the analyses (7).

Page 92: Determination of wheat quality during the development of ... · sviluppo del grano e sono già presenti quindici giorni dopo l’impollinazione (dpa). Il grano continua ad accumulare

81

6. RESULTS-DISCUSSION

• 6.1 Determination (or investigation) of wheat quality

§ 6.1.1 Previous explorative analysis and outliers detection by PCA § 6.1.2 Determination of wheat quality by PLS-R § 6.1.3 Prediction of unknown samples by PLS-R and SIMCA

• 6.2 Gluten proteins development

§ 6.2.1 Study of development of varieties § 6.2.2 Study of development of quality

Page 93: Determination of wheat quality during the development of ... · sviluppo del grano e sono già presenti quindici giorni dopo l’impollinazione (dpa). Il grano continua ad accumulare

Results and Discussion Determination of wheat quality

82

6.1 Determination of wheat quality Section 6.1 is focused on the investigation of the gluten proteins by mass spectrometry and multivariate data analysis at distinct stages of grain development, in order to make a model for the determination of the wheat quality before the harvest. Firstly the differences that distinguish the mass spectra between baking and feeding qualities are searched by multivariate data analysis; secondly the models are created on basis of these differences. A central part of this target is the evaluation of the possibility to predict the wheat quality before the harvest of the grain; therefore the samples have been collected at different stages of grain development from 15dpa to 45dpa (15, 20, 25, 30, 35, 45 dpa). At 45 dpa the grain has finished to accumulate storage proteins and this is the stage when the wheat is harvested. At 15 dpa the grain has already started to synthesise the gluten proteins. It is remembered that the wheat quality is referred to breadmaking purpose and it is a criteria based on composition of gluten complex.

This section (6.1) is subdivided in three subsections:

• Previous explorative analysis and outliers detection by PCA section 6.1.1 • Determination of wheat quality by PLS-R section 6.1.2 • Prediction of unknown samples by PLS-R and SIMCA section 6.1.3

The first section (6.1.1) focuses on the use of PCA in order to get an overview of the data, checking the potential outliers and make estimation about the possibility for the quality separation. The second section (6.1.2) is centred on PLS-R and the determination of wheat quality and identification of the important variables to distinguish the samples. In the last section (6.1.3) two methods are created based on discriminative PLS-R and SIMCA in order to predict the quality of unknown samples at 15 dpa and 45 dpa.

In the enclosed Appendix the details of the results are shown, divided by sections corresponding to each dpa:

• Appendix D 15 dpa • Appendix E 20 dpa • Appendix F 25 dpa • Appendix G 30 dpa • Appendix H 35 dpa • Appendix I 45 dpa

Page 94: Determination of wheat quality during the development of ... · sviluppo del grano e sono già presenti quindici giorni dopo l’impollinazione (dpa). Il grano continua ad accumulare

Results and Discussion Determination of wheat quality

83

66..11..11 DDeetteerrmmiinnaattiioonn of baking and feeding quality and outliers detection by PCA

Introduction The PCA is used in this project in order to search for the potential outliers and to get an overview about the possibility to separate the samples on basis of the quality distinction. In this section are resumed the results, discussing the main points and the differences between the stages investigated. In the enclosed Appendix the results are divided by dpa and reported according to the following index:

• Appendix D2 PCA conditions and results of 15 dpa

• Appendix E2 PCA conditions and results of 20 dpa

• Appendix F2 PCA conditions and results of 25 dpa

• Appendix G2 PCA conditions and results of 30 dpa

• Appendix H2 PCA conditions and results of 35 dpa

• Appendix I2 PCA conditions and results of 45 dpa

Page 95: Determination of wheat quality during the development of ... · sviluppo del grano e sono già presenti quindici giorni dopo l’impollinazione (dpa). Il grano continua ad accumulare

Results and Discussion Determination of wheat quality

84

Results

The importance of the analysis is based on the reproducibility of the studies, and it is greatly controlled by the detection of outliers as discussed in section 5.8. The data set is initially composed of sixty mass spectra for each variety for a total of 240 mass spectra for each stage investigated (dpa). The mass spectra without signals (i.e. they look like a line) are then removed. The number of excluded mass spectra is higher at early stages of development compared to the number for the mature grains; for example, 58% of spectra have been eliminated for 15dpa while the 34% for 45dpa (table 6.1). The higher percentages of spectra eliminated at 15dpa may be due to the lower concentration of proteins, because the extraction, crystallisation and all the procedures of analyses have been performed exactly in the same way.

Variety Quality Number of mass spectra 15 dpa

Number of mass spectra 45 dpa

Extraction

993618 Feeding 17+12=29 20+13=33 A+B

Stakado Feeding 22+20=42 21+15=33 A+B

Miller Baking 7+7=14 22+15=38 A+B

Pentium Baking 6+11=17 27+24=51 A+B

Total 52+50=102 90+68=158 A+B

Decrease 58% 34%

Tab 6.1 Number of mass spectra used for the analysis at 15dpa and 45dpa and percentage of samples eliminated. The remaining spectra are pre-processed and examined by PCA for outlier detection, which reduces the number further. The outlier detection is initially performed applying the PCA on samples of the same dpa; on the basis of residual X-variance against the leverage, i.e. the influence plot, the potential outliers are discovered, marked and subsequently controlled if they have abnormal values, thus eliminated.

Page 96: Determination of wheat quality during the development of ... · sviluppo del grano e sono già presenti quindici giorni dopo l’impollinazione (dpa). Il grano continua ad accumulare

Results and Discussion Determination of wheat quality

85

PCA is used to explore the data matrix and to get an overview of samples when they are processed by multivariate data analysis. For this purpose the PCA is now applied to each stage of development to all the samples collected (that includes four varieties). The molecular weight is chosen as X-variables. PCA decompose the complex multidimensional dataset into a structure part and a noise part according to equation 3.1. The structure part is shown in Unscrambler by two plots: the score plot (T-values), and the loading plot (P-values), which show the location of the samples and the X-variables into the new axis system respectively. The reduction of dimensionality into few components that explain the variance in the data allows a human visual investigation of the samples. By looking at 1, 2, 3 dimensional score plot it is possible to get an overview of the data in order to recognise if the location of the samples are described by quality distinction. The samples are coloured in the score plot by group of quality, for an easier investigation (figure 6.1). The results of each dpa separation are given in the corresponding appendix, but generally the separation is difficult and not completely clear at 15dpa (appendix, figure D2.1) and easier and clear at 45dpa. The capability of PCA to separate the samples according to their quality increases bit by bit with the following dpa (20, 25, 30, 35, 45), i.e. when more gluten proteins are stored in the grain.

Figure 6.1 The PCA score plot of 45dpa samples shows that the four varieties (993618, Stakado, Miller and Pentium) are grouped in four distinct clusters. The third and fourth PC separates the feeding quality from baking quality.

Miller

Pentium

Stakado

993618

Feeding quality

Baking quality

Page 97: Determination of wheat quality during the development of ... · sviluppo del grano e sono già presenti quindici giorni dopo l’impollinazione (dpa). Il grano continua ad accumulare

Results and Discussion Determination of wheat quality

86

At 45dpa, the results of separation are so clear that not only the baking quality is separated from feeding quality but also the varieties are distinguished into four distinct clusters (figure 6.1). In addition, the differences between the stages are indicated also by explained variance (shown at the bottom of the figure). The explained variance expresses the importance of the principal component and it is measured as a percentage of the total variance in the data, i.e. that it is a measurement of the proportion of variation in the data valuated by the current PCs (28). For the PCs that separate the samples on basis of wheat quality, the 15dpa model has low percentages of explained variance (2% and 1%; appendix, figure D2.1), which sequentially increase up 14% and 7% for the analysis at 45dpa (appendix, figure I2.1), suggesting the quality is already discernable at 15dpa, but the distinction is clearer and much more explained at 45 dpa. At 15dpa some Stakado samples (feeding quality) are always mixed with baking quality when the PC is projected in a quality distinction. This suggests that some samples collected from Stakado variety at 15 dpa, may have not synthesised yet the feeding pattern. Since the loading values (P) are correlated to score values (T) (equation 3.1), the loading plot is used to identify which variables that influence the model. In the loading plot, the abscissa (X-variables) shows the m/z of the proteins analysed, while the ordinate shows the values that geometrically represent the cosine of the angles between the variables and the current PC (figure. 6.2). The loading plot of 45dpa model is composed of the line plot of the PC3 (blue) and PC4 (red) (figure 6.2).

Figure 6.2 The loading plot from the PCA at 45dpa. The variables (m/z) are in abscissa, while the values of loading (included between +0.3 / -0.3) are shown at the ordinate. In blue is indicated the PC3, in red the PC4. The positive high values of PC3 and negative high values of PC4 contribute to the separation of baking samples (referred to the score plot in figure 6.1).

Page 98: Determination of wheat quality during the development of ... · sviluppo del grano e sono già presenti quindici giorni dopo l’impollinazione (dpa). Il grano continua ad accumulare

Results and Discussion Determination of wheat quality

87

The high loading values indicate which variables are important and how much they contribute to PC3 and PC4, which separate the samples in the score plot. The location of the feeding quality samples are due to their values of X-matrix; in figure 6.1 the feeding samples with a large score value for the PC3 and PC4 have a large positive value for the variable with large positive loading of PC3 and PC4 (figure 6.2). Therefore the loading plot may give information about proteins correlated to wheat quality in terms of m/z or MW covered, considering that the peaks may be correlated to each other (section 5.7). At the same time the complexity of the PCA loading plot, make the identification of the significant variables very difficult and subjectively influenced. The important variables will be suggested by further PLS-R analysis on basis of Unscrambler significative variables and uncertainty test carried out in section 6.1.2. Unscrambler calculates the calibration residual variance, and the validation residual variance. The calibration variance is calculated by fitting the calibration data into the model, while the validation variance is calculated by testing the model on data not used when making the model. The optimal number of PCs is calculated on the basis of minimum values of residual variance, i.e. until the explained variances (the variance in the data that is explained, is complementary to residual variance) stop increasing with addition of PC (figure 6.3). This limit of maximum explained variance separates the useful information of the data from the redundancy information, or rather the noise (equation 3.1).

Figure 6.3 The residual variance plot of 45 dpa PCA model. For each PC calculated the blue bars give the total residual variance of the calibration model, while the red bars illustrates the total residual variance of the validation model. The optimal number of PCs used is based on the number of PCs where the residual variance stops decreasing with further addition of PCs. The total calibration and validation residual variance are closed to 0 and they are similar, i.e. the model is properly created and it representative of news data.

Page 99: Determination of wheat quality during the development of ... · sviluppo del grano e sono già presenti quindici giorni dopo l’impollinazione (dpa). Il grano continua ad accumulare

Results and Discussion Determination of wheat quality

88

For each dpa, the models created have a low (close to 0) total residual variance, i.e. the model explains most of the variation in X (figure 6.3). Conversely when the total residual variance is high the model has not explained much information and a lot of redundancy information are held in the spectra. The minimum calibration and validation residual variance of 15dpa and 45dpa models are similar (around 0.00005 figure 6.3 and appendix, figure D2.2), suggesting this may be the limit after that starts the noise of instrument. The methods are good and they represent the data well (no outliers in the models) because the two residual variances do not differ too much and they are closed to 0, i.e. the calibration set is well fitted and it properly describes new samples. The difficulty in separating the samples at 15dpa, compared to 45 dpa, is anticipated by the visual difference between 15dpa and 45dpa mass spectra (Appendix D1, E1, F1, G1, H1 and I1). The mass spectrum profile changes during the grain filling by a gradually increase in the number of peaks. For example, comparing the antipodes 15dpa and 45dpa mass spectra, the differences are clearly visible: approximately three visible peaks at 15dpa against around fifteen peaks at 45dpa (figure 6.4). The protein content changes from 15dpa to 45dpa, when the grain is harvested; the MALDI-TOF MS separates major quantity of proteins that allows a more evident separation of the samples.

11000 17000 23000 29000 35000 41000 47000 53000 59000 65000 71000 m/z46.0 57.2 66.6 74.8 82.2 89.0 95.2 101.1 106.6 112.0 117.0 uS

420

500

580

660

740

820

900

980

1060

1140

Abundance

Sum of 30 from 38 | Positive Polarity | No Filter

Figure (6.4) Comparison of mass spectra at 15 and 45 dpa (Pentium variety). The mass spectra at 15dpa (left) have few peaks (three evident) compared to 45 dpa (right) that have around fifteen clear peaks.

The separation of the samples on basis of their quality is more evident at 45dpa as compared to 15dpa. At 15dpa, the samples have been carefully pre-processed in order to show the difference between baking and feeding quality.

10000 16000 22000 28000 34000 40000 46000 52000 58000 64000 70000 m/z44.0 55.6 65.1 73.5 80.9 87.9 94.2 100.1 105.7 111.1 116.2 uS

240

270

300

330

360

390

420

450

480

510

Ab

un

da

nce

Sum of 22 from 441 | Positive Polarity | No Filter

Page 100: Determination of wheat quality during the development of ... · sviluppo del grano e sono già presenti quindici giorni dopo l’impollinazione (dpa). Il grano continua ad accumulare

Results and Discussion Determination of wheat quality

89

The results show that the differences between mass spectra from baking and feeding quality already exist at 15dpa but at 45dpa are clearer. The mass spectra at 15dpa have been accurately investigated in order to understand where the discrimination between the two qualities was. Two ranges of molecular weight were found more interesting for wheat quality discrimination, which are the point of reference for the development of the discriminative PLS-R and SIMCA models:

1. 14-16.4 kDa 2. 30-45 kDa

The details of wheat quality investigation on samples at 15 dpa are given in appendix D2.

Page 101: Determination of wheat quality during the development of ... · sviluppo del grano e sono già presenti quindici giorni dopo l’impollinazione (dpa). Il grano continua ad accumulare

Results and Discussion Determination of wheat quality

90

6.1.2 Determination of wheat quality by PLS-R

Introduction The PCA has revealed that the X-matrix, i.e. the mass spectra, contains meaningful differences between baking quality samples and feeding quality samples. These differences though clearer at last stage of development (35dpa, 45dpa), already exist at 15dpa. In this section, the PLS-R is used to relate the same set of X-variables (mass spectra) with a second data set composed of the known quality variables, which form the Y-variables. For each dpa, a model is created: establishing a regression model between the two matrices, the purpose of these analyses is the investigation of which part of the mass spectrum is important to distinguish the samples unqualified and qualified for breadmaking purpose. This section resumes the results obtained for each dpa investigated, discussing the hidden structure of the data, how the X -variables are related to the Y-variables, and for each dpa investigated, which important variables influence the quality separation the most. In the enclosed Append ix the results are divided by dpa and reported according to the follow index:

• Appendix D3 PLS-R conditions and results of 15 dpa

• Appendix E3 PLS-R conditions and results of 20 dpa

• Appendix F3 PLS-R conditions and results of 25 dpa

• Appendix G3 PLS-R conditions and results of 30 dpa

• Appendix H3 PLS-R conditions and results of 35 dpa

• Appendix I3 PLS-R conditions and results of 45 dpa

Page 102: Determination of wheat quality during the development of ... · sviluppo del grano e sono già presenti quindici giorni dopo l’impollinazione (dpa). Il grano continua ad accumulare

Results and Discussion Determination of wheat quality

91

The hidden structure of the data The data are analysed for dpa, i.e. each stage of development is separately investigated. The MALDI-TOF mass spectra are chosen as the X-variables and the classes of wheat quality and variety as the Y-variables. The Y-variables are six, divided into 4 varieties and two quality variables. The samples that belong to a class samples give the value of “one” whilst the samples that not belong to the class give the value of “zero”. For examples, the Pentium samples have value of 1 of Y-variables corresponding to Pentium and baking but they have the values of zero for the Stakado, Miller, 993618 and feeding classes. Full cross validation with uncertainty tests are selected, and the models are recalculated on basis of significant X-variables. The three dimensional scatter plot of Y-loadings for the first three PCs from PLS-R (figure 6.5) allows to interpret the components of separation easier and is used to discover the hidden structure of the data. In figure 6.6 the score plot is plotted in the same directions and the samples are separated by the first three PCs, as for each result given in appendix. For each stage of development, the first three components separate the samples in a combination of these principal components, which could be called the ‘dimension of quality’ (red arrow, figure 6.5). Here is shown the scatter plot of Y-loading (figure 6.5) and the score plot (figure 6.6) for the 45dpa; the results of 15, 20, 30, 35 dpa analyses are given in appendix (fig (D3.1) (E3.1) (F3.1) (H3.1), respectively).

Figure 6.5 Three dimensional scatter plot of Y-loadings for the first three PCs from the 45dpa PLS-R model. The separation is explained by the first three PCs, where the red arrow points out the quality dimension explained principally by PC1. Each of the four varieties is located in different squares.

PC1 PC2

PC3

Page 103: Determination of wheat quality during the development of ... · sviluppo del grano e sono già presenti quindici giorni dopo l’impollinazione (dpa). Il grano continua ad accumulare

Results and Discussion Determination of wheat quality

92

The “quality dimension” lengthens along the maximum distance between two classes of wheat qualities (feeding and baking), which are well explained by the first three PCs. Each of the four varieties is approximately located in a different squared of the loading plot.

Figure 6.6 Three dimensional scatter plot of X-loadings for the first three PCs from 45dpa PLS-R model. The score plot is positioned in the same direction as the previously Y-loading plot to show the correlation. Each variety is coloured in a different colour. Furthermore, the objects with a large positive score for PC1 have generally higher X-values than average for the variables with large positive loadings of their respective PC. The higher X-values of one quality maybe correlated to the presence of peaks, i.e. the presence or absence of proteins. After recalculation of the models with only the significant values, the total explained variation is large, 24%, 28%, 21% for PC1, PC2, PC3 respectively, (shown at the bottom of the figure), i.e. a significant portion of the information useful for quality separation are explained in the data set into few variables by these three principal components. For each model made, the total explained variance for the first three PCs is larger than 70%, for example the model of 15dpa described by 3 PCs explains 91% of the variation in the data (table 6.2).

PC1 PC2

PC3

Feeding Baking

Page 104: Determination of wheat quality during the development of ... · sviluppo del grano e sono già presenti quindici giorni dopo l’impollinazione (dpa). Il grano continua ad accumulare

Results and Discussion Determination of wheat quality

93

Dpa

Explained X-variance for PC1,PC2,PC3

Total explained X-variance accumulated after three PCs

Explained Y-variance for PC1,PC2,PC3

Total explained Y-variance accumulated after three PCs

15 77%, 7%, 7% 91% 23%, 38%, 8% 69%

20 54%, 27%, 12% 93% 56%, 23%, 13% 92%

25 54%, 24%, 9% 87% 44%, 19%, 17% 80% 30 53%, 17%, 9% 79% 28%, 38%, 6% 72%

35 38%, 12%, 20% 70% 56%, 10%, 4% 70% 45 24%, 28%, 21% 73% 48%, 19%, 11% 78%

Tab 6.2 Explained variance for X variables and Y variables. Comparison between the dpa. For each model created, the optimal number of PCs is calculated on the basis of the lowest value of RMSEP (figure 6.7) (appendix, fig (D3.3) (E3.3) (F3.3) (H3.3)).

Figure 6.7 RMSEP for 45dpa model. The lowest RMSEP gives the optimal number of dimension. The optimal number of PCs to choose for making the model is 5, i.e. when RMSEP reaches the minimum value.

Page 105: Determination of wheat quality during the development of ... · sviluppo del grano e sono già presenti quindici giorni dopo l’impollinazione (dpa). Il grano continua ad accumulare

Results and Discussion Determination of wheat quality

94

Important variables

The purpose of this section is to unravel the mass spectra in order to detect for each stage of development, at which molecular weight ranges the gluten proteins with largest influence on breadmaking quality are. These intervals may be interesting for further investigation, because they may be related to the presence or absence of proteins specific to a quality. It is recalled that for each molecular weight ranges suggested, the proteins may have different MW, according the correlation of peaks (section 5.8). The most important variables are thus the X-variables that mostly influence the model; they are detected on basis of the regression coefficients and the correlation loadings, after recalculation of the model with the significant X-variables option of Unscrambler. The correlation loadings are firstly used to discover the structure in the data and to understand the relationship between X-variables and the quality. When the ellipse option is used, the correlation loading plot shows that the variance of the variables close to the outer ellipse are explained by 100% and for the variables close to the inner ellipse the variance is explained by 50 % (figure 6.8).

Figure 6.8 The correlation loading plot of 25dpa model shows the X- (mass spectra, in blue) and Y- variables (varieties and quality, in red). The first and third PCs separate the quality. The X-variables on the right hand side are positively correlated to feeding quality and they cover the range between 33.4-34.3 and 40.9-41.8 kDa.

33.4-34.3 kDa

40.9-41.8 kDa

38.8-39.7 kDa

26.3-28.2 kDa

Page 106: Determination of wheat quality during the development of ... · sviluppo del grano e sono già presenti quindici giorni dopo l’impollinazione (dpa). Il grano continua ad accumulare

Results and Discussion Determination of wheat quality

95

Since the wheat qualities are well separated by the first PCs, the regression coefficients of the corresponding PCs are secondly plotted in order to identify the important variables that have separated the samples along this ‘qua lity dimension’. In fact, the regression coefficients for the first three PCs summarize the relationship between the X-variables and the Y-variables by a model obtained with 3 components, showing how they are correlated to the quality. For example, at 25dpa, the variables covering 33.4-34.3 kDa and 40.9-41.8 kDa are highly correlated to the samples, which belong to feeding quality (figure (6.9) shows the important variables for the 25dpa model).

Figure 6.9 A closer view of the important variables for the model at 25dpa: regression coefficient of the first three PCs (blue, red and green for the 1st, 2nd and 3rd, respectively). The variables that are highly correlated to the separation of wheat quality (upper part for feeding and down part for baking quality) are indicated by circles and the corresponding molecular weights are nearly reported.

33.4-34.3 kDa

40.9-41.8 kDa

38.8-39.7 kDa

26.3-28.2 kDa

19.5-19.6 kDa

Page 107: Determination of wheat quality during the development of ... · sviluppo del grano e sono già presenti quindici giorni dopo l’impollinazione (dpa). Il grano continua ad accumulare

Results and Discussion Determination of wheat quality

96

For each dpa, the variables that are the most significant ones for the separation of samples in the model are found (appendix, fig (D3.4) (E3.4) (F3.4) (H3.4) (I3.4)) and compared (table 6.3).

Dpa Important variables: range of MW covered (Feeding)

Important variables: range of MW covered (Baking)

15 33.4-34.5 kDa 40.9-41.6 kDa 44.2-44.4 kDa 46.6 kDa

19.1-19.9 kDa 22.1-24.2 kDa 32-32.8kDa 39.1-40 kDa

20 16.7 kDa 32.8-34.3 kDa 59.7 kDa

34.9-35.3 kDa 38.4-38.6 39.1-39.7 kDa

25 33.4-34.3 kDa 40.9-41.8 kDa

19.5-19.6 kDa 26.3-28.2 kDa 38.8-39.7 kDa

30 15.1 kDa 21.2-21.4 kDa 33.8-34.1 kDa 40.6-40.9 kDA 50.4-50.6 kDa

34.7-36.4 kDa 77.9-83.4 kDa 38.8-39.7 kDa

35 16.7-18.7 kDa 33.4-34.3 kDa 37.3-38.2 kDa 41.4-41.9 kDa

34.5-34.9 kDa 39.6-40.4 kDa

45 33.4-34 kDa 40-41.6 kDa

38-40.2 kDa 42.7-43.2 kDa

Table 6.3 Summary of the important variables for feeding and baking quality, on the basis of significant variables of regression coefficient PLS-R model.

Some of them are found in each stage of development in only a quality, suggesting they are likely covering a molecular range of specific quality proteins. The feeding samples from 15,25,30,35 and 45dpa have higher values of variables covering 33.4-34.3, suggesting the feeding quality can have one or more proteins specific for this quality in these ranges. According to previous studies the first range was found only in feeding quality (7).

Page 108: Determination of wheat quality during the development of ... · sviluppo del grano e sono già presenti quindici giorni dopo l’impollinazione (dpa). Il grano continua ad accumulare

Results and Discussion Determination of wheat quality

97

6.1.3 Prediction of unknown samples by PLS-R and SIMCA

Introduction The main purpose of this thesis is to make a method for quality determination at different stages of development of the grain. The differences founded by PCA (section 6.1.1) and the demonstration of the same part of the spectrum already exists at 15 dpa (section 6.1.2) allows the development of methods for quality identification. In this section, the discriminative PLS-R and the SIMCA classification are the two methods used in order to classify the unknown samples according to their quality. The discriminative PLS-R methods are created for each stage of development, while SIMCA models are used only for stages at 15 dpa and for 45dpa. This section is subdivided into two parts, consisting of discriminative PLS-R and SIMCA classification. In the enclosed Appendix the results are divided and reported by dpa for discriminative PLS-R and SIMCA classification in two distinct parts, according to the follow index:

• Appendix D3 PLS-R conditions and results of 15 dpa

• Appendix E3 PLS-R conditions and results of 20 dpa

• Appendix F3 PLS-R conditions and results of 25 dpa

• Appendix G3 PLS-R conditions and results of 30 dpa

• Appendix H3 PLS-R conditions and results of 35 dpa

• Appendix I3 PLS-R conditions and results of 45 dpa

• Appendix D4 SIMCA classification for 15 dpa

• Appendix I4 SIMCA classification for 45 dpa

Page 109: Determination of wheat quality during the development of ... · sviluppo del grano e sono già presenti quindici giorni dopo l’impollinazione (dpa). Il grano continua ad accumulare

Results and Discussion Determination of wheat quality

98

Discriminative PLS-R Discriminative PLS-R is used in this project to predict the unknown quality of the samples on basis of a PLS-R model. For each set of data collected from different stages of grain development models for the prediction of unknown samples are created. In the first part of this section, the models are obtained using the samples from extraction ‘A’ and their fitness are calculated on basis of validation methods. In the second part of the section, the unknown samples from extraction ‘B’ are predicted using the model previously obtained. The unknown samples must be pre-processed in the same way as the calibration set; otherwise the prediction will be not valid (28).

Calibration of the models For each dpa, the models are calibrated to describe the connection between X (the mass spectra) and the Y (the quality) values. The outliers are kept out of the calculation of the models, which are recalculated with significant values (score plot of 45dpa model).

Figure 6.11 Score plot of the discriminative PLS-R of 45dpa samples. The first PC separates the samples on basis of their quality: the baking quality (Miller and Pentium) are located on the right hand side, the feeding quality (993618 and Stakado) are located on the left hand side.

Page 110: Determination of wheat quality during the development of ... · sviluppo del grano e sono già presenti quindici giorni dopo l’impollinazione (dpa). Il grano continua ad accumulare

Results and Discussion Determination of wheat quality

99

The fitness of the models A good model should be able to predict the new samples correctly. Before the prediction, the models are validated for estimation of the prediction errors, in order to measure and know the fitness of the model. The validation of a model means therefore testing the performance of a model in order to predict unknown samples that have not been used on the model (28). There are different types of validation, and in each of the pertinent Y-value of the unknown sample has to be known. The best one is called test set validation. A new data set (test set) is used for the validation and it is made independently from the calibration set. It is known necessary to have a huge data set. The calibrated model predict the Y-values (Y-pred) of the validation set which are compared to the real, reference Y-values (Y-ref) that have been kept out from the prediction (28). The comparison results are expressed by prediction errors, or residual variances, which quantifies the accuracy of the predicted Y-value. In the full cross-validation method there is only one set of data, where a part of it is used for calibration and the other part is used for validation; the method is based on calculation of as many sub-models as there are samples, where each sub-model is calculated leaving out one sample at time and use the others for calibration. The Y-value of the sample left out is predicted and after calculation of all the sub-models the squared distance between the predicted and the actual Y-value for each sample is summed and averaged (28). The main result is given by RMSEP, the measure of the uncertainty on the future predictions. This parameter is interpreted as the average uncertainty that can be expected during the prediction of the Y-value of an unknown sample, expressed in the same unit as the Y-variable (28). For example, the RMSEP of 45dpa model is 0.109, and then the Y-pred could shift of an average of 0.109 from the value of Y-ref (figure 6.12). The results of the validation of the models are given as a plot of measured Y-values against the predicted Y-values for the samples belong to validation set. The Y-variables represent the two qualities, that for the feeding quality the measure value is zero, and for the baking quality is one (figure 6.12).

Page 111: Determination of wheat quality during the development of ... · sviluppo del grano e sono già presenti quindici giorni dopo l’impollinazione (dpa). Il grano continua ad accumulare

Results and Discussion Determination of wheat quality

100

Figure 6.12 Predicted against measured values for 45dpa model of samples from validation set. The RMSEP (0.109) indicates the average values of prediction error. The abscissa shows measured Y values, where the baking quality (blue) samples have 1 and feeding quality (red) 0. The ordinate shows the predicted Y-values. Pentium and Miller samples have predicted values close to 1 and Stakado and 993618 samples have values close to 0: they will be correctly predicted. The Stakado and 993618 samples have a low Y-pred values (and that classifies them correctly in feeding quality); contrarily to Pentium and Miller, which have a high Y-pred value (so they will be classified correctly in baking group) (figure 6.12). The fitness of the model is also reported with two other parameters: the slope and the correlation coefficient. The slope is referring to Y-measured against Y-predicted (figure 6.13).The models have a good fit when the value of the slope is close to 1. As more close to 1 the value of the slope is, as the model is well fitted. For the 45 dpa model the good fit of the model is given by the slope of validation of 0.954 (figure 6.13). The correlation is computed as the square root of the covariance between the two variables divided by the product of their variances. It expresses numerically the link between Y-pred and Y-ref . It varies from -1 to +1 (28). For the 45dpa model, the value of correlation of validation set is 0.975, i.e. the Y-

pred of validation set is similar to Y-ref of calibration set.

Page 112: Determination of wheat quality during the development of ... · sviluppo del grano e sono già presenti quindici giorni dopo l’impollinazione (dpa). Il grano continua ad accumulare

Results and Discussion Determination of wheat quality

101

Figure 6.13 The measured Y-value and the predicted Y-values are plotted. The calibration is given in blue, while the validation in red. The correlation is high (0.982 and 0.975 respectively). The prediction values are really similar to the measured values, and this means that the model is surely good. On the basis of RMSEP, slope and correlation coefficient, the models obtained for the discriminative PLS-R are well fitted for each dpa (table 6.4). Inter alia, the model at 15dpa suits in a similar manner than 45dpa: both have a RMSEP of 0.10 and slope of validation and correlation coefficient close to 0.95 and 0.97 respectively (table 6.4).

DPA RMSEP Appendix

Slope Calibration

Slope Validation

Correlation Coefficient Calibration

Correlation Coefficient Validation

15 0.104 D3 0.965 0.944 0.982 0.973 20 0.072 E3 0.978 0.978 0.988 0.983

25 0.096 F3 0.976 0.964 0.987 0.961

30 0.139 G3 0.933 0.925 0.966 0.958 35 0.119 H3 0.951 0.944 0.975 0.966

45 0.109 I3 0.965 0.954 0.982 0.975

Table 6.4 For each dpa the fitness of the model is indicated as RMSEP, the slope and the correlation coefficient of the calibration and validation model.

Page 113: Determination of wheat quality during the development of ... · sviluppo del grano e sono già presenti quindici giorni dopo l’impollinazione (dpa). Il grano continua ad accumulare

Results and Discussion Determination of wheat quality

102

Prediction of unknown samples The PLS-R models are presently then used to predict which category, baking or feeding, the “unknown” samples belong to. Therefore only the X-values of news samples are given to a regression model, simulating the “unknown” samples, in order to obtain computed Y-pred values, i.e. the quality. The separation uses a binary discriminant variable codec 0/1 (yes =1; no =0) related to a Y variable in the regression model, baking and feeding respectively (28). The prediction of unknown samples is thus simulated as if it is a real classification. The models obtained with samples from extraction ‘A’ are applied on the ‘unknown’ samples of extraction ‘B’, where the quality is hidden to the model. Therefore, the quality of samples from extraction B is also known, and will be used to calculate the number of samples correctly predicted from the discriminative PLS-R. However it is again reminded that the model will use only the X-variables of extraction B to calculate the value of Y-pred., i.e. the values of mass spectra. The result is given in a plot of Y-pred values for all samples against Y-ref values (figure 6.14). Since the Y-ref is the quality, which takes the value of zero or 1, the samples which give Y-pred

values higher than the average (0.5) will be classified in one group or else they will classify into the other group. For examples, in the plot for prediction of 45dpa samples, the Y-pred value is referring to the baking variable. Samples that get values > 0.5 are classified into the baking group; contrarily, the samples have a value < 0.5 they are classified as a feeding quality (28). The line cross value 0.5 (called “reference line”) points out the separation of the two group (figure 6.14).

Figure 6.14 The extraction ‘B’ from grains at 45dpa has been used as unknown samples. All of Miller and Pentium samples have high predicted Y-values (on the right side) thus they are correctly classified into the baking group. All of Stakado and 993618 have low values and they are classified into the feeding group.

Feeding quality

Feeding quality

Baking quality Reference line

Page 114: Determination of wheat quality during the development of ... · sviluppo del grano e sono già presenti quindici giorni dopo l’impollinazione (dpa). Il grano continua ad accumulare

Results and Discussion Determination of wheat quality

103

All the samples from extraction “B” are correctly separated on basis of the model created from extraction “A”. The samples from feeding quality have values lower than 0.4 and baking higher than 0.8. Using the models created for the prediction of the six different stages of development, discriminative PLS-R has predicted correctly the “unknown” samples from extraction “B” at 100% (table 6.5).

Dpa Number of samples in the calibration set

Number of “unknown” samples

Percentage of correctly predicted samples

Appendix

15 38 50 100% D3 20 35 24 100% E3

25 87 91 100% F3 30 75 70 100% G3

35 94 83 100% H3

45 91 67 100% I3

Figure 6.5 For each dpa, the number of mass spectra in the calibration set and test set. The percentages of correctly predicted samples are calculated on basis of samples correctly predicted to the total number of samples used. The last column is referring to the results in the appendix.

The result shows that MALDI-TOF MS with multivariate data analysis were able to predict the wheat quality of samples at 15dpa, i.e. one month before the harvest of the grain, in a stage of the development where the grain has not finished accumulating the gluten proteins. Two notes are here mentioned for 45dpa model and 15dpa model. The prediction model based on 45dpa samples has been also applied on data, pre-processed in the same way, from the samples of different dpa, to predict the quality at an earlier stage of development (appendix I3, “prediction of 15dpa”). The prediction model work well on samples at 30, 35 and 45 dpa; the samples at 15, 20, 25 dpa are not correctly predicted. In general, the capability to predict unknown samples using a 45dpa model decreases when the model is applied on “younger” samples. The prediction model for the samples at each stage of development should be done using a calibration set of the same stage. However this

Page 115: Determination of wheat quality during the development of ... · sviluppo del grano e sono già presenti quindici giorni dopo l’impollinazione (dpa). Il grano continua ad accumulare

Results and Discussion Determination of wheat quality

104

trial may suggest that the protein composition and/or content has changed during the development of the grain and the model can not work on cons iderable difference of X-matrix. Finally, two other models are compared using data without weighting any variables, and data standardized all the variables with A/(Sdev+B), where A=1 and B=0. The result points out no clear improvements of the prediction model between the model that uses data standardized and the model that uses data not standardized (appendix, figure I3.19 and I3.20). The prediction model based on 15dpa samples classified the samples correctly but they have a large deviation. A large deviation signifies that the samples used for prediction are not similar to the samples used to making the calibration model (28), thus the classification of new set of samples could be wrong. However substantial distinction between the spectra from the two qualities exist thus improving the pre-processing could be possible predict the samples with more certainty.

Figure 6.15 The result of prediction of 15dpa samples from extraction ‘B’. All the samples are correctly classified but the Y-pred has a high standard deviation.

Page 116: Determination of wheat quality during the development of ... · sviluppo del grano e sono già presenti quindici giorni dopo l’impollinazione (dpa). Il grano continua ad accumulare

Results and Discussion Determination of wheat quality

105

SIMCA Classification Classification is a method for prediction of class membership (samples), where the response is a category variable (wheat quality) (28). SIMCA is a classification method where the samples are classified on a basis of modelling the similarities between samples from the same class. First the classes are created on the basis of PCA models (class models) of samples in the calibration set (28). Secondly, the “unknown” samples (classification set) are confronted to the class models and assigned to classes according to their similarities to the calibration samples (28). Especially when the classes are known a priori, the model of the class can be done easily and new samples can be predicted (28). The purpose of the classification is thus to be able to predict which class an unknown sample belong to (28). The SIMCA classification is applied on samples of 45dpa and 15dpa, in order to classify the two antipodes of collection samples. A positive result means that there is a possibility to use SIMCA also for the stages between 15 and 45 dpa.

Page 117: Determination of wheat quality during the development of ... · sviluppo del grano e sono già presenti quindici giorni dopo l’impollinazione (dpa). Il grano continua ad accumulare

Results and Discussion Determination of wheat quality

106

45 DPA First the SIMCA is used for the purpose of classifying the four wheat varieties. Later SIMCA is applied to classify the sample, into the two quality groups: baking and feeding. For the first purpose, a PCA model is created for each class of variety, using a training data set composed of samples from extraction “A”. Full cross validation is used in each PCA model and the data is not weighted. Table 6.6 shows the details of calibration set.

Variety Extraction Name of the samples Stakado First (A) ST45(01-09,11,13-22,24,28)

993618 First (A) 9945(01-02,04-08,10,13,14,16-25) Miller First (A) MI45(01-04,07-20,22-25,28-29)

Pentium First (A) PE45(01-11,13-17,20-30)

Table 6.6 The calibration set for the classification of varieties. The varieties are shown in the first column, in the second are the extraction listed and in the third are the name of the samples.

The classification is made with unknown samples that come from extraction “B”. These unknown samples are processed and if they are analogues to the calibration samples of one or more classes, they will be classified into these classes, otherwise they will be rejected. The result is shown in figure (6.16) as the Cooman’s plot, where the vertical and horizontal line represents the classification limit of the model feeding and baking, respectively. On basis of distance between the samples and the two models, the unknown samples are placed inside the four rectangular regions (figure 6.16): a) the samples inside feeding class but outside baking class; b) inside baking class and outside feeding class; c) inside both classes; d) outside both classes.

Page 118: Determination of wheat quality during the development of ... · sviluppo del grano e sono già presenti quindici giorni dopo l’impollinazione (dpa). Il grano continua ad accumulare

Results and Discussion Determination of wheat quality

107

Figure 6.16 The Cooman’s plot. The two training-sets, Pentium and 993618 are shown in red and blue, respectively. The samples classified are given in green. The vertical and horizontal lines represent the statistical classification limit of the feeding and baking models, respectively. The significance level is indicated at the bottom to the left. Samples which are placed inside the rectangular region “a” are classified as feeding class, in region “b” baking class, in “c” in both class and in region “d” are outside both baking and feeding class.

The classification limit can be changed using the “significance value” in a range between 0.1% and 25%: when the value is 5%, there is a risk of 5% that a sample falls outside the class, even if it really belongs to it; 95% of objects which truly belong to the class will be classified into one or more classes (28). When some samples are classified in more than one class, it is possible to increase a level of significance that narrows down the class limits and more samples are rejected: significance level of 25% means that only ‘certain’ samples are classified into the class, while more ‘doubtful’ samples are left outside (28). On the other hand, an increase of the significance level may leave some samples outside the classes, and they may be predicted using a PLS-R discriminative model. The significance level has been chosen for each classification. The first classification (figure 6.16) is applied on samples of 993618 and Pentium varieties from extraction B. The two respective models are selected in order to classify the unknown samples. The usual value utilised to assess the significance of observed effects is 5% (28), but it is increased to 10% because many samples are classified in more than one class while using of 5%. In this way, 100% of 993618 and 95.8 % of Pentium samples are correctly classified (Appendix, table I4.2).

a)

c)

d)

b)

Page 119: Determination of wheat quality during the development of ... · sviluppo del grano e sono già presenti quindici giorni dopo l’impollinazione (dpa). Il grano continua ad accumulare

Results and Discussion Determination of wheat quality

108

The good separation is given because the model distance between the classes is big (figure 6.17). When the distance is bigger than 3, the classes are well separated and they can be distinguished from each other (28).

Figure 6.17 The model distance between 993618 (left) and Pentium (right) classes is bigger than value 3 (it is around 13), it is accepted as a good separation between the two class (28). The class models have been improved on basis of the “discrimination power” and “modelling power”. In fact, the variables with a discrimination power higher than 3 (28) (figure 6.18) can be considered important for the differentiation between two classes, while the variables for each class that has a “modelling power” closer to 1 (figure 6.19) are more relevant in describing that class, because the variable’s variance has been used to described the model (28).

Page 120: Determination of wheat quality during the development of ... · sviluppo del grano e sono già presenti quindici giorni dopo l’impollinazione (dpa). Il grano continua ad accumulare

Results and Discussion Determination of wheat quality

109

Figure 6.18 Discrimination power between the two quality models for the classification samples at of 45dpa. The variables that have a high value (larger than 3) are selected in order to recalculate the two models. The plot indicates that variables encircled by red circles have a high ability to discriminate the two classes (especially the variable at 38.8 kDa).

3

38.8-39.5 kDa

33.7 – 34 KDa

19.5-19.8 kDa

48.7 kDa

55 kDa

Page 121: Determination of wheat quality during the development of ... · sviluppo del grano e sono già presenti quindici giorni dopo l’impollinazione (dpa). Il grano continua ad accumulare

Results and Discussion Determination of wheat quality

110

Figure 6.19 The influence of the variables to the model is plotted in a modelling power plot (here of the 993618 class). The closer the value is to 1, the better the variable’s variance is described in the model. The variables encircled in red (between 31-36 kDa) are selected in order to recalculate the class. The class models are then recalculated on the basis of discrimination power and modelling power, keeping out the variables that have low values. The result of recalculation are new class models separated by a major distance (appendix, figure I4.3). SIMCA is used also for the purpose of classifying the samples into the two qualities (baking and feeding). For each quality, a PCA model is created from extraction “A” and all the samples from extraction “B” are applied for classification. The two “baking and feeding classes” are distinctive (figure 6.20 blue and red, respectively), and 100% of the classified new samples (green, in the figure 6.20) are correct, and only seven samples are not classified.

31–36 KDa

Page 122: Determination of wheat quality during the development of ... · sviluppo del grano e sono già presenti quindici giorni dopo l’impollinazione (dpa). Il grano continua ad accumulare

Results and Discussion Determination of wheat quality

111

Figure 6.20 The baking and feeding classes at 45dpa are well separated into two distinctive groups (blue and red, respectively). The significance level modifies the distance between the samples in the PCA model. By increasing the level from 1% to 5%, the lower square to the left becomes smaller and the upper square to the right gets bigger leading to more samples getting rejected as non-classified samples. Seven samples are in the rectangle “d” and they are not classified.

The SIMCA model on 45dpa samples has classified the samples from extraction “B” correctly, in classes of varieties and qualities.

The samples not classified

Samples classified in both classes.

Baking class

Feeding class

a)

b)

d)

c)

Page 123: Determination of wheat quality during the development of ... · sviluppo del grano e sono già presenti quindici giorni dopo l’impollinazione (dpa). Il grano continua ad accumulare

Results and Discussion Determination of wheat quality

112

15 DPA

Two SIMCA models are created in order to classify unknown samples at 15dpa using class models based on the wheat quality. The samples used to make the model, come from the extraction “A”, using the important variables as X-variables found by PLS-R. The SIMCA model is then improved by recalculation on the basis on “discrimination power” and “modelling power” and thus creating two classes that represent the variance of mass spectra better. The samples not used as meaning the calibration set are processed and classified: only 4 samples are not-classified, and one sample (9915B01) is classified as both feeding and baking quality (figure 6.21).

Figure 6.21 Cooman’s plot for quality classification of 48 samples. The quality classes are based on PCA models of samples from extraction “A” (baking and feeding class in blue and red, respectively). The model classifies correctly 43 samples from extraction “B” (green) and leaves out five samples (4 into quadrant d and 1 into quadrant c). These samples will be processed by PLS-R model. However, the quality of these five samples could be predicted correctly by discriminative PLS-R.

The 4 samples not classified

The sample classified in both classes.

Baking class

Feeding class

a)

b)

d)

c)

Page 124: Determination of wheat quality during the development of ... · sviluppo del grano e sono già presenti quindici giorni dopo l’impollinazione (dpa). Il grano continua ad accumulare

Results and Discussion Determination of wheat quality

113

Figure 6.22 The samples unclassified by SIMCA could be processed by discriminative PLS-R. The quality of five unclassified samples is correctly predicted with discriminative PLS-R. The quality classification with SIMCA of unknown samples is more difficult at 15dpa than 45dpa. However SIMCA was able to classify on the basis of the similarity the samples at 15dpa. In case of not classification of some samples, discriminative PLS-R has been used to correctly predict the quality of these samples.

Page 125: Determination of wheat quality during the development of ... · sviluppo del grano e sono già presenti quindici giorni dopo l’impollinazione (dpa). Il grano continua ad accumulare

Results and Discussion Gluten proteins development

114

6.2 Gluten proteins development The glutenins and gliadins have been subjected to separation at wheat samples according to their quality. Section 6.1 focused on the successful separation of samples on the basis of their quality. For each stage of development was created a model for the determination of the quality of samples collected. In spite of the separation on basis of their qua lity at every stage analysed, the 15dpa spectra contained only few peaks (i.e. proteins) compared to 45dpa (section 6.1.1). The grain starts the filling of storage proteins at 10 dpa and continues to synthesise and accumulate storage proteins until the ripening (45dpa). During the grain filling the accumulation is not a constant process: the maximum rate of synthesis of gliadins occur approximately 6-8 days before the maximum rate of glutenins (with a duration extended by a similar period) (25) and the process is controlled by environmental condition (ex. temperature) and nutritional compound available that could alter the rate of synthesis (25). Therefore the different speed of accumulation between gliadins and glutenins result in a shift of total proteins composition during the storage process. Even the ratio of synthesis of gliadins and glutenins is changed (i.e. the quantity), the average molecular weight of polymeric glutenin increase throughout grain filling (25), due to polymerisation processes of smaller glutenins (25) or due to a different synthesis of storage protein in the latter stage. The storage protein composition changes hence during the grain filling. The chemometric studies based on MALDI-TOF mass spectra of the gliadin and glutenin fraction are used to investigate the development of the grain. The PLS-R is applied in this section on samples from 15 dpa to 45 in order to understand the changes after one month of grain filling in extract of gliadins and glutenins. The section is subdivided into two subsections where the varieties are separately analysed using samples of 15 dpa and 45 dpa (6.2.1) and a subsection where the analyses are focused on development of feeding and baking quality from 15dpa to 45dpa (section 6.2.2). The purpose of this work is the investigation of differences between the early stage of development and the ripe stage of development (i.e. from 15 dpa to 45 dpa), using PLS-R on the basis of separation of gliadins and glutenins proteins by MALDI-TOF MS. Appendix L presents the results of the studies on the development of each variety (L1) and each quality (L2).

Page 126: Determination of wheat quality during the development of ... · sviluppo del grano e sono già presenti quindici giorni dopo l’impollinazione (dpa). Il grano continua ad accumulare

Results and Discussion Gluten proteins development

115

6.2.1 Study on development of varieties The aim of this study is to understand if the gliadins and glutenins composition of a variety changes during the grain filling from 15dpa to 45dpa, and search which variables that are correlated to each stage. In order to know the changes in the spectra after one month of grain filling, the PLS-R is applied on samples of one variety collected at 15 dpa and at 45 dpa, i.e. the varieties are separately analysed one at a time. The samples at 15 dpa represent an early stage of grain development and the samples at 45dpa represent the mature stage. The X-variables cover the MW between 27-73 kDa; the rest of the spectra cannot be used because after the pre-processing the spectra show a different slope (appendix L1.1.1). First by the PCs that separate the two stages of development are searched for and than the variables positively correlated to 15 dpa and 45 dpa stages are established. These variables are compared in order to find out at which molecular weight specific protein for that stage of development might be located. All of the results of this section are located in the appendix ‘L1’.

Page 127: Determination of wheat quality during the development of ... · sviluppo del grano e sono già presenti quindici giorni dopo l’impollinazione (dpa). Il grano continua ad accumulare

Results and Discussion Gluten proteins development

116

Results from PLS-R on each variety at 15 and 45 dpa The PLS-R is performed using the molecular weight as X-variables and the dpa as Y-variables, which have got the value corresponding to the values of dpa. The Y-variables are weighted with standard deviation, while the X-variables are not weighted. The PLS-R is applied on mass spectra of gliadins and glutenins extracted at 15 dpa and 45 dpa. For each variety the samples have been separated on the basis of their age (figure 6.23 and appendix, figure L1.2.1, L1.3.1, L1.4.1).

Figure 6.23 The score plot of PLS-R (PC1 vs. PC2) on 993618 samples at 15 dpa (blue) and 45 dpa (green). The two stages of development are separated by PC1, which explains most of the variance in the mass spectra (84%). Most of the variation in the data is explained by the first PCs (the first two PC explain the 88% for 993618 variety, figure 6.23), which separates the two stages, and the same result is obtained for each variety (appendix, figure L1.2.1, L1.3.1, L.1.4.1). Comparing the explained variance for the PCs that separate the samples, each variety has large differences between the mass spectra of 15dpa samples and mass spectra at 45 dpa, suggesting that the protein composition is changed. Moreover

993618 variety

45_dpa

993618 variety

15_dpa

Page 128: Determination of wheat quality during the development of ... · sviluppo del grano e sono già presenti quindici giorni dopo l’impollinazione (dpa). Il grano continua ad accumulare

Results and Discussion Gluten proteins development

117

the correlation coefficient and slope of predicted versus measured Y-values for calibration and validation models indicate the good fit of the models (appendix (L1.2.4) (L1.3.4) (L1.4.4). On the basis of the regression coefficient, the variables that mostly influence the separation between the two stages are marked and compared. Three groups of variables are correlated to the stage of 45dpa: they cover the molecular weight of 30-42 kDa, 49-52.7 kDa and 60-72 kDa (Appendix, figure L1.2.2; L1.3.2; L1.4.2). These variables influence the most the separation of the samples; for the variety 993618 are given by the first PC1 and are shown in the correlation loading of figure 6.24.

Figure 6.24 PLS-R loading plot of X and Y variables. The first PC explains the separation of samples on basis of their dpa (red arrows) with 86% of variance. Three groups of important variables influence the separation the most: they are pointed out in red circles.

49-52.8 kDa

60-72 kDa

30-42 kDa

Page 129: Determination of wheat quality during the development of ... · sviluppo del grano e sono già presenti quindici giorni dopo l’impollinazione (dpa). Il grano continua ad accumulare

Results and Discussion Gluten proteins development

118

Figure 6.25 The PC1 regression coefficient of PLS-R of the 993618 variety and the important variables. The ranges pointed out by red circles are positively correlated to 45dpa. The same ranges are found for each variety, suggesting that the grain might contain proteins covering these ranges at 45dpa. For the variety 993618, these ranges have been selected and analysed separately, each range is able to distinguish the two stages. Conversely the rest of the spectrum is unable to discriminate the two stages, may be due to the presence of the same proteins (if both have peaks) or lack of any proteins (when both have no peaks) (table 6.7).

Molecular ranges containing information for 15dpa and 45dpa stage separation.

Molecular ranges not containing information for 15dpa-45dpa stage separation.

30-42 kDa (45dpa)

49-52.7 kDa (45dpa) 42.3-47.3 kDa

60-72 kDa (45dpa) 77.3-90 kDa

Table 6.7 The important and insignificant ranges of spectra for stages separation.

30-42 kDa

49-52.7 kDa

60-72 kDa

Page 130: Determination of wheat quality during the development of ... · sviluppo del grano e sono già presenti quindici giorni dopo l’impollinazione (dpa). Il grano continua ad accumulare

Results and Discussion Gluten proteins development

119

Figure 6.26 The mass spectra of some 993618 samples (indicated at the bottom) at 15dpa (blue) and 45 dpa (red). The important variables used by PLS-R model to discriminate the two stages cover MW ranges that are related to mass peaks: they are pointed out in the figure by red circles. Moreover, these ranges correspond to the presence of more peaks at 45 dpa (figure 6.26 appendix, figure L1.1.6, L1.2.3, L1.3.3, L1.4.3) that may be correlated to presence of more proteins.

30-42 kDa

60-72 kDa 49-52.7 kDa

993618 at

45dpa

993618 at

15dpa

Page 131: Determination of wheat quality during the development of ... · sviluppo del grano e sono già presenti quindici giorni dopo l’impollinazione (dpa). Il grano continua ad accumulare

Results and Discussion Gluten proteins development

120

6.2.2 Study on development of quality The aim of this part of this study is to discover how the gliadins and glutenins composition change during the six stages of development (15dpa, 20dpa, 25dpa, 30dpa, 35dpa, 45dpa) of samples collected from the same quality. The analyses are therefore identical to section 6.2.1, but the samples are now grouped on the basis of their quality and all the six stages of data-set are considered. The feeding and baking quality are investigated and confronted using PLS-R. The X-variables used cover the range 29-73 kDa due to different slope in the spectra at beginning and at the end of them (Appendix L1.1.1). The study starts by searching the PCs responsible for the separation according to the sequential increase of dpa; the study continues trying to find which variables that distinguish the stages of grain filling. All of the results are located in the appendix ‘L2’.

Page 132: Determination of wheat quality during the development of ... · sviluppo del grano e sono già presenti quindici giorni dopo l’impollinazione (dpa). Il grano continua ad accumulare

Results and Discussion Gluten proteins development

121

Results from PLS-R on each quality at 15,20,25,30,35,45 dpa

The gliadins and glutenins extracted at different stage of development are processed by PLS-R. Even a clear separation along a sequential increase of dpa is resulted difficult, both quality have shown a definite difference between the samples at 15, 20, 25dpa (they are here called group ‘A’) and the samples of 30, 35, 45 dpa (they are here called group ‘B’) (figure 6.27).

Figure 6.27 The loading plot of Y-variables (dpa). The samples of 15, 20, 25 dpa (group ‘A’) are correlated and different from samples of 30,35,45 dpa (group ‘B’). The group ‘A’ is separated from the group B by PC1. The samples belong to group ‘A’ have a large difference from the group B, suggesting that the protein composition change radically between 25 and 30 dpa (baking quality in figure 6.28) and feeding quality in appendix L2.1.1).

A

B

Page 133: Determination of wheat quality during the development of ... · sviluppo del grano e sono già presenti quindici giorni dopo l’impollinazione (dpa). Il grano continua ad accumulare

Results and Discussion Gluten proteins development

122

Figure 6.28 The score plot (PC1 vs. PC2) of baking quality samples at all stages of development (15,20,25,30,35,45dpa). The first three stages (15,20,25, group ‘A’ ) are separated from the last three (30,35,45, group ‘B’) by PC1 of PLS-R. Studying the regression coefficient (appendix (L2.1.4) (L2.2.4), the important variables are into the same range of the important variables identified by the investigation of their single variety.

Baking quality 15-20-25 dpa

Baking quality 30-35-45 dpa

A B

Page 134: Determination of wheat quality during the development of ... · sviluppo del grano e sono già presenti quindici giorni dopo l’impollinazione (dpa). Il grano continua ad accumulare

Results and Discussion Gluten proteins development

123

Figure 6.29 The mass spectra of some baking quality samples (indicated at bottom) at each dpa. The important variables from the regression model are related to the presence of peaks at 30,35,45 dpa (red lines) compared to 15,20,25 dpa (blue lines). The separation of the samples in two distinct groups may suggest that between 25 and 30 dpa the protein composition is greatly changed. This result is not completely new, since the discriminative PLS-R based on 45 samples was unable to predict the samples correctly until 25 dpa, but its capability of prediction increased considerably at 30 dpa, followed by a gradually increment up 45 dpa where all the unknown samples were correctly predicted (section 6.1.1 and appendix, figure I3.16, I3.17). Thus, the protein content may be largely changed between 25 and 30 dpa. On basis of the molecular weight of gluten proteins separated by 2D-gels, in table 6.8 are reported the possible type of proteins that may be synthesised between 15 dpa and 45dpa.

30-42 kDa

60-72 kDa 49-55 kDa

Baking quality 30-35-45 dpa

Baking quality 15-20-25 dpa

Page 135: Determination of wheat quality during the development of ... · sviluppo del grano e sono già presenti quindici giorni dopo l’impollinazione (dpa). Il grano continua ad accumulare

Results and Discussion Gluten proteins development

124

Molecular weight covered by range of important variables.

Possible types of gluten proteins

30- 42 kDa (45dpa) α−β−γ- Gliadins LMWGS B&C Types LMWGS D-Types

49- 52.7 kDa (45dpa) ω- Gliadins

60-72 kDa (45dpa) HMWGS ω- Gliadins kDa LMWGS D-Type

Table 6.8 Important variables founded and possible groups of proteins correlated, according the molecular range covered (24).

Generally the total protein composition changes during the grain filling from a content relatively rich in enzymes for metabolic functions (15dpa) to content richer of storage protein for the germination processes, which are gradually synthesised and accumulated until 45dpa. Here the studies are focused only the changes of gluten protein composition; the result of the analyses have shown that each variety and quality has more peaks at 45dpa, suggesting the grain may have expressed some more gliadins or glutenins at a latter stage of development. Even if the gluten composition changes during the development, it is still unknown which types of proteins that are synthesised previous of another or if the exists a common precursor that evolve in different kinds of proteins. It could for example be interesting to know if the a-gliadins are synthesised before γ- gliadins or vice- versa. Furthermore the MALDI- TOF MS is an analysis not quantitative thus it could be interesting also studies with other techniques how increase the level of gluten fraction during the grain filling. Even if the mass spectrometry with multivariate data analysis cannot answer to these questions, it has shown that the gluten protein content is changed. The analyses has the quality to be fast, but it must be followed by further analyses to identify which proteins characterise each stage. It is suggested further analysis based on 2DE and HPLC to separate the gliadins and glutenins in fractions from 15 to 45 dpa, and analysing by digestion of these fractions with trypsin or chymotrypsin, in order to know which proteins they are. Another possible technique for identification could be to obtain a partial amino acid sequence by faster MS/MS, and matching them into a databases catalogue.

Page 136: Determination of wheat quality during the development of ... · sviluppo del grano e sono già presenti quindici giorni dopo l’impollinazione (dpa). Il grano continua ad accumulare

125

7. CONCLUSION

Multivariate data analysis associated with mass spectrometry was used for the determination of the wheat quality on samples collected at different stages of the grain development, between 15dpa and 45dpa. The method was used also to study the changes of the gluten composition during the development of the grain. The gliadin and glutenin fraction were extracted from four wheat varieties suitable for breadmaking purposes (baking quality) and not suitable for breadmaking purposes (feeding quality). The samples were collected at different stages of the grain development, in a period of one month before the harvest. The gluten proteins were separated by the MALDI-TOF MS and applied to multivariate data analysis. MALDI-TOF MS represented a fast technique of separation, where more samples were prepared at the same time and analysed within a few minutes. The initial explorative analysis by PCA revealed the baking quality was already discernable from the feeding quality at 15dpa; by PLS-R the hidden structure of the data at 15dpa was discovered to be similar to 45dpa, where as some ranges of molecular weight were used by both 15dpa and 45dpa PLS-R models to separate the samples on basis of their quality. This result suggests the proteins correlated to wheat quality were already accumulated into the grain at 15dpa, i.e. 30 days before the harvest of the grain. On the basis of the different gluten composition between the two qualities, the discriminative PLS-R correctly predicted the quality of samples at 15dpa and 45dpa, which were not included in the model. The PLS-R model for the prediction at 15dpa has a slope validation model of 0.94, the correlation coefficient of 0.97, the prediction errors (RMSEP) of 0.10; similar and comparable to the model at 45 dpa. The correctly classification of the samples of 15dpa and 45dpa, on base of their quality, were also obtained using SIMCA. The samples were classified into two distinct groups, corresponding to the baking and feeding groups. At 15dpa, the three samples classified by SIMCA in either baking or feeding groups, and 1 sample classified into both groups, have been processed by PLS-R, which was able to correctly predict the quality. Based on important variables, the variables that contribute the most for the separation of the feeding samples along the quality distinction cover a molecular weight range between 33.4-34 kDa; the variables positively correlated to baking quality cover molecular weight between 39.1-40 kDa. Previous work shows that protein related to molecular weight between 33.4-34 kDa are found with varieties not suitable to breadmaking quality. The purpose of the studies was not the correlation of the proteins to the propriety in breadmaking; however an interesting continuance of the project could be to use MS/MS with database for a deeper investigation of these proteins correlated to molecular ranges suggested, in order to know if they can be used as the references to identify the wheat quality. Subsequently would be interesting understand the roles of the proteins in relation to

Page 137: Determination of wheat quality during the development of ... · sviluppo del grano e sono già presenti quindici giorni dopo l’impollinazione (dpa). Il grano continua ad accumulare

126

breadmaking purpose, in order to know how they contribute to inhibit or further the breadmaking quality. PLS-R was applied for investigation of the development of the grain: the results show that the gluten proteins composition changed between 15dpa and 45dpa by a gradually increasing in the number of peaks. At 15dpa there are approximately three visible peaks while at 45dpa there are around fifteen peaks. The higher number of peaks at 45 dpa allowed multivariate data analysis to separate clearly the 45dpa stage from 15dpa stage, suggesting that the gluten protein composition was changed. On the basis of important variables, the ranges between 30-42 kDa, 49-55 kDa and 60-72 kDa are related to peaks appear at 45 dpa. These results suggest that the grain may have expressed some more gliadins or glutenins at a latter stage of development. In conclusion, using MALDI-TOF MS with multivariate data analysis, the 100% of the samples were predicted by SIMCA and PLS-R combined together, even at 30 days before the harvest, where they were not classified by SIMCA have been processed correctly by discriminative PLS-R. The quality identification is already distinguished at 15dpa, but it is clearer and much more explained by mass spectra at 45 dpa. For an implementation to a real application, it is necessary to make new PLS-R or SIMCA methods including as much as possible varieties frequently used and subsequently test the model with random samples. However, the purpose of these studies was to prove the possibility for predicting the quality of wheat at 15dpa; here is concluded that there are the bases to develop a method for a real application, which could predict the quality of a crop before the harvest. Moreover it was possible to know that the protein composition changed during the development of the grain where the mostly change occurred between 25 and 30 dpa. The technique could be implemented as fast test for investigation of proteins change between two stages of the development of an organism. The technique has the merit to be fast, compared to the traditional method as 2DE gels, which are rather time-consuming. The purpose of the project was not to make a comparison, but to use mass spectrometry as faster method 2DE-gels for determination of wheat quality.

Page 138: Determination of wheat quality during the development of ... · sviluppo del grano e sono già presenti quindici giorni dopo l’impollinazione (dpa). Il grano continua ad accumulare

127

8. REFERENCES

1. Pomeranz Y and Williams PC. Wheat hardness: Its genetic, structural, and biochemical background, measurement, and significance. 1990; 471.

2. Shewry PR, Tatham AS, Halford NG. Seed Proteins, Shewry PR, Casey (eds). Kluwer Academic & Publishers: Dordrecht, 1999.

3. Morris CF and Rose SP. Wheat. 1996; 1: 5.

4. Graveland A and Henderson MH. Structure and functionality of gluten proteins. 1987; 238.

5. Web source: Philippe Rekacewicz. World watch Institute, Washington DC, United States, November , 1996. Available from the Web side: http://www.fas.usda.gov

6. New cronos database, Eurostat and Irish national data, Central Statistic Office 2003. Available from the Web side http://www.cso.ie

7. Gottlieb DM, Schultz J, Petersen M, Nešic L, Jacobsen S, Søndergaard I Determination of wheat quality by mass spectrometry and multivariate data analysis. Rapid Communications in Mass Spectrometry 2002; 16: 2034.

8. Veraverbeke WS, Delcour JA Wheat protein composition and properties of wheat glutenin in relation to breadmaking functionality. Critical Reviews in Food Science and Nutrition 2002; 42: 179.

9. Shewry PR. Cereal grain proteins. 1996; 1st: 227.

10. “WHEAT: THE BIG PICTURE”

All the pictures of the wheat development are available from the Web site http://www.wheatbp.net. According to the copyright statement, the download and use are freely permitted for the purposes of this thesis.

11. Eliasson A-C, Larsson K. In Cereals in breadmaking: A molecular colloidal approach, Marcel Dekker: 1993; 376.

12. Shewry PR, Tatham AS, Halford NG. Seed Proteins, Shewry PR, Casey (eds). Kluwer Academic & Publishers: Dordrecht, 1999.

Page 139: Determination of wheat quality during the development of ... · sviluppo del grano e sono già presenti quindici giorni dopo l’impollinazione (dpa). Il grano continua ad accumulare

128

13. Lásztity R. In The chemistry of cereal proteins, CRC Press: Boca Raton 1996; 328.

14. Mikhaylinko GG, Czuchajowska Z, Baik B-K, Kidwell KK Environmental influences on flour composition, dough rheology, and baking quality of spring wheat. Cereal Chemistry 2000; 77: 507.

15. Belderok B, Mesdag J, Donner DA. In Bread-making quality of wheat: A century of breeding in Europe, Donner DA (ed).Kluwer Academic publishers: Dordrecht 2000; 416.

16. Kent NL. Chemical components. 1994; 4: 53.

17. Pogna NE, Tusa P, Boggini G. Genetic and biochemical aspects of dough quality in wheat. Adv. Food Sci. 1996; 18: 145.

18. Shewry PR, Halford NG Cereal seed storage proteins: structures, properties and role in grain utilization. Journal of Experimental Botany 2002; 53: 947.

19. Thomsen AD, Christensen I. In Hvedesorter - Dansk dyrket 1992-1994, Bogtrykkergården: Struer 1995; 171.

20. Payne PI, Nightingale MA, Krattiger AF, Holt LM The relationship between HMW glutenin subunit composition and the bread-making quality of British-grown wheat varieties. Journal of the Science of Food and Agriculture 1987; 40: 51.

21. Kasarda DD. Glutenin structure in relation to wheat quality. 1989; 1: 277.

22. Tatham AS, Shewry PR, Belton PS. Structural studies of cereal prolamins, including wheat gluten. 1990; 1.

23. Shewry PR Plant storage proteins. Biological Reviews 1995; 70: 375.

24. Shewry PR, Miles MJ, Tatham AS. The prolamin storage proteins of wheat and related cereals. Prog. Biophys. molec. Biol. 1994; 61: 37.

25. Panozzo JF, Eagles HA, Wootton M Changes in protein composition during grain development in wheat. Australian Journal of Agricultural Research 2001; 52: 485.

26. Skylas DJ, Mackintosh JA, Cordwell SJ, Basseal DJ, Harry J, Blumenthal C, Copeland L, Wrigley CW, Rathmell W Proteome approach to the characterisation of protein composition in the developing and mature wheat-grain endosperm. Journal of Cereal Science 2000; 32: 169.

Page 140: Determination of wheat quality during the development of ... · sviluppo del grano e sono già presenti quindici giorni dopo l’impollinazione (dpa). Il grano continua ad accumulare

8. References

129

27. Martens H, Martens M. In Multivariate analysis of quality: An introduction, John Wiley & Sons Ltd.: Chichester 2001; 445.

28. Esbensen KH. In Multivariate Data Analysis - in practice, Camo: Oslo, Norway 2002; 598.

29. Siuzdak G. In Mass Spectrometry for Biotechnology, Academic Press Inc.: San Diego, California 1996; 161.

30. Beavis RC and Chait BT. Matrix-assisted laser desorption ionization mass-spectrometry of proteins. 1996; 519.

31. Cohen SL and Chait BT. Influence of matrix solution conditions on the MALDI-MS analysis of peptides and proteins. Analytical Chemistry 1996; 68: 31.