Context
'Chemometrics is the chemical discipline that uses mathematics and statistics to design or select optimal experimental procedures, to provide maximum relevant chemical information by analysing chemical data, and to obtain knowledge about chemical systems' (D.L. Massart et al., 1997).
Objectives
One of the focus in this area is the development of new chemometric methods and algorithms for spectroscopic and chromatographic data. Chemometrics is used in order to get quantitative (regression or multivariate calibration) and qualitative (discrimination or classification) information from the data. · Regression/Multivariate calibration In the case of regression (also called multivariate calibration) one tries to determine the functional relationship between measured values (signal intensities at certain frequencies or retention times) and analytical values (concentration). For spectroscopic data, three of the most important methods are Multiple Linear Regression (MLR), Partial Least Squares (PLS) and Artificial Neural Networks (ANN) method. Each of these methods consists of many steps, including the selection of the model, the estimation of the model parameters (as well as the errors), and their validation. The main difficulties are not in the modelling step itself, but reside either in the preliminary steps or in the interpolation. One of the aims of this project is to study and to propose advanced solutions for some of the steps in the different multivariate calibration methods : - The pre-processing of the data. - The investigation of the homogeneity of the data in order to guarantee the quality of the model. Principal Component Analysis (PCA) is often used to visualise multivariate data, especially to see how samples are distributed over the calibrated range. PCA creates new orthogonal variables (scores or latent variables) that are linear combinations of the original measured variables. - The data set very often contains subgroups of similar objects inside the given population. In those cases we say that the data are clustered. Detection of clustering is an important problem in multivariate calibration. - If a sample is irrelevant, grossly erroneous or abnormal in some other way compared with the rest of the data set, it can be considered to be an outlier. Techniques like the Mahalanobis distance, X residuals, potential functions and robust techniques can be used for outlier detection. - For interpretation of the results, it is necessary to know how good they are. This is expressed as an uncertainty: the prediction is better when its associated uncertainty is smaller. · Discrimination/Classification In the case of classification one tries to find classification rules, which define optimal boundaries between all given classes by maximising the difference between them. Methods for discrimination are, for instance, Linear Discriminant Analysis (LDA), Quadratic Discriminant Analysis (QDA) and the K Nearest Neighbours method (kNN). Also PCR, PLS and ANN can be used for classification purposes. One of the more recent and powerful techniques for both regression and discrimination is called Support Vector Machines (SVM). SVM is currently a very active research area within machine learning. SVM is a learning approach which utilises the concept of kernel substitution in order to make the task of learning more tractable by exploiting an implicit mapping into a high dimensional space. SVM is an innovatory technique that has been successfully applied to numerous tasks within data mining, computer vision, bio informatics and multivariate calibration, for instance. SVM is a good approach to data modelling because it combines the control of generalisation with the minimisation of errors.
Results obtained
At CRA-W, a recent example of discrimination using SVM is the development in feed science of a new system to detect the presence of meat and bone meal in compound feed in order to combat fraud and accidental contaminations in the framework of the emergence of the mad cow crisis. The programs and software used are Matlab, Unscrambler, Statistica, ISIS and Winisi.
CRAW off coordinator
DARDENNE Pierre (Inspecteur général scientifique) Département Qualité; des productions agricoles Chaussée de Namur, 24 B-5030 Gembloux Telephone direct :62 03 54 Telephone departement :+ 32 (0) 81 / 62.03.50 Fax departement : +32 (0) 81 / 62.03.88 E-mail :dardenne@cra.wallonie.beFunding
- CRA-W - Walloon Agricultural Research Centre