29 January 2013

CHEMOSPEC-U15

CRA-W project, coordinator

Context

'Chemometrics is the chemical discipline that uses mathematics and statistics to design or select optimal experimental procedures, to provide maximum relevant chemical information by analysing chemical data, and to obtain knowledge about chemical systems' (D.L. Massart et al., 1997).

Objectives

One of the focus in this area is the development of new chemometric methods and algorithms for spectroscopic and chromatographic data. Chemometrics is used in order to get quantitative (regression or multivariate calibration) and qualitative (discrimination or classification) information from the data. · Regression/Multivariate calibration In the case of regression (also called multivariate calibration) one tries to determine the functional relationship between measured values (signal intensities at certain frequencies or retention times) and analytical values (concentration). For spectroscopic data, three of the most important methods are Multiple Linear Regression (MLR), Partial Least Squares (PLS) and Artificial Neural Networks (ANN) method. Each of these methods consists of many steps, including the selection of the model, the estimation of the model parameters (as well as the errors), and their validation. The main difficulties are not in the modelling step itself, but reside either in the preliminary steps or in the interpolation. One of the aims of this project is to study and to propose advanced solutions for some of the steps in the different multivariate calibration methods : - The pre-processing of the data. - The investigation of the homogeneity of the data in order to guarantee the quality of the model. Principal Component Analysis (PCA) is often used to visualise multivariate data, especially to see how samples are distributed over the calibrated range. PCA creates new orthogonal variables (scores or latent variables) that are linear combinations of the original measured variables. - The data set very often contains subgroups of similar objects inside the given population. In those cases we say that the data are clustered. Detection of clustering is an important problem in multivariate calibration. - If a sample is irrelevant, grossly erroneous or abnormal in some other way compared with the rest of the data set, it can be considered to be an outlier. Techniques like the Mahalanobis distance, X residuals, potential functions and robust techniques can be used for outlier detection. - For interpretation of the results, it is necessary to know how good they are. This is expressed as an uncertainty: the prediction is better when its associated uncertainty is smaller. · Discrimination/Classification In the case of classification one tries to find classification rules, which define optimal boundaries between all given classes by maximising the difference between them. Methods for discrimination are, for instance, Linear Discriminant Analysis (LDA), Quadratic Discriminant Analysis (QDA) and the K Nearest Neighbours method (kNN). Also PCR, PLS and ANN can be used for classification purposes. One of the more recent and powerful techniques for both regression and discrimination is called Support Vector Machines (SVM). SVM is currently a very active research area within machine learning. SVM is a learning approach which utilises the concept of kernel substitution in order to make the task of learning more tractable by exploiting an implicit mapping into a high dimensional space. SVM is an innovatory technique that has been successfully applied to numerous tasks within data mining, computer vision, bio informatics and multivariate calibration, for instance. SVM is a good approach to data modelling because it combines the control of generalisation with the minimisation of errors.

Results obtained

At CRA-W, a recent example of discrimination using SVM is the development in feed science of a new system to detect the presence of meat and bone meal in compound feed in order to combat fraud and accidental contaminations in the framework of the emergence of the mad cow crisis. The programs and software used are Matlab, Unscrambler, Statistica, ISIS and Winisi.

CRAW off coordinator

DARDENNE Pierre (Inspecteur général scientifique) Département Qualité; des productions agricoles Chaussée de Namur, 24 B-5030 Gembloux Telephone direct :62 03 54 Telephone departement :+ 32 (0) 81 / 62.03.50 Fax departement : +32 (0) 81 / 62.03.88 E-mail :dardenne@cra.wallonie.be

Funding

  • CRA-W - Walloon Agricultural Research Centre

Publications

Fernández Pierna, J.A. , Duval, H. , Valderrama, P. , Rutledge, D. , Baeten, V. & Dardenne, P. (2011). A case study of extrapolation in NIR modelling - A chemometric challenge at Chimiométrie 2009. Chemometrics and Intelligent Laboratory Systems, 106: (2), 205-209. Dardenne, P. & Fernández Pierna, J.A. (2006). A NIR data set is the object of a chemometric contest at "Chimiométrie 2004". Chemom. Intell. Lab. Syst. 80: (2), 236-242. Fernández Pierna, J.A. , Dardenne, P. & Fernández Pierna, J.A. (2007). Chemometric contest at "Chimiométrie 2005": a discrimination study. Chemom. Intell. Lab. Syst. 86: (2), 219-223. Fernández Pierna, J.A. , Chauchard, F. , Preys, S. , Roger, J. , Galtier, O. , Baeten, V. & Dardenne, P. (2011). How to build a robust model against perturbation factors with only a few reference values: A chemometric challenge at Chimiométrie 2007. Chemom. Intell. Lab. Syst. 106: (2), 152-159. Fernández Pierna, J.A. , Grelet, C. , Dehareng, F. , Dardenne, P. & Baeten, V. (2012). Merging of spectral datasets from different MIR instruments used in the routine analysis of milk. Proceedings in: ICAR 2012, Cork, 28 May 2012, 55-72. Fernández Pierna, J.A. , Vermeulen, P. , Amand, O. , Tossens, A.. , Dardenne, P. & Baeten, V. (2012). NIR hyperspectral imaging spectroscopy and chemometrics for the detection of undesirable substances in food and feed. Chemometrics and Intelligent Laboratory Systems, 117: 233-239. Overgaard, S. , Fernández Pierna, J.A. , Baeten, V. , Dardenne, P. & Isaksson, T. (2012). Prediction error improvements using variable selection on small calibration sets- a comparison of some recent methods. Journal of Near Infrared Spectroscopy, 20: (3), 329-337. Fernández Pierna, J.A. , Dardenne, P. & Fernández Pierna, J.A. (2008). Soil parameter quantification by NIRS as a Chemometric challenge at "Chimiométrie 2006". Chemom. Intell. Lab. Syst. 91: (1), 94-98. Fernndez Pierna, J.A. , Duponchel, L. , Ruckebusch, C. , Bertrand, D. , Baeten, V. & Dardenne, P. (2012). Trappist beer identification by vibrational spectroscopy: A chemometric challenge posed at the Chimiométrie 2010 congress. Chemometrics and Intelligent Laboratory Systems, 113: 2-9.