วันอาทิตย์ที่ 9 มิถุนายน พ.ศ. 2556

The ideal training set

"All the chemometric calibration methods rely on correlation between some derived spectral measurements and reference measurements for the samples in what, for obvious reasons, is called a training set. It is crucial to the future robustness of the calibration that this training set is representative of the unknowns for which predictions are to be made. ..... The chemometric method has to learn, from the training set, how to make predictions that are robust to variations in the spectra caused by physical properties of the sample and by constituents other than the one of interest. If these sources of variability are not present in the training samples, the resulting calibration will not be robust against them. For example, if all the training samples in an exercise to calibrate for wheat protein are scanned at 14% moisture content, because they have all equilibrated in a laboratory whilst waiting to be scanned, then there is every chance that the calibration will not work well for samples scanned at other moisture contents. There are two solutions to this: equilibrate all the unknowns in the same way, or ensure that the moisture variations in the training set are representative of those that will occur in the unknowns.
   .......The ideal training set is a random sample taken from all the unknowns that the calibration will ever be used on. In most cases this is an unachievable ideal, but we should at least try to represent the main sources of variability in the samples. Key ones are particle size, when powders or ground samples are being measured, and moisture content. For example, if the samples come from an industrial process, it is important to include samples from several batches."

(NIR Celebration Special Issue of NIR news)

ไม่มีความคิดเห็น:

แสดงความคิดเห็น