Quantitative Analysis of UV-Visible Spectroscopy and Machine Learning Coupled for Alkali-Soluble Lignin in Steam-Exploded Biomass
Keywords:
Alkali-soluble lignin, UV-Visible spectroscopy, Machine learning, Extra Trees regression, Steam explosion, Feature selectionAbstract
Lignin is one of the most abundant biopolymers in lignocellulosic biomass, yet its efficient quantification remains a significant challenge for biorefineries due to the time-consuming nature and limitations of traditional wet-chemical analysis methods. This study aimed to develop a rapid and accurate approach for quantifying lignin concentration in alkali extracts of steam-exploded woody biomass by integrating UV-visible spectroscopy with machine learning algorithms as a practical complement to the conventional Klason lignin assay (modified ASTM D1106-56). UV-visible spectral data were collected and subjected to outlier removal using the Isolation Forest algorithm, followed by various preprocessing techniques and feature selection via the SelectKBest algorithm to optimize inputs for four regression models: Extra Trees, Random Forest, XGBoost, and Support Vector Regression. The combination of Baseline Correction and Standard Normal Variate (SNV) was the optimal preprocessing method, while the selection of the top 150 characteristic wavelengths effectively maximized information retention. Among the models evaluated, the Extra Trees (ET) regressor exhibited superior generalization capability and stability, achieving a test coefficient of determination (R2) of 0.803 and a Mean Absolute Percentage Error (MAPE) of 4.0%, significantly outperforming SVR and XGBoost, which suffered from overfitting and underfitting, respectively.