0

Matlab

I have some PLSR regression in Matlab that I need to translate to Python. The Matlab code is as follows:

% PLSR with 15 latent variables
[~,~,~,~,~,MSEcv] = plsregress(X,y,15,"cv",5,"MCReps",100);
rRMSE = 100*sqrt(MSEcv(2,:))/max(y)-min(y));
% Find optimal number of latent variables
minl = 1; maxl = 15
nlv = find(rMSE(minl+1:maxl+1)==min(rRMSE(minl+1:maxl+1)))-1+minl;
nlv = nlv(1)
% Fit with optimal number of latent variables
[XL,yl,XS,YS,beta,PCTVAR;MSE;stats] = plsregress(X,y,nlv)

(PLS regression with a maximum of 15 latent variables, cross-validation with 1/5 of the values and 100 Monte-Carlo repetitions.)


Python

I tried to implement the code in Python as follows:

import numpy as np
from sklearn.cross_decomposition import PLSRegression
from sklearn.model_selection import cross_validate
from sklearn.model_selection import ShuffleSplit

# Maximum number of latent variables
mlv = 15
# Cross-validation
cv = 5
# Monte-Carlo repetitions
mcr = 100

# 1...mlv to fit models with various number of latent variables
try_latent_vars = np.arange(1, mlv)

##----------------------------------------------------------------------------|
# Define funtion to fit PLSmodel
def optimise_pls_cv(X_vals, y_vals, n_comp, crossval, mcreps):
    '''Fit PLS regression model using cross-validation.'''
    # Define PLS object
    pls = PLSRegression(n_components = n_comp)

    # Cross-validation fit
    cv_split = ShuffleSplit(n_splits = mcreps, test_size = 1/crossval,
                            random_state = 0)
    cvs = cross_validate(pls, X_vals, y_vals, cv = cv_split,
                         scoring = ["r2", "neg_mean_squared_error"])
    mean_r2_error = np.mean(cvs["test_r2"])
    test_mse = -np.mean(cvs["test_neg_mean_squared_error"])
    return pls, mean_r2_error, test_mse

##----------------------------------------------------------------------------|
# Fit PLS model

# Empty lists to store R^2 and mean squared error values
r2s = []
mses = []

for n_comp in try_latent_vars:
    model, r2, mse = optimise_pls_cv(X_vals = X.T,
                                     y_vals = y,
                                     n_comp = n_comp,
                                     crossval = cv,
                                     mcreps = mcr)
    r2s.append(r2)
    mses.append(mse)

index_max_r2s = np.argmax(r2s)
lv = try_latent_vars[index_max_r2s]

##----------------------------------------------------------------------------|
## Fit model with optimized number of components
model, r2, mse = optimise_pls_cv(X_vals = X.T,
                                 y_vals = y,
                                 n_comp = lv,
                                 crossval = cv,
                                 mcreps = mcr)
metrics = {"R2" : r2,
           "MSE" : mse}
metrics_str = "R2: %0.4f, MSE: %0.4f" % (r2, mse)

In both code snippets, X and y are a set of spectral information and some variable to predict from the spectra, respectively.


The problem

Unfortunately, I get vastly different results with those versions (R^2 values for the Matlab version are all 0.5 or higher while the Python version ends up with R^2 values close to zero or negative). How is that and how can I solve this issue?


Example data

Here is some example data as .txt files in Dropbox. Read as

import pandas as pd
X = pd.read_table("/path/to/saved/files/X.txt", header = None)
y = pd.read_table("/path/to/saved/files/y.txt", header = None)

1 Answer 1

0

I found that sklearn uses the NIPALS algorithm while Matlab's plsregress uses the SIMPLS algorithm to estimate PLS regression parameters. For some of my data, this does not make a huge difference, whereas I found the NIPALS version to produce output of highly variable quality for slight variation of the used sample spectra (e.g., outlier removal or continuum removal may have a huge impact for some variables).

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.