3,642 questions
0
votes
0
answers
27
views
Class labelling influences the gini importance of features [closed]
Problem: In sklearn Random Forest classifier, the class labelling influences the gini importance of features.
I would expect the labelling of classes should not influence the importance values and ...
0
votes
0
answers
88
views
Evaluating pre-trained random forest in Fortran
I have a trained random forest regressor from scikit-learn:
https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestRegressor.html
I then want to make use of (but not train ...
0
votes
0
answers
55
views
iterative_train_test_split does not return same split
I am trying to build a multilabel classification model using a random forest classifier. However, for some reason iterative_train_test_split does not return the same split even though it should use np....
0
votes
0
answers
35
views
How to deploy .keras and .joblib models in FRDM MXCN947 (NXP) Microcontroller?
I have created 3 models (a RF, CNN-LSTM and a MLP) to be deployed in the FRDM MCXN947 microcontroller.
They have sizes of 2.18 GB, 7.75 MB and 0.186 MB respectively. I have saved the RF model as ....
6
votes
2
answers
103
views
Reproduce a particular tree from the random forest using DecisionTreeRegressor
I am trying to replicate a specific decision tree trained by a RandomForestRegressor class, using DecisionTreeRegressor.
However, I cannot get the exact results, even with using the exact same ...
0
votes
2
answers
119
views
Rancom Forest Classifier model returns all zeroes
I'm trying to train a RandomForestClassifier Model. However, when I train it, it gives me all zeroes. And, I really can't seem to understand why. The dataset is HUGE (close to like 75,0000 rows), so, ...
0
votes
0
answers
16
views
KNIME: Random Forest Learner
(https://i.sstatic.net/TMOXGHAJ.png)
I need to set Age as the Target column but it is not allowing me to select the attribute. How does Knime decide what features to offer as the target column?
I ...
2
votes
0
answers
62
views
Prediction by trained model of sjwhitworth/golearn
Lately I tried a random forest model by golearn.
I want to use saved model (.gob) and only explain variables.
As long as I explored, the only way I found was to prepare a template base.FixedDataGrid ...
0
votes
0
answers
17
views
Getting extremely low importance scores in ensemble.randomForestClassifier
I am training a RandomForestClassifier from sklearn.ensemble with the following code:
adata = ad.read_h5ad(f'{data_dir}{ct}_clean_log1p_normalized.h5ad')
adata = adata[:, adata.var....
1
vote
1
answer
72
views
RandomForest Classifier Takes Forever [closed]
I am working on a data science project and trying to find the optimal parameters for my project
this is what I want to test but it takes forever and I could not see the output since its been 1 hour.
...
-1
votes
1
answer
124
views
do i need to scale the rf model while creating voting ensemble model? [closed]
I'm building a classification model for sleep disorders using Voting Ensemble and I have three base models: Logistic Regression, Random Forest and SVM.
Now I want to combine these models using a ...
0
votes
0
answers
117
views
SHAP value in TreeExplainer: Additivity check failed in TreeExplainer
I am trying to obtain the SHAP values of a Random Forest model for binary classification, trained in Python. I am using the following code:
final_model = RandomForestClassifier(random_state=42, **...
0
votes
0
answers
48
views
I used GEE to do random forest classification, but the result is only one color. Why?
Friends, I want to run a random forest model using GEE. I selected a study area and generated some sample points randomly within it, assigning attributes "0", "1", "2", &...
1
vote
1
answer
99
views
Why does RandomForestClassifier in scikit-learn predict even on all-NaN input?
I am training a random forest classifier in python sklearn, see code below-
from sklearn.ensemble import RandomForestClassifier
rf = RandomForestClassifier(random_state=42)
rf.fit(X = df.drop("...
3
votes
1
answer
79
views
How do I use a random forest to predict gaps in a dataset? [closed]
I have a dataset that I used to make a random forest (it is split into testing and training data). I have already made the random forest and generated predictions (code below), but I don't know how to ...
0
votes
0
answers
20
views
Get analytical equation of RF regressor model [duplicate]
I have the following dataset:
X1 X2 X3 y
0 0.548814 0.715189 0.602763 0.264556
1 0.544883 0.423655 0.645894 0.774234
2 0.437587 0.891773 0.963663 0.456150
3 ...
0
votes
1
answer
66
views
Terra predict function failing to predict my random forest model
I have a csv. with my data and I put it through a RF to predict sediment type based off of bathymetry data from 24 sample points (and get sediment distribution as an image output), the RF is working ...
0
votes
0
answers
28
views
Issue with Input Array on Random Forest Model with Both Numerical and Categorical Features
I am obtaining a ValueError regarding the input arrays and there dimension. I am trying to create a Random Forest Regression Model for price prediction using both numerical features and categorical ...
0
votes
1
answer
382
views
Length of features is not equal to the length of SHAP Values
Im running a random forest model and to get some feature importance and Im trying to run a SHAP analysis. The problem is that every time I try to plot the shap values, I keep getting this error:
...
1
vote
1
answer
112
views
Error Building Random Forest in R: randomForest Function Fails
I'm currently engaged in a machine - learning project where I need to utilize the random forest algorithm. I've installed the randomForest package in R, but I'm facing significant issues when ...
1
vote
1
answer
55
views
Plotting one Decision Tree of a Random Forest in sklearn
I have come around a strange thing when plotting a decision tree in sklearn.
I just wanted to compare a Random Forest model consisting of one estimator using bootstrapping and one without ...
0
votes
1
answer
95
views
R: Error in x[[jj]][iseq] <- vjj : replacement has lenght zero (Library SpatialML::rgf)
I am trying to run a geographically weighted random forest classification using the function SpatialML::rgf(). However, I am encountering the following error:
'Error in x[[jj]][iseq] <- vjj : ...
0
votes
0
answers
41
views
Draw a decision tree while hiding the values of the "value" row
I want to simplify the decision tree output and hide the values in the "value" field.Below is the code I am using
enter image description here
fig, ax = plt.subplots(figsize=(10, 10))
...
-1
votes
1
answer
95
views
Different Results (With Seed) For sklearn Random Forest
I am using sklearn to run a random forest. I am setting the seed for the random forest, as well as splitting the data for cross validation. When I re-run the code consecutive times, it gives me the ...
0
votes
0
answers
15
views
How to reproduce the results of an ML model in Spark? [duplicate]
I am creating a machine learning model (random forest) in Spark (Pyspark) with cross-validation and grid search. I have two dataframes: one for training and one for testing, both stored in Parquet.
...
0
votes
1
answer
49
views
GridSeachCV custom profit function results with an error: missing 1 required positional argument: 'y'
I am trying to optimize my model with GridSearchCV, using a custom profit function. However, when I Run my code, I end up with the following error message: TypeError: profit_scorer() missing 1 ...
0
votes
0
answers
37
views
Whitebox Workflow Random Forest Regression Fit Hyperparameter Tuning
I've been using Whitebox Workflow Random Forest Regression fit for my undergraduate thesis, a plugin from QGIS. The plugin creates a model from the input data such as Raster files, number of trees, ...
1
vote
0
answers
91
views
partykit: Error when using varimp on cforest for data set including NA values
I want to estimate the relative importance of variables in explaining a response variable ("dep_var", a numeric variable based on a 4-point Likert scale). I am mostly intersted in the ...
2
votes
1
answer
113
views
Modification of Random Forest to always evaluate some feature(s) at every split
I am trying to change the functionality of a random forest classifier. While usually features are selected at random for each split, I want one specific feature to be evaluated at each split. I know ...
0
votes
2
answers
55
views
Feature Importance with ColumnTransform and OneHotEncoder in RandomForestClassifier
Apologies for bothering you, but I haven't been able to find a definitive answer after searching the site.
I'm building a RandomForestClassifier on some clinical data where the target variable (...
2
votes
0
answers
221
views
SHAP Additivity Check Fails with Astronomical SHAP Values for RandomForestClassifier
So I trained my model and here I can share some relevant parts regarding the issue:
import pandas as pd
import pickle
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.ensemble ...
1
vote
1
answer
401
views
SHAP values for random survival forest
I want to plot the SHAP values for my RSF model; here is the code and error:
xvars <- c("RIDRETH1", "RXDLIPID", "DRXTKCAL", "DRXTPROT", "DRXTCARB", ...
2
votes
1
answer
148
views
Is set.seed() needed when building a single decision tree in R?
I am learning how to build a single decision tree and random forests in R. I understand that set.seed() is needed before building a random forest to ensure reproducibility of the results, e.g. if ...
1
vote
0
answers
82
views
How to use Python to replicate Random Forest Regression prediction using decision paths?
I'm trying to test whether I've understood the way RandomForestRegressor produces forecast after a model's fitted. I used the California housing example to train a simple model and predict the first ...
0
votes
0
answers
30
views
Dataframe of raster images is taking too long
I am creating a DataFrame from the raster arrays of large raster images to train a Random Forest model, but the process of creating the DataFrame is taking too long due to the large size of the images....
0
votes
1
answer
111
views
How to deal with overlapping data in machine learning
I am creating a Machine Learning Model that determines whether a user is a bot or not, I used seaborn to plot a pairplot and realised most of the data is overlapping. Below is the code I wrote for ...
3
votes
0
answers
89
views
Problem with textTrainRandomForest() function
I'm trying to use the text R package to train a ML model with the textTrainRandomForest() function, but I'm encountering an error:
Error in `dplyr::bind_cols()`:
! Can't recycle `..1` (size 10) to ...
0
votes
1
answer
49
views
Calculating AUC for a random forest model
I can't find the syntax for calculating AUC for this random forest model. See the code below; please advise.
## 1
library(caret)
library(dplyr)
library(pROC)
library(readxl)
library(car)
set.seed(1234)...
-2
votes
1
answer
118
views
Is it possible to identify which indicators influence the credit risk for each client company in credit risk analysis?
I am working on credit risk analysis. I want to predict the risk of each company developing a debt with a fictional company. I obtained the feature importance from the model, but I want to know if it ...
1
vote
0
answers
48
views
Why is the split statistic in the ranger package for R greater than 1?
In the ranger package for R, the node impurity is measured with the Gini index for classification trees. I would expect the Gini index to lie between 0 and 1, as
$$ Gini = 1-\sum_{i=1}^C p_i^2$$,
...
-1
votes
1
answer
35
views
X has 8 features, but RandomForestRegressor is expecting 2924 features as input
I'm building a restaurant recommender for my city using a Kaggle dataset and RandomForestRegressor.
I built the model, and now want the model to recommend a good restaurant when it is given 4 ...
2
votes
2
answers
131
views
ClassifierChain with Random Forest: Why is np.nan not supported even though Base Estimator handles it?
I'm working on a multilabel classification problem using the ClassifierChain approach with RandomForestClassifier as the base estimator. I've encountered an issue where my input matrix X contains np....
-1
votes
2
answers
207
views
How to use Machine Learning to find the pattern customer profile? [closed]
I have a dataset with personal characteristics of customers who purchase from a fictional company. Initially, I don't have any target variable, only their characteristics. My goal is to find a pattern,...
-1
votes
1
answer
78
views
Random Forest Test split
I have trained Random forest model using June dataset to predict status_value of an employee and using 0.3 split for test_size. I am including code snippets as the code it self works well without any ...
-1
votes
1
answer
148
views
Final Predictions accuracy of my ML Binary Classification Model is horrible [closed]
I am competing in a Kaggle competiton (https://www.kaggle.com/competitions/playground-series-s4e8) where we have to predict whether a mushroom is poisonous or not based on the data provided.
The issue ...
-2
votes
1
answer
55
views
Hybridized collaborative filtering and sentence similarity-based system for doctor recommendation based on user input of symptoms and location
I'm trying to solve a problem of recommending a doctor based on a user's symptoms and location using a hybridized collaborative filtering and sentence similarity-based recommender system that follow ...
0
votes
0
answers
95
views
How to visualize random forest plot using graphviz, in characters outside of UTF-8 (Chinese)
I am doing a random forest model on PC orders data, which is mostly in Chinese. I have done the model and accuracy checks. However, I can't seem to generate the image due to a UnicodeEncodeError, ...
0
votes
0
answers
190
views
How to Encode Non-Ordinal Categorical Variables for RandomForest without Using Label Encoding?
I need to predict different types of exploitation using a RandomForestClassifier. My dataset contains several categorical variables such as gender, citizenship, and CountryOfExploitation. These ...
0
votes
1
answer
59
views
How to optimise hyperparameterss for RandomForestClassifier in Python for large datasets?
I'm just working on this problem where I thought RandomForestClassifier from scikit-learn would be better solution for a large dataset. Only after trying with it for this, I found it to be not ...
0
votes
1
answer
95
views
x@presence error for Species Distribution modeling [closed]
Does anybody know how to solve this problem?
I'm trying to build a species Distribution model using bioclimatic variables
sdm package by Naimi, 2016
R version 4.4.1
ERROR:
model <- sdm(Species ~ ., ...