How I solve "ValueError: Found array with 0 sample(s) (shape=(0, 5)) while a minimum of 1 is required by LinearRegression."

Question

I have built a drought prediction algorithm. The inputs of this model are future year range such as Start year and end year, Month, and City. The output of this model is a bar chart, its Y axis showing the drought index and X axis shows the year range with months. But I have stucked with a error. I don't know how to mitigate this error. Please give me a solution for mitigate this error.

This is the code of the my model

import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.ensemble import RandomForestRegressor
from sklearn.svm import SVR
from sklearn.metrics import mean_absolute_error, mean_squared_error
from datetime import datetime
from statsmodels.tsa.arima.model import ARIMA

# Load and preprocess data
df = pd.read_csv('Sri lanka drought variables(monthwise).csv')

# Normalize the data
df['Norm_Temperature'] = (df['Average_temperature_month'] - df['Average_temperature_month'].min()) / (df['Average_temperature_month'].max() - df['Average_temperature_month'].min())
df['Norm_Precipitation'] = (df['Average_precipitation_month'] - df['Average_precipitation_month'].min()) / (df['Average_precipitation_month'].max() - df['Average_precipitation_month'].min())
df['Norm_Evapotranspiration'] = (df['Average_evapotranspiration_month'] - df['Average_evapotranspiration_month'].min()) / (df['Average_evapotranspiration_month'].max() - df['Average_evapotranspiration_month'].min())

# Calculate CDI
df['CDI'] = (df['Norm_Precipitation'] + (1 - df['Norm_Temperature']) + df['Norm_Evapotranspiration']) / 3

# Convert Year and Month to Date
df['Date'] = pd.to_datetime(df[['Year', 'Month']].assign(day=1))

# Group data by year, month, and city
grouped_data = df.groupby(['Year', 'Month', 'City']).mean().reset_index()

# Select features and target
X = grouped_data[['Year', 'Month', 'Average_temperature_month', 'Average_precipitation_month', 'Average_evapotranspiration_month']]
y = grouped_data['CDI']

df.head()

# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Initialize models
linear_reg = LinearRegression()
random_forest = RandomForestRegressor(random_state=42)
svm = SVR()

# Train the models
linear_reg.fit(X_train, y_train)
random_forest.fit(X_train, y_train)
svm.fit(X_train, y_train)

# Evaluate accuracy
def evaluate_model(model, X_test, y_test):
    y_pred = model.predict(X_test)
    mae = mean_absolute_error(y_test, y_pred)
    rmse = np.sqrt(mean_squared_error(y_test, y_pred))
    score = model.score(X_test, y_test)
    return score, mae, rmse

models = {
    'Linear Regression': linear_reg,
    'Random Forest': random_forest,
    'SVR': svm
}

results = {name: evaluate_model(model, X_test, y_test) for name, model in models.items()}
for name, (score, mae, rmse) in results.items():
    print(f"Model: {name}")
    print(f"  Accuracy Score: {score}")
    print(f"  Mean Absolute Error: {mae}")
    print(f"  Root Mean Squared Error: {rmse}")

best_model_name = max(results, key=lambda k: results[k][0])
best_model = models[best_model_name]
print(f"Best Model: {best_model_name}")

# Forecast future values using ARIMA
def forecast_arima(data, column, order, steps):
    model = ARIMA(data[column], order=order)
    model_fit = model.fit()
    forecast = model_fit.forecast(steps=steps)
    return forecast

end_year = grouped_data['Year'].max()
forecast_steps = len(pd.date_range(start=f'{end_year+1}-01-01', end=f'{end_year+1}-12-31', freq='M'))

temperature_forecast = forecast_arima(df, 'Average_temperature_month', order=(5, 1, 0), steps=forecast_steps)
precipitation_forecast = forecast_arima(df, 'Average_precipitation_month', order=(5, 1, 0), steps=forecast_steps)
evapotranspiration_forecast = forecast_arima(df, 'Average_evapotranspiration_month', order=(5, 1, 0), steps=forecast_steps)

future_dates = pd.date_range(start=f'{end_year+1}-01-01', end=f'{end_year+1}-12-31', freq='M')
future_data = pd.DataFrame({
    'Date': future_dates,
    'Temperature': temperature_forecast,
    'Precipitation': precipitation_forecast,
    'Evapotranspiration': evapotranspiration_forecast
})

# Calculate future CDI
future_data['Norm_Temperature'] = (future_data['Temperature'] - df['Average_temperature_month'].min()) / (df['Average_temperature_month'].max() - df['Average_temperature_month'].min())
future_data['Norm_Precipitation'] = (future_data['Precipitation'] - df['Average_precipitation_month'].min()) / (df['Average_precipitation_month'].max() - df['Average_precipitation_month'].min())
future_data['Norm_Evapotranspiration'] = (future_data['Evapotranspiration'] - df['Average_evapotranspiration_month'].min()) / (df['Average_evapotranspiration_month'].max() - df['Average_evapotranspiration_month'].min())

future_data['CDI'] = (future_data['Norm_Precipitation'] + (1 - future_data['Norm_Temperature']) + future_data['Norm_Evapotranspiration']) / 3

# Function to predict drought index for a specific city
def predict_drought(model, city, start_year, end_year):
    city_data = grouped_data[(grouped_data['City'] == city) & (grouped_data['Year'].between(start_year, end_year))]
    X_city = city_data[['Year', 'Month', 'Average_temperature_month', 'Average_precipitation_month', 'Average_evapotranspiration_month']]
    y_city = city_data['CDI']
    predictions = model.predict(X_city)
    city_data['Predicted_CDI'] = predictions
    return city_data

# Example prediction for 'Anuradhapura' from 2030 to 2040
city = 'Anuradhapura'
future_predictions = predict_drought(best_model, city, 2030, 2040)

The error is below:

ValueError                                
Cell In[87], line 3
      1 # Example prediction for 'Anuradhapura' from 2030 to 2040
      2 city = 'Anuradhapura'
----> 3 future_predictions = predict_drought(best_model, city, 2030, 2040)
      5 if future_predictions is not None:
      6     plt.figure(figsize=(15, 7))

Cell In[86], line 6, in predict_drought(model, city, start_year, end_year)
      4 X_city = city_data[['Year', 'Month', 'Average_temperature_month', 'Average_precipitation_month', 'Average_evapotranspiration_month']]
      5 y_city = city_data['CDI']
----> 6 predictions = model.predict(X_city)
      7 city_data['Predicted_CDI'] = predictions
      8 return city_data

File /lib/python3.11/site-packages/sklearn/linear_model/_base.py:386, in LinearModel.predict(self, X)
    372 def predict(self, X):
    373     """
    374     Predict using the linear model.
    375 
   (...)
    384         Returns predicted values.
    385     """
--> 386     return self._decision_function(X)

File /lib/python3.11/site-packages/sklearn/linear_model/_base.py:369, in LinearModel._decision_function(self, X)
    366 def _decision_function(self, X):
    367     check_is_fitted(self)
--> 369     X = self._validate_data(X, accept_sparse=["csr", "csc", "coo"], reset=False)
    370     return safe_sparse_dot(X, self.coef_.T, dense_output=True) + self.intercept_

File /lib/python3.11/site-packages/sklearn/base.py:605, in BaseEstimator._validate_data(self, X, y, reset, validate_separately, cast_to_ndarray, **check_params)
    603         out = X, y
    604 elif not no_val_X and no_val_y:
--> 605     out = check_array(X, input_name="X", **check_params)
    606 elif no_val_X and not no_val_y:
    607     out = _check_y(y, **check_params)

File /lib/python3.11/site-packages/sklearn/utils/validation.py:967, in check_array(array, accept_sparse, accept_large_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, ensure_min_samples, ensure_min_features, estimator, input_name)
    965     n_samples = _num_samples(array)
    966     if n_samples < ensure_min_samples:
--> 967         raise ValueError(
    968             "Found array with %d sample(s) (shape=%s) while a"
    969             " minimum of %d is required%s."
    970             % (n_samples, array.shape, ensure_min_samples, context)
    971         )
    973 if ensure_min_features > 0 and array.ndim == 2:
    974     n_features = array.shape[1]

ValueError: Found array with 0 sample(s) (shape=(0, 5)) while a minimum of 1 is required by LinearRegression.

have you tried debugging the app ?

ticktalk
– ticktalk

2024-06-15 11:59:53 +00:00
Commented Jun 15, 2024 at 11:59 — ticktalk
– ticktalk, Commented Jun 15, 2024 at 11:59

desertnaut · Accepted Answer · 2024-06-16 22:08:53Z

0

Try checking in predict_drought function whether following line is not giving empty output. Seems like it is giving empty one. If so check whether your dataset have samples which meet your condition as in the below line. Also try using print command to debug inside the functions.

 city_data = grouped_data[(grouped_data['City'] == city) (grouped_data['Year'].between(start_year, end_year))]

edited Jun 16, 2024 at 22:08

desertnaut

60.8k32 gold badges155 silver badges183 bronze badges

answered Jun 16, 2024 at 9:01

Snegha

413 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Maxime Charrière Over a year ago

Hello and welcome in StackOverflow, and thx for your contribution. For now you don't have the reputation to write a comment, so it's fine, but when available, it will beter to post a comment than an answer if you don't have a final solution.

Collectives™ on Stack Overflow

How I solve "ValueError: Found array with 0 sample(s) (shape=(0, 5)) while a minimum of 1 is required by LinearRegression."

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related