1 year ago
#337761
ShrunkenDown
Cannot fit a Model after Performing Stratified K-Fold Split
I am new to the concept of using K-folds to split into train and test data, which I am practicing with the dataset below.
Context:
- The Dataset is the Kaggle UrbanSound8k set available at https://www.kaggle.com/datasets/chrisfilo/urbansound8k
- I am using SciKitLearn and NumPy when I encounter my problem in a Jupyter Notebook with Python.
A 10-fold split has been performed on the dataset as per the recommendation. However, I cannot fit the y train to the model. This is because it is showing as a 2D array but dimensionality reduction techniques are just appending the second dimension to the first rather than removing it entirely.
import numpy as np
from sklearn.model_selection import StratifiedKFold
from sklearn.linear_model import LogisticRegression
When printing the shape of the y train array:
print(y_train.shape)
I get:
> (7859, 10)
When trying to fit this to a model:
model = LogisticRegression(solver='newton-cg')
model.fit(X_train, y_train)
I receive:
> ValueError: y should be a 1d array, got an array of shape (7859, 10) instead
After searching online for dimensionality reduction, they just appear to merge the array rather than outright delete the second dimension.
For example, if I try to flatten the array and re-fit the model:
y_train = y_train.flatten()
model.fit(X_train, y_train)
I now receive this error:
ValueError: Found input variables with inconsistent numbers of samples: [7859, 78590]
In summary
I want my train and test split:
X_train.shape, X_test.shape, y_train.shape, y_test.shape
Which appears like this after the k-fold train-test split:
((7859, 128), (873, 128), (7859, 10), (873, 10))
To instead look like this:
((7859, 128), (873, 128), (7859,), (873,))
Thank you for any suggestions.
python
numpy
scikit-learn
train-test-split
k-fold
0 Answers
Your Answer