1 year ago

#337761

test-img

ShrunkenDown

Cannot fit a Model after Performing Stratified K-Fold Split

I am new to the concept of using K-folds to split into train and test data, which I am practicing with the dataset below.

Context:

A 10-fold split has been performed on the dataset as per the recommendation. However, I cannot fit the y train to the model. This is because it is showing as a 2D array but dimensionality reduction techniques are just appending the second dimension to the first rather than removing it entirely.

import numpy as np
from sklearn.model_selection import StratifiedKFold
from sklearn.linear_model import LogisticRegression

When printing the shape of the y train array:

print(y_train.shape)

I get:

> (7859, 10)

When trying to fit this to a model:

model = LogisticRegression(solver='newton-cg')
model.fit(X_train, y_train)

I receive:

> ValueError: y should be a 1d array, got an array of shape (7859, 10) instead

After searching online for dimensionality reduction, they just appear to merge the array rather than outright delete the second dimension.

For example, if I try to flatten the array and re-fit the model:

y_train = y_train.flatten()
model.fit(X_train, y_train)

I now receive this error:

ValueError: Found input variables with inconsistent numbers of samples: [7859, 78590]

In summary

I want my train and test split:

X_train.shape, X_test.shape, y_train.shape, y_test.shape

Which appears like this after the k-fold train-test split:

((7859, 128), (873, 128), (7859, 10), (873, 10))

To instead look like this:

((7859, 128), (873, 128), (7859,), (873,))

Thank you for any suggestions.

python

numpy

scikit-learn

train-test-split

k-fold

0 Answers

Your Answer

Accepted video resources