# Questions tagged [scikit-learn]

4570 questions

1

votes

2

answer

4.6k

Views

### DLL Load Failed: The specified module could not be found [Python]

Not sure what the issue is...
...but many of the classifiers will not work on my machine now.
I just installed version 14.1 of scikit-learn. Could this be a path thing?
Traceback (most recent call last):
File "hashtag.py", line 19, in
from sklearn.linear_model import SGDClassifier
File "C:\Anaconda...

1

votes

1

answer

2.5k

Views

### CV function in RidgeCV

I am working with the Ridge regression function in sci-kit learn.
There is a cross validation function RidgeCV. The basic (example) settings are:
RidgeCV(alphas=[0.1, 1.0, 10.0], cv=None, fit_intercept=True, scoring=None,
normalize=False,store_cv_values=True)
Lets say I wanted to do a 10 fold CV. A...

1

votes

1

answer

1.9k

Views

### Multiclass linear SVM in python that return probability

How can I implement a linear SVM for multi-class which returns the proabability matrix for the test samples.
Train samples: mxn
Train labels: mxc
Test labels : mxc, where column has the probability of each class.
The function in sklearn which does "one-vs-the-rest" LinearSVC doesn't return probablit...

1

votes

1

answer

2k

Views

### scikit-learn joblib: Permission error importing, run in Serial mode

The following permission error occurs when I try importing joblib from script or python -c 'import joblib':
/usr/local/lib/python2.7/dist-packages/joblib//joblib_multiprocessing_helpers.py:29: UserWarning: [Errno 13] Permission denied. joblib will operate in serial mode
warnings.warn('%s. joblib w...

1

votes

1

answer

1.4k

Views

### What is the recommended way to distribute a scikit learn classifier in spark?

I have built a classifier using scikit learn and now I would like to use spark to run predict_proba on a large dataset. I currently pickle the classifier once using:
import pickle
pickle.dump(clf, open('classifier.pickle', 'wb'))
and then in my spark code I broadcast this pickle using sc.broadcast...

1

votes

1

answer

8.2k

Views

### Plot SVM with Matplotlib?

I have some interesting user data. It gives some information on the timeliness of certain tasks the users were asked to perform. I am trying to find out, if late - which tells me if users are on time (0), a little late (1), or quite late (2) - is predictable/explainable. I generate late from a colum...

0

votes

1

answer

25

Views

### why r2_score is quite different between train_test_split and pipeline cross_val_score?

I wonder why r2_score is quite different between train_test_split and pipeline cross_val_score? I suspect it's because the model can see the unknown words through CountVectorizer() in the pipeline. But based on concept of Pipeline, CountVectorizer() should only work on training set split by cross_va...

1

votes

1

answer

858

Views

### Facing ValueError: Target is multiclass but average='binary'

I'm a newbie to python as well as machine learning. As per my requirement, I'm trying to use Naive Bayes algorithm for my dataset.
I'm able to find out the accuracy but trying to find out precision and recall for the same. But, it is throwing the following error:
"choose another average setting." %...

1

votes

3

answer

47

Views

### In Sklearn, is there a clean way to transform a list of dicts?

I have a list of dicts that I want to scale. To use sklearn scalers, I need to turn the dicts into lists. Then, I will turn the lists back into dicts. This is what I'm doing:
keys = sorted(X[0].keys())
scaler = RobustScaler()
transformed = scaler.fit_transform([[x[k] for k in keys] for x in X])
X =...

0

votes

0

answer

5

Views

### GET topic names for each document

I am trying to topic modelling for the documents using the example in this link https://www.w3cschool.cn/doc_scikit_learn/scikit_learn-auto_examples-applications-topics_extraction_with_nmf_lda.html
My question
How can I know which documents correspond to which topic ?
So far this is what i have don...

6

votes

1

answer

152

Views

### Training hyperparameters for multidimensional Gaussian process regression

Here is a simple working implementation of a code where I use Gaussian process regression (GPR) in Python's scikit-learn with 2-dimensional inputs (i.e grid over x1 and x2) and 1-dimensional outputs (y).
import numpy as np
from matplotlib import pyplot as plt
from sklearn.gaussian_process import G...

-1

votes

0

answer

16

Views

### Python and C# Interprocess

My project is using machine learning to classify an image. I'm sending a string (containing the path of the image) from C# to Python and I'm expecting the label of the image.
I'm using the Anaconda 3 distribution, but the problem is that some modules are not found. The error is: "DLL load failed: T...

1

votes

3

answer

110

Views

### scikit-learn & statsmodels - which R-squared is correct?

I'd like to choose the best algorithm for future. I found some solutions, but I didn't understand which R-Squared value is correct.
For this, I divided my data into two as test and training, and I printed two different R squared values below.
import statsmodels.api as sm
from sklearn.linear_mo...

1

votes

2

answer

695

Views

### Do I need to scale test data and Dependent variable in the train data?

I am new to the concept of scaling a feature in Machine Learning, I read that scaling will be useful when one feature range is very high when compared to other features. But if I choose to scale the training data then:
Can I just scale that one feature that has high range?
If I scale the entire X of...

0

votes

0

answer

6

Views

### Optimizing customized loss function with some parameters in sklearn

I have dataset with features x and labels y. I also have some prior knowledge about the functional form of the function with some unknown parameters. I want to infer the parameters (based on OLS solution). I googled a lot but didn't find a solution in sklearn library?
Actually, I found another solut...

1

votes

2

answer

3.5k

Views

### ValueError: Expected 2D array, got 1D array instead:

While practicing Simple Linear Regression Model I got this error,
I think there is something wrong with my data set.
Here is my data set:
Here is independent variable X:
Here is dependent variable Y:
Here is X_train
Here Is Y_train
This is error body:
ValueError: Expected 2D array, got 1D array ins...

1

votes

1

answer

290

Views

### Expected 2-D array, got 1-D array instead

from sklearn import MinMaxScaler, StandardScaler
import numpy as np
a = ([1,2,3],[4,5,6])
stan = StandardScaler()
mima = MinMaxScaler()
stan.fit_tranform(a)
mima.fit_transform(a)
results after runnin stan and mima
array([[-1., -1., -1.],
[ 1., 1., 1.]])
array([[0., 0., 0.],
[1., 1., 1.]])
However,...

1

votes

1

answer

40

Views

### PCA computed by GPflow and Sklearn doesn't match

I am performing PCA analysis by using Sklearn and GPflow. I noticed that the output returned by both the libraries doesn't match.
Please see below the sample code snippet-
import numpy as np
from gpflow.models import PCA_reduce
from sklearn.decomposition import PCA
X = np.random.random((100, 10))
fo...

1

votes

2

answer

46

Views

### Input array for fitting method

This code returns the expected results. But there are 2 pandas methods involved. Can I use only 1 method or remove pandas from fit_transform?
from sklearn.preprocessing import MinMaxScaler
scaler = MinMaxScaler()
data = [-1, 2,1, 18]
scaler.fit_transform(pd.DataFrame(pd.Series(data)))
array([[0....

4

votes

1

answer

35

Views

### Pandas.get_dummies return to two columns(_Y and _N) instead of one

I am trying to use sklearn to train a decision tree based on my dataset.
When I was trying to slicing the data to (outcome:Y, and predicting variables:X), it turns out that the outcome (my label) is in True/False:
#data slicing
X = df.values[:,3:27] #X are the sets of predicting variable, dropping...

0

votes

0

answer

13

Views

### invalid version of numpy or scipy or sickitLearn

On a ubuntu 16.04 image in a Docker Container -
I try to install :
FROM ubuntu:16.04
MAINTAINER Amazon AI
RUN apt-get -y update && apt-get install -y --no-install-recommends \
wget \
python3.5 \
nginx \
libgcc-5-dev \
ca-certificates \
&& rm -rf /var/lib/apt/lists/*
# Here we get all python packag...

1

votes

0

answer

19

Views

### How to deal with FutureWarning: Int64Index.flags is deprecated and will be removed in a future version?

I'm doing a Grid Search passing custom cross validation folds with a list with indices and getting this warning, Why I'm getting this warnings message? and how to avoid this warning?
My indexes are type 'int64'.
Note: I don't want to simply suppress the warning. I need a solution so that without a...

1

votes

0

answer

5

Views

### LightGBM - sklearnAPI vs training and data structure API and lgb.cv vs gridsearchcv/randomisedsearchcv

What are the differences between the sklearnAPI(LGBMModel, LGBMClassifier etc) and default API(lgb.Dataset, lgb.cv, lgb.train) of lightgbm? Which one should I prefer using?
Is it better to use lgb.cv or gridsearchcv/randomisedsearchcv of sklearn when using lightgbm?

1

votes

1

answer

951

Views

### How to store predicted classes matching the pre-vectorized X in Python Scikit-learn?

I would like to use name to predict gender. And not just name but name features like extracting the "last name" as a feature derived from a name. My code's flow is as such, get data into df > specify lr classifier and dv dictVectorizer > use functions to create features > perform dictVectorization >...

1

votes

1

answer

703

Views

### SVM Classification: Confidence Interval

Is it possible to get a Z-score from sklearn's svm implementation?
So, if it classifies inputs X as [0,1,0,1,1,1,0,0,0], could you get it to output: [0.5,0.78,0.95,0.11,0.34,...], where these are the estimated confidences the learner has in its predictions?
If I implemented it myself, would I be abl...

1

votes

2

answer

2.4k

Views

### Exporting python sklearn models to production (java/c++)

I trained a computer vision classifier consisting of 2 components: a kernel PCA transformation of the data and a SVM binary classification model.
These models are trained in Python using SKlearn, but I'd like to use them for an actual computer vision task in c++ and later possibly Java. What's the b...

1

votes

2

answer

2.4k

Views

### TypeError: can't pickle function objects (can't pickle sklearn estimator)

It only happened when using jieba
my code:
from sklearn.feature_extraction.text import TfidfVectorizer
import jieba
data = ["十二届全国政协副秘书长黄小祥被免职撤委员资格－人事任免－时政频道－中工网", "银联持卡人境外可获紧急现金支援-财经网", "国...

1

votes

1

answer

1.3k

Views

### Tfidvectorizer - L2 normalized vector

I want to ensure that the TfidfVectorizer object is returning a l2 normalized vector. I am running a binary classification problem with documents of varied length.
I am trying to extract the normalized vectors of each corpora, so I assumed I could just sum up each row of the Tfidfvectorizer matrix....

1

votes

2

answer

2.6k

Views

### How to Calculate F1 measure in multi-label classification?

I am working on sentence category detection Problem. Where each sentence can belong to multiple categories for Example:
"It has great sushi and even better service."
True Label: [[ 0. 0. 0. 0. 0. 1. 0. 0. 0. 0. 0. 1.]]
Pred Label: [[ 0. 0. 0. 0. 0. 1. 0. 0. 0. 0. 0. 1.]]
Corr...

0

votes

0

answer

5

Views

### How the Naive Bayes works

I already read about the naive bayes that it is a classification technique algorithm and can make predication based on the data you give, but in this example I just cant get it how the output [3,4] came.
Following the example:
#assigning predictor and target variables
x= np.array([[-3,7],[1,5], [1,2...

1

votes

1

answer

475

Views

### How can I define a custom kernel function for sklearn.svm.SVC?

I am trying to make a stock prediction system in Python using scikit-learn. Here is my code:
import numpy as np
import pandas as pd
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import accuracy_score
import matplotlib.pyplot as plt
from sklearn import svm,preprocessing
from s...

1

votes

2

answer

679

Views

### Subsample size in scikit-learn RandomForestClassifier

How is it possible to control the size of the subsample used for the training of each tree in the forest?
According to the documentation of scikit-learn:
A random forest is a meta estimator that fits a number of decision
tree classifiers on various sub-samples of the dataset and use
averaging to imp...

1

votes

1

answer

2.2k

Views

### Custom transformer for Scikit Learn Pipeline

I'm using the Scikit learn pipeline object because I have a sequence of tasks to perform (upsampling, feature selection, classification). My upsampling method is a custom one, that means I have to implement a custom transformer for the pipeline.
A transformer must have a transform and fit method. Of...

1

votes

1

answer

888

Views

### How term frequency is calculated in TfidfVectorizer?

I searched a lot for understanding this but I am not able to. I understand that by default TfidfVectorizer will apply l2 normalization on term frequency. This article explain the equation of it. I am using TfidfVectorizer on my text written in Gujarati language. Following is details of output about...

1

votes

1

answer

5.9k

Views

### Python 3: NameError: name 'sklearn' is not defined

I am trying to run an Elastic Net regression but get the following error: NameError: name 'sklearn' is not defined... any help is greatly appreciated!
# ElasticNet Regression
from sklearn import linear_model
import statsmodels.api as sm
ElasticNet = sklearn.linear_model.ElasticNet() # create a las...

1

votes

1

answer

1.8k

Views

### Tensorflow DNNClassifier and scikit-learn GridSearchCV issues

It's been a few hours now that I tried performing an hyperparameters optimization over a tensorflow DNN model using GridSearchCV. The latest version of my code is the following:
import random
from tensorflow.contrib.learn.python import learn
from sklearn import datasets
from sklearn.model_selection...

1

votes

1

answer

3.4k

Views

### How to load and convert .mat file into numpy 2D array?

I have a data in mat file (observations and features) and i want to load it into numpy 2D array. I dont want to convert it into csv first and then load csv into numpy.

1

votes

1

answer

289

Views

### Gradient Boosting with a OLS Base Learner

I've been playing with the Boostings function in Sklearn and I've noticed a key difference between sklearn.ensemble.GradientBoostingRegressor and sklearn.ensemble.AdaBoostRegressor. While the latter allows the user to specify the base learner, the former does not. Specifically, sklearn.ensemble.Gr...

1

votes

1

answer

520

Views

### How to get KMeans's inertia_ value after using Pipline

I want to combine StandardScaler() and KMeans() by using Pipeline and also check the kmeans's inertia_ because I want to check which number of cluster is best.
The code is as following:
ks = range(3, 5)
inertias = []
inertias_temp = 9999.0
for k in ks:
scaler = StandardScaler()
kmeans = KMeans(n_clu...

1

votes

1

answer

2.8k

Views

### LSTM with CRF in Keras

I don't really understand how to combine sklearn_crfsuite and Keras.
I have to made a classic LSTM and insteed of the last Activation, I use sklearn_crfsuite?
Someone have an example?
Thx,