Questions tagged [scikit-learn]

1

votes
2

answer
4.6k

Views

DLL Load Failed: The specified module could not be found [Python]

Not sure what the issue is... ...but many of the classifiers will not work on my machine now. I just installed version 14.1 of scikit-learn. Could this be a path thing? Traceback (most recent call last): File "hashtag.py", line 19, in from sklearn.linear_model import SGDClassifier File "C:\Anaconda...
Student
1

votes
1

answer
2.5k

Views

CV function in RidgeCV

I am working with the Ridge regression function in sci-kit learn. There is a cross validation function RidgeCV. The basic (example) settings are: RidgeCV(alphas=[0.1, 1.0, 10.0], cv=None, fit_intercept=True, scoring=None, normalize=False,store_cv_values=True) Lets say I wanted to do a 10 fold CV. A...
mpg
1

votes
1

answer
1.9k

Views

Multiclass linear SVM in python that return probability

How can I implement a linear SVM for multi-class which returns the proabability matrix for the test samples. Train samples: mxn Train labels: mxc Test labels : mxc, where column has the probability of each class. The function in sklearn which does "one-vs-the-rest" LinearSVC doesn't return probablit...
Abhishek Bhatia
1

votes
1

answer
2k

Views

scikit-learn joblib: Permission error importing, run in Serial mode

The following permission error occurs when I try importing joblib from script or python -c 'import joblib': /usr/local/lib/python2.7/dist-packages/joblib//joblib_multiprocessing_helpers.py:29: UserWarning: [Errno 13] Permission denied. joblib will operate in serial mode warnings.warn('%s. joblib w...
GJacobs
1

votes
1

answer
1.4k

Views

What is the recommended way to distribute a scikit learn classifier in spark?

I have built a classifier using scikit learn and now I would like to use spark to run predict_proba on a large dataset. I currently pickle the classifier once using: import pickle pickle.dump(clf, open('classifier.pickle', 'wb')) and then in my spark code I broadcast this pickle using sc.broadcast...
eleanora
1

votes
1

answer
8.2k

Views

Plot SVM with Matplotlib?

I have some interesting user data. It gives some information on the timeliness of certain tasks the users were asked to perform. I am trying to find out, if late - which tells me if users are on time (0), a little late (1), or quite late (2) - is predictable/explainable. I generate late from a colum...
Rachel
0

votes
1

answer
25

Views

why r2_score is quite different between train_test_split and pipeline cross_val_score?

I wonder why r2_score is quite different between train_test_split and pipeline cross_val_score? I suspect it's because the model can see the unknown words through CountVectorizer() in the pipeline. But based on concept of Pipeline, CountVectorizer() should only work on training set split by cross_va...
biran wu
1

votes
1

answer
858

Views

Facing ValueError: Target is multiclass but average='binary'

I'm a newbie to python as well as machine learning. As per my requirement, I'm trying to use Naive Bayes algorithm for my dataset. I'm able to find out the accuracy but trying to find out precision and recall for the same. But, it is throwing the following error: "choose another average setting." %...
Intrigue777
1

votes
3

answer
47

Views

In Sklearn, is there a clean way to transform a list of dicts?

I have a list of dicts that I want to scale. To use sklearn scalers, I need to turn the dicts into lists. Then, I will turn the lists back into dicts. This is what I'm doing: keys = sorted(X[0].keys()) scaler = RobustScaler() transformed = scaler.fit_transform([[x[k] for k in keys] for x in X]) X =...
Leo Jiang
0

votes
0

answer
5

Views

GET topic names for each document

I am trying to topic modelling for the documents using the example in this link https://www.w3cschool.cn/doc_scikit_learn/scikit_learn-auto_examples-applications-topics_extraction_with_nmf_lda.html My question How can I know which documents correspond to which topic ? So far this is what i have don...
Usman Rafiq
6

votes
1

answer
152

Views

Training hyperparameters for multidimensional Gaussian process regression

Here is a simple working implementation of a code where I use Gaussian process regression (GPR) in Python's scikit-learn with 2-dimensional inputs (i.e grid over x1 and x2) and 1-dimensional outputs (y). import numpy as np from matplotlib import pyplot as plt from sklearn.gaussian_process import G...
Mathews24
-1

votes
0

answer
16

Views

Python and C# Interprocess

My project is using machine learning to classify an image. I'm sending a string (containing the path of the image) from C# to Python and I'm expecting the label of the image. I'm using the Anaconda 3 distribution, but the problem is that some modules are not found. The error is: "DLL load failed: T...
Dascalu Cosmin
1

votes
3

answer
110

Views

scikit-learn & statsmodels - which R-squared is correct?

I'd like to choose the best algorithm for future. I found some solutions, but I didn't understand which R-Squared value is correct. For this, I divided my data into two as test and training, and I printed two different R squared values ​​below. import statsmodels.api as sm from sklearn.linear_mo...
Mert Yanık
1

votes
2

answer
695

Views

Do I need to scale test data and Dependent variable in the train data?

I am new to the concept of scaling a feature in Machine Learning, I read that scaling will be useful when one feature range is very high when compared to other features. But if I choose to scale the training data then: Can I just scale that one feature that has high range? If I scale the entire X of...
learncode
0

votes
0

answer
6

Views

Optimizing customized loss function with some parameters in sklearn

I have dataset with features x and labels y. I also have some prior knowledge about the functional form of the function with some unknown parameters. I want to infer the parameters (based on OLS solution). I googled a lot but didn't find a solution in sklearn library? Actually, I found another solut...
Delsilon
1

votes
2

answer
3.5k

Views

ValueError: Expected 2D array, got 1D array instead:

While practicing Simple Linear Regression Model I got this error, I think there is something wrong with my data set. Here is my data set: Here is independent variable X: Here is dependent variable Y: Here is X_train Here Is Y_train This is error body: ValueError: Expected 2D array, got 1D array ins...
danyialKhan
1

votes
1

answer
290

Views

Expected 2-D array, got 1-D array instead

from sklearn import MinMaxScaler, StandardScaler import numpy as np a = ([1,2,3],[4,5,6]) stan = StandardScaler() mima = MinMaxScaler() stan.fit_tranform(a) mima.fit_transform(a) results after runnin stan and mima array([[-1., -1., -1.], [ 1., 1., 1.]]) array([[0., 0., 0.], [1., 1., 1.]]) However,...
vivek
1

votes
1

answer
40

Views

PCA computed by GPflow and Sklearn doesn't match

I am performing PCA analysis by using Sklearn and GPflow. I noticed that the output returned by both the libraries doesn't match. Please see below the sample code snippet- import numpy as np from gpflow.models import PCA_reduce from sklearn.decomposition import PCA X = np.random.random((100, 10)) fo...
Ravi Joshi
1

votes
2

answer
46

Views

Input array for fitting method

This code returns the expected results. But there are 2 pandas methods involved. Can I use only 1 method or remove pandas from fit_transform? from sklearn.preprocessing import MinMaxScaler scaler = MinMaxScaler() data = [-1, 2,1, 18] scaler.fit_transform(pd.DataFrame(pd.Series(data))) array([[0....
shantanuo
4

votes
1

answer
35

Views

Pandas.get_dummies return to two columns(_Y and _N) instead of one

I am trying to use sklearn to train a decision tree based on my dataset. When I was trying to slicing the data to (outcome:Y, and predicting variables:X), it turns out that the outcome (my label) is in True/False: #data slicing X = df.values[:,3:27] #X are the sets of predicting variable, dropping...
WY G
0

votes
0

answer
13

Views

invalid version of numpy or scipy or sickitLearn

On a ubuntu 16.04 image in a Docker Container - I try to install : FROM ubuntu:16.04 MAINTAINER Amazon AI RUN apt-get -y update && apt-get install -y --no-install-recommends \ wget \ python3.5 \ nginx \ libgcc-5-dev \ ca-certificates \ && rm -rf /var/lib/apt/lists/* # Here we get all python packag...
Nasri
1

votes
0

answer
19

Views

How to deal with FutureWarning: Int64Index.flags is deprecated and will be removed in a future version?

I'm doing a Grid Search passing custom cross validation folds with a list with indices and getting this warning, Why I'm getting this warnings message? and how to avoid this warning? My indexes are type 'int64'. Note: I don't want to simply suppress the warning. I need a solution so that without a...
Franco Piccolo
1

votes
0

answer
5

Views

LightGBM - sklearnAPI vs training and data structure API and lgb.cv vs gridsearchcv/randomisedsearchcv

What are the differences between the sklearnAPI(LGBMModel, LGBMClassifier etc) and default API(lgb.Dataset, lgb.cv, lgb.train) of lightgbm? Which one should I prefer using? Is it better to use lgb.cv or gridsearchcv/randomisedsearchcv of sklearn when using lightgbm?
Sift
1

votes
1

answer
951

Views

How to store predicted classes matching the pre-vectorized X in Python Scikit-learn?

I would like to use name to predict gender. And not just name but name features like extracting the "last name" as a feature derived from a name. My code's flow is as such, get data into df > specify lr classifier and dv dictVectorizer > use functions to create features > perform dictVectorization >...
KubiK888
1

votes
1

answer
703

Views

SVM Classification: Confidence Interval

Is it possible to get a Z-score from sklearn's svm implementation? So, if it classifies inputs X as [0,1,0,1,1,1,0,0,0], could you get it to output: [0.5,0.78,0.95,0.11,0.34,...], where these are the estimated confidences the learner has in its predictions? If I implemented it myself, would I be abl...
bordeo
1

votes
2

answer
2.4k

Views

Exporting python sklearn models to production (java/c++)

I trained a computer vision classifier consisting of 2 components: a kernel PCA transformation of the data and a SVM binary classification model. These models are trained in Python using SKlearn, but I'd like to use them for an actual computer vision task in c++ and later possibly Java. What's the b...
Sander
1

votes
2

answer
2.4k

Views

TypeError: can't pickle function objects (can't pickle sklearn estimator)

It only happened when using jieba my code: from sklearn.feature_extraction.text import TfidfVectorizer import jieba data = ["十二届全国政协副秘书长黄小祥被免职撤委员资格-人事任免-时政频道-中工网", "银联持卡人境外可获紧急现金支援-财经网", "国...
Mithril
1

votes
1

answer
1.3k

Views

Tfidvectorizer - L2 normalized vector

I want to ensure that the TfidfVectorizer object is returning a l2 normalized vector. I am running a binary classification problem with documents of varied length. I am trying to extract the normalized vectors of each corpora, so I assumed I could just sum up each row of the Tfidfvectorizer matrix....
OAK
1

votes
2

answer
2.6k

Views

How to Calculate F1 measure in multi-label classification?

I am working on sentence category detection Problem. Where each sentence can belong to multiple categories for Example: "It has great sushi and even better service." True Label: [[ 0. 0. 0. 0. 0. 1. 0. 0. 0. 0. 0. 1.]] Pred Label: [[ 0. 0. 0. 0. 0. 1. 0. 0. 0. 0. 0. 1.]] Corr...
Noman Dilawar
0

votes
0

answer
5

Views

How the Naive Bayes works

I already read about the naive bayes that it is a classification technique algorithm and can make predication based on the data you give, but in this example I just cant get it how the output [3,4] came. Following the example: #assigning predictor and target variables x= np.array([[-3,7],[1,5], [1,2...
Mizlul
1

votes
1

answer
475

Views

How can I define a custom kernel function for sklearn.svm.SVC?

I am trying to make a stock prediction system in Python using scikit-learn. Here is my code: import numpy as np import pandas as pd from sklearn.preprocessing import StandardScaler from sklearn.metrics import accuracy_score import matplotlib.pyplot as plt from sklearn import svm,preprocessing from s...
tannishk
1

votes
2

answer
679

Views

Subsample size in scikit-learn RandomForestClassifier

How is it possible to control the size of the subsample used for the training of each tree in the forest? According to the documentation of scikit-learn: A random forest is a meta estimator that fits a number of decision tree classifiers on various sub-samples of the dataset and use averaging to imp...
user6903745
1

votes
1

answer
2.2k

Views

Custom transformer for Scikit Learn Pipeline

I'm using the Scikit learn pipeline object because I have a sequence of tasks to perform (upsampling, feature selection, classification). My upsampling method is a custom one, that means I have to implement a custom transformer for the pipeline. A transformer must have a transform and fit method. Of...
machinery
1

votes
1

answer
888

Views

How term frequency is calculated in TfidfVectorizer?

I searched a lot for understanding this but I am not able to. I understand that by default TfidfVectorizer will apply l2 normalization on term frequency. This article explain the equation of it. I am using TfidfVectorizer on my text written in Gujarati language. Following is details of output about...
Himadri
1

votes
1

answer
5.9k

Views

Python 3: NameError: name 'sklearn' is not defined

I am trying to run an Elastic Net regression but get the following error: NameError: name 'sklearn' is not defined... any help is greatly appreciated! # ElasticNet Regression from sklearn import linear_model import statsmodels.api as sm ElasticNet = sklearn.linear_model.ElasticNet() # create a las...
PineNuts0
1

votes
1

answer
1.8k

Views

Tensorflow DNNClassifier and scikit-learn GridSearchCV issues

It's been a few hours now that I tried performing an hyperparameters optimization over a tensorflow DNN model using GridSearchCV. The latest version of my code is the following: import random from tensorflow.contrib.learn.python import learn from sklearn import datasets from sklearn.model_selection...
Nicola Miotto
1

votes
1

answer
3.4k

Views

How to load and convert .mat file into numpy 2D array?

I have a data in mat file (observations and features) and i want to load it into numpy 2D array. I dont want to convert it into csv first and then load csv into numpy.
1

votes
1

answer
289

Views

Gradient Boosting with a OLS Base Learner

I've been playing with the Boostings function in Sklearn and I've noticed a key difference between sklearn.ensemble.GradientBoostingRegressor and sklearn.ensemble.AdaBoostRegressor. While the latter allows the user to specify the base learner, the former does not. Specifically, sklearn.ensemble.Gr...
Jacob H
1

votes
1

answer
520

Views

How to get KMeans's inertia_ value after using Pipline

I want to combine StandardScaler() and KMeans() by using Pipeline and also check the kmeans's inertia_ because I want to check which number of cluster is best. The code is as following: ks = range(3, 5) inertias = [] inertias_temp = 9999.0 for k in ks: scaler = StandardScaler() kmeans = KMeans(n_clu...
Mei-Chih Chang
1

votes
1

answer
2.8k

Views

LSTM with CRF in Keras

I don't really understand how to combine sklearn_crfsuite and Keras. I have to made a classic LSTM and insteed of the last Activation, I use sklearn_crfsuite? Someone have an example? Thx,
Williamben

View additional questions