# Questions tagged [classification]

2637 questions

1

votes

1

answer

525

Views

### Trading in precision for better recall in Keras classification neural net

There's always a tradeoff between precision and recall. I'm dealing with a multi-class problem, where for some classes I have perfect precision but really low recall.
Since for my problem false positives are less of an issue than missing true positives, I want reduce precision in favor of increasin...

1

votes

2

answer

269

Views

### Correct implementation of weighted K-Nearest Neighbors

From what I understood, the classical KNN algorithm works like this (for discrete data):
Let x be the point you want to classify
Let dist(a,b) be the Euclidean distance between points a and b
Iterate through the training set points pᵢ, taking the distances dist(pᵢ,x)
Classify x as the most frequ...

-2

votes

0

answer

19

Views

### How many number of neurons are in the first input layer model.add(Conv2D(64, kernel_size=(3, 3),input_shape=(200,200,3))

1)how many number of neurons in the input layer? I'm giving the input size of image as 200*200
2)I guess the number of neurons for input layer should be number of (features) pixels of an input image (in this case 200*200)
3)what if there are more number of neurons in the input layer than the feature...

0

votes

0

answer

4

Views

### Classification: Target with more than 2 classes

I am doing a classification exercise and facing a target with more than 2 categorical classes. I have encoded those classes using the Labelencoder.
The only problem is, I believe I might have to use Onehotencoding after as I do not have only zero and 1 anymore but 0,1,2,3.
The reality is, I just do...

1

votes

3

answer

50

Views

### How can I solve a classification problem with a dependent variable with more than two values

I have a simple NLP problem, where I have some written reviews that have a simple binary positive or negative judgement. In this case I am able to train and test as independent variables the columns of X that contain the 'bags of words', namely the single words in a sparse matrix.
from sklearn.feat...

1

votes

1

answer

126

Views

### Machine learning algorithm score changes without any change in data or step

I am new to Machine learning and getting started with Titanic problem on Kaggle. I have written a simple algorithm to predict the result on test data.
My question/confusion is, every time, I execute the algorithm with the same dataset and the same steps, the score value changes (last statement in th...

0

votes

1

answer

13

Views

### Text Classification Approach

I have data with 2 important columns, Product Name and Product Category. I wanted to classify a search term into a category. The approach (in Python using Sklearn & DaskML) to create a classifier was:
Clean Product Name column for stopwords, numbers, etc.
Create 90% 10% train-test split
Convert text...

1

votes

2

answer

46

Views

### How to perform SMOTE with cross validation in sklearn in python

I have a highly imbalanced dataset and would like to perform SMOTE to balance the dataset and perfrom cross validation to measure the accuracy. However, most of the existing tutorials make use of only single training and testing iteration to perfrom SMOTE.
Therefore, I would like to know the correct...

0

votes

0

answer

4

Views

### Training The Binary Image classifer (cats/dogs) with one more class

I am using the following code:
https://github.com/llSourcell/how_to_make_an_image_classifier/blob/master/demo.ipynb
Where he does a simple implementation of cats vs dogs. I have used my own imageset which consists of just 240 training images and 50 test samples and I have tried this same model. The...

1

votes

0

answer

406

Views

### features selection for large dataset in python

I have a Document-term matrix of dimension 3144469 x 268496 for which i need to do feature selection.I tried it doing with feature selection of Sckit-learn using code
fs = feature_selection.SelectPercentile(feature_selection.chi2, percentile=40)
documenttermmatrix_train= fs.fit_transform(documentter...

1

votes

0

answer

43

Views

### Neural Network Classification with several input of different type

I've started to study neural networks and now I'm learning to use them to classify objects.
But I have 1 doubts:
How should I represent the input array if the inputs have different type (e.g. number and string)?
For example if i have to classify an apartment (array with the same type (all int)),...

1

votes

1

answer

106

Views

### When neural network loss is descending but the accuracy is not increased?

I implement a batch-based back-propagation algorithm for a neural network with one hidden layer and sigmoid activation function. The output layer is one-hot Sigmoid layer. The net of first layer is z1. After apply sigmoid it becomes a1. similarly, we have z2 and a2 for the second layer.
The back-pr...

1

votes

1

answer

713

Views

### Multiclass classification of text in R

I have build a random forest for multiclass text classification. The model returned an accuracy of 75 %. There are 6 labels, however out of the 6 classes, only 3 are classified and rest are not classified. I would really appreciate if anyone could let me know what went wrong.
Below are the steps i f...

1

votes

1

answer

122

Views

### Analyze Data Set on WEKA

I'm new to WEKA and I would ask you if anyone can help me to understand if i'm using WEKA correctly.
1) I have a Dat set including 11377 record classified as follows:
11111 records have class YES
266 records have class NO
(For some reason, i can use only J48 algorithm for classification)
When I sele...

1

votes

0

answer

72

Views

### How TF-IDF handles missing values?

I am working on a classification problem in which I have to classify product category based on the information of the product like title, description and other attributes.
It is working for different categories but getting biased in closed categories like mobile and mobile accessories.
Let's say I h...

1

votes

0

answer

76

Views

### the implementation of lazy multi label classifiers in Mulan

I want to use k nearest neighbor for multi label classification. there are some classifiers based on knn which are implemented in mulan library, or are written in C or Matlab such as MLKNN.
when I use the same classifier for numeric dataset I get identical result,
but for nominal dataset such as sla...

1

votes

0

answer

514

Views

### Is passing sklearn tfidf matrix to train MultinomialNB model proper?

I'm do some text classification tasks. What I have observed is that if fed tfidf matrix(from sklearn's TfidfVectorizer), Logistic Regression model is always outperforming MultinomialNB model. Below is my code for training both:
X = df_new['text_content']
y = df_new['label']
X_train, X_test, y_train,...

1

votes

1

answer

818

Views

### How do you do ROI-Pooling on Areas smaller than the target size?

I am currently trying to get the Faster R-CNN network from here to work in windows with tensorflow. For that, I wanted to re-implement the ROI-Pooling layer, since it is not working in windows (at least not for me. If you got any tips on porting to windows with tensorflow, I would highly appreciate...

1

votes

0

answer

552

Views

### Text Categorization by uisng mlr package in R

I need to train a model which would perform multilabel multiclass categorization on text data.
Currently, i'm using mlr package in R. But unluckily I didn't proceed further because of the error I got it before training a model.
More specifically I'm stuck in this place:
classify.task = makeMultilabe...

1

votes

2

answer

1.1k

Views

### Keras Dense Net Overfitting

I am attempting to use keras to build an activity classifier from accelerometer signals. However, I am experiencing extreme overfitting of the data even with the most simplistic of models.
The input data is of shape (10,3) and contains roughly .1 second of data from the accelerometer in 3 dimension...

1

votes

0

answer

104

Views

### Python classification technique naive bayes

I am doing a research on classification techniques. I found a code online for Naive Bayes Classification in python. I have shared the code below. But I am getting errors in it. Please help in solving the errors. The software I am using is Anaconda with Python 3.6 in it.
The code is as follows:
impo...

1

votes

0

answer

74

Views

### Suggestions for block sizes in dask for my matrix (doing onehot and classification)

I am new to using dask, although I have experience in parallel computing and other libraries. I was wondering if someone had good suggestions about which block sizes I should use.
I have the done the following workflow previously in memory using scikit-learn with a smaller matrix. I would like to...

1

votes

0

answer

247

Views

### tensorflow text classification using softmax

I'm new to both tensorflow and machine learning and I'm playing with the Enron dataset to classify the top 10 senders. I found some nice examples in kaggle that uses scikit-learn and that works but when I try the same with tensorflow the accuracy is embarrassingly bad.
Below is what i'm doing
Load t...

1

votes

1

answer

432

Views

### How do I properly use the weight_column when working with tf.estimator.DNNClassifier in Tensorflow (or how do I make a biased cost function)?

I am using https://www.tensorflow.org/api_docs/python/tf/estimator/DNNClassifier
Let's say I have a Classification problem. Attempting to classify 2 things. Class1 is Happy Face, Class2 is Not Happy Face. In this particular scenario, when looking at 1,000+ samples every day, I just want to grab the...

1

votes

0

answer

26

Views

### How to select features when you have image(pixels) with extra information(categories)?

Suppose you need to train your classifier on a dataset that has images as well as more descriptor features available (along with the labels of-course).
For eg. if you have to classify cats vs dogs, and you are provided with the image, weight and age of each animal. If I just had the image, I could...

1

votes

0

answer

132

Views

### Sklearn MultinomialNB gives 1 probability for some class for few examples?

I used MultinomialNB from sklearn for some text data.
Data contains 12 class.
And its classification task.
After applying MultinomialNB with CounterVectorizer i checked few example's predicted class probability.And for some reasons one class shows 1.0 probability.
[[ 3.91049692e-23 , 2.50074669e-...

1

votes

0

answer

41

Views

### How to choose training data from a satellite imagery for supervised classification?

I am performing supervised classification of Sentinel 2 imagery using a Random Forest Classifier. I wish to select the training data from the image. Could anyone please tell me the method to efficiently perform this?

1

votes

0

answer

400

Views

### Using weighted_cross_entropy_with_logits for multilabel sparse classification

I have a multilabel classification problem where each tuple in the training data set is labeled with one or more class and the number of classes in the data set is large ~500, resulting in sparse target vectors as
[1, 0, 0, ..., 1, 0, 0, ...].
I am using Keras with Tensorflow backend to build the cl...

1

votes

0

answer

39

Views

### How to deploy scikit-learn classifier model into ANN written in another language

I have a scikit-learn classifier:
MLPClassifier(activation='relu', alpha=0.0001, batch_size='auto', beta_1=0.9,
beta_2=0.999, early_stopping=False, epsilon=1e-08,
hidden_layer_sizes=(13, 13, 13), learning_rate='constant',
learning_rate_init=0.001, max_iter=500, momentum=0.9,
nesterovs_momentum=True,...

1

votes

1

answer

211

Views

### Using fitctree to train a more sensitive model with an imbalanced training set

I'm trying to build a decision tree in MATLAB for binary classification. I have 4 features for each instance. There are around 25,000 instances in the positive class and 350,000 instances in the negative class.
I've tried building classifiers both within the classification learner app and using fit...

1

votes

0

answer

237

Views

### Do scikit-learn classifiers automatically one-hot encode?

I'm confused by the behavior of the fit methods for the scikit-learn classifiers. I'm preprocessing my array that identifies the classes such that they are one-hot encoded, e.g, the shape is (n_samples, n_classes).
However, when I try to use algorithms like SVC or logistic regression, I get the foll...

1

votes

0

answer

82

Views

### Python: Pipeline use some result of first classifier to second classifier (sklearn)

I want to use GaussianNB to classify into category A / B then use MultiNomialNB to classify only type A into sub-categories a1/ a2/ a3
My question is how can I insert another first classifier into pipeline and use
only A result to be input of second classifier?
what I have now:
pipeline1 = Pipeline...

1

votes

0

answer

125

Views

### R - Document-context matrix from dtm-tf and word embeddings

I have a term-frequency, document-term matrix (dtm-tf) in which each row is a document, each column is a term, and each number in the matrix represents the number of occurences of the term in the document. I also have a term-context matrix (a matrix of word vectors/embeddings) where each row is a te...

1

votes

0

answer

109

Views

### Document representation with pre-trained Word Vectors for Author Classification/Regression (GP)

I am trying to replicate (https://arxiv.org/abs/1704.05513) to do a Big 5 author classification on Facebook data (posts and Big 5 profiles are given).
After removing the stop words, I embed each word in the file with their pre-trained GloVe word vectors. However, computing the average or coordinate-...

1

votes

0

answer

115

Views

### Extracting information from sentence : NER or other ways?

What I'm trying to do now is extracting 'customers'names' from a firm's disclosure text.
What I have done up to now stated as below:
Classify every sentences in disclosure data whether it contains information about its customers or not by machine learning(1 if it contains customer data, 0 if not)
So...

1

votes

0

answer

443

Views

### Custom loss function Keras backend

I'm getting stuck at implementing a custom loss function that should measure the Recall of the classified data.
for a more detailed problem description, see:
Classification: skewed data within a class
I have implemented it with numpy arrays, but how would one translate this to Keras-backend? Does an...

1

votes

0

answer

31

Views

### How can use libsvm for multiclass pixel-based classification in matlab?

I'm working with libsvm and I must perform a multiclass pixel-based classification. I want to classify an image which contains Four classes. For training, I have extracted SURF dense features for each class and out it them in Data_Train.xlsx in which first column is the class and the rest is the SUR...

1

votes

0

answer

335

Views

### tf-slim resnet pretrained model can't get correct results

I am using pretrained resnet50 model provided by tensorflow slim. When I am using this model to inference, I can't get correct result. Does anyone can help me to solve problem?
The follow is the code I used to do inference.
The image preprocess method following this issue ResNet pre-processing: VGG...

1

votes

1

answer

382

Views

### C# Accord.net. text classfication

I have unknown number of columns in my TFIDF vector.
my clasificaton code is:
double[][] inputs = table.ToJagged('ColumnName1','columnName2');
int[] outputs = table.Columns[2].ToArray();
var teacher = new NaiveBayesLearning();
var nb = teacher.Learn(inputs, outputs);
i don't know how to pass unknown...

1

votes

0

answer

257

Views

### Pipeline : add another feature to text classification in Python (FeatureUnion)

I am attempting to implement a text classification solution using scikit learn.
I have been able to get results for simple classification of text. Now I want to add another feature (non-text) into the prediction process - to improve accuracy.
My data-set is as follows :
label : the target value i.e,...