Questions tagged [ml]

1

votes
1

answer
165

Views

Does ML.NET CategoricalOneHotVectorizer encode testing data as well?

I'm not sure how ML.NET CategoricalOneHotVectorizer works, from their sample code, var pipeline = new LearningPipeline { // ... extra code ... new CategoricalOneHotVectorizer('VendorId', 'RateCode', 'PaymentType'), // ... extra code ... new FastTreeRegressor() }; looks to me that once we call model...
HuyNA
1

votes
1

answer
1.8k

Views

How to get best params after tuning by pyspark.ml.tuning.TrainValidationSplit?

I'm trying to tune the hyper-parameters of a Spark (PySpark) ALS model by TrainValidationSplit. It works well, but I want to know which combination of hyper-parameters is the best. How to get best params after evaluation ? from pyspark.ml.recommendation import ALS from pyspark.ml.tuning import Train...
takaomag
1

votes
2

answer
920

Views

pyspark: getting the best model's parameters after a gridsearch is blank {}

could someone help me extract the best performing model's parameters from my grid search? It's a blank dictionary for some reason. from pyspark.ml.tuning import ParamGridBuilder, TrainValidationSplit, CrossValidator from pyspark.ml.evaluation import BinaryClassificationEvaluator train, test = df.ra...
user798719
1

votes
2

answer
47

Views

google ml-engine cloud storage as a file

I am working in Python with Google Cloud ML-Engine. The documentation I have found indicates that data storage should be done with Buckets and Blobs https://cloud.google.com/ml-engine/docs/tensorflow/working-with-cloud-storage However, much of my code, and the libraries it calls works with files....
user1902291
1

votes
1

answer
262

Views

ValueError: List argument 'values' to 'ConcatV2' Op with length 0 shorter than minimum length 2 3Dball

Executing '3Dball' creates some errors in Unity ml-agent When I execute PPO.ipynb, there is no error till 'Load the environment'. Executing 'Train the Agents' there are some errors ValueError: List argument 'values' to 'ConcatV2' Op with length 0 shorter than minimum length 2. This is the code I exe...
deepsigner
1

votes
1

answer
98

Views

Train Tensorflow on Google Cloud ML

I have a model that I am trying to train on my local machine, but it needs more RAM than I have on my computer. Because of this, I wish to train this model on Google Cloud ML. This model that I am trying to train uses Reinforcement Learning and takes some actions and receives rewards from an environ...
Thiago Medeiros
1

votes
1

answer
388

Views

ERROR: Couldn't match files for checkpoint gs://obj-detection/train/model.ckpt

I run my detection model on google cloud ml and got this error while running the evaluation script. I found this link that mentioned about this issue, but it seems like the issue's till not be solved. Anyone knows how to fix this? Any helps would be appreciated. Thanks. ERROR 2018-02-04 12:53:10 -06...
Tung Le
1

votes
1

answer
275

Views

Issues importing tf Transform on cloud ml

Whenever I try to import tensorflow-transform on an mlengine job I get the following issue: Traceback (most recent call last): File '/usr/lib/python2.7/runpy.py', line 162, in _run_module_as_main 'main', fname, loader, pkg_name) File '/usr/lib/python2.7/runpy.py', line 72, in _run_code exec code in...
Max Deng
1

votes
1

answer
333

Views

PySpark ML Pipeline.load results throws java.lang.UnsupportedOperationException: empty collection

I have a PySpark fitted pipeline that I am saving to disk for later use. Here is my pipeline code : model = Pipeline(stages=[segment_indexer, model_name_indexer, make_name_indexer, engine_type_indexer, segment_encoder, model_name_incoder, make_name_incoder, engine_type_incoder, x_assembler, estim...
Anand Hemmige
1

votes
0

answer
184

Views

Deserialization of a Spark model causes “Spark exception : Failed to get broadcast_40_piece0 of broadcast_40”

I am building a Text classifier pipeline in spark ml with the following stages: Ngram, Vectorizer, IDF, Logistic regression. Since we're not using a distributed file system I've to save the generated model on the file system by deserializing the model as a java/scala object. (Spark's default model p...
Durga Swaroop
1

votes
0

answer
33

Views

Execute spark jobs in Azure ML studio

I am trying to run some spark scripts using execute python script in azure ML studio. And getting an error saying unable to import spark libraries Basically i am trying to create web services using ML studio for the models that are developed. Is it possible or feasible to run spark jobs using ML st...
Ravi Kiran
1

votes
1

answer
371

Views

Spark - How to use QuantileDiscretizer with RandomForestClassifier

Is it possible to use QuantileDiscretizer, keeping NaN values, with a RandomForestClassifier? I have been getting an error like this: 18/03/23 17:38:15 ERROR Executor: Exception in task 3.0 in stage 133.0 (TID 381) java.lang.IllegalArgumentException: DecisionTree given invalid data: Feature 1 is cat...
boechat107
1

votes
0

answer
99

Views

CPN ML: how do i get a product containing a certain element, from a list

I'm trying to make a function with arguments a and bs, that will check if list bs contains a product whos first element is a. If list bs contains a product whos first element is a, then that product is returned. The function is giving a nondescriptive error relating to a file 'evalloop.sml'. fun mat...
skullkrusher
1

votes
1

answer
46

Views

issue in Decision Tree Classifier

I am trying to run Decision Tree classifier , the label is having double schema, and value from -20 to +20 import org.apache.spark.ml.classification.DecisionTreeClassifier import org.apache.spark.ml.classification.DecisionTreeClassificationModel import org.apache.spark.ml.evaluation.BinaryClassifi...
Parv bali
1

votes
0

answer
363

Views

'Couldn't resolve host 'metadata' while accessing ML model from local

I have issues accessing the ML model on google cloud. While running the command gcloud ml-engine local predict --model-dir=$MODEL_BINARIES --json-instances=examples.json I see the following issue. 2018-04-06 12:20:36.276951: I tensorflow/core/platform/cloud/retrying_utils.cc:77] The operation failed...
vra44
1

votes
0

answer
174

Views

Run keras job with OpenAI Gym Atari on gcloud ML Engine

I'm having trouble running a keras+TF+GymAI Atari job on cloud ML engine. In particular, when using the Atari environment, do you need to pickle the data and put it in the storage bucket or can you run efficiently when you install Gym without actually storing the atari environments in the bucket? I'...
P Stout
1

votes
1

answer
77

Views

how to put string as input and run on ml engine

I want to build a Chinese sentiment analytics on Google ML Engine. my input is string as sentence and I also need do some string process such as replace newline, split string to chars and pad char sequence to fixed length. this is my sample code, I want to try my idea: import tensorflow as tf input_...
Henry Chen
1

votes
0

answer
100

Views

Google cloud-ml stuck without logs after several iterations

I am training a TF ML job on cloud-ml, and it seems the job is stuck after a few iterations (900 iterations). Surprisingly, when I run the code locally it works fine, and also hyper tuning on GCP continues training but runs slower than my local laptop which has a 1060GTX GPU. I am also using the run...
Shahin R. Namin
1

votes
0

answer
124

Views

Is there a way to extract metadata about a SPARK-ML Model/CrossValidator that gives input and output?

i'm building a REST service for scoring against ML pipelines created in Spark ML. For this I'd need to know the input data format (attribute names and types) and output data format for predictions. Lets say i have the following formula = RFormula( formula='approve ~ age + balance + jobIndx + marital...
Ralf Mueller
1

votes
2

answer
562

Views

Error creating model version using “gcloud ml-engine versions create”

When I create a version of a machine learning model (whether it is my own model or the ML Engine census example) using the command: gcloud ml-engine versions create v1 \ --model $MODEL_NAME \ --origin $MODEL_BINARIES \ --runtime-version 1.4 I get an error saying: ERROR: (gcloud.ml-engine.versions.cr...
prsr
1

votes
1

answer
352

Views

Online (incremental) logistic regression in Spark [duplicate]

This question already has an answer here: Whether we can update existing model in spark-ml/spark-mllib? 2 answers In Spark MLlib (RDD-based API) there is the StreamingLogisticRegressionWithSGD for incremental training of a Logistic Regression model. However, this class has been deprecated and offer...
S Leon
1

votes
0

answer
207

Views

Pyspark : The evaluation in the cross validation based on a user-defined metric

I am a newbie in Spark . I installed PySpark 2.3.0 on Windows. I am working on a dataset that contains 3 classes : 'Positive','Negative','Neutral'. I want to apply cross validation using LinearSVC , but for evaluation I want to use the average F1-score for the 2 classes 'positive' and 'negative' onl...
Sarsoura
1

votes
1

answer
123

Views

Getting argument missing error In ParamgridBuilder on Pyspark

I am currently implementing Gradientboost classification model in Pyspark.Based on kaggle dataset My current final columns after fitting pipeline is I am now trying parameter tuning by PARAMGRIDBUILD. here is my Parameter grid build code param_grid=ParamGridBuilder.addGrid(gradboost.maxDepth,[2,3,4...
Kalyan
1

votes
1

answer
396

Views

“Import error:No module named Cython.Build” while training on Google Cloud ML Engine

I am trying to train a model on Google Cloud ML Engine with this command. I installed tensorflow with Anaconda.But while I training model , this error appears: -Import error:No module named Cython.Build Command 'python setup.py egg_info' failed with error code 1 in /tmp/pip-install-0eA9cj/pycocotool...
LadyLyanna
1

votes
0

answer
42

Views

Cloud ML Engine error while training my model (key='dos', hash_bucket_size=100, dtype=tf.string)

I don't know what this error means! Can you please help me with an explanation? I'm trying to build my model and when I add a string column this error occurs, but when I add a numeric column and launch the training job, this runs successfully. Given: {}'.format(column)) ValueError: Items of feature_...
Sofia Amel
1

votes
0

answer
127

Views

Pyspark Logistic Regression has zero coefficients after fitting

Good afternoon. I am solving a multi-label classification problem with the help of LogisticRegression in pyspark. However, after I fit a model to the data, all elements of the CoefficientMatrix of the model are zeroes. I noticed, that if I decrease a number of samples in the training set to some lev...
Ilya Golubev
1

votes
0

answer
24

Views

Spark ML convert Map of counts to feature

I have a Scala Map of seenCounts in specific places, eg.: Map(beach -> 31, cafe -> 140, prison -> 2) How should I convert such type of data to features for machine learning? Currently I construct a List[String] of items and use CountVectorizer to convert it to feature, however I am loosing informati...
Aivaras
1

votes
0

answer
249

Views

How to efficiently calculate WoE in PySpark

I want to calculate Weight of Evidence on a feature column depending on the binary target column, is there a way to efficiently do that in Spark? As of now, Spark still doesn't have any inbuilt API for calculating WoE. I've built using a few Spark-SQL queries which is as follows (here item is one co...
Aakash Basu
1

votes
0

answer
108

Views

Google cloud ML engine Batch Predict on GPU

If I train with standard_p_100 GPU's and am running some batch predict jobs with the trained models, is there a way for me to specify or request that the batch predictions be performed on GPU's? For comparison, the batch prediction for an epoch of training data seems to be taking 8-10x than it would...
reese0106
1

votes
2

answer
253

Views

Reduce size of Tensorflow SavedModel for Google ML Engine deployment

I have developed and trained a CNN Keras model and now I want to deploy this model to Google Machine Learning Engine, so I can execute predictions using their API. I have converted to SavedModel format and the export/saved_model.pb has 14MB and the /export/variables/ directory has around 380MB. Goog...
1

votes
3

answer
445

Views

ml.net sentiment analysis warning about format errors & bad values

I've been having a problem with my ml.net console app. This is my first time using ml.net in Visual Studio so I was following this tutorial from microsoft.com, which is a sentiment analysis using binary classification. I'm trying to process some test data in the form of tsv files to get a positive o...
1

votes
1

answer
215

Views

Apply multiple SparkML pipelines to a single DataFrame

I trained several ml pipelines with SparkML and persisted them in HDFS. Now, I want to apply the pipelines to the same dataframe. I implemented a generic scoring class which reads in the pipelines along with the data, applies each of the pipelines to the dataframe and appends the models predictions...
zero1
1

votes
1

answer
654

Views

Pyspark ERROR:py4j.java_gateway:An error occurred while trying to connect to the Java server (127.0.0.1:50532)

Hello I was working with Pyspark,implementing a sentiment analysis project using ML package first time the cofde work good but suddenly it becomes showing the error mentionened above Does someone can help please Here is the full error description ERROR:py4j.java_gateway:An error occurred while tryi...
jowwel93
1

votes
1

answer
126

Views

Tensorflow checkpoints are not correctly saved when using gcloud compute unit instead of local

When I train locally using google cloud buckets as data source and destination with: gcloud ml-engine local train --module-name trainer.task_v2s --package-path trainer/ I get normal results and checkpoints are getting saved properly in 20 seps since my dataset is 400 examples and I use 20 as batchsi...
user2368505
1

votes
1

answer
125

Views

Is cross-validation faster without using pipelines in spark-ml?

Suppose I have many steps in my feature engineering: I would have many transformers in my pipeline. I am wondering how is Spark handling these transformers during the cross-validation of the pipeline: are they executed for each fold? Would it be faster to apply the transformers before cross-validati...
Arius
1

votes
1

answer
132

Views

ML Console application throwing native dll not found error

I created one Dotnet core application using VS 2017 and then published it and then copied the published folder having runtimes folder and my application dll to Windows server 2016 where I had installed the dotnet core framework. Even time I run the application from command line I get the below error...
Devesh
1

votes
0

answer
182

Views

Using google ml engine prediction for a sci-kit learn model which needs additional modules

I have my pipeline defined in a separate file model.py class TextSelector(BaseEstimator, TransformerMixin): def __init__(self, field): self.field = field def fit(self, X, y=None): return self def transform(self, X): return X[self.field] class NumberSelector(BaseEstimator, TransformerMixin): def __in...
Harrison
1

votes
2

answer
189

Views

(400) Bad Request when trying to send an image to my custom AutoML model via the REST API

I'm trying to implement my custom AutoML model in C# by sending images via the REST API, but I keep getting different errors. The one I currently have is: The remote server returned an error: (400) Bad Request. I have taken an image and converted into a string of bytes called byteString and have cre...
user3636407
1

votes
1

answer
193

Views

Authenticate Javascript Program for Gcloud AutoML Vision API

I am currently working on a project whereby I have used gcloud automl to train an image classifier. I have got it working fine and it is able to handle my requests using access-tokens. However, my issue lies in that access-tokens only last for an hour. I would like to be able to create a method in m...
Anish Khanna
1

votes
0

answer
43

Views

input and output format for Swagger YAML

I followed the below tutorial to create my machine learning app in Google Cloud: https://github.com/GoogleCloudPlatform/ml-on-gcp/tree/master/sklearn/gae_serve#steps I need to construct a 'modelserve.yaml' at first to define my input and output such as this file: https://github.com/GoogleCloudPlatfo...
Soheil Novinfard

View additional questions