Questions tagged [h2o]

1

votes
1

answer
130

Views

Specifying a url to a .whl file in a conda env .yml file

I want a specific older version of a package (h2o) to be installed when I load a conda env .yml file. However, the older versions for this package only seem to work if I install them using pip directly from a the url hosting the .whl file. For example if I want to install version 3.18.0.8 I need to...
Dan
1

votes
1

answer
22

Views

which h2o components/functionalities are free

I would like to test drive H2O run from R. I can install it locally using install.packages no problem. There are several options to scale H2O up. For example, H2O4GPU and H2O Sparkling Water. For security reasons, we would like to use these options on premise. If we pay for the hardware, would H2O s...
cs0815
1

votes
1

answer
608

Views

How to interpret the output of H2O .predict method for random forest classification?

When I use the predict method on my trained model I get an output that is 1 row and 206 columns. it seems to have 206 values ranging in values from 0-1. This sort of makes sense as the model's output is categorical variable with values 0 and 1 as possible values. But I don't get the 206 values, as I...
mark_maker
1

votes
1

answer
547

Views

LIME (with h2o) explanation error

dI'm new to R and ML but have a focused question that I am trying to answer. I'm using my own data but following Matt Dancho's example here to predict attrition: http://www.business-science.io/business/2017/09/18/hr_employee_attrition.html I have removed zero variance and scaled variables as per his...
Stacy S.
1

votes
0

answer
71

Views

How to get H2O ModelMetricsBinominal using Rest API with Retrofit and h2o-bindings

I've got several (Binominal)-DRF-Models and I'd like to get the ModelMatricsBinominalV3 object to extract the thresholds_and_metric_scores variable. I've implemented a solution without retrofit and bindings, but I want to use h2o-bindings to be able to send and receive pojos since my current solutio...
crenbaerry93
1

votes
2

answer
1.1k

Views

Python h2o frame to np array reshape

I'm a newbie in python. I have an h2o frame table having 1000 rows and 25 columns, I would like to convert this table to numpy array and reshape to (5,5) I used this code: mynarray=np.array([np.array(nrows).astype(np.float32).reshape(5,5) for nrows in myh2oframe]) Error I received is cannot copy seq...
Angel Lordan
1

votes
1

answer
226

Views

Prediction From Loaded Model | DistributedException

There may be an obvious solution to this, as we're new to the H2O platform, though we've been unable to find any conclusive information. We're saving our (H2O-XGBoost) models via Python: h2o.save_model(model=model, path=/path/to/our/models, force=True) Then subsequently loading our models (loaded af...
CoolUser
1

votes
0

answer
64

Views

Add a column value in a row based on every values of this same row

My question might be dumb or anything else. But I was wondering : I want to do structured streaming I want to both aggregate and score the data with a Sparkling Water model So I have this val data_processed = data_raw .withWatermark('timestamp', '10 minutes') .groupBy(window(col('timestamp'),'1 mi...
tricky
1

votes
0

answer
265

Views

H2O python: gbm predictions uncertainty

I am new to h2o in python. I'm fitting a GBM with cross-validation on my training set and then get predictions on a holdout set. My outcome is CONTINOUOUS and for every prediction I would like to have a measure of uncertainty. I'm not interested in prediction intervals, I simply look for an uncerta...
a_geo
1

votes
1

answer
317

Views

h2o GBM: leaf predictions

I'm performing a gridsearch for GBM in h2o for a continuous outcome with continuous predictors. I'm using cross validation for training and then predict on a test set. I'm using the function .predict_leaf_node_assignment: best_gbm.predict_leaf_node_assignment(test_frame_h2o) (where best_gbm is the b...
a_geo
1

votes
0

answer
75

Views

is there a way to introduce cost for one class in h2o.glm in R?

I am using h2o.glm to carry out a prediction. The model requirement is as many true positives as possible at the cost of many false positives. In terms of accuracy or (AUC) the result of h2o.glm is pretty good, however, due to the requirements, I need to play with the cost matrix. C5.0 has a way to...
may
1

votes
0

answer
139

Views

Accessing an HDFS filesystem configured for high availability from H2O

I'm trying to read data out of our Hadoop HDFS filesystem using the h2o.import_file python function. I've set the HADOOP_CONF_DIR environment variable as so: import os os.environ['HADOOP_CONF_DIR'] = '/etc/hadoop/conf' When I try to read a file using the hdfs:///path/to/my/file.txt syntax, H2O gives...
Michael Allman
1

votes
0

answer
154

Views

Perform data transformation on training data inside cross validation

I would like to do cross validation for 5 folds. In each fold, I have a training and valid set. However, due to data issue, I need to transform my data. First, I transform the training data, train the model,apply the transformation rule to the validation data, and then test the model. I need to redo...
Hua
1

votes
0

answer
38

Views

H2O.ai Steam launching cluster

I'm trying to start a H2O cluster in the Steam web interface. The cluster is starting (I can open it after it started), but Steam thinks it failed and doesn't add it to the list of active clusters. It seems it getting a Java Timeout exception after disowning the cluster. So maybe I need to increase...
Markus Wilhelm
1

votes
1

answer
49

Views

Ignoring h2o factor in GLM

When you one-hot encode categorical variables, you usually drop one of the variables before modeling. That way, you don't have a redundant feature that is linearly dependent on the others. Is there a way to specify a level of the categorical variable that should not be used in fitting? From the do...
Sepehr
1

votes
0

answer
216

Views

Saving an h2o pipeline model build using sklearn

I have an sklearn pipeline with h2o preprocessors and h2o estimators. Please see below. pipeline = Pipeline([('standardize', h2o.transforms.preprocessing.H2OScaler()), ('pca', h2o.transforms.decomposition.H2OPCA(k=2)), ('drf', h2o.estimators.random_forest.H2ORandomForestEstimator(ntrees=200))]) pipe...
Anup
1

votes
2

answer
949

Views

“Could not establish link to the H2O cloud http://127.0.0.1:54321 after 20 retries” when importing h2o

I installed h2o for Python 2 using below code in Azure Notebook IDE: !pip install h2o Then imported it using: import h2o However, I get the following error: H2OConnectionError: Could not establish link to the H2O cloud http://127.0.0.1:54321 after 20 retries [07:03.57] H2OServerError: HTTP 503...
Adarsha
1

votes
2

answer
144

Views

(H2O.ai) Does column name or order matter when an estimator predicts on data set?

Do h2o estimators need to have the input data set have the same column names that they were trained on (regardless of if some columns were ignored) or is it the order that matters (in which case, can the ignored columns be replaced with other data)? Eg. When predicting on a data set with an h2o mod...
lampShadesDrifter
1

votes
0

answer
94

Views

How to run H2O from Zeppelin and working on data node

We use H2O over zeppelin(with sparkling water). Zeppelin works on the edge machine with low resources. While running H2O from Zeppelin I can see that the Zeppelin interpreter process is doing all the work (I suppose it's the algorithm processing) and takes a lot of resources from the edge node while...
sagi.l
1

votes
0

answer
59

Views

fold_column when using h2o.grid in R

Using the fold_column parameter leads to an error when using h2o.grid for any algorithm / method. Does anybody know why that is the case? h2o.grid works when I just use nfolds and the estimation - not using the grid function - also works when fold_column is used, but not when I combin h2o.grid and f...
jovogt
1

votes
0

answer
47

Views

h2o autoencoder errors trend positively on test data

I am using H2o with R to train an autoencoder using h2o.deeplearning. My training data used to fit the model is 10000x1000 so there is a real possibility of overfitting (because I have only 10 data cases per variable). The purpose of training the autoencoder is to detect outliers in the test data. W...
Peter
1

votes
0

answer
52

Views

How to output individual tree results for a GBM model, using the POJO file created in H2O.ai?

I just have a POJO model file and the genModel.jar delivered to me. I need to figure out a way to output the individual tree results for that. Please guide me which wrapper and methods to use if this is supported in the POJO model.
vj99899
1

votes
0

answer
42

Views

How do I update my h2o version on AWS working with flow?

I installed h2o using the AMI on the marketplace. It installed 3.14, and I am trying to update the version to the latest stable one of h2o.ai so my co-workers can use flow. How can I best do this? I have tried uninstalling using pip install http://h2o-release.s3.amazonaws.com/h2o/rel-wolpert/4/...
Chris Hawkins
1

votes
0

answer
59

Views

predictBinomial method slow from Java code

I am new to H2o. I am calling the predictBinomial method from a Java application and I am getting the correct results back, but it takes a long time to respond. Here is my scenario: I am exposing a web service method where I receive the name of the class (modelName) and using ClassLoader to load it...
Sergio Vargas
1

votes
1

answer
185

Views

Scoring history with noise in h2o deep learning

I use h2o Deep Learning with Python. My problem is a time series forecasting problem as I want to predict the evolution of the number of sunspots. Here are all the values of sunspots since 1749 : http://www.sidc.be/silso/DATA/SN_ms_tot_V2.0.txt. I want to use a sliding window of 43 months hence my...
T. RB
1

votes
1

answer
246

Views

How to export an h2o model as MOJO from sparkling water in scala, to be loaded by EasyPredictModelWrapper

My goal is to export an h2o model trained on spark with scala (using sparkling-water), such that I can import it in an application without Spark. Thus: using scala (the documentation only shows examples for r and python) export a model which is build using sparkling-water (h2o with spark) import a m...
gerben
1

votes
0

answer
94

Views

Why h2o give different prediction over spark cluster from spark local?

H2O in spark cluster mode giving different predictions from spark local mode. H2O in spark local is giving better than spark cluster why it is happening ,can you help me? Tell me whether it's H2O behaviour. Two Data set are being used. One for training the model and another for scoring. trainingData...
poojanavin
1

votes
1

answer
58

Views

Beta constraints in H2OGeneralizedLinearEstimator

I'm looking for a way to set beta in prior to the model run in H2O GeneralizedLinearEstimator? Beta which can be used as a starting point for the model? It is called beta constraints as per the documentation below. Could someone help me with this. http://docs.h2o.ai/h2o/latest-stable/h2o-docs/data-s...
1

votes
2

answer
148

Views

How to deploy distributed h2o flow cluster with docker?

I'm able to deploy a h2o cluster with ec2 instances and having the private ip in the flatfile. Doing the same with docker works but I can't figure out what to enter into the flatfile so they can create the cluster. Private IP the container is running on is not working
Nick Anderson
1

votes
0

answer
44

Views

Set up H2O steam on local machine

I'm trying to set up H2O steam on a Linux VM. What I did so far: Set up VM with Ubuntu Download and deployed file as described on official page: http://docs.h2o.ai/steam/latest-stable/Installation.html Start jetty with the following command: java -jar var/master/assets/jetty-runner.jar var/master...
JVoelker
1

votes
0

answer
173

Views

h2o.xgboost is throwing null pointer exception

I am trying to run h2o.xgboost() in R and was able to use that successfully in 3.14.0.3 version. But, I recently updated to 3.18.0.8 version and I am getting below error. I tried lot of things but was not able to find reason. Any help will be appreciated. Error: DistributedException from localhost/1...
Rushabh
1

votes
0

answer
62

Views

2 Questions about autoencoder in h2o

Can anyone tell me which kind of auto encoder (sparse, denoising etc.) h2o implements by design or depends this only by the used options? Second Quesition: Whats the difference between H2ODeepLearningEstimator() with autoencoder enabled and H2OAutoEncoderEstimator? Thanks in advance.
s0nic
1

votes
0

answer
303

Views

R - Building Autoencoder model in Caret

I want to build an autoencoder model with the Caret package with the following features: 1) Build an unsupervised neural network model using deep learning autoencoders 2) Using the autoencoder model in (1) as a pre-training input for a supervised model. Online examples on using autoencoder in caret...
user1783739
1

votes
1

answer
62

Views

New version of h2o in R still produces additional row when calling as.h2o on column names with special characters

I am still having the problem outlined by another user in this question: as.h2o produces additional row when column names contain special characters Currently, h2o is on version 3.18.0.11, and it looks like this issue has only been resolved up to 3.18.0.08. I have tried to downgrade the installation...
TuringTester69
1

votes
0

answer
132

Views

Subsetting H2O DataFrame in R

I want to apply many distinct filters to an h2o dataframe to create unique subsets of data. I also want to be conscious of the memory management process that h2o uses, because I will be applying this to gigabytes of data. As far as I can tell from similar questions, there aren't many definitive ans...
kputschko
1

votes
0

answer
55

Views

pysparkling H2OConf interfering with my application log

Here is my code: from pysparkling import H2OConf #commenting this line makes it work import logging logging.basicConfig(filename='my_log.log',level=logging.INFO) logging.info('test') I cannot get the log file to get created, unless I comment the first line of the code. If I do that, then everything...
Tiberiu
1

votes
1

answer
241

Views

How to run a prediction on GPU?

I am using h2o4gpu and the parameters which i have set are h2o4gpu.solvers.xgboost.RandomForestClassifier model. XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bytree=1.0, gamma=0, learning_rate=0.1, max_delta_step=0, max_depth=8, min_child_weight=1, missing=nan, n_es...
Anshul Gupta
1

votes
0

answer
80

Views

H2O .savemodel on network path not working

We have h2o cluster on linux machine (ran through command line), and we are connecting it from our local machine (Windows) which is on the same network. When we try to call saveModel we are getting errors. ERROR: Unexpected HTTP Status code: 412 Precondition Failed (url = http://10.0.0.4:54321/99/Mo...
Sunil Ajagekar
1

votes
0

answer
54

Views

Variance across row in h2o

I am trying to calculate variance of multiple columns across each row. Hence the result would have dimension no_of_rows*1. I tried following way: import pandas as pd test = pd.DataFrame({'p1':[0.8,0.7,0.3],'p10':[0.4,0.6,0.3],'p11':[0.9,0.6,0.4],'p12':[0.44,9.8,0.4],'p13':[0.8,0.4,0.5],'p14':[0....
blehblehbleh
1

votes
0

answer
117

Views

H2o in R Connection reset by peer

I am running h2o in R using h2o.glm(). For some reason I keep getting this error: CURL ERROR: Recv failure: Connection reset by peer I varied the size of my cluster and dataset and yet the connection seems to break. Does anybody know how to solve this I am running out of ideas! Thanks in advance, J
jjm

View additional questions