Questions tagged [data-science-experience]

0

votes
0

answer
3

Views

After installing docker 2.0.0.3 and IBM DSX , getting error on Windows 10

Getting issue while installing IBM DSX on windows 10.getting some error after installation
Venkatesh
1

votes
2

answer
496

Views

ValueError: Invalid endpoint: s3-api.xxxx.objectstorage.service.networklayer.com

I'm trying to access a csv file in my Watson Data Platform catalog. I used the code generation functionality from my DSX notebook: Insert to code > Insert StreamingBody object. The generated code was: import os import types import pandas as pd import boto3 def __iter__(self): return 0 # @hidden_ce...
Chris Snow
1

votes
0

answer
21

Views

DSX desktop install NOT working (on x86 laptop)

I have tried multiple times and DSX desktop install does not work I am trying to install on a win7 laptop I have selected Docker, Jupyter with spark (around 6.6GB) but it always ends up installation Docker and then hangs (as in the progress bar does not proceed further and is stuck at 25% for a LONG...
Deepak C Shetty
1

votes
1

answer
152

Views

Referring to parent attribute in pandas

This is my json { 'fInstructions': [ { 'id': 155, 'type':'finstruction', 'ref': '/spm/finstruction/155', 'iLineItem':[ { 'id': 156, 'type':'ilineitem', 'ref': '/spm/ilineitem/156', 'creationDate': '2018-03-09', 'dueDate':'2018-02-01', 'effectiveDate':'2018-03-09', 'frequency':'01', 'coveredPeriodFro...
More Than Five
1

votes
0

answer
184

Views

Adding Brunel to DSX(Toree)

I am using IBM's Data Science Experience (DSX) and want to show some visualizations with Scala. According to datascience - visualization I have 3 options, Pixiedust, Brunel and Lightning. As far as I saw Lightning is a WIP and requires a server(?) to run the visualizations, Pixiedust's support of Sc...
Anton.P
1

votes
1

answer
127

Views

How can I apply Normalized mean Absolute for my model accuracy in movie ratings recommendation system::

Hello it will be very helpful if some one can help me out with NMAE (Normalized mean average Error to find the accuracy of the model: NMAE=∑(|predicted rating – real rating|) / n(max rate – min rate) I have given an example how my model is giving the data set output: I have been using R progra...
Debjyoti Das
1

votes
0

answer
93

Views

Unable to connect to Cloud object storage instance IBM Watson Studio

I am trying to connect to COS from IBM Watson studio but I get an error... When I hit enter I get the following error: Unable to find products data_catalog or data_science_experience in the entitlements response for account id: 51373fa1b8bf36fd9d78574d19af0d11.
deltascience
1

votes
0

answer
35

Views

IBM Watson Studio Installation Problem Stuck at “Pre-install script timeout, trying again”

I am installing the Watson Studio 1.2.2 on a 5 nodes environment However, the installation is stuck at the 'Testing the connection to [email protected] 1 '192.168.123.130' ' 'Pre-install script timeout, trying again' For your reference I install the Watson Studio at the 5 nodes environment ( 3 Control/Sto...
W KC
1

votes
0

answer
33

Views

Predicting churn of customer using Machine Learning with lag

I have data of 5000 customers over time series (monthly) which looks like: This is my first time dealing with time series data. Can someone explain some strategies for Churn prediction probability (3 months, 6 months) in advance? I am confused because for every customer churning probability 3 or 6...
bazinga
0

votes
0

answer
6

Views

Is there a way to calculate the coefficient of the Correlation of binary variables between a and b?

So there are two variables a -- Who is greater than 40 year old (BINARY 0 or 1) b -- If they have a Luxury Car (Binary 0 or 1) Now they have the data sum values. Total sample size -- 500 Total number of people above 40 are -- 60 Total number of peopl...
VPapz
1

votes
1

answer
67

Views

Is it possible for a spark job on bluemix to see a list of the other processes on the operating system?

A common approach for connecting to third party systems from spark is to provide the credentials for the systems as arguments to the spark script. However, this raises some questions about security. E.g. See this question Bluemix spark-submit -- How to secure credentials needed by my Scala jar Is...
Chris Snow
1

votes
2

answer
78

Views

Converting sensor tag data in DSX

I'm working on converting the existing recipe for Data Science Experience (DSX) to use data from a connected Sensor Tag device. However the mobile applications for that device send the data as strings rather than numerics - this is causing the DSX recipe that calculates a Z score to choke. The data...
Skilganon
1

votes
1

answer
36

Views

How can I migrate a DSX Notebook to Spark 2.0?

It's currently tied to Spark 1.6, but I want to use SparkSession, among other new features in Spark. How can I do the migration without copying every cell to a new notebooK?
J. Bloom
1

votes
1

answer
316

Views

sc is not created automatically in notebook

A notebook I created yesterday in DSX has stoped working - errors re can't find the sc object 'NameError: global name 'sc' is not defined' I restarted the kernel but can't get it created. I have no other kernel running. I created a new notebook - Spark 2.0 with Python 2 and literally nothing in it...
amadain
1

votes
1

answer
129

Views

Q: Not able to install DTW algorithm in DSX's R notebook

I am trying to install DTW package of R in DSX's R notebook. when running install command: install.packages('dtw') gives following error: 'installation of package ‘dtw’ had non-zero exit status” warning.
Saurabh Gupta
1

votes
1

answer
248

Views

How do I implement the TensorFrames Spark package on Data Science Experience?

I've been able to import the package: import pixiedust pixiedust.installPackage('databricks:tensorframes:0') But when I try a simple example: import tensorflow as tf import tensorframes as tfs from pyspark.sql import Row data = [Row(x=[float(x), float(2 * x)], key=str(x % 2), z = float(x+1)) for x i...
Ross Lewis
1

votes
2

answer
815

Views

HDF5 dataset from MATLAB to Pandas DataFrame in Python

I have .mat files with HDF5 data and I want to load it into Python (Pandas DataFrame). I can load the file: f2 = h5py.File('file.mat') f2['data'] which is an HDF5 dataset: If I read it with Pandas: g = pd.read_hdf('file.mat','data') I get the following error: cannot create a storer if the object is...
Ross Lewis
1

votes
2

answer
391

Views

Import a zip file to Python Notebook in IBM Data Science Experience(IBM DSX)

I have a zip file train.zip(1.1GB) which I wanted to import into a Python Notebook, unzip and then set out to work on it. I imported it as a String IO object utilizing the option Inert StringIO Object. from io import StringIO import requests import json import pandas as pd # @hidden_cell # This func...
Abhishek Anand
1

votes
1

answer
61

Views

Unable to run yum command on IBM DSX notebook

I am unable to run yum command in DSX environment. I need yum command access to install some packages. Here's the error I am seeing when I type in '!yum install sox' command in DSX notebook: Could not find platform independent libraries Could not find platform dependent libraries Consider setting...
sudhir koka
1

votes
1

answer
54

Views

Unable to start Scala 2.11 with Spark 2.0 in IBM DSX notebook

When attempting to start any notebook on IBM DSX with the Scala 2.11/Spark 2.0 kernel, I get the following error: Dead kernel The kernel has died, and the automatic restart has failed. It is possible the kernel cannot be restarted. If you are not able to restart the kernel, you will still be able to...
Pål
1

votes
1

answer
100

Views

How can i handle a lot of data with timestamp in arangodb?

i am new to handling a lot of data. Every 100ms i write actually 4 json blocks to my arangodb in a collection. the content of the json ist something like that: { 'maintenence': { 'holder_1': 1, 'holder_2': 0, 'holder_3': 0, 'holder_4': 0, 'holder_5': 0, 'holder_6': 0 }, 'error': 274, 'pos': { 'left'...
mok liee
1

votes
2

answer
192

Views

Can not install the CRAN packages “viridis”, “Hmisc” on IBM DSX R Environment Notebooks

I am trying to install the CRAN Hmisc package in an R Environment Notebook on IBM DSX. But it repeatedly fails with the following Error: install.packages('Hmisc') also installing the dependencies ‘checkmate’, ‘rstudioapi’, ‘Formula’, ‘latticeExtra’, ‘acepack’, ‘gridExtra’, ...
Sumit Goyal
1

votes
1

answer
57

Views

DSX notification if scheduled notebook does not run?

I'm trying to troubleshoot an hourly scheduled notebook as per this question: How to troubleshoot a DSX scheduled notebook? When listing the kernel logs I noticed at 3am the notebook was not scheduled: kernel-pyspark-20170104_230002.log kernel-pyspark-20170105_010001.log kernel-pyspark-20170105_020...
Chris Snow
1

votes
3

answer
648

Views

No FileSystem for scheme: cos

I'm trying to connect to IBM Cloud Object Storage from IBM Data Science Experience: access_key = 'XXX' secret_key = 'XXX' bucket = 'mybucket' host = 'lon.ibmselect.objstor.com' service = 'mycos' sqlCxt = SQLContext(sc) hconf = sc._jsc.hadoopConfiguration() hconf.set('fs.cos.myCos.access.key', acces...
Chris Snow
1

votes
1

answer
583

Views

matplotlib - ImportError: No module named _tkinter

I have a simple notebook with the following code: %matplotlib inline However, when running it I get the following error: ImportError: No module named _tkinter I have another notebook in the same project, and that one is able to run the statement without issue. The data science experience is a manage...
Chris Snow
1

votes
2

answer
69

Views

unable to save changes in jupyter notebook on DSX

Occasionally, I'm unable to save changes to my notebook in DSX. I believe this is because my session has timed out. How can I prevent my changes from being lost?
Chris Snow
1

votes
2

answer
79

Views

Spark-cloudant package 1.6.4 loaded by %AddJar does not get used by notebook

I'm trying to use the latest spark-cloudant package with a notebook: %AddJar -f https://github.com/cloudant-labs/spark-cloudant/releases/download/v1.6.4/cloudant-spark-v1.6.4-167.jar Which outputs: Starting download from https://github.com/cloudant-labs/spark-cloudant/releases/download/v1.6.4/clouda...
Chris Snow
1

votes
1

answer
44

Views

how to remove yourself from a DSX project?

It is possible for users to add you to a DSX project, however, after a while you can end up belonging to a lot of other projects. How can you remove yourself from another project? Is it possible to do this, or do you need to contact the project owner and ask them to remove you? See also: https://da...
Chris Snow
1

votes
1

answer
148

Views

How to add spark packages to Spark R notebook on DSX?

The spark documentation shows how a spark package can be added: sparkR.session(sparkPackages = 'com.databricks:spark-avro_2.11:3.0.0') I believe this can only be used when initialising the session. How can we add spark packages for SparkR using a notebook on DSX?
Chris Snow
1

votes
1

answer
86

Views

How to handle input file with non standard delimitation in dsx ml pipeline?

I'm trying to work with a data set that has no header and has :: for field delimmiters: ! wget --quiet http://files.grouplens.org/datasets/movielens/ml-1m.zip ! unzip ml-1m.zip ! mv ml-1m/ratings.dat . ! head ratings.dat The output: 1::1193::5::978300760 1::661::3::978302109 1::914::3::978301968 I h...
Chris Snow
1

votes
1

answer
373

Views

How do I access files on Bluemix Object Storage from Python on Data Science Experience?

I'd like to copy a file to local memory on DSX so that I can create a pandas dataframe using read_csv. I don't want to use the given 'insert to code' option because that assumes column headers and it isn't as pretty to code. Here is my code: import swiftclient IBM_Objectstorage_Connection = swiftc...
Ross Lewis
1

votes
1

answer
355

Views

AssertionError: Multiple .dist-info directories on Data Science Experience

In a Python 3.5 notebook, backed by an Apache Spark service, I had installed BigDL 0.2 using pip. When removing that installation and trying to install version 0.3 of BigDL, I get this error: (linebreaks added for readability) AssertionError: Multiple .dist-info directories: /gpfs/fs01/user/scbc-4db...
Roland Weber
1

votes
1

answer
22

Views

Pandas Getting each business day of a year by using date range function

I am trying to get all business day of a year by using pandas date_range function.But i am missing some necessary parameters to get my desired result. pd.date_range('2015-01-01', '2015-12-31', freq='D') DatetimeIndex(['2015-01-01', '2015-01-02', '2015-01-03', '2015-01-04', '2015-01-05', '2015-01-06'...
Lucky
1

votes
2

answer
50

Views

How to find average after sorting month column in python

I have a challenge in front of me in python. | Growth_rate | Month | | ------------ |-------| | 0 | 1 | | -2 | 1 | | 1.2 | 1 | | 0.3 | 2 | | -0.1 | 2 | | 7 | 2 | | 9 | 3 | | 4.1 | 3...
Parth Dhir
2

votes
2

answer
186

Views

How do I load data from a StreamingBody object using Insert to Code to pandas in Watson Studio?

The Insert to Code feature enables you to access data stored in Cloud Object Storage when working in Jupyter notebooks in Watson Studio. Some file types (e.g. txt files) will have just StreamingBody and Credentials as insert to code options: How can I use the StreamingBody object to access my data?
Joe Plumb
3

votes
2

answer
1.4k

Views

how to use the “display” function in a scala 2.11 with Spark 2.0 notebook in dsx

In dsx is there a way to use 'display' in a scala 2.11 with Spark 2.0 notebook (I know it can be done in a python notebook with pixiedust). Eg: display(spark.sql('SELECT COUNT(zip), SUM(pop), city FROM hive_zips_table WHERE state = 'CA' GROUP BY city ORDER BY SUM(pop) DESC')) But I want to do the s...
Vik M
1

votes
2

answer
123

Views

Brunel 2.3 TypeError after installation via pip

I am using the Brunel visualisation package for Python 2 on IBM Data Science Experience. After I installed the latest version of brunel via !pip install brunel==2.3, I get the following error upon the first usage: TypeError: Package org.brunel.util.D3Integration.getDatasetNames is not Callable What...
Sven Hafeneger
2

votes
3

answer
424

Views

Netezza Drivers not available in Spark (Python Notebook) in DataScienceExperience

I have a project code in Python Notebook and it ran all good when Spark was hosted in Bluemix. We are running the following code to connect to Netezza (on premises) which worked fine in Bluemix. VT = sqlContext.read.format('jdbc').options(url='jdbc:netezza://169.54.xxx.x:xxxx/BACC_PRD_ISCNZ_GAPNZ'...
Sagar KSK
1

votes
2

answer
456

Views

How to connect to Cloudand/CouchDB using SparkSQL in DataScience Experience?

formerly CouchDB was supported via the cloudant connector: https://github.com/cloudant-labs/spark-cloudant But this project states that it is no longer active and that it moved to Apache Bahir: http://bahir.apache.org/docs/spark/2.1.1/spark-sql-cloudant/ So I've installed the JAR in a Scala notebook...
Romeo Kienzler
2

votes
1

answer
1.1k

Views

Write csv to Ibm bluemix object storage from DSX python 2.7 notebook

I am trying to write a pandas dataframe as CSV to Bluemix Object Storage from a DSX Python notebook. I first save the dataframe to a 'local' CSV file. I then have a routine that attempts to write the file to Object Storage. I get a 413 response - object too large. The file is only about 3MB. He...
Ted Morris

View additional questions