Questions tagged [data-science-experience]
60 questions
0
votes
0
answer
3
Views
After installing docker 2.0.0.3 and IBM DSX , getting error on Windows 10
Getting issue while installing IBM DSX on windows 10.getting some error after installation
1
votes
2
answer
496
Views
ValueError: Invalid endpoint: s3-api.xxxx.objectstorage.service.networklayer.com
I'm trying to access a csv file in my Watson Data Platform catalog. I used the code generation functionality from my DSX notebook: Insert to code > Insert StreamingBody object.
The generated code was:
import os
import types
import pandas as pd
import boto3
def __iter__(self): return 0
# @hidden_ce...
1
votes
0
answer
21
Views
DSX desktop install NOT working (on x86 laptop)
I have tried multiple times and DSX desktop install does not work
I am trying to install on a win7 laptop
I have selected Docker, Jupyter with spark (around 6.6GB) but it always ends up installation Docker and then hangs (as in the progress bar does not proceed further and is stuck at 25% for a LONG...
1
votes
1
answer
152
Views
Referring to parent attribute in pandas
This is my json
{
'fInstructions': [
{
'id': 155,
'type':'finstruction',
'ref': '/spm/finstruction/155',
'iLineItem':[
{
'id': 156,
'type':'ilineitem',
'ref': '/spm/ilineitem/156',
'creationDate': '2018-03-09',
'dueDate':'2018-02-01',
'effectiveDate':'2018-03-09',
'frequency':'01',
'coveredPeriodFro...
1
votes
0
answer
184
Views
Adding Brunel to DSX(Toree)
I am using IBM's Data Science Experience (DSX) and want to show some visualizations with Scala.
According to datascience - visualization I have 3 options, Pixiedust, Brunel and Lightning.
As far as I saw Lightning is a WIP and requires a server(?) to run the visualizations, Pixiedust's support of Sc...
1
votes
1
answer
127
Views
How can I apply Normalized mean Absolute for my model accuracy in movie ratings recommendation system::
Hello it will be very helpful if some one can help me out with NMAE (Normalized mean average Error to find the accuracy of the model:
NMAE=∑(|predicted rating – real rating|) / n(max rate – min rate)
I have given an example how my model is giving the data set output:
I have been using R progra...
1
votes
0
answer
93
Views
Unable to connect to Cloud object storage instance IBM Watson Studio
I am trying to connect to COS from IBM Watson studio but I get an error...
When I hit enter I get the following error:
Unable to find products data_catalog or data_science_experience in the
entitlements response for account id: 51373fa1b8bf36fd9d78574d19af0d11.
1
votes
0
answer
35
Views
IBM Watson Studio Installation Problem Stuck at “Pre-install script timeout, trying again”
I am installing the Watson Studio 1.2.2 on a 5 nodes environment
However, the installation is stuck at the
'Testing the connection to [email protected] 1 '192.168.123.130' '
'Pre-install script timeout, trying again'
For your reference
I install the Watson Studio at the 5 nodes environment ( 3 Control/Sto...
1
votes
0
answer
33
Views
Predicting churn of customer using Machine Learning with lag
I have data of 5000 customers over time series (monthly) which looks like:
This is my first time dealing with time series data. Can someone explain some strategies for Churn prediction probability (3 months, 6 months) in advance?
I am confused because for every customer churning probability 3 or 6...
0
votes
0
answer
6
Views
Is there a way to calculate the coefficient of the Correlation of binary variables between a and b?
So there are two variables
a -- Who is greater than 40 year old (BINARY 0 or 1)
b -- If they have a Luxury Car (Binary 0 or 1)
Now they have the data sum values.
Total sample size -- 500
Total number of people above 40 are -- 60
Total number of peopl...
1
votes
1
answer
67
Views
Is it possible for a spark job on bluemix to see a list of the other processes on the operating system?
A common approach for connecting to third party systems from spark is to provide the credentials for the systems as arguments to the spark script. However, this raises some questions about security. E.g. See this question Bluemix spark-submit -- How to secure credentials needed by my Scala jar
Is...
1
votes
2
answer
78
Views
Converting sensor tag data in DSX
I'm working on converting the existing recipe for Data Science Experience (DSX) to use data from a connected Sensor Tag device. However the mobile applications for that device send the data as strings rather than numerics - this is causing the DSX recipe that calculates a Z score to choke. The data...
1
votes
1
answer
36
Views
How can I migrate a DSX Notebook to Spark 2.0?
It's currently tied to Spark 1.6, but I want to use SparkSession, among other new features in Spark. How can I do the migration without copying every cell to a new notebooK?
1
votes
1
answer
316
Views
sc is not created automatically in notebook
A notebook I created yesterday in DSX has stoped working - errors re can't find the sc object
'NameError: global name 'sc' is not defined'
I restarted the kernel but can't get it created. I have no other kernel running.
I created a new notebook - Spark 2.0 with Python 2 and literally nothing in it...
1
votes
1
answer
129
Views
Q: Not able to install DTW algorithm in DSX's R notebook
I am trying to install DTW package of R in DSX's R notebook.
when running install command:
install.packages('dtw')
gives following error:
'installation of package ‘dtw’ had non-zero exit status” warning.
1
votes
1
answer
248
Views
How do I implement the TensorFrames Spark package on Data Science Experience?
I've been able to import the package:
import pixiedust
pixiedust.installPackage('databricks:tensorframes:0')
But when I try a simple example:
import tensorflow as tf
import tensorframes as tfs
from pyspark.sql import Row
data = [Row(x=[float(x), float(2 * x)],
key=str(x % 2),
z = float(x+1)) for x i...
1
votes
2
answer
815
Views
HDF5 dataset from MATLAB to Pandas DataFrame in Python
I have .mat files with HDF5 data and I want to load it into Python (Pandas DataFrame). I can load the file:
f2 = h5py.File('file.mat')
f2['data']
which is an HDF5 dataset:
If I read it with Pandas:
g = pd.read_hdf('file.mat','data')
I get the following error:
cannot create a storer if the object is...
1
votes
2
answer
391
Views
Import a zip file to Python Notebook in IBM Data Science Experience(IBM DSX)
I have a zip file train.zip(1.1GB) which I wanted to import into a Python Notebook, unzip and then set out to work on it. I imported it as a String IO object utilizing the option Inert StringIO Object.
from io import StringIO
import requests
import json
import pandas as pd
# @hidden_cell
# This func...
1
votes
1
answer
61
Views
Unable to run yum command on IBM DSX notebook
I am unable to run yum command in DSX environment. I need yum command access to install some packages.
Here's the error I am seeing when I type in '!yum install sox' command in DSX notebook:
Could not find platform independent libraries
Could not find platform dependent libraries
Consider setting...
1
votes
1
answer
54
Views
Unable to start Scala 2.11 with Spark 2.0 in IBM DSX notebook
When attempting to start any notebook on IBM DSX with the Scala 2.11/Spark 2.0 kernel, I get the following error:
Dead kernel The kernel has died, and the automatic restart has failed.
It is possible the kernel cannot be restarted. If you are not able to
restart the kernel, you will still be able to...
1
votes
1
answer
100
Views
How can i handle a lot of data with timestamp in arangodb?
i am new to handling a lot of data.
Every 100ms i write actually 4 json blocks to my arangodb in a collection.
the content of the json ist something like that:
{
'maintenence': {
'holder_1': 1,
'holder_2': 0,
'holder_3': 0,
'holder_4': 0,
'holder_5': 0,
'holder_6': 0
},
'error': 274,
'pos': {
'left'...
1
votes
2
answer
192
Views
Can not install the CRAN packages “viridis”, “Hmisc” on IBM DSX R Environment Notebooks
I am trying to install the CRAN Hmisc package in an R Environment Notebook on IBM DSX. But it repeatedly fails with the following Error:
install.packages('Hmisc')
also installing the dependencies ‘checkmate’, ‘rstudioapi’, ‘Formula’, ‘latticeExtra’, ‘acepack’, ‘gridExtra’, ...
1
votes
1
answer
57
Views
DSX notification if scheduled notebook does not run?
I'm trying to troubleshoot an hourly scheduled notebook as per this question:
How to troubleshoot a DSX scheduled notebook?
When listing the kernel logs I noticed at 3am the notebook was not scheduled:
kernel-pyspark-20170104_230002.log
kernel-pyspark-20170105_010001.log
kernel-pyspark-20170105_020...
1
votes
3
answer
648
Views
No FileSystem for scheme: cos
I'm trying to connect to IBM Cloud Object Storage from IBM Data Science Experience:
access_key = 'XXX'
secret_key = 'XXX'
bucket = 'mybucket'
host = 'lon.ibmselect.objstor.com'
service = 'mycos'
sqlCxt = SQLContext(sc)
hconf = sc._jsc.hadoopConfiguration()
hconf.set('fs.cos.myCos.access.key', acces...
1
votes
1
answer
583
Views
matplotlib - ImportError: No module named _tkinter
I have a simple notebook with the following code:
%matplotlib inline
However, when running it I get the following error:
ImportError: No module named _tkinter
I have another notebook in the same project, and that one is able to run the statement without issue.
The data science experience is a manage...
1
votes
2
answer
69
Views
unable to save changes in jupyter notebook on DSX
Occasionally, I'm unable to save changes to my notebook in DSX. I believe this is because my session has timed out.
How can I prevent my changes from being lost?
1
votes
2
answer
79
Views
Spark-cloudant package 1.6.4 loaded by %AddJar does not get used by notebook
I'm trying to use the latest spark-cloudant package with a notebook:
%AddJar -f https://github.com/cloudant-labs/spark-cloudant/releases/download/v1.6.4/cloudant-spark-v1.6.4-167.jar
Which outputs:
Starting download from https://github.com/cloudant-labs/spark-cloudant/releases/download/v1.6.4/clouda...
1
votes
1
answer
44
Views
how to remove yourself from a DSX project?
It is possible for users to add you to a DSX project, however, after a while you can end up belonging to a lot of other projects.
How can you remove yourself from another project? Is it possible to do this, or do you need to contact the project owner and ask them to remove you?
See also: https://da...
1
votes
1
answer
148
Views
How to add spark packages to Spark R notebook on DSX?
The spark documentation shows how a spark package can be added:
sparkR.session(sparkPackages = 'com.databricks:spark-avro_2.11:3.0.0')
I believe this can only be used when initialising the session.
How can we add spark packages for SparkR using a notebook on DSX?
1
votes
1
answer
86
Views
How to handle input file with non standard delimitation in dsx ml pipeline?
I'm trying to work with a data set that has no header and has :: for field delimmiters:
! wget --quiet http://files.grouplens.org/datasets/movielens/ml-1m.zip
! unzip ml-1m.zip
! mv ml-1m/ratings.dat .
! head ratings.dat
The output:
1::1193::5::978300760
1::661::3::978302109
1::914::3::978301968
I h...
1
votes
1
answer
373
Views
How do I access files on Bluemix Object Storage from Python on Data Science Experience?
I'd like to copy a file to local memory on DSX so that I can create a pandas dataframe using read_csv. I don't want to use the given 'insert to code' option because that assumes column headers and it isn't as pretty to code. Here is my code:
import swiftclient
IBM_Objectstorage_Connection = swiftc...
1
votes
1
answer
355
Views
AssertionError: Multiple .dist-info directories on Data Science Experience
In a Python 3.5 notebook, backed by an Apache Spark service, I had installed BigDL 0.2 using pip. When removing that installation and trying to install version 0.3 of BigDL, I get this error: (linebreaks added for readability)
AssertionError: Multiple .dist-info directories:
/gpfs/fs01/user/scbc-4db...
1
votes
1
answer
22
Views
Pandas Getting each business day of a year by using date range function
I am trying to get all business day of a year by using pandas date_range function.But i am missing some necessary parameters to get my desired result.
pd.date_range('2015-01-01', '2015-12-31', freq='D')
DatetimeIndex(['2015-01-01', '2015-01-02', '2015-01-03', '2015-01-04',
'2015-01-05', '2015-01-06'...
1
votes
2
answer
50
Views
How to find average after sorting month column in python
I have a challenge in front of me in python.
| Growth_rate | Month |
| ------------ |-------|
| 0 | 1 |
| -2 | 1 |
| 1.2 | 1 |
| 0.3 | 2 |
| -0.1 | 2 |
| 7 | 2 |
| 9 | 3 |
| 4.1 | 3...
2
votes
2
answer
186
Views
How do I load data from a StreamingBody object using Insert to Code to pandas in Watson Studio?
The Insert to Code feature enables you to access data stored in Cloud Object Storage when working in Jupyter notebooks in Watson Studio. Some file types (e.g. txt files) will have just StreamingBody and Credentials as insert to code options:
How can I use the StreamingBody object to access my data?
3
votes
2
answer
1.4k
Views
how to use the “display” function in a scala 2.11 with Spark 2.0 notebook in dsx
In dsx is there a way to use 'display' in a scala 2.11 with Spark 2.0 notebook (I know it can be done in a python notebook with pixiedust). Eg:
display(spark.sql('SELECT COUNT(zip), SUM(pop), city FROM hive_zips_table
WHERE state = 'CA' GROUP BY city ORDER BY SUM(pop) DESC'))
But I want to do the s...
1
votes
2
answer
123
Views
Brunel 2.3 TypeError after installation via pip
I am using the Brunel visualisation package for Python 2 on IBM Data Science Experience.
After I installed the latest version of brunel via !pip install brunel==2.3, I get the following error upon the first usage:
TypeError: Package org.brunel.util.D3Integration.getDatasetNames is not Callable
What...
2
votes
3
answer
424
Views
Netezza Drivers not available in Spark (Python Notebook) in DataScienceExperience
I have a project code in Python Notebook and it ran all good when Spark was hosted in Bluemix.
We are running the following code to connect to Netezza (on premises) which worked fine in Bluemix.
VT = sqlContext.read.format('jdbc').options(url='jdbc:netezza://169.54.xxx.x:xxxx/BACC_PRD_ISCNZ_GAPNZ'...
1
votes
2
answer
456
Views
How to connect to Cloudand/CouchDB using SparkSQL in DataScience Experience?
formerly CouchDB was supported via the cloudant connector:
https://github.com/cloudant-labs/spark-cloudant
But this project states that it is no longer active and that it moved to Apache Bahir:
http://bahir.apache.org/docs/spark/2.1.1/spark-sql-cloudant/
So I've installed the JAR in a Scala notebook...
2
votes
1
answer
1.1k
Views
Write csv to Ibm bluemix object storage from DSX python 2.7 notebook
I am trying to write a pandas dataframe as CSV to Bluemix Object Storage from a DSX Python notebook. I first save the dataframe to a 'local' CSV file. I then have a routine that attempts to write the file to Object Storage. I get a 413 response - object too large. The file is only about 3MB. He...