Questions tagged [data-science]

0

votes
0

answer
3

Views

After installing docker 2.0.0.3 and IBM DSX , getting error on Windows 10

Getting issue while installing IBM DSX on windows 10.getting some error after installation
Venkatesh
1

votes
1

answer
40

Views

Pandas, groupby and counting data in others columns

I have data with four columns, that includes: Id, CreationDate, Score and ViewCount. The CreationDate has a next format, for example: 2011-11-30 19:41:14.960. I need to groupby the years of CreationDate, count them, summing Score and ViewCount also, and to add to additional columns. I want to use wi...
morris
1

votes
1

answer
47

Views

How to calculate the steepness of a trend in python

I am using the regression slope as follows to calculate the steepness (slope) of the trend. Scenario 1: For example, consider I am using sales figures (x-axis: 1, 4, 6, 8, 10, 15) for 6 days (y-axis). from sklearn.linear_model import LinearRegression regressor = LinearRegression() X = [[1], [4], [6]...
Emi
1

votes
2

answer
77

Views

Resample pandas dataframe and interpolate missing values for timeseries data

I need to resample timeseries data and interpolate missing values in 15 min intervals over the course of an hour. Each ID should have four rows of data per hour. In: ID Time Value 1 1/1/2019 12:17 3 1 1/1/2019 12:44 2 2 1/1/2019 12:02 5 2 1/1/2019 12:28 7 Out:...
primo7
1

votes
1

answer
61

Views

Count of duplicates of a list in Pandas Dataframe by group

I have a Dataframe that currently looks like this: image source label bookshelf A [flora, jar, plant] bookshelf B [indoor, shelf, wall] bookshelf C [furniture, shelf, shelving] cactus A...
alexcu
1

votes
3

answer
45

Views

Concepts to measure text “relevancy” to a subject?

I do side work writing/improving a research project web application for some political scientists. This application collects articles pertaining to the U.S. Supreme Court and runs analysis on them, and after nearly a year and half, we have a database of around 10,000 articles (and growing) to work w...
ecole96
0

votes
0

answer
6

Views

how to handle exceptions in pyspark, when data is unproper order?

actually i am creating small RDD from some unorderd data, like it doesn't have same number of columns in each row. so i am taking it as tuple type with maximum line index. here what i am getting problem is when i am accessing tuple[4],tuple[9] like this some rows doesn't have 9 index and all, so in...
jodu
0

votes
1

answer
14

Views

can not split large .txt file into train, test and validation parts for deep text corrector

I have a single large .txt file and I want to split it into train, test and validation set. below are the lines of code where I want to use those flies. I am not getting any intuition about how to do it. python correct_text.py --train_path /movie_dialog_train.txt \ --val_path /movie_dialog_val.txt...
SRajput
-1

votes
0

answer
5

Views

What type of machine learning or AI Model can I use for Factor Ranking

What type of machine learning or AI Model can I use for Factor Ranking? I have some factors and am trying to rank them based on how they are able to predict in my model please what kind of machine learning or AI or Deep Learning Model work for this?
tplshams
1

votes
2

answer
496

Views

ValueError: Invalid endpoint: s3-api.xxxx.objectstorage.service.networklayer.com

I'm trying to access a csv file in my Watson Data Platform catalog. I used the code generation functionality from my DSX notebook: Insert to code > Insert StreamingBody object. The generated code was: import os import types import pandas as pd import boto3 def __iter__(self): return 0 # @hidden_ce...
Chris Snow
1

votes
2

answer
498

Views

Embedding in Keras

Which algorithm is used for embedding in Keras built-in function? Word2vec? Glove? Other? https://keras.io/layers/embeddings/
oren_isp
1

votes
2

answer
42

Views

Error:-too many values to unpack (expected 2), while trying to iterate over two columns in a Data Frame

for L,M in laundry1['latitude'],laundry1['longitude']: print('latitude:-') print(L) print('longitude:-') print(M) i am trying to iterate over the two columns of a data-frame, assigning there value to L & M and printing there value but it shows error of 'too many values to unpack (expected 2) ' view...
Adarsh singh
1

votes
3

answer
39

Views

Calculate the average of the rows for each group

I need to calculate the mean of a certain column in DataFrame, so that means for each row is calculated excluding the previous values of the row for which it's calculated in certain group. Lets assume we have this dataframe, this is the expected output is there any way that like iterate each row b...
Hani Ihlayyle
1

votes
1

answer
34

Views

I want to create a crime a new column in my data frame that is the crime rate of each specific row

I have a crime data set, I already calculated the crimes committed in each location. Now I want to create a new column that is the crime rate for that specific row. I already calculated the crime rate now I want to match the specific crime rate to correct row matching the same latitude value Here I...
David Arriaga
0

votes
0

answer
8

Views

Filling cell data with mean for each unique name

I have been using R for the past couple days and I have question that I am a little stumped on. I have a dataframe with bidder names and bids where some of the bids are empty. I am having trouble implementing a dynamic way to take the average bid for each unique bidder and apply that to the empty ce...
NacDan
1

votes
1

answer
50

Views

Comparing columns of a dataset with python

I have a huge dataset (2653, 17). I have noticed two columns to be somewhat related but not exact as I have inferred from the value_counts method. What I mean is most of the corresponding entry of I is M, or of C is NaN. Is there any way to confirm this or calculate how many entries are related this...
deadcode
1

votes
0

answer
361

Views

Accessing the columns of pivot table in Python Pandas

I'm using a python pandas pivot. How can I get access the columns of pivot on new data frame? KM_pivot_first = pd.pivot_table(read_sql_KM, values=['IMPRESSIONS','ENGAGEMENTS'],index='PLACEMENT_ID',aggfunc=np.sum) KM_data_summary = KM_pivot_first[['PLACEMENT_ID', 'IMPRESSIONS', 'ENGAGEMENTS']] error:...
dharmendra mishra
1

votes
1

answer
30

Views

histChanging Class in R for Column Name

I have found many helpful pages on how to change a class in R but all have seemed to not work for my task. Below is the code I'm using with output: > mydata = read.table('Books_R_Data.csv', header=TRUE,stringsAsFactors=TRUE,sep=',') > hist(mydata) Error in hist.default(mydata) : 'x' must be numeric...
LivinLife
1

votes
1

answer
404

Views

How to plot a subset of forecast in R?

I have a simple R script to create a forecast based on a file. Data has been recorded since 2014 but I am having trouble trying to accomplish below two goals: Plot only a subset of the forecast information (starting on 11/2017 onwards). Include month and year in a specific format (i.e. Jun 17). Here...
nsoria
1

votes
2

answer
361

Views

How to make polynomial features using sparse matrix in Scikit-learn

I am using Scikit-learn for converting my train data to polynomials features and then fit it to a linear model. model = Pipeline([('poly', PolynomialFeatures(degree=3)), ('linear', LinearRegression(fit_intercept=False))]) model.fit(X, y) But it throws an error TypeError: A sparse matrix was passed,...
Niyamat Ullah
1

votes
1

answer
62

Views

How to handle missing values in Python3?

A = ds.iloc[:,0:4].values B = ds.iloc[:,-1].values imp = Imputer(missing_values='NaN', strategy='mean', axis=0) imp = impsqft.fit(A[:,3]) A[:,3] = imp.transform(A[:,3]) I want to replace 4th column with mean of that column for null values but it gives me below error: array=[ 1. 2. nan 4. 1....
chetan sharma
1

votes
0

answer
450

Views

I am working on Sentimental analysis on twitter data got this error: Error in get_oauth_sig() : OAuth has not been registered for this session

> oauth_endpoint(authorize = 'https://api.twitter.com/oauth', access= 'https://api.twitter.com/oauth/access_tocken' ) download.file(url='http://curl.haxx.se/ca/cacert.pem', destfile='cacert.pem') trying URL 'http://curl.haxx.se/ca/cacert.pem' Content type 'application/x-pem-file' le...
santhosh kumar
1

votes
0

answer
32

Views

Extracting tabular data from PDF file.The pdf file has text , image as well as tabular data.

The pdf file has text as well as tabular data. If not then is there any way by which I can understand whether the current page of pdf contains tables or not I am able to Extract data from the pdf page but can't confirm whether it is tabular data or verbose(paragraphs) text.
Lazarus
0

votes
0

answer
16

Views

how can I write a loop in python to get the difference between first and last date for one id

opptyId field oldValue newValue updateTime 0 Stage Qualify 2014-05-27T18:50:14 0 Forecast Best Case 2014-05-27T18:50:14 0 created 2014-05-27T18:50:14 0 Amount 795.53 2014-06-17T18:54:00 0 Stage Qualify Closed - Won 2014-07-09T20:11:05 0 Forecast...
bella
1

votes
2

answer
448

Views

Visualizing clusters using TSNE

I have a dataset which I need to cluster and display in a way wherein elements in the same cluster should appear closer together. The dataset is based out of a research study, and has around 16 rows(entries) and about 50 features. I do agree that its not an ideal dataset to begin with, but unfortuna...
Shreya Pandit
1

votes
1

answer
61

Views

Why am getting different answer while both are same?

When am trying to fetch latitude and longitude using geocode function present in ggmap library, am getting different result in both. But, when am checking the class of 'dd' variable in both the cases its list, but why am not getting same output in 2nd one as 1st output. Wondering why ? for(i in 1:3)...
Awesh
1

votes
1

answer
400

Views

Finding the best LCA model in poLCA R package

I am applying LCA analysis with PoLCA R package, but the analysis not resulted since three days (it did not find the best model yet) and occasionally it gives the following error: 'ALERT: iterations finished, MAXIMUM LIKELIHOOD NOT FOUND'. So i cancelled the process at 35 latent class. I am analyzin...
fritzz
1

votes
0

answer
119

Views

Add extra layers to pre-trained model at input in tensorflow

I have facet model (ckpt and meta files) which takes input of size (batch_size,160,160,3) and gives output of size (batch_size,128). My input is a k-dimensional vector and I have a pre-processing function(consists of some convolution and pooling layers) which takes my input and gives (batch_size,16...
N_Divyasri
1

votes
0

answer
159

Views

What is a bottleneck in pandas.read_csv: CPU vs Storage

What is actually a bottleneck of reading csv file with pandas.read_csv()? Is it CPU or Storage reading speed limitations? How much speed increase can be obtained if use SSD instead of HDD? To be more specific, let's consider the following configuration (the current cheapest server from Hetzner): Int...
AlexanderLedovsky
1

votes
0

answer
179

Views

Decision Tree Categorical and Continuous Variable

I'm new to data science and currently trying to learn and understand decision tree algorithm. I have a question about how the algorithm works when we have some continuous variables in a classification problem and categorical variables in regression problems. Usually algo works on the basis of gini i...
Aditya Narayan Gupta
1

votes
1

answer
87

Views

code for value.counts() in columns pretaining to a specific value in one column

I'm new to data science and trying to do some data wrangling with python 2.7 in iPython notebook. A tutorial I was following for my first project asked me to replace all NaN intputs with 0 or 1. But I'd like to consider another approach where I can 1st look at the count for the rows with non-numeric...
uharsha33
1

votes
2

answer
174

Views

Code for imputing values in a specific column using the particular rows Index number or unique ID?

I'd like to input certain value in a particular column. my data looks something like: LoanID Married ApplicantIncome CoapplicantIncome Credit_History LP00135 NaN 33460 16000 1.0 LP00234 Yes 55000 70000 1.0 LP00432 No...
uharsha33
1

votes
1

answer
27

Views

How to copy data from a column to another based on a condition in R?

I have below data frame as shown below. Funct.Area Environment ServiceType Ticket.Nature SLA.Result..4P. IRIS.Priority Func_Environment 2 FUN DCF FUN SR OK Medium FUN-DCF 3 AME - FIN DCF FUN SR Defect...
nsoria
1

votes
0

answer
21

Views

DSX desktop install NOT working (on x86 laptop)

I have tried multiple times and DSX desktop install does not work I am trying to install on a win7 laptop I have selected Docker, Jupyter with spark (around 6.6GB) but it always ends up installation Docker and then hangs (as in the progress bar does not proceed further and is stuck at 25% for a LONG...
Deepak C Shetty
1

votes
1

answer
427

Views

sklearn partial_fit() not showing accurate results as fit()

I am training 3 lists of data L1, L2, L3. First i train all one them with SGDClassifier fit() and later instance by instance with partial_fit(). I I test the data with L4, L5. [The data in lists is image data and L4, L5 images are same as L2]. The predictions with fit() is correct and it is what i a...
user1
1

votes
0

answer
83

Views

getting similar predictions on data while predicting using tensorflow

I am a beginner in machine learning and I am working on a simple project to predict the electricity consumption of a household using data available here. The data consists of the global minute averaged active power of every minute for 4 years. The head of the data looked something like this. Date...
Prateek Surana
1

votes
1

answer
152

Views

Referring to parent attribute in pandas

This is my json { 'fInstructions': [ { 'id': 155, 'type':'finstruction', 'ref': '/spm/finstruction/155', 'iLineItem':[ { 'id': 156, 'type':'ilineitem', 'ref': '/spm/ilineitem/156', 'creationDate': '2018-03-09', 'dueDate':'2018-02-01', 'effectiveDate':'2018-03-09', 'frequency':'01', 'coveredPeriodFro...
More Than Five
1

votes
2

answer
64

Views

How to group by in Panda with multiple columns

Consider a Panda DataFrame as below Fruit Rate Quantity ------------------------- Apple 2 4 Apple 3 3 Apple 5 9 Mango 4 5 Mango 6 12 Banana 2 2 banana 1 2 Here the total quantity of fruits. Mango: 5+12=17 Apple: 4+3+9= 16 Banana: 2+2=4 Wha...
RAVI SHANKAR
1

votes
0

answer
47

Views

Run all regressors against the data in scikit

I am working on creating a framework where I can call all regressors available in scikit-learn. Relating to this I have two questions- How to get list of all regressors programmatically? Objective is to run regressors against the dataset and acquire the metrics such as RMSE, R-Sq, Adjusted R-Sq, etc...
rishi91991
1

votes
0

answer
29

Views

How to understand the equation used for proving the transitivity property for correlation?

I am trying to understand the transitivity proof for correlation. That is if X is highly correlated to Y and Y is highly correlated to Z, then is it necessary that X and Z are also highly correlated. I found the equation which is used for proving this statement and that is: Corr(X,Y) = Corr(Y,Z)*Cor...
neha

View additional questions