Questions tagged [data-analysis]

1

votes
1

answer
40

Views

Pandas, groupby and counting data in others columns

I have data with four columns, that includes: Id, CreationDate, Score and ViewCount. The CreationDate has a next format, for example: 2011-11-30 19:41:14.960. I need to groupby the years of CreationDate, count them, summing Score and ViewCount also, and to add to additional columns. I want to use wi...
morris
1

votes
1

answer
302

Views

R neural Networks

I am playing around with Adult Dataset https://archive.ics.uci.edu/ml/datasets/adult and R. I am trying to use the neuralnet package to train a Neural Network with Back propagation. I have cleaned the data. Now I am trying to run this part : n
Ioannis K
1

votes
0

answer
41

Views

How to re-scaling signal intensity in image in relation to their spatial position?

Hi I have a 1D radial profile of a sample across a pipe (fig_1). One data point (along the orange straight line) is acquired at each 'band' from the image. The resolution (x,y,z) of each data point is 100um x 100um x 1000um. (fig_1) However in order to produce a quantitative image, each data point i...
J. Doe
1

votes
1

answer
31

Views

split pandas single column(List of dict) and append as new keys of dict as new columns

Input : df = pd.DataFrame({'a':[1,2], 'b':[[{'x1':1,'x2':3},{'x1':4,'x2':1}], [{'x1':5},{'x1':3,'x2':6}]], 'c':[5,6]}) If I apply the operation print(df['b'].apply(pd.Series)) Output is: 0 1 0 {'x1': 1, 'x2': 3} {'x1': 4, 'x2': 1} 1 {'x1': 5} {'x1': 3, 'x2': 6} Expe...
Rakesh Bhagam
-1

votes
0

answer
4

Views

How to group data on the basis of year in R?

I am working with the London Crime data set. It has a Borough, Major Category, Minor Category, Dates, and the Count. Below is my data structure. 'data.frame': 139392 obs. of 5 variables: $ Borough : Factor w/ 32 levels 'Barking and Dagenham',..: 1 1 1 1 1 1 1 1 1 1 ... $ Major.Category: Fac...
Ahsan Hasan
1

votes
1

answer
126

Views

Issue using Tweepy to pull data from Twitter Stream: Data Analysis

from tweepy import OAuthHandler from tweepy import StreamListener class listener(StreamListener): def on_data(self, data): print(data) return(True) def on_error(self, status): print (status) auth = OAuthHandler(ckey, csecret) auth.set_access_token(atoken, asecret) twitterStream = Stream(auth, listen...
Yepram Yeransian
1

votes
0

answer
18

Views

Making decision depending on complicated factors without starting data

Currently I'm building an automatically making decision system, which depends on many different factors. The problems I meet that is don't have any data to analyze and train. My system has some factors, such as is on Holiday or not, is on Maintenance Status or not, Currently Connected Users (CCU),...
Le Duong Tuan Anh
1

votes
0

answer
43

Views

CSV spreadsheet analysis

I'm trying to complete the assignment (Quiz 21) described below for the following course: https://classroom.udacity.com/courses/ud170/lessons/5430778793/concepts/53961386480923 The first code fragment is the one I wrote, which outputs the wrong lengths for the lists. The second code fragment is the...
Stefan Lavelle
1

votes
0

answer
84

Views

Adding hover tool to datashader interactive image

I want to perform datashading on a plot created in bokeh. I encountered with this python notebook. But I want to know can I add hovertool to resultant image after datashading. If yes then how can I add tools like hovertool,taptool to the Interactive Image of created by datashader?
Avinash Magar
1

votes
1

answer
47

Views

Feature selection by machine learning

The aim of my current study is to explore machine learning methods to select outcomes highly associated with treatment, which will be considered an approach for dealing with multiple testing. My question is: what kinds of machine learning feature selection methods that I can use to find the strong a...
Wang Wang
1

votes
0

answer
43

Views

How to make Jupyter Notebooks Sharable to your colleagues

At my organisation we currently use a sql query tool on top of Redshift. This provider us with ability to save our sql queries and create a place where any one can search for a query name and look at it and its results. We can also give query links to each other. Problem is since it is sql and comp...
ila
1

votes
1

answer
112

Views

Pandas: fix typos in keys within a dataframe

So, I have a large data frame with customer names. I used the phone number and email combined to create a unique ID key for each customer. But, sometimes there will be a typo in the email so it will create two keys for the same customer. Like so: Key | Order # 555261andymiller...
1

votes
0

answer
64

Views

How to refresh shape data file in spotfire

I am a beginner in working with geospatial data. What I have done so far: I created a map chart visualization in spotfire. I created a shape file using QGIS. I added the shapefile in the spotfire using Add Data Table -> File I added a feature layer into map chart and used/applied the shapefile data....
Jay
1

votes
0

answer
55

Views

Fixed it. What is the option_description used for in the build_dict function in the dataMeta package in R?

I have a dataset with some 100,000 tweets and their sentiment scores attached. The original dataset just has two columns one for the tweets and one for their sentiment scores. I am trying to build a data dictionary for it using the dataMeta package. Here is the code that I have writtern so far: #Dat...
1

votes
0

answer
35

Views

Getting an error while writing dataframe into csv

I am trying to write dataframe into csv file using !cat but I'm getting some errors. Code: data.to_csv(r'C:\Users\Downloads\pydata\pydata-book-2nd-edition\examples\out.csv') !cat C:\Users\Pruthvish\Downloads\pydata\pydata-book-2nd-edition\examples\out.csv ,something,a,b,c,d,message 0,one,1,2,3.0,4 1...
Pruthvish
1

votes
1

answer
348

Views

Trying to find the most efficient way to convert SQL Query to Pandas DataFrame that has large number of records

I am trying to query MS-SQL database view and convert the result to Pandas DataFrame. Below are the two different ways I tried and in both cases it is taking ~439.98 seconds (~7 minutes) in order to query and convert to DataFrame that has 415076 records (This time is for converting it to the DataFra...
Y A Prasad
1

votes
0

answer
88

Views

'x' must be atomic for 'sort.list', using dbFD(). FD package

I am trying to run dbFD(traits, as.matrix(abun)) but i receive this error: Error in sort.list(y) : 'x' must be atomic for 'sort.list' Have you called 'sort' on a list? my data looks similar, but larger to this: t1 t2 t3 ... sp1 sp2 sp3 sp4.... sp1 0.2 10...
1

votes
1

answer
118

Views

Use VBA to suppress Analysis Toolpak Histogram function messege

Question overview: I am using Excel VBA histogram function from 'Analysis Toolpak' to generate approximately 25 histograms automatically. When Histogram graph is generated, it is placed on top of cells that have values in it, effectively hiding them (Which is OK with me). Therefore a following messa...
Adam Pak
1

votes
0

answer
46

Views

Window function for unique rows in SQL Server

I have a table like below The main idea is to get the amount of each channel for each orderID. If the channel is repeating for Id, it should take the amount only once and rest would be null. The result should look like below I want to do the same logic for country and source as well. If I do the piv...
user123
1

votes
0

answer
65

Views

Music21 and D3.js for music feature extraction and visualization?

I am looking for suggestions on what tools could be used for the following scenarios about music feature extraction and visualization (on my Mac): identify and group notes in a score (from different voices/instruments) that sound concurrently (even if they are attacked in different time offsets, tho...
Ilias Kyriazis
1

votes
1

answer
48

Views

How using python to groupby and scaling values?

I would like to rescaled column 'w'. I have averaged 'w'. aveData_set = Data_Set.groupby(['buildingid', pd.Grouper(key='reporttime',freq='15T')])['w'].mean().reset_index() aveData_set result: Then I would like each 24H rescaling column 'w'. ScaleData_set = aveData_set.groupby(['buildingid', pd.Group...
Linminxiang
1

votes
0

answer
53

Views

movielense popularity recommender code with R

I'm now studying R, and now doing project about movie recommend algorithm. I used movielense 100k data with recommenderlab library, and use these tutorials. https://mitxpro.mit.edu/asset-v1%[email protected][email protected]_CS1_Movies.pdf https://cran.r-project.org/web/packages...
MS.K
1

votes
2

answer
55

Views

Turning textual answers into dichotomous variables

I've done research using the google forms and now I need to prepare that data for the further analysis. The point is I don't really know how to go about that. I have variables (questionnaire questions), each of this question have four answers. In my data those answers are just strings, so let's say:...
Piotr Homa
1

votes
0

answer
47

Views

Combining different time series in R

Let's assume that I am the owner of a burger shop. I log every time that a costumer buys something from my shop, so I have the registries of all burgers and milk-shakes sold on the previous month. For me, It is easier and cheaper to make 20 milk-shakes at once than making 1 at time. So here is my go...
Gabriel Bessa
1

votes
0

answer
41

Views

How to plot in python using Legend as a checkbox?

I have been trying to plot a graph which has a dataframe having 3 columns . One is the 'Hour', Second is the 'amount' in Rupees and the third consist of 'machine codes'. I need to analyze the amount of transaction a machine does on an hourly basis. There are total 67 unique machine codes. Kindy chec...
Zain Afzal
0

votes
1

answer
56

Views

join two tables without losing relevant values

I have two tables representing a database for customer products and its competitors' products: tmp_match - from_product_id and to_product_id representing matches between customer product and competitor product respectively. tmp_price_history - shows the price of each product per date. I am trying to...
Max Segal
1

votes
0

answer
174

Views

Leaflet / Mapbox marine traffic density Map

I am currently making a marine traffic tool using Leaftlet and Mapbox. For that, I have a huge amount of AIS Data that I converted in GeoJSON file. The GeoJSON file is a list of 'LineString' defining each ship's trajectories like this : { 'features': [ { 'geometry': { 'coordinates': [ [-4.013451666...
Miionu
1

votes
1

answer
65

Views

How do I group data into naturally occurring “Bins”

What approach should I use to sort the following into naturally occurring 'bins'. double[] x = { 18, 18, 18, 18, 19, 20, 20, 20, 21, 22, 22, 23, 24, 26, 27, 32, 33, 49, 52, 56,900,1200, 1200, 1500, 2000, 2000,2200,2200 }; I've looked at various code for 'outliers', 'quintiles' and not sure about...
TLDR
1

votes
0

answer
54

Views

Creating Video Watch Time Retention Plot using Plotly and Python

I have a table like this: videoId userId viewedMintues totalMinutes 1007975 275308 10 26 1009304 304392 6 6 1009343 463588 3 23 100941 462406 1 26 100941 463199 12 26 100941...
Debadri Dutta
1

votes
2

answer
65

Views

python, matrix column extraction and sum

Say I have a matrix A = [a_1,a_2,...,a_n]. Each column a_i belongs to a class. All classes are from 1 to K. All n column's labels are stored in one n-dim vector b. Now for each class i, I need to sum all vectors in class i together and put the result vector as the i-th column of a new matrix. So the...
Michael Sun
1

votes
1

answer
138

Views

How to Change Node's Color Based on Node's Level in CART Plot (rpart.plot) [R]

I want to change node's color based on node's level in CART Plot / rpart.plot on R. The required plot is like this. enter image description here I have done until this step which I haven't yet : 1. Move the values of the target variable (Setosa, Versicolor, and Virginica) to the left-side of char...
Aswin Candra
1

votes
1

answer
141

Views

Detect significant trend changes

I would like to detect the dates where a trend curve significantly changes using R. The red dots are the points in time where I see a significant changed, these should be detected. Small fluctuations should be ignored. I have tried the breakpoints functions which finds the dates indicated by the dot...
Markus Palme
1

votes
1

answer
31

Views

Iterating through pandas column

I have a dataframe with following columns: User_id PQ TGGS PAG Games_played 118399 8.536585 7.079646 10.204082 7.711443 212651 75.000000 73.684211 75.000000 46.534653 210314 60.000000 9.523810 33.333333 14.414414 columns are actually game codes. I want...
asnique
1

votes
1

answer
27

Views

Concatenating CSV Files using Pandas is causing Duplication

I was writing a python method on Google Colab in order to go into a folder of 84 .csv's, concatenate them and output a new .csv Here is the method def concatenate(indirectory = '/content/gdrive/My Drive/Folder/Folder', outfile = '/content/gdrive/My Drive/--.csv'): os.chdir(indirectory) fileList = gl...
MLBeginner
1

votes
0

answer
41

Views

Strategy to conduct regression modelling within R

I am not sure if this is the right place to ask, but let me try anyway. I would like to conduct a structured analysis of a data set using linear regressions. I thought it would be a good idea to start off with creating a data frame / table where each row comprises one of the models that I would lik...
Max M
1

votes
1

answer
29

Views

Analyse tables with unknown structure and fault tolerance

I want to analyse tables with similar data, that are structured differently and where the headers also may be slightly diverse. For collecting all the data from the tables summing them up I face several problems. Step 1: I look for the header keywords. Searching for if 'cars==cars' is not possible,...
thohemp
1

votes
0

answer
21

Views

Is it Necessary to De-Mean my Data before Applying PCA, or does pca(X) do that Automatically?

I am aware that a first step in performing PCA for dimensionality reduction is de-meaning the data. I have performed PCA after de-meaning manually with X=X-mean(X) and compared with plainly applying [COEFF,score,latent,~,explained]=pca(X) on my data. By inspecting the eigenvalues and the percentage...
John Doe
1

votes
1

answer
55

Views

Expand data set row in R [duplicate]

This question already has an answer here: Repeat rows of a data.frame 10 answers I've got a table like this: | Activation Month | Disabled Month | Month.Fee | Custr | 21/4/2018 | N/A | 10 | A | 21/3/2018 | 21/6/2018 | 20 | B I want to transfor...
Ilproff_77
1

votes
1

answer
55

Views

dataExplorer::create_report failed to compile

I am trying to produce a pdf report of a dataframe named 'mydata' using the DataExplorer package. Nevertheless I get the following Error: Failed to compile D:/Documents/R/R projects/ENDO/report.tex. I have tried to see if any error occurs with tinytex using: options(tinytex.verbose = TRUE) devtools:...
Woody

View additional questions