Questions tagged [data-manipulation]

1

votes
1

answer
36

Views

Forecasting one step ahead

I have one data.frame with three columns Year, Nominal_Revenue and COEFFICIENT. So I want to forecast with this data like example below library(dplyr) TEST
silent_hunter
1

votes
2

answer
34

Views

various transformations with lapply() - R

I have this df: df df Created Updated Resolved 14 2019-02-18T08:59:57.067-0300 2019-03-12T16:20:48.210-0300 2019-03-12T16:20:48.203-0300 28 2019-01-28T11:55:34.723-0300 2019-03-08T15:37:32.071-0300 2019-03-08T15:37:32.065-0300 29 2019-01-28T11:52:39.744-0300...
Chris
0

votes
1

answer
19

Views

How can I visualise time series with coordinates?

I am trying to get this data plotted. I am not sure which package to use. Data looks like this in excel Time [4710.19 4710.21 4710.23 4710.24 4710.26 4710.28 4710.29] X [176.5 176.5 176.5 177 179 180.5 182.5 185.5] Y [222 227.5 237 247.5 263 278 296 314] I would like to ge...
Sarune Savickaite
744

votes
21

answer
462.8k

Views

How can I access and process nested objects and arrays?

I have a nested data structure containing objects and arrays. How can I extract the information, i.e. access a specific or multiple values (or keys)? For example: var data = { code: 42, items: [{ id: 1, name: 'foo' }, { id: 2, name: 'bar' }] }; How could I access the name of the second item in items...
Felix Kling
1

votes
1

answer
35

Views

producing a full adjacency matrix from partial information

I have a matrix that contains all the info necessary to construct 5x5 adjacency matrices. Each row represents one matrix: [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [1,] 1 1 1 1 1 1 1 0 1 0 [2,] 0 0 0 1 1 1 1 0 1 0 ... I want to c...
JMQ
1

votes
2

answer
32

Views
1

votes
2

answer
47

Views

pandas - transform several rows in columns following the status of the rows

How is the best way to transform the following dataframe also adding the sum of the 'status'? Before: plan type hour status total A cont 0 ok 10 A cont 0 notok 3 A cont 0 other 1 A vend 1 ok 7 A vend 1 notok 2 A vend 1 other 0 B...
Thabra
1

votes
3

answer
48

Views

Creating Column Based on Date Order, Rotating Dataset R

I have a dataset that looks like the following: ID= c('A','A','A','A','B','B','C','C','C') Date= as.Date(c('2017-09-24', '2017-09-26', '2017-09-23', '2017-09-30','2017-09-12', '2017-09-15', '2017-09-01', '2017-09-30', '2017-09-25')) Data= c(10,5,15,20,8,9,5,6,2) df= data.frame(ID, Date, Data) d ID...
Martin Jones
1

votes
1

answer
37

Views

How to dynamically generate new data.frames for specified conditions

I have a large data.frame that I would like to subset by a variable ID. The data.frame is 100,000 rows long. There are 100 ID values. Is there any straight forward way of writing a function that will create unique data.frame subsets for all ID values? I know how to do it one by one. For example:...
datanalyst
1

votes
3

answer
460

Views

How to manipulate data in a text file

I am trying to make a program that takes a large data file of integers and creates a new csv in a different format, where it takes the x,y,z of 30 lines and merges them into one line. The large dataset is formatted in (timestamp, x,y,z) Ex: 0.000, 5, 6, 8, 1.000, -6, 7, 9, 2.000, -15, 25, 23, or: ti...
aval
1

votes
1

answer
58

Views

Grouping columns with the same name in R

I'm trying to format my data in a 'readable' way where I have multiple columns with the same name. I tried using the melt() function, but I failed to solve the problem, which seems to be related to the fact that there are different values on the variables. A small example of the data: obs m ti...
Holyzin
1

votes
0

answer
76

Views

Converting Json response into a unique dataframe

Using tm1 rest api, I could get the data from a post request. Date: 2018-04-11 14:43 Status: 201 Content-Type: application/json; odata.metadata=minimal; odata.streaming=true; charset=utf-8 Size: 2.86 MB Then I converted this response into a json format using: fromJSON(content(data, type = 'text')) T...
Cesar
1

votes
0

answer
21

Views

How to extract only the first occurrence of the pattern matching from the string, in R?

How to extract only the first occurrence of the pattern matching from the string, in R,in case the given pattern has multiple matches in the string. I have been using Strapply to extract the required part of the string but the problem here is this function returns all the multiple matches instead of...
1

votes
1

answer
135

Views

Data Manipulation either in MATLAB or Python

I want to manipulate data of text file either in MATLAB or python. My data file contains 3000 rows but I have posted here for just as an example of 4 rows of data. The data file has R, L, G, C data for different frequencies (here 3 frequency in 3 rows). Now I want to manipulate the data to another f...
aguntuk
1

votes
1

answer
58

Views

How to count length of NA values by group/factor in R?

I am tasked with manipulating data obtained from 1258 unique surveys. In terms of dimensions. 28 million individual observations (including NA) -8 columns (variables). object name : dat The column/variable I am particularly interested in is education (edu). I want to get the length of NA and Non-N...
Shivy b
1

votes
0

answer
49

Views

Add column with sample size for each model to results dataframe

I am running regression models in the mtcars dataset and exporting results to a dataframe library(tidyverse) library(broom) outcomes % rowwise() %>% summarise(frm = paste0(Var1, '~factor(', Var2, ')+', Var3)) %>% group_by(model_id = row_number(), frm, samplesize=nrow(mtcars)) %>% do(tidy(lm(.$frm, d...
aelhak
1

votes
1

answer
56

Views

R: Convert values into pipe-delimited format

I'm trying to create a RedCap data dictionary from an SPSS output. SPSS lists the allowed values, or factors, for each variable like this: SEX 0 Male 1 Female LANGUAGE 1 English 2 Spanish 3 Other 6 Unknown How can I convert the above to this format for RedCap: Variable Values SEX...
Evan
1

votes
1

answer
53

Views
1

votes
1

answer
47

Views

How do I join these tables in a way that still allows ggplot2 plotting with colors?

I have a data manipulation/table joining question that I have been hitting my head against for a couple of days. I am trying to create plots using ggplot2 that color the data by factors. The simple way to do this is by using: ggplot(data, aes(X,Y)) + geom_point(aes(color = Factor_A)) This means I n...
M. L.
1

votes
0

answer
35

Views

MEAN: Doing database queries in Angular Frontend?

I was wondering if it is somehow possible to do MongoDB No-SQL Queries (or even SQL-queries for sql database) in the Angular frontend, like formulating the query and then sending the query string to the database and getting a subset of values back (e.g. in JSON-format). Because the problem that I cu...
MMM
1

votes
0

answer
37

Views

Data Manipulation of the same data

I have a list of data in a json file. I want display all the months but I want the year that matches to just show as one big year. Look at the image. The image shows the same data but it's been split into year and month. Rather than it repeating the word 2017 over and over again, I want it just say...
L.C
1

votes
1

answer
75

Views

Formatting the data which increases monotonically in Python

I have formatted the data according to the need. Now my final data or dataframe is not monotonically increasing whereas the input data is increasing monotonically according to the 1st column field (freq). Here is the link for Data_input_truncated.txt. My python code is in the below: import pandas as...
aguntuk
1

votes
0

answer
127

Views
1

votes
0

answer
32

Views

R - MCA (multiple correspondence analysis) with badly formated data

I'm new to R programming language but i know the theory behind the MCA. My problem is, i am asked to read a specific file that contains qualitative data reprented by a numeric value, like an enumeration would be in C# for example. Also, i think i need to get the data as a contingency table in order...
Axel Samyn
1

votes
1

answer
32

Views

Combine two or more columns into a new column

My program asks the user for a path to an Excel file, reads the file into a data frame and writes it to a new Excel file using openxlsx. Before I write the file, I want to combine two columns into a new one, and delete the two original columns. NULL values and blank cells should be ignored The file...
1

votes
2

answer
29

Views

proportion data frame for each factor level based on another column

I would like to summarize a data frame by month where each column is the proportion of each factor level based on the Records column in the data frame below. I have been attempting to use dplyr but haven't quite figured it out. library(dplyr) set.seed(100) df=data.frame(Month=rep(c('1/1/2017','2/1/2...
alleyway
1

votes
1

answer
35

Views

Select Years with the Greatest Number of Repeatedly Sampled Sites in R

I have many sites that were sampled over many 'Season-Year' combinations (time column). I want to select Season-Year combinations that have 10 or more of the same sites. Data is at the bottom of this post. Any thoughts for making this work? Code I have tried that didn't work: subset1 % group_by(Szn...
jabby corbs
1

votes
1

answer
66

Views

Calculate how many reports are running at a certain time

I am trying to calculate how many reports are running at a certain time. The data is like: ReportID StartTime Duration 1 2018-11-02 13:00:00 240 seconds 2 2018-11-02 14:00:00 300 seconds 3 2018-11-02 14:01:15 300 seconds 4 2018-11-02 14:00:00 5000 seconds The ideal output will be: Ti...
ProgrammerOliv
1

votes
2

answer
100

Views

Assign date to all lines below until the next date

df index col1 ------------------------ 0 2017-01-01 1 a 2 b 3 c 4 2017-01-02 5 d 6 e 7 f 8 2017-01-03 9 g 10 h 11 i expected df index col1 col2 ---------------------------...
Chipmunkafy
1

votes
1

answer
25

Views

How can I add a column with mutate () to each of the multiple data sets I read?

I am a beginner in R and currently learn how to do the data wrangling job in multiple data sets. Right now I read 55 csv.file data sets with 300 rows using the following code: Rawdata
Meijuan Zeng
1

votes
1

answer
80

Views

How to convert my data to counting process format with start stop times for interval truncation in R?

I would like to model a recurrent event with subjects that move in and out of risk over the course of the observation period of the study. I have data on the out-of-risk periods (start and end dates) where the subject cannot experience the event. I would appreciate any help on how to convert my dat...
shitshimugi
1

votes
1

answer
94

Views

Alternative to summary() for a dataframe with +100 labeled columns - R [closed]

I have a df with +100 labeled columns and 500 rows approx. I'm trying to get an overview idea of the data, but it seems to be impossible given the huge number of columns and doing summary() results in an enormous and confusing summary. I have been looking to some Github/Kaggle projects and they var...
Chris
1

votes
1

answer
33

Views

Forward filling based on the value of other column

Update: I have a large pandas dataframe with admitTime, dischargeTime, pat_name, pat_rec and it has around 5 million records. I am trying to forward fill the columns dischargeTime, pat_name, based on the dischargeTime datetime value for rest of the columns and break after that. df: admitTime...
ALB2345
1

votes
0

answer
45

Views

Converting dates to date ranges based on specified criteria using R

The objective of this code is to create a start date and end date for various periods given a speficified criteria. Current data format: df = seq(as.Date('2018/12/1'), as.Date('2018/12/31'), 'day') dg = c(rep(0,each=7),rep(1, each=20), rep(0,each=4)) data = data.frame(dates=df,status=dg) Desired ou...
Richard.R
1

votes
1

answer
31

Views

Create subgroups within a factor based on the sequencing of another column

I am attempting to create subgroups within a factor based on a particular column. Here is a example dataset named 'test' similar to the one I am working with. structure(list(old.id = c('A', 'A', 'A', 'A', 'A', 'A', 'A', 'B', 'B', 'B', 'B', 'B', 'B', 'B', 'B', 'B', 'B', 'C', 'C', 'C' ), id.number...
alleyway
1

votes
0

answer
22

Views

Error in synchrony() function from Codyn package. How to remove replicate species?

I am trying to solve the error below. The code and small sample of the dataset are posted below. Error in check_multispp(df, species.var, replicate.var) : One or more replicates consists of only a single species; please remove these replicates prior to calculations library(codyn) synchrony
jabby corbs
1

votes
0

answer
47

Views

How to tidy data that is one long string of numbers

I'm still very new to data science and R, but my work requires me to work through some very large, very messy data sets with little to no structure to them. I'm currently working with a US Freight text file (Approximately 48,000 characters, 13,341 rows, and 1 column) containing information such as F...
TheIllusiveNick
1

votes
1

answer
21

Views

jq how to choose from unique keys based on values

I've been learning more about JSON lately and stumbled upon the 'jq' command-line JSON processor. I am trying to combine multiple json files regarding clones from our Github repository. Some of these dates overlap, and since they were accessed at different points of the day, have slightly different...
oneillrunner

View additional questions