Questions tagged [dplyr]

1

votes
1

answer
231

Views

Using predict function for new data along with tidyverse

I want to use predict function for new data along with tidyverse as in the following example. However, I could not figured out how to use with new data for wt = 4.0 and 4.2. Any hints, please. library(tidyverse) mtcars %>% dplyr::mutate(cyl1 = factor(cyl)) %>% tidyr::nest(-cyl) %>% dplyr::mutate...
MYaseen208
1

votes
2

answer
67

Views

Why isn't my barplot rearranging properly when faceting with ggplot?

So I have made this barplot with this code, bars organised in descending order, great! na.omit(insect_tally_native_ranges)%>% group_by(native_ranges)%>% dplyr::summarise(freq=sum(n))%>% ggplot(aes(x=reorder(native_ranges,freq),y=freq))+ geom_col(color='#CD4F39',fill='#CD4F39',alpha=0.8)+ coord_flip(...
delcast
1

votes
1

answer
36

Views

Forecasting one step ahead

I have one data.frame with three columns Year, Nominal_Revenue and COEFFICIENT. So I want to forecast with this data like example below library(dplyr) TEST
silent_hunter
1

votes
1

answer
78

Views

R group_by %>% full_join losing NA records

Consider these two data frames: t1% filter(Id==2) %>% full_join(t1,by=c('Time','Cat')) t2 %>% group_by(Id) %>% filter(Id==1) %>% full_join(t1,by=c('Time','Cat')) This will give me 5, where the missing entry (NA values) of Id==2 and Time==2 is gone: t2 %>% group_by(Id) %>% full_join(t1,by=c('Time',...
user3173412
1

votes
2

answer
41

Views

standard eval with `dplyr::count()` [duplicate]

This question already has an answer here: dplyr: How to use group_by inside a function? 3 answers How can I pass a character vector to dplyr::count(). library(magrittr) variables % dplyr::count_(variables) This works well, but dplyr v0.8 throws the warning: count_() is deprecated. Please use coun...
wibeasley
1

votes
1

answer
55

Views

Function to pass parameter to perform group_by in R [duplicate]

This question already has an answer here: Pass arguments to dplyr functions 6 answers I am trying to write a function and pass in 2 parameters. I am getting an error and the function is not able recognize the 2nd parameter. library(dplyr) Test2
JK1185
1

votes
5

answer
61

Views

Subtract rows varying one column but keeping others fixed

I have an experiment where I need to subtract values of two different treatments from the Control (baseline), but these subtractions must correspond to other columns, named block and year sampled. Dummy data frame: df
Lucas
1

votes
1

answer
90

Views

Merge 3 columns based on unique values?

I am trying to do a merge on 3 columns to a single one. The column values are separated by ';' and the new column need to unzip all the 3 column values and put the unique values. I know how to perform the merge column. But I am struggling to do unzipping the row value in 3 columns and finding unique...
johnsmith0
1

votes
1

answer
15

Views

Guidance on joining 2 dataframes, such that each row of df2 becomes an entire column of df1, iterated over all rows of df2

I need to 'multiply' two df's together to create all possible solutions, to use in a Tableau scenario. The scenario is as follows: I have a df1 of cars and their associated MPGs, and a df2 of zipcodes, and their associated distance from a fixed point (calculating carbon footprint). Once I get the...
Emery
1

votes
1

answer
25

Views

How can I use gather function to manipulate my data frame? [duplicate]

This question already has an answer here: Collapse / concatenate / aggregate a column to a single comma separated string within each group 3 answers I Have a data frame as follows: df
yas.f
1

votes
2

answer
22

Views

Grouping then selecting the bottom row from one column

I have a dataset where I need to group by one column and select the last row from that group in another column and taking the mean of the third column. A sample is like this: df % group_by(id) %>% summarise(mean(v)) The result shows as follows: id `mean(v)` 1 a 2 2 b 2.67 3 c...
principe
1

votes
1

answer
41

Views

Check whether value exists in specific group of rows in a dataframe

I have this dataframe (df): structure(list(from = c('(192) 242-2345', NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, '(832) 345-3168', NA, NA), to = c('(900) 301-3451', NA, NA, NA, NA, NA, NA, NA, NA...
JasonBaik
1

votes
2

answer
40

Views

Spread variable across multiple columns in dplyr

Let's say I have the following dataset: df
Parseltongue
1

votes
1

answer
31

Views

count the frequency of all pairwise combinations by group

I want to count the frequency of all pairwise combinations of item by group. have % # https://stackoverflow.com/a/38335011/841405 full_join(have, by='group') %>% group_by(item.x, item.y) %>% summarise(length(unique(group))) %>% filter(item.x!=item.y) %>% mutate(item = paste(item.x, item.y, sep='...
Eric Green
1

votes
2

answer
151

Views

Get level names using glue and dplyr in a loop

I am trying to get level names from a table using dplyr and glue in a loop (I use a loop because I get a large number of variable to get grouped tables and individual tables), I show an example below: library(dplyr) library(glue) var=c( 'vs', 'am') for(i in var) { bd=mtcars%>% group_by(carb) %>% cou...
Rodrigo
1

votes
2

answer
24

Views

Pmax of columns ending with a given string

I would like to conditionally mutate a new column representing the pmax() of columns ending with '_n' for a given row. I know I can do this by explicitly specifying the column names, but I would prefer to have this be the result of a call to ends_with() or similar. I have tried mutate_at() and plai...
Raoul Duke
1

votes
1

answer
17

Views

How to Create Values based on Start-Stop Info in Separate Column

I have a very messy dataset created by a research device. This data shows a physiological measure ('Physio') for every few milliseconds ('Time'). The output lists several user messages, such as when a trial starts ('START_TRIAL n'), when a trial ends ('STOP_TRIAL'), and other random things that ma...
alexd
1

votes
2

answer
14

Views

Trying to combine dates and times

I am trying to combine dates and times. These are from a file when imported, looks like this: library(tidyverse) library(lubridate) bookings
wl1234
1

votes
1

answer
26

Views

Copy values from one table to another, only where second table has specific values

I thought this would be straightforward, but it's been a while since I've looked at R. I have two tables, and I want to make a third table with values from the first based on values from the second. (I want the numbers from table 1, anytime the corresponding row/column from table 2 has a '1') I was...
Steve
0

votes
0

answer
13

Views

R - Summarize course enrollment over relative term sequence

The Applied Problem I want to abstract out code that summarizes course taking patterns and success rates of a cohort of students for n courses and n terms. Example With the following cohort of students, how many go to course 'B' after taking Course 'A', and how many of those students succeeded: dat...
MillionC
0

votes
0

answer
6

Views

R: Creating new dataframe based on multiple conditions on existing dataframe

I need to create a new dataframe using multiple conditions on an existing dataframe. I tried using dplyr function, summarise in particular for multiple conditions but failed as the dataset size decreases once the conditions as applied. For explanation, below is a simple sample of what I am trying to...
0

votes
0

answer
7

Views

`dplyr::if_else()` compared to base R `ifelse()` - why rbindlist error?

This code block below utilizing dplyr::if_else() works without issue and produces the flextable shown. library(tidyverse) library(flextable) # utilizing dplyr if_else df1 % mutate(col3 = if_else(apply(.[, 1:2], 1, sum) > 10 & .[, 2] > 5, 'True', 'False')) df1 %>% flextable() %>% theme_zebra() I fi...
Jason Hunter
0

votes
0

answer
6

Views

Two dataframes, same traits but different ID of individuals. Is there an R function to generate the conversion between old and new ID?

I'm working on a large dataset with a set of variables. I've downloaded a more recent dataset with the same variables but different coding for the samples IDs. I need to write a function that tells which of ID corresponds to the new one. Old_data old_ID var1 var2 var3 var4 var5 var6 1 A 2...
0

votes
0

answer
6

Views

Calculations with dplyr based on specific factors and dates and summaries of values

I have a data frame of counts of different classifications of ship on specific dates at certain distances off shore (DOS), e.g. 0-12nm and 0-100nm - I would like to subtract the ships within the 0-12nm DOS from 0-100nm, so that I can calculate how many e.g. 'passenger' ships were only in 12-100nm on...
Lmm
5

votes
4

answer
58

Views

Evaluate different logical conditions from string for each row

I have a data.frame like this: value condition 1 0.46 value > 0.5 2 0.96 value == 0.79 3 0.45 value 0.01 7 0.90 value >= 0.6 8 0.25 value < 0.91 9 0.04 value > 0.2 structure(list(value = c(0.46, 0.96, 0.45, 0.68, 0.57, 0.1, 0.9, 0.25, 0.04), condition = c('value > 0.5', 'value == 0...
Humpelstielzchen
6

votes
3

answer
104

Views

Creating one variable from a list of variables in R?

I have a sequence of variables in a dataframe (over 100) and I would like to create an indicator variable for if particular text patterns are present in any of the variables. Below is an example with three variables. One solution I've found is using tidyr::unite() followed by dplyr::mutate(), but I'...
patward5656
1

votes
3

answer
2k

Views

Mutating dummy variables in dplyr

I want to create 7 dummy variables -one for each day, using dplyr So far, I have managed to do it using the sjmisc package and the to_dummy function, but I do it in 2 steps -1.Create a df of dummies, 2) append to the original df #Sample dataframe mydfdata.frame(x=rep(letters[1:9]), day=c('Mon','Tues...
Lefkios Paikousis
1

votes
2

answer
1.3k

Views

How to use bind_rows() and ignore column names [duplicate]

This question already has an answer here: Simplest way to get rbind to ignore column names 2 answers This question probably has been answered before, but I can't seem to find the answer. How do you use bind_rows() to just union the two tables and ignore the column names. The documentation on bind_r...
jmich738
1

votes
3

answer
399

Views

Remove duplicates in one column based on another column

I'm looking for a nicer way to do this in R. I do have one possibility but it seems like there should be a smart/more readable way. I want to delete duplicates in one/more column only if a condition is met in another column (or columns). In my simplified example I want to delete duplicates in colu...
user2738526
1

votes
2

answer
240

Views

Proportions by group with srvyr package

Hi, I have a data frame with a weight column like the example: df % dplyr::mutate(smartphone = case_when( q_d1 == 2 ~ 'No Internet', q_d2_1 > 0 ~ 'smartphone' , q_d2_1 == 0 ~ 'No smartphone' , TRUE ~ NA_character_)) %>% group_by(smartphone) %>% summarize(proportion = srvyr::survey_mean(), total =...
DanielG
1

votes
1

answer
40

Views

Select unique values in dataframe based on sorted value

Has anyone selected unique values from a dataframe based on a second value's highest value? Example: name value cheese 15 pepperoni 12 cheese 9 tomato 4 cheese 3 tomato 2 The best I've come up with - which I am SURE there's a better way - is to sort df by value descending, extract df$name, run uniqu...
Christopher Penn
1

votes
2

answer
33

Views

apply count() to every factor variable in a dataframe

I can use purrr::map() to get the mean of every column in a dataframe. Can I use any of the map functions in combination with count() to get counts for each categorical variable in a dataframe? library(dplyr) library(purrr) mtcars %>% map(mean) mtcars %>% mutate(am = factor(am, labels = c('auto', '...
Joe
1

votes
1

answer
47

Views

Modify column value based on another column value

This is creating troubles to me,I am using dplyr and I want to change the value of each Week(W1 to W3) based on the value of CP: if < CP then 0 CP W1 W2 W3 W4 1 50 0 60 0 0 4 10...
3nomis
1

votes
1

answer
71

Views

dplyr: divide all values in group by group's first value

My df looks something like this: ID Obs Value 1 1 26 1 2 13 1 3 52 2 1 1,5 2 2 30 Using dplyr, I to add the additional column Col, which is the result of a division of all values in the column value by the group's first value in that column. ID O...
TIm Haus
1

votes
1

answer
35

Views

dplyr::starts_with and ends_with not subsetting based on arguments

I want to select a number of variables based on thier names to transform them. The variable names all start with inq and end with 7, 8, 10, 13:15. This is not working for me... Apologies if this is obvious, but I cannot get it to work. Am I using the wrong functions, putting my functions and argumen...
Atanas Janackovski
1

votes
2

answer
52

Views

GGPLOT - Show connectivity of annual enrolment across grades and years

I have student enrolment data from 1990-2017: nominal_roll1 % gather(Year, Attendance, `1991-92`:`2017-18`) %>% mutate(Year_ = as.numeric(str_trunc(.$Year, side = 'right', width = 4, ellipsis = '')), Grade = factor(Grade, levels = c('K4','K5','Gr. 1','Gr. 2','Gr. 3','Gr. 4','Gr. 5','Gr. 6','Gr. 7',...
Corey Pembleton
1

votes
4

answer
45

Views

Filter by lowest and highest years by group using dplyr

I feel like the answer here is obvious, but I can't nail it down. I have this dataframe: df % group_by(SIC) %>% filter(!is.na(value)) %>% filter(year %in% c(min(year), max(year))) # A tibble: 35 x 3 # Groups: SIC [18] SIC year value 1 12 2011 0.081 2 11 2011 0.218 3 7 2011 0.212 4...
elliot
1

votes
2

answer
42

Views

Filter only rows that are duplicated using dplyr

I have been trying for a while now to solve a problem close to the one as presented at this issue with no success. This consists in filtering for items that are duplicated in a group, but also considering the original one used for comparison with dplyr (I prefer dplyr over base or data.table). The s...
Just Burfi
1

votes
1

answer
51

Views

Merge/Join two datasets on minimum distance between two columns

I am trying to merge two datasets of yields and I need to merge them on the minimum difference of maturities. Since I would like to calculate the spread between commercial loans and treasury bills of the matching maturity. The join works, but I am looking for a better way, perhaps with fuzzy_join?...
hannes101
1

votes
1

answer
41

Views

Error in charToDate(x) : When perfrom aggregation by year in R

I have dataset mydat=structure(list(time = structure(c(6L, 7L, 8L, 9L, 1L, 2L, 3L, 4L, 5L), .Label = c('01.01.2008', '01.02.2008', '01.03.2008', '01.04.2008', '01.05.2008', '01.09.2007', '01.10.2007', '01.11.2007', '01.12.2007'), class = 'factor'), account_a = structure(c(6L, 4L, 3L, 2L, 9L, 8L,...
cbool

View additional questions