# Questions tagged [dplyr]

7143 questions

1

votes

1

answer

231

Views

### Using predict function for new data along with tidyverse

I want to use predict function for new data along with tidyverse as in the following example. However, I could not figured out how to use with new data for wt = 4.0 and 4.2. Any hints, please.
library(tidyverse)
mtcars %>%
dplyr::mutate(cyl1 = factor(cyl)) %>%
tidyr::nest(-cyl) %>%
dplyr::mutate...

1

votes

2

answer

67

Views

### Why isn't my barplot rearranging properly when faceting with ggplot?

So I have made this barplot with this code, bars organised in descending order, great!
na.omit(insect_tally_native_ranges)%>%
group_by(native_ranges)%>%
dplyr::summarise(freq=sum(n))%>%
ggplot(aes(x=reorder(native_ranges,freq),y=freq))+
geom_col(color='#CD4F39',fill='#CD4F39',alpha=0.8)+
coord_flip(...

1

votes

1

answer

36

Views

### Forecasting one step ahead

I have one data.frame with three columns Year, Nominal_Revenue and COEFFICIENT. So I want to forecast with this data like example below
library(dplyr)
TEST

1

votes

1

answer

78

Views

### R group_by %>% full_join losing NA records

Consider these two data frames:
t1% filter(Id==2) %>% full_join(t1,by=c('Time','Cat'))
t2 %>% group_by(Id) %>% filter(Id==1) %>% full_join(t1,by=c('Time','Cat'))
This will give me 5, where the missing entry (NA values) of Id==2 and Time==2 is gone:
t2 %>% group_by(Id) %>% full_join(t1,by=c('Time',...

1

votes

2

answer

41

Views

### standard eval with `dplyr::count()` [duplicate]

This question already has an answer here:
dplyr: How to use group_by inside a function?
3 answers
How can I pass a character vector to dplyr::count().
library(magrittr)
variables %
dplyr::count_(variables)
This works well, but dplyr v0.8 throws the warning:
count_() is deprecated.
Please use coun...

1

votes

1

answer

55

Views

### Function to pass parameter to perform group_by in R [duplicate]

This question already has an answer here:
Pass arguments to dplyr functions
6 answers
I am trying to write a function and pass in 2 parameters. I am getting an error and the function is not able recognize the 2nd parameter.
library(dplyr)
Test2

1

votes

5

answer

61

Views

### Subtract rows varying one column but keeping others fixed

I have an experiment where I need to subtract values of two different treatments from the Control (baseline), but these subtractions must correspond to other columns, named block and year sampled.
Dummy data frame:
df

1

votes

1

answer

90

Views

### Merge 3 columns based on unique values?

I am trying to do a merge on 3 columns to a single one. The column values are separated by ';' and the new column need to unzip all the 3 column values and put the unique values. I know how to perform the merge column. But I am struggling to do unzipping the row value in 3 columns and finding unique...

1

votes

1

answer

15

Views

### Guidance on joining 2 dataframes, such that each row of df2 becomes an entire column of df1, iterated over all rows of df2

I need to 'multiply' two df's together to create all possible solutions, to use in a Tableau scenario.
The scenario is as follows:
I have a df1 of cars and their associated MPGs, and a df2 of zipcodes, and their associated distance from a fixed point (calculating carbon footprint). Once I get the...

1

votes

1

answer

25

Views

### How can I use gather function to manipulate my data frame? [duplicate]

This question already has an answer here:
Collapse / concatenate / aggregate a column to a single comma separated string within each group
3 answers
I Have a data frame as follows:
df

1

votes

2

answer

22

Views

### Grouping then selecting the bottom row from one column

I have a dataset where I need to group by one column and select the last row from that group in another column and taking the mean of the third column.
A sample is like this:
df %
group_by(id) %>%
summarise(mean(v))
The result shows as follows:
id `mean(v)`
1 a 2
2 b 2.67
3 c...

1

votes

1

answer

41

Views

### Check whether value exists in specific group of rows in a dataframe

I have this dataframe (df):
structure(list(from = c('(192) 242-2345', NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, '(832) 345-3168',
NA, NA), to = c('(900) 301-3451', NA, NA, NA, NA, NA, NA, NA,
NA...

1

votes

2

answer

40

Views

### Spread variable across multiple columns in dplyr

Let's say I have the following dataset:
df

1

votes

1

answer

31

Views

### count the frequency of all pairwise combinations by group

I want to count the frequency of all pairwise combinations of item by group.
have %
# https://stackoverflow.com/a/38335011/841405
full_join(have, by='group') %>%
group_by(item.x, item.y) %>%
summarise(length(unique(group))) %>%
filter(item.x!=item.y) %>%
mutate(item = paste(item.x, item.y, sep='...

1

votes

2

answer

151

Views

### Get level names using glue and dplyr in a loop

I am trying to get level names from a table using dplyr and glue in a loop (I use a loop because I get a large number of variable to get grouped tables and individual tables), I show an example below:
library(dplyr)
library(glue)
var=c( 'vs', 'am')
for(i in var) {
bd=mtcars%>%
group_by(carb) %>%
cou...

1

votes

2

answer

24

Views

### Pmax of columns ending with a given string

I would like to conditionally mutate a new column representing the pmax() of columns ending with '_n' for a given row. I know I can do this by explicitly specifying the column names, but I would prefer to have this be the result of a call to ends_with() or similar.
I have tried mutate_at() and plai...

1

votes

1

answer

17

Views

### How to Create Values based on Start-Stop Info in Separate Column

I have a very messy dataset created by a research device. This data shows a physiological measure ('Physio') for every few milliseconds ('Time'). The output lists several user messages, such as when a trial starts ('START_TRIAL n'), when a trial ends ('STOP_TRIAL'), and other random things that ma...

1

votes

2

answer

14

Views

### Trying to combine dates and times

I am trying to combine dates and times. These are from a file when imported, looks like this:
library(tidyverse)
library(lubridate)
bookings

1

votes

1

answer

26

Views

### Copy values from one table to another, only where second table has specific values

I thought this would be straightforward, but it's been a while since I've looked at R.
I have two tables, and I want to make a third table with values from the first based on values from the second. (I want the numbers from table 1, anytime the corresponding row/column from table 2 has a '1')
I was...

0

votes

0

answer

13

Views

### R - Summarize course enrollment over relative term sequence

The Applied Problem
I want to abstract out code that summarizes course taking patterns and success rates of a cohort of students for n courses and n terms.
Example
With the following cohort of students, how many go to course 'B' after taking Course 'A', and how many of those students succeeded:
dat...

0

votes

0

answer

6

Views

### R: Creating new dataframe based on multiple conditions on existing dataframe

I need to create a new dataframe using multiple conditions on an existing dataframe.
I tried using dplyr function, summarise in particular for multiple conditions but failed as the dataset size decreases once the conditions as applied.
For explanation, below is a simple sample of what I am trying to...

0

votes

0

answer

7

Views

### `dplyr::if_else()` compared to base R `ifelse()` - why rbindlist error?

This code block below utilizing dplyr::if_else() works without issue and produces the flextable shown.
library(tidyverse)
library(flextable)
# utilizing dplyr if_else
df1 %
mutate(col3 = if_else(apply(.[, 1:2], 1, sum) > 10 & .[, 2] > 5,
'True',
'False'))
df1 %>% flextable() %>% theme_zebra()
I fi...

0

votes

0

answer

6

Views

### Two dataframes, same traits but different ID of individuals. Is there an R function to generate the conversion between old and new ID?

I'm working on a large dataset with a set of variables. I've downloaded a more recent dataset with the same variables but different coding for the samples IDs. I need to write a function that tells which of ID corresponds to the new one.
Old_data
old_ID var1 var2 var3 var4 var5 var6
1 A 2...

0

votes

0

answer

6

Views

### Calculations with dplyr based on specific factors and dates and summaries of values

I have a data frame of counts of different classifications of ship on specific dates at certain distances off shore (DOS), e.g. 0-12nm and 0-100nm - I would like to subtract the ships within the 0-12nm DOS from 0-100nm, so that I can calculate how many e.g. 'passenger' ships were only in 12-100nm on...

5

votes

4

answer

58

Views

### Evaluate different logical conditions from string for each row

I have a data.frame like this:
value condition
1 0.46 value > 0.5
2 0.96 value == 0.79
3 0.45 value 0.01
7 0.90 value >= 0.6
8 0.25 value < 0.91
9 0.04 value > 0.2
structure(list(value = c(0.46, 0.96, 0.45, 0.68, 0.57, 0.1, 0.9,
0.25, 0.04), condition = c('value > 0.5', 'value == 0...

6

votes

3

answer

104

Views

### Creating one variable from a list of variables in R?

I have a sequence of variables in a dataframe (over 100) and I would like to create an indicator variable for if particular text patterns are present in any of the variables. Below is an example with three variables. One solution I've found is using tidyr::unite() followed by dplyr::mutate(), but I'...

1

votes

3

answer

2k

Views

### Mutating dummy variables in dplyr

I want to create 7 dummy variables -one for each day, using dplyr
So far, I have managed to do it using the sjmisc package and the to_dummy function, but I do it in 2 steps -1.Create a df of dummies, 2) append to the original df
#Sample dataframe
mydfdata.frame(x=rep(letters[1:9]),
day=c('Mon','Tues...

1

votes

2

answer

1.3k

Views

### How to use bind_rows() and ignore column names [duplicate]

This question already has an answer here:
Simplest way to get rbind to ignore column names
2 answers
This question probably has been answered before, but I can't seem to find the answer. How do you use bind_rows() to just union the two tables and ignore the column names.
The documentation on bind_r...

1

votes

3

answer

399

Views

### Remove duplicates in one column based on another column

I'm looking for a nicer way to do this in R. I do have one possibility but it seems like there should be a smart/more readable way.
I want to delete duplicates in one/more column only if a condition is met in another column (or columns).
In my simplified example I want to delete duplicates in colu...

1

votes

2

answer

240

Views

### Proportions by group with srvyr package

Hi, I have a data frame with a weight column like the example:
df %
dplyr::mutate(smartphone = case_when(
q_d1 == 2 ~ 'No Internet',
q_d2_1 > 0 ~ 'smartphone' ,
q_d2_1 == 0 ~ 'No smartphone' ,
TRUE ~ NA_character_)) %>%
group_by(smartphone) %>%
summarize(proportion = srvyr::survey_mean(),
total =...

1

votes

1

answer

40

Views

### Select unique values in dataframe based on sorted value

Has anyone selected unique values from a dataframe based on a second value's highest value?
Example:
name value
cheese 15
pepperoni 12
cheese 9
tomato 4
cheese 3
tomato 2
The best I've come up with - which I am SURE there's a better way - is to sort df by value descending, extract df$name, run uniqu...

1

votes

2

answer

33

Views

### apply count() to every factor variable in a dataframe

I can use purrr::map() to get the mean of every column in a dataframe. Can I use any of the map functions in combination with count() to get counts for each categorical variable in a dataframe?
library(dplyr)
library(purrr)
mtcars %>% map(mean)
mtcars %>% mutate(am = factor(am, labels = c('auto', '...

1

votes

1

answer

47

Views

### Modify column value based on another column value

This is creating troubles to me,I am using dplyr and I want to change the value of each Week(W1 to W3) based on the value of CP: if < CP then 0
CP W1 W2 W3 W4
1 50 0 60 0 0
4 10...

1

votes

1

answer

71

Views

### dplyr: divide all values in group by group's first value

My df looks something like this:
ID Obs Value
1 1 26
1 2 13
1 3 52
2 1 1,5
2 2 30
Using dplyr, I to add the additional column Col, which is the result of a division of all values in the column value by the group's first value in that column.
ID O...

1

votes

1

answer

35

Views

### dplyr::starts_with and ends_with not subsetting based on arguments

I want to select a number of variables based on thier names to transform them. The variable names all start with inq and end with 7, 8, 10, 13:15. This is not working for me... Apologies if this is obvious, but I cannot get it to work. Am I using the wrong functions, putting my functions and argumen...

1

votes

2

answer

52

Views

### GGPLOT - Show connectivity of annual enrolment across grades and years

I have student enrolment data from 1990-2017:
nominal_roll1 %
gather(Year, Attendance, `1991-92`:`2017-18`) %>%
mutate(Year_ = as.numeric(str_trunc(.$Year, side = 'right', width = 4, ellipsis = '')),
Grade = factor(Grade, levels = c('K4','K5','Gr. 1','Gr. 2','Gr. 3','Gr. 4','Gr. 5','Gr. 6','Gr. 7',...

1

votes

4

answer

45

Views

### Filter by lowest and highest years by group using dplyr

I feel like the answer here is obvious, but I can't nail it down. I have this dataframe:
df %
group_by(SIC) %>%
filter(!is.na(value)) %>%
filter(year %in% c(min(year), max(year)))
# A tibble: 35 x 3
# Groups: SIC [18]
SIC year value
1 12 2011 0.081
2 11 2011 0.218
3 7 2011 0.212
4...

1

votes

2

answer

42

Views

### Filter only rows that are duplicated using dplyr

I have been trying for a while now to solve a problem close to the one as presented at this issue with no success. This consists in filtering for items that are duplicated in a group, but also considering the original one used for comparison with dplyr (I prefer dplyr over base or data.table).
The s...

1

votes

1

answer

51

Views

### Merge/Join two datasets on minimum distance between two columns

I am trying to merge two datasets of yields and I need to merge them on the minimum difference of maturities. Since I would like to calculate the spread between commercial loans and treasury bills of the matching maturity.
The join works, but I am looking for a better way, perhaps with fuzzy_join?...

1

votes

1

answer

41

Views

### Error in charToDate(x) : When perfrom aggregation by year in R

I have dataset
mydat=structure(list(time = structure(c(6L, 7L, 8L, 9L, 1L, 2L, 3L,
4L, 5L), .Label = c('01.01.2008', '01.02.2008', '01.03.2008',
'01.04.2008', '01.05.2008', '01.09.2007', '01.10.2007', '01.11.2007',
'01.12.2007'), class = 'factor'), account_a = structure(c(6L,
4L, 3L, 2L, 9L, 8L,...