# Questions tagged [tidyverse]

1069 questions

1

votes

1

answer

231

Views

### Using predict function for new data along with tidyverse

I want to use predict function for new data along with tidyverse as in the following example. However, I could not figured out how to use with new data for wt = 4.0 and 4.2. Any hints, please.
library(tidyverse)
mtcars %>%
dplyr::mutate(cyl1 = factor(cyl)) %>%
tidyr::nest(-cyl) %>%
dplyr::mutate...

1

votes

1

answer

41

Views

### R Find the Distance between Two US Zipcode columns

I was wondering what the most efficient method of calculating the distance in miles between two US zipcode columns would be using R.
I have heard of the geosphere package for computing the difference between zipcodes but do not fully understand it and was wondering if there were alternative methods...

1

votes

1

answer

25

Views

### How can I use gather function to manipulate my data frame? [duplicate]

This question already has an answer here:
Collapse / concatenate / aggregate a column to a single comma separated string within each group
3 answers
I Have a data frame as follows:
df

1

votes

2

answer

24

Views

### How to find opening and closing balances

Could someone please help me find opening_baland closing_bal.
I have all the transaction aggregates that happened in the month (new/transfers/exits etc) and I also have the closing balance
for the last month. Using this data I needed to work back.
library(tidyverse)
library(lubridate)
# this is th...

1

votes

2

answer

14

Views

### Trying to combine dates and times

I am trying to combine dates and times. These are from a file when imported, looks like this:
library(tidyverse)
library(lubridate)
bookings

0

votes

1

answer

24

Views

### Visualise differences between factor levels using ggplot

I have a plot in my mind that I would like to create, but I don't know how to successfully achieve this goal.
I have 2 dataframes, one containing the mean value for each factor level, and the other, pairwise differences between these levels.
contrasts

0

votes

3

answer

24

Views

### How do you find if a value is found in specific columns?

ID Pred1 Pred2 Pred3 Obs1 Obs2 Obs3 FP
1 Boston Tokyo London Boston London Other 0
2 Tokyo London Paris Seattle Paris Other 0
3 London Berlin Paris Paris Berlin London 0
4 Seattle Berlin London Tokyo Paris Boston 1
This is my dataset. What I am tryi...

6

votes

3

answer

104

Views

### Creating one variable from a list of variables in R?

I have a sequence of variables in a dataframe (over 100) and I would like to create an indicator variable for if particular text patterns are present in any of the variables. Below is an example with three variables. One solution I've found is using tidyr::unite() followed by dplyr::mutate(), but I'...

1

votes

1

answer

387

Views

### Equivalent of Stata tab command in R

I'm trying to find out what the Stata command tab x y if z>1 would be in R.
Other than d %>% filter (z>1).

1

votes

2

answer

67

Views

### `dplyr::case_when` don't give me correct results

case_when don't produces the expected results:
My list:
library(tidyverse)
1:6%>%
str_c('var',.)%>%
map(~assign(.,runif(30,20,100),envir=globalenv()))
tibble

1

votes

1

answer

47

Views

### Modify column value based on another column value

This is creating troubles to me,I am using dplyr and I want to change the value of each Week(W1 to W3) based on the value of CP: if < CP then 0
CP W1 W2 W3 W4
1 50 0 60 0 0
4 10...

1

votes

1

answer

35

Views

### dplyr::starts_with and ends_with not subsetting based on arguments

I want to select a number of variables based on thier names to transform them. The variable names all start with inq and end with 7, 8, 10, 13:15. This is not working for me... Apologies if this is obvious, but I cannot get it to work. Am I using the wrong functions, putting my functions and argumen...

1

votes

2

answer

30

Views

### How do I prevent interpolation between values where there are more than X number of missing rows of data?

I would like to interpolate missing data, but skip scenarios where there are more than X number (e.g., 3) missing rows of data. I have code below, but the final step does not work.
I previously posted a question and got a great answer (How do I prevent interpolation between values where there are mo...

1

votes

1

answer

33

Views

### How to unnest a list containing data frames

I'm trying to expand a nested column that contains a list of data frames. They are either NULL or 1 row by n columns, so the goal is to just add n columns to the tibble. (NULL list items would preferably expand to NAs).
I've tried several solutions including those from this answer.
The goal for th...

1

votes

1

answer

28

Views

### Is there a limit for columns created within one `mutate`call?

I'm currently restructuring an application, which provides data for a certain subject. At the moment I'm designing the structure of the new scripts for the shiny app and it works well. Before I go on and finalize things, I wanted to ask if anybody encountered problems when creating new columns with...

1

votes

0

answer

129

Views

### R: consecutive occurrence of a number using tidyverse

Some sample data firs:
set.seed(123)
mat.1980

1

votes

1

answer

488

Views

### How to Use Forcats::Fct_Collapse in a Function Across Different Dataframes with Different Factor Levels

library(tidyverse)
library(forcats)
I have two simple dataframes (code at bottom) and I want to create a new recoded variable by collapsing the 'Animal' column. I usually do this with forcats::fct_collapse. However, I want to make a function to apply fct_collapse to many different dataframes that ha...

1

votes

1

answer

27

Views

### Assigning a list of lists as a nested column

I want to use purrr to generate some data based on some parameters.
Shown below is a script that will generate a beta density on 0 to 1 parametedized by a a and b (the columns of the dataframe params.
library(tidyverse)
a = c(2,4,6)
b = c(10,12,14)
params = expand.grid(a = a, b = b)
gen_den = functi...

1

votes

1

answer

61

Views

### Filter Start Date with Greather Than or Equal To and End Date that Contains Months as Strings [closed]

library(tidyverse)
library(lubridate)
I'm new to working with dates in the tidyverse and I'm attempting to filter by Start_Date that is greater than or equal to 08-MAY-2017, and an End_Date that contains the months of AUG or JUL.
I attempted this with the code below. I first used lubridate::mdy...

1

votes

1

answer

83

Views

### Spreading keys/values over multiple data frames stored in a list using a for loop

I have a bunch of data frames stored in a single list. My goal is to format each data frame in the list such that values in a specific column turn into column names. Since I would like every data frame in the list to be transformed, I tried to apply the spread function in tidyverse over all elements...

1

votes

0

answer

67

Views

### fuzzy matching in DNA seqs

For the purposes of the reprex I've generated a tibble called random_DNA_tbl that is a random selection of 10 DNA sequences (of 100 bases). I've got a separate tibble called subseq_tbl, with 3 shorter sequences that match 100% to 3 of the sequences in random_DNA_tbl, but I'd also like to use fuzzy m...

1

votes

1

answer

85

Views

### Creating a factor: error using the cut() function

I am receiving this Error in mutate_impl(.data, dots) : Evaluation error: lengths of 'breaks' and 'labels' differ. error when attempting to create a new variable that indicates if the Air Quality Index is greater than 50 for over 100 days. Basically, I want to create a 'yes' or 'no' and label.
I wa...

1

votes

1

answer

79

Views

### Error: could not find function “lang_unnamespace”

I am getting the error here in this Travis build, and I cannot reproduce it locally. Yes, I realize that I do not have a minimal reproducible example, but I do know that it happens within tidyselect::vars_select(). Has anyone else encountered this before? I cannot find any mention of lang_unnamespac...

1

votes

0

answer

63

Views

### what is the correct way to reference variables when using tidyverse with other functions?

say I would like to use reporttools with tidyverse,
I first make sure the packages are loaded,
#install.packages('tidyverse', 'reporttools') #Use this to install it, do this only once
library(reporttools); library(tidyverse)
Second I test it with a basic reporttools tableNominal, i.e.,
data(CO2)...

1

votes

1

answer

1.3k

Views

### pmap _df: Error in bind_rows_(x, .id) : Argument 1 must have names

I thought the map_df family can fully replace plyr::ldply, as the release note in purrr package claimed a long time ago. However, I'm quite frustrated to realize that I cannot find a simple and elegant solution in this case.
params %
pmap_dfr(rnorm, n = 5)
An error message will be returned:
Error...

1

votes

1

answer

102

Views

### Using purrr to convert list of vectors to list of matrices

EDITED: Based on suggestion by user @useR I have the following reprex
for my required question (see end of post).
# This is the source list i.e. list of vectors
all_list [[1]]
#> [1] 1 10 19 28 37
#>
#> [[2]]
#> [1] 4 13 22 31 40
#>
#> [[3]]
#> [1] 7 16 25 34 43
#>
#> [[4]]
#> [1] 2 11 20 29...

1

votes

1

answer

250

Views

### Converting time-series results to dates

I use fpp2 for forecasting. My workflow involves importing data, converting to a time series, then forecasting.
One pain-point is that after forecasting I am left with data that is an extension of my current data, but no longer retains the same date column.
For example, if I am working with weeks...

1

votes

2

answer

67

Views

### gather multiple columns with nested, repeated measures

I have a dataset of people (pid) of different types (type2=c('dad', 'mom', 'kid'; and for ease, type=c('a', 'b', 'c')) nested in households (hid) with repeated measurements (time).
Some variables like v1_ are asked to everyone, but the values are spread across three columns. For instance, v1_a cont...

1

votes

1

answer

76

Views

### Grouped tibbletime and using collapse_index, getting weird results

I have a file (appx 9K records) that I want to aggregate based on the group first, and then on dates that are within seven days of each other. However, I'm not understanding why the results look the way they do. I realize there are other ways I could achieve the same results with this particular exa...

1

votes

1

answer

515

Views

### dplyr mutating multiple columns by prefix and suffix

I have a problem that I can replicate using the iris dataset, where many groups (same prefix in name) of variables with two different suffixes. I want to be take a ratio for all these groups but can't find a tidyverse solution.. I would have through mutate_at() might have been able to help.
In the i...

1

votes

2

answer

44

Views

### R efficiency iterating through dataframes

I am working with a large data set, lets call it data, and want to create a new column, lets call it data$results based off of some column data$input. The results are based off of some conditional if/then logic, so my original approach was something like:
for (rows in data) {
data$results

1

votes

0

answer

62

Views

### How to make tibble saved with write_tsv readable by read_tsv

I have quite large tibble() (data.frame) which I save with write_tsv() and would like to read with read_tsv(). I am using all default options.
However, read_tsv() emits a bunch of warnings (See example below). What strategy could I use to make it work?
(also tried write_csv() -> read_csv() but same...

1

votes

2

answer

1.6k

Views

### Package installation in R fail on MacOS

I tried to install two packages in R Studio: tidyverse and quantmod. However both give me errors and I can't understand why (googling doesn't help to understand the problem).
For tidy verse I get:
> install.packages('tidyverse')
also installing the dependency ‘xml2’
There are binary versions ava...

1

votes

1

answer

27

Views

### facet_grid with multiple line colours

I have the following data frame resulting from simulations of ODEs with different parameter sets, e.g.
df %
gather(p, pval, -t, -x, -xval) %>%
distinct()
df.1$pval

1

votes

2

answer

136

Views

### write_csv Scientific notation depending on trailing “000”?

Writing a csv with the write_csv() function from package readr seems to treat numbers differently depending on trailing zeros.
4001705344 is saved as is, but
4100738000 is saved as 4100738e3 in the csv.
This causes problems when I reopen the csv (e.g. in Excel).
For a reproducible example s.
library...

1

votes

1

answer

106

Views

### How to undo dplyr mutate silently round the division operation [duplicate]

This question already has an answer here:
Why does as_tibble() round floats to the nearest integer?
1 answer
I have the following data frame:
library(tidyverse)
dat # A tibble: 3 x 3
#> sid nof_reads mapped_reads
#>
#> 1 MK1 19786677 19785168
#> 2 MK2 29531664 295...

1

votes

1

answer

157

Views

### Format numeric data table rowwise in R using tidyverse for kable output

I have a table of values that I want to save as a kable() table. Each row of the table is a variable and each column is a value of that variable (e.g., a mean, minimum, maximum, etc.). You can apply the format() function to columns of a data frame but applying it across rows seems very awkward. I fi...

1

votes

1

answer

21

Views

### How do I add common rows together?

This is an example of the data frame I am using
UserID

1

votes

0

answer

171

Views

### R's purrr package shortcut dot meaning: .f, .p

My question is, I do not quite understand the meaning of the . when writing a function such as implementing a self-designed every() function of purrr predicate functions:
every2 1})
#> [1] FALSE
every2(1:3, function(x) {x > 0})
#> [1] TRUE
What does the . mean when you type .f?
I have tried it by m...

1

votes

1

answer

241

Views

### ggplot2 (version 3) incompatibility with ggmap for geom_density_2d

ggplot2 version 3 seems to have an incompatibility with ggmap when using the geom_density2d() function to add a layer. The following code returns an error (though worked with ggplot2 version 2):
# Create a data frame
df