-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tidy the inflammation dataset #5
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given the "WIP" tag, I won't comment on the completeness. Seems good so far. Maybe I'm not understanding, but what exactly are you unsure about doing? I think covering wide to long etc conversion is a very powerful tool, so I agree with covering these.
_episodes_rmd/06-tidy-data.Rmd
Outdated
teaching: 30 | ||
exercises: 10 | ||
questions: | ||
- "Which different forms can one dataset have?" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"What are possible forms that a dataset can have?"
_episodes_rmd/06-tidy-data.Rmd
Outdated
exercises: 10 | ||
questions: | ||
- "Which different forms can one dataset have?" | ||
- "Which advantages and disadvantages do these forms have?" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
which -> what
_episodes_rmd/06-tidy-data.Rmd
Outdated
- "Which features make a dataset more or less reusable?" | ||
- "How can we add datasets to R packages?" | ||
objectives: | ||
- "Use `tidyr::gather()` to convert wide data to its long form." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This objective doesn't relate to item 4 of questions. Maybe add to it saying "and adding to package"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
More objectives will be added. I have a task list in my staging area ;-)
_episodes_rmd/06-tidy-data.Rmd
Outdated
objectives: | ||
- "Use `tidyr::gather()` to convert wide data to its long form." | ||
keypoints: | ||
- "Excel & co. incentivise the wide data format, which may spread variables across columns." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Spreadsheets incentivise ..."
_episodes_rmd/06-tidy-data.Rmd
Outdated
|
||
```{r inflammation-wide} | ||
dat <- read.csv(file = "data/inflammation-01.csv", header = FALSE) | ||
colnames(dat); rownames(dat) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Has this ;
been introduced? I tend to prefer one line per command for such short lines of code
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No. I thought it was necessary to get both outputs into the knit-ed version. But it turns out it isn't, so we'll go with your preference :-)
_episodes_rmd/06-tidy-data.Rmd
Outdated
|
||
```{r inflammation-label} | ||
patient_ID <- paste("patient", sep = "_", seq(60)) | ||
dat_labelled <- cbind(dat, patient_ID) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since we use dplyr/tidyr, why not also use bind_cols
instead of cbind
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't know about that, thanks for the hint! In this case, though: Error in cbind_all(x) : Argument 2 must have names
. So, I think we can either stick to base::cbind
or upgrade the example with names()
or something like that? What do you think?
Please feel free to add it. I tried names(patient_ID) <- "patient_ID"
and then dplyr::bind_cols(dat, patient_ID)
but still get the error.
@@ -0,0 +1,73 @@ | |||
--- | |||
title: "Tidying & packaging datasets" | |||
teaching: 30 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You think it will take 30 min to cover this? Seems a bit much for so little code. Maybe also include spread
? And using the gather(..., -variable_name)
minus ability to exclude a column from the gathering.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
WIP ;-) I didn't adjust the number after copying from a template.
But yes, spread
and more about gather
is a good idea!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't include more about this after all, because we can re-use so many other resources if people would like to learn about this. For the context of this inflammation dataset, I mostly want to get the column naming and tidying across.
Also, I would include git commit
ing the steps in between or in the end, so I think 30min is realistic after all.
- improve wording - fix typos - avoid ;
sub would require as.numeric() as well.
ee78358
to
4bf2d5e
Compare
Thanks for your comments above, @lwjohnst86 :-)
About which of the 3 options could be the most effective learning experience. |
c9beccb
to
357ace5
Compare
Also move it to where the tidy data episode (see #5) will need it to display the same code as learners need for packaging the dataset.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good!
|
||
Just as a good function documentation makes your code more accessible and reusable | ||
for others and your future self, datasets benefit from a documentation as well. | ||
Please read through [r-pkgs.had.co.nz/data.html#documenting-data](http://r-pkgs.had.co.nz/data.html#documenting-data) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They'll do this on their own I assume?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, 5min break for us.
> > ~~~ | ||
> > #' Inflammation In Patients During Study... | ||
> > #' | ||
> > #' @source Pre-registration: \url{http://wwww.alltrials.net/study...}. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any reason not to use markdown syntax in the Roxygen documentation? Use can use it by running usethis::use_roxygen_md()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the delay in responding! GitHub didn't notify me about your comments :-(
> > #' \item{patient_ID}{A factor prepresenting the different patients} | ||
> > #' \item{day}{Number of days after start of the study} | ||
> > #' \item{inflammatory_response}{Measured daily as described in the methods section of ...} | ||
> > #' } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think there is a \describe
alternative in markdown, so this would stay if you did use markdown
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because of this, and the mention of "LaTeX-like syntax" in an earlier episode, I'd keep it like this first, but I added a note below this: 5b9deca.
_episodes_rmd/06-tidy-data.Rmd
Outdated
> | ||
> > ## Solutions | ||
> > | ||
> > 1. Run `roxygen2::roxygenise()`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or devtools::document()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because it's not mentioned anywere else in the lesson anymore, I'd refer learners to ??roxygenise
in case of problems. There, they'll find a note about when to use devtools::document()
.
7a0bdc1
to
08f6409
Compare
08f6409
to
188caaf
Compare
I'm merging this as it is now, because we have to freeze the material at some point I'm going to do it now, and leave #6 to a second iteration of this workshop. |
See #3. I feel between a rock and a hard place here, about which way to go for the 12. > afternoon "Community-standard data formats, tidy data, data packages":
R-eco…dplyr or r-social…tidyr are mature & highlight the important aspects :-)
The PANGAEA remix is close to our hearts, but touches only a few aspects. Community-standard variable names are already aligned, so only a few lines of
gather()
&use_data()
could be appended there.The inflammation dataset can easily be tidied on the technical level, but how does one find community-standard variable names?
patient_ID
&inflammation_score
seem logical, but are they in any ontology?Ideas for this episode
reshape2::melt
totidyr::gather()
.rda
include all 12 files, but then how to label patients consecutively (1-60, 61-120, etc.)?