Tidy the inflammation dataset #5

katrinleinweber · 2018-06-20T09:24:08Z

See #3. I feel between a rock and a hard place here, about which way to go for the 12. > afternoon "Community-standard data formats, tidy data, data packages":

R-eco…dplyr or r-social…tidyr are mature & highlight the important aspects :-)
The PANGAEA remix is close to our hearts, but touches only a few aspects. Community-standard variable names are already aligned, so only a few lines of gather() & use_data() could be appended there.
The inflammation dataset can easily be tidied on the technical level, but how does one find community-standard variable names? patient_ID & inflammation_score seem logical, but are they in any ontology?

Ideas for this episode

convert reshape2::melt to tidyr::gather()
include raw data as files & result data as .rda
~~include all 12 files, but then how to label patients consecutively (1-60, 61-120, etc.)?~~

lwjohnst86

Given the "WIP" tag, I won't comment on the completeness. Seems good so far. Maybe I'm not understanding, but what exactly are you unsure about doing? I think covering wide to long etc conversion is a very powerful tool, so I agree with covering these.

lwjohnst86 · 2018-06-26T08:38:38Z

_episodes_rmd/06-tidy-data.Rmd

+teaching: 30
+exercises: 10
+questions:
+- "Which different forms can one dataset have?"


"What are possible forms that a dataset can have?"

lwjohnst86 · 2018-06-26T08:39:00Z

_episodes_rmd/06-tidy-data.Rmd

+exercises: 10
+questions:
+- "Which different forms can one dataset have?"
+- "Which advantages and disadvantages do these forms have?"


which -> what

lwjohnst86 · 2018-06-26T08:40:41Z

_episodes_rmd/06-tidy-data.Rmd

+- "Which features make a dataset more or less reusable?"
+- "How can we add datasets to R packages?"
+objectives:
+- "Use `tidyr::gather()` to convert wide data to its long form."


This objective doesn't relate to item 4 of questions. Maybe add to it saying "and adding to package"?

More objectives will be added. I have a task list in my staging area ;-)

lwjohnst86 · 2018-06-26T08:41:05Z

_episodes_rmd/06-tidy-data.Rmd

+objectives:
+- "Use `tidyr::gather()` to convert wide data to its long form."
+keypoints:
+- "Excel & co. incentivise the wide data format, which may spread variables across columns."


"Spreadsheets incentivise ..."

lwjohnst86 · 2018-06-26T08:42:11Z

_episodes_rmd/06-tidy-data.Rmd

+
+```{r inflammation-wide}
+dat <- read.csv(file = "data/inflammation-01.csv", header = FALSE)
+colnames(dat); rownames(dat)


Has this ; been introduced? I tend to prefer one line per command for such short lines of code

No. I thought it was necessary to get both outputs into the knit-ed version. But it turns out it isn't, so we'll go with your preference :-)

lwjohnst86 · 2018-06-26T08:44:06Z

_episodes_rmd/06-tidy-data.Rmd

+
+```{r inflammation-label}
+patient_ID <- paste("patient", sep = "_", seq(60))
+dat_labelled <- cbind(dat, patient_ID)


since we use dplyr/tidyr, why not also use bind_cols instead of cbind?

I didn't know about that, thanks for the hint! In this case, though: Error in cbind_all(x) : Argument 2 must have names. So, I think we can either stick to base::cbind or upgrade the example with names() or something like that? What do you think?

Please feel free to add it. I tried names(patient_ID) <- "patient_ID" and then dplyr::bind_cols(dat, patient_ID) but still get the error.

lwjohnst86 · 2018-06-26T08:47:28Z

_episodes_rmd/06-tidy-data.Rmd

@@ -0,0 +1,73 @@
+---
+title: "Tidying & packaging datasets"
+teaching: 30


You think it will take 30 min to cover this? Seems a bit much for so little code. Maybe also include spread? And using the gather(..., -variable_name) minus ability to exclude a column from the gathering.

WIP ;-) I didn't adjust the number after copying from a template.

But yes, spread and more about gather is a good idea!

I didn't include more about this after all, because we can re-use so many other resources if people would like to learn about this. For the context of this inflammation dataset, I mostly want to get the column naming and tidying across.

Also, I would include git commiting the steps in between or in the end, so I think 30min is realistic after all.

- improve wording - fix typos - avoid ;

sub would require as.numeric() as well.

katrinleinweber · 2018-06-27T13:19:58Z

Thanks for your comments above, @lwjohnst86 :-)

what exactly are you unsure about doing?

About which of the 3 options could be the most effective learning experience. 3. is nearly complete, but that doesn't mean we have to use.

http://www.ontobee.org/search?ontology=&keywords=inflammat*&submit=Search+terms

Also move it to where the tidy data episode (see #5) will need it to display the same code as learners need for packaging the dataset.

lwjohnst86

Looking good!

lwjohnst86 · 2018-07-02T09:58:38Z

_episodes_rmd/06-tidy-data.Rmd

+
+Just as a good function documentation makes your code more accessible and reusable
+for others and your future self, datasets benefit from a documentation as well.
+Please read through [r-pkgs.had.co.nz/data.html#documenting-data](http://r-pkgs.had.co.nz/data.html#documenting-data)


They'll do this on their own I assume?

Yes, 5min break for us.

lwjohnst86 · 2018-07-02T10:00:06Z

_episodes_rmd/06-tidy-data.Rmd

+> > ~~~
+> > #' Inflammation In Patients During Study...
+> > #'
+> > #' @source Pre-registration: \url{http://wwww.alltrials.net/study...}.


Any reason not to use markdown syntax in the Roxygen documentation? Use can use it by running usethis::use_roxygen_md().

Sorry for the delay in responding! GitHub didn't notify me about your comments :-(

lwjohnst86 · 2018-07-02T10:03:13Z

_episodes_rmd/06-tidy-data.Rmd

+> > #'  \item{patient_ID}{A factor prepresenting the different patients}
+> > #'  \item{day}{Number of days after start of the study}
+> > #'  \item{inflammatory_response}{Measured daily as described in the methods section of ...}
+> > #' }


I don't think there is a \describe alternative in markdown, so this would stay if you did use markdown

Because of this, and the mention of "LaTeX-like syntax" in an earlier episode, I'd keep it like this first, but I added a note below this: 5b9deca.

lwjohnst86 · 2018-07-02T10:03:49Z

_episodes_rmd/06-tidy-data.Rmd

+>
+> > ## Solutions
+> > 
+> > 1. Run `roxygen2::roxygenise()`.


Or devtools::document()

Because it's not mentioned anywere else in the lesson anymore, I'd refer learners to ??roxygenise in case of problems. There, they'll find a note about when to use devtools::document().

katrinleinweber · 2018-07-04T14:31:46Z

I'm merging this as it is now, because we have to freeze the material at some point I'm going to do it now, and leave #6 to a second iteration of this workshop.

Tidy the inflammation dataset

08d3870

katrinleinweber added the enhancement New feature or request label Jun 20, 2018

katrinleinweber self-assigned this Jun 20, 2018

katrinleinweber added 3 commits June 20, 2018 11:43

Fix typo

cbb102a

Update from reshape2 to tidyr

0cbe564

Separate chr conversion from gather()

9a850ac

katrinleinweber requested a review from lwjohnst86 June 20, 2018 13:09

lwjohnst86 reviewed Jun 26, 2018

View reviewed changes

katrinleinweber added 5 commits June 27, 2018 09:40

Address review comments

968ce36

- improve wording - fix typos - avoid ;

Focus on elegant variable to value conversion

3e1f132

sub would require as.numeric() as well.

Follow use_this' convention for raw data

1baebbe

Abbreviate preview

66cc65e

Rephrase approval & mention AllTrials

4bf2d5e

katrinleinweber force-pushed the 3-tidy-data branch from ee78358 to 4bf2d5e Compare June 27, 2018 12:09

katrinleinweber added 2 commits June 27, 2018 14:16

Explain final data packaging step

cb77acf

Refer to external resource for dataset docu

7818074

katrinleinweber added 5 commits June 27, 2018 21:57

Trim whitespace

bcf0a6e

Rephrase task for .R file

3033c99

Draft last steps in package docu & build

82291c0

Rename parameter & files to match common terms

6f5d2a1

http://www.ontobee.org/search?ontology=&keywords=inflammat*&submit=Search+terms

Enrich dataset's metadata

97d9297

katrinleinweber force-pushed the gh-pages branch from c9beccb to 357ace5 Compare June 28, 2018 14:27

Katrin Leinweber added 5 commits June 28, 2018 20:30

Don't hard-code patient_ID length

2d3b249

Fix dataset docu formatting

2e3a8f4

Explain importance of good variable names

ce45d7d

Expand episode summary

13f6733

Minor rephrasing & formatting

fecc260

katrinleinweber changed the title ~~WIP: Tidy the inflammation dataset~~ Tidy the inflammation dataset Jun 29, 2018

Refer to further gather() & spread() resources

f009e0e

katrinleinweber mentioned this pull request Jun 29, 2018

Mirror, mirror on the wall, which episode order is the FAIRest of them all? #6

Open

katrinleinweber pushed a commit that referenced this pull request Jul 1, 2018

Use only 1 inflammation dataset

adc05ed

Also move it to where the tidy data episode (see #5) will need it to display the same code as learners need for packaging the dataset.

katrinleinweber mentioned this pull request Jul 1, 2018

Auto-testing the data during tidying #7

Open

lwjohnst86 reviewed Jul 2, 2018

View reviewed changes

katrinleinweber added a commit to TIBHannover/2018-07-09-FAIR-Data-and-Software that referenced this pull request Jul 2, 2018

Align R setup with TIBHannover/FAIR-R#5 & previous

261fb9d

katrinleinweber mentioned this pull request Jul 4, 2018

Add positive example / joker episode for FAIR dataset remixes & some software best-practices TIBHannover/2018-07-09-FAIR-Data-and-Software#1

Merged

4 tasks

katrinleinweber force-pushed the 3-tidy-data branch from 7a0bdc1 to 08f6409 Compare July 4, 2018 14:18

Katrin Leinweber added 2 commits July 4, 2018 16:21

Mention usethis::use_roxygen_md()

5b9deca

Fix typos

188caaf

katrinleinweber force-pushed the 3-tidy-data branch from 08f6409 to 188caaf Compare July 4, 2018 14:22

katrinleinweber merged commit be6b96b into gh-pages Jul 4, 2018

katrinleinweber deleted the 3-tidy-data branch July 4, 2018 14:31

katrinleinweber added a commit that referenced this pull request Jul 4, 2018

Regenerate / make site after #5

8d73000

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tidy the inflammation dataset #5

Tidy the inflammation dataset #5

katrinleinweber commented Jun 20, 2018 •

edited

Loading

lwjohnst86 left a comment

lwjohnst86 Jun 26, 2018

lwjohnst86 Jun 26, 2018

lwjohnst86 Jun 26, 2018

katrinleinweber Jun 27, 2018

lwjohnst86 Jun 26, 2018

lwjohnst86 Jun 26, 2018

katrinleinweber Jun 27, 2018

lwjohnst86 Jun 26, 2018

katrinleinweber Jun 27, 2018 •

edited

Loading

lwjohnst86 Jun 26, 2018

katrinleinweber Jun 27, 2018

katrinleinweber Jun 29, 2018

katrinleinweber commented Jun 27, 2018

lwjohnst86 left a comment

lwjohnst86 Jul 2, 2018

katrinleinweber Jul 4, 2018

lwjohnst86 Jul 2, 2018

katrinleinweber Jul 4, 2018

lwjohnst86 Jul 2, 2018

katrinleinweber Jul 4, 2018 •

edited

Loading

lwjohnst86 Jul 2, 2018

katrinleinweber Jul 4, 2018

katrinleinweber commented Jul 4, 2018

Tidy the inflammation dataset #5

Tidy the inflammation dataset #5

Conversation

katrinleinweber commented Jun 20, 2018 • edited Loading

Ideas for this episode

lwjohnst86 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

katrinleinweber Jun 27, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

katrinleinweber commented Jun 27, 2018

lwjohnst86 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

katrinleinweber Jul 4, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

katrinleinweber commented Jul 4, 2018

katrinleinweber commented Jun 20, 2018 •

edited

Loading

katrinleinweber Jun 27, 2018 •

edited

Loading

katrinleinweber Jul 4, 2018 •

edited

Loading