Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Invalid multibyte string when printing certain combination of characters with fs::path() #384

Open
giocomai opened this issue May 31, 2022 · 2 comments
Labels
bug an unexpected problem or unintended behavior

Comments

@giocomai
Copy link

giocomai commented May 31, 2022

This is an issue that emerges when a certain, admittedly unusual, combination of valid characters is printed on the console via fs::path(). The shortest combination I found is the following (with the relevant error):

fs::path("śl")

Error in base::nchar(x, type, allowNA, keepNA) :
invalid multibyte string, element 1

The issue does not seem to be related to the ś itself, as the following works nicely:

fs::path("ś")

Obvious solutions such as the following all return the same error:

fs::path(stringi::stri_enc_toutf8("śl"))
fs::path(stringi::stri_enc_tonative("śl"))
fs::path(fs::path_sanitize("śl"))

Error in base::nchar(x, type, allowNA, keepNA) :
invalid multibyte string, element 1

If you look at the reprex, all seemingly works fine. But if I run the same code on the console, then I get the above error.

I include a reprex for reference, as well as a screenshot, since the issue is not fully visible via reprex.

In the real world, this issue emerges as I print fs::path() as a form to show advancement in a script that processes data related to a bunch of cities, including the Polish city of Przemyśl.

Of course there are ways around it, but since it broke my scripts, I still decided to report this. Tested with both the current development version (see reprex below) and version currently on CRAN (1.5.2)

library("fs")

# see e.g. https://en.wikipedia.org/wiki/Przemy%C5%9Bl
x <- "Przemyśl"

# works
print(x)
#> [1] "Przemyśl"

# throws error
fs::path(x)
#> Przemyśl

# throws error
fs::path(stringi::stri_enc_toutf8(x))
#> Przemyśl

# works
fs::path_sanitize(filename = x) 
#> [1] "Przemyśl"

# throws error
fs::path(fs::path_sanitize(filename = x))
#> Przemyśl

# works
y <- invisible(fs::path(x))

# works
stringr::str_c(y)
#> [1] "Przemyśl"

# works
filename <- fs::path(tempdir(), "Przemyśl.txt")

# throws error
fs::path(filename)
#> /tmp/RtmpzO4Dt3/Przemyśl.txt

# works
writeLines("test", fs::path(filename))

# works
fs::path("ś")
#> ś

# works
fs::path("Przemyś")
#> Przemyś

# throws error
fs::path("śl")
#> śl


devtools::session_info()
#> ─ Session info ───────────────────────────────────────────────────────────────
#>  setting  value
#>  version  R version 4.1.3 (2022-03-10)
#>  os       Fedora Linux 35 (Workstation Edition)
#>  system   x86_64, linux-gnu
#>  ui       X11
#>  language (EN)
#>  collate  en_IE.UTF-8
#>  ctype    en_IE.UTF-8
#>  tz       Europe/Vienna
#>  date     2022-05-31
#>  pandoc   2.14.0.3 @ /usr/libexec/rstudio/bin/pandoc/ (via rmarkdown)
#> 
#> ─ Packages ───────────────────────────────────────────────────────────────────
#>  package     * version    date (UTC) lib source
#>  brio          1.1.3      2021-11-30 [1] CRAN (R 4.1.2)
#>  cachem        1.0.6      2021-08-19 [1] CRAN (R 4.1.1)
#>  callr         3.7.0      2021-04-20 [1] CRAN (R 4.1.2)
#>  cli           3.3.0      2022-04-25 [1] CRAN (R 4.1.3)
#>  crayon        1.5.1      2022-03-26 [1] CRAN (R 4.1.2)
#>  desc          1.4.1      2022-03-06 [1] CRAN (R 4.1.2)
#>  devtools      2.4.3      2021-11-30 [1] CRAN (R 4.1.2)
#>  digest        0.6.29     2021-12-01 [1] CRAN (R 4.1.2)
#>  ellipsis      0.3.2      2021-04-29 [2] CRAN (R 4.1.0)
#>  evaluate      0.15       2022-02-18 [1] CRAN (R 4.1.2)
#>  fansi         1.0.3      2022-03-24 [1] CRAN (R 4.1.2)
#>  fastmap       1.1.0      2021-01-25 [2] CRAN (R 4.1.0)
#>  fs          * 1.5.2.9000 2022-05-31 [1] Github (r-lib/fs@e7d98c4)
#>  glue          1.6.2      2022-02-24 [1] CRAN (R 4.1.2)
#>  highr         0.9        2021-04-16 [3] CRAN (R 4.1.0)
#>  htmltools     0.5.2      2021-08-25 [1] CRAN (R 4.1.1)
#>  knitr         1.39       2022-04-26 [1] CRAN (R 4.1.3)
#>  lifecycle     1.0.1      2021-09-24 [1] CRAN (R 4.1.1)
#>  magrittr      2.0.3      2022-03-30 [1] CRAN (R 4.1.3)
#>  memoise       2.0.1      2021-11-26 [1] CRAN (R 4.1.2)
#>  pillar        1.7.0      2022-02-01 [1] CRAN (R 4.1.2)
#>  pkgbuild      1.3.1      2021-12-20 [1] CRAN (R 4.1.2)
#>  pkgconfig     2.0.3      2019-09-22 [1] CRAN (R 4.1.1)
#>  pkgload       1.2.4      2021-11-30 [1] CRAN (R 4.1.2)
#>  prettyunits   1.1.1      2020-01-24 [1] CRAN (R 4.1.1)
#>  processx      3.5.2      2021-04-30 [2] CRAN (R 4.1.0)
#>  ps            1.6.0      2021-02-28 [2] CRAN (R 4.1.0)
#>  purrr         0.3.4      2020-04-17 [1] CRAN (R 4.1.2)
#>  R.cache       0.15.0     2021-04-30 [3] CRAN (R 4.1.0)
#>  R.methodsS3   1.8.1      2020-08-26 [3] CRAN (R 4.1.0)
#>  R.oo          1.24.0     2020-08-26 [3] CRAN (R 4.1.0)
#>  R.utils       2.11.0     2021-09-26 [1] CRAN (R 4.1.2)
#>  R6            2.5.1      2021-08-19 [1] CRAN (R 4.1.1)
#>  remotes       2.4.2      2021-11-30 [1] CRAN (R 4.1.2)
#>  reprex        2.0.1      2021-08-05 [1] CRAN (R 4.1.1)
#>  rlang         1.0.2      2022-03-04 [1] CRAN (R 4.1.2)
#>  rmarkdown     2.14       2022-04-25 [1] CRAN (R 4.1.3)
#>  rprojroot     2.0.3      2022-04-02 [1] CRAN (R 4.1.3)
#>  rstudioapi    0.13       2020-11-12 [1] CRAN (R 4.1.2)
#>  sessioninfo   1.2.2      2021-12-06 [1] CRAN (R 4.1.2)
#>  stringi       1.7.6      2021-11-29 [1] CRAN (R 4.1.2)
#>  stringr       1.4.0      2019-02-10 [1] CRAN (R 4.1.2)
#>  styler        1.7.0      2022-03-13 [1] CRAN (R 4.1.2)
#>  testthat      3.1.4      2022-04-26 [1] CRAN (R 4.1.3)
#>  tibble        3.1.7      2022-05-03 [1] CRAN (R 4.1.3)
#>  usethis       2.1.6      2022-05-25 [1] CRAN (R 4.1.3)
#>  utf8          1.2.2      2021-07-24 [1] CRAN (R 4.1.1)
#>  vctrs         0.4.1      2022-04-13 [1] CRAN (R 4.1.3)
#>  withr         2.5.0      2022-03-03 [1] CRAN (R 4.1.2)
#>  xfun          0.31       2022-05-10 [1] CRAN (R 4.1.3)
#>  yaml          2.3.5      2022-02-21 [1] CRAN (R 4.1.2)
#> 
#>  [1] /home/g/R/x86_64-redhat-linux-gnu-library/4.1
#>  [2] /usr/lib64/R/library
#>  [3] /usr/share/R/library
#> 
#> ──────────────────────────────────────────────────────────────────────────────

reprex_01
reprex_02

Created on 2022-05-31 by the reprex package (v2.0.1)

@gaborcsardi gaborcsardi added the bug an unexpected problem or unintended behavior label Jun 1, 2022
@gaborcsardi
Copy link
Member

There is a chance that R 4.2.0 fixes this, if you need a workaround now.

@giocomai
Copy link
Author

giocomai commented Jun 1, 2022

Thanks! yes, ultimately, even wrapping it in stringr::str_c() fixes it, so there's plenty of workarounds!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug an unexpected problem or unintended behavior
Projects
None yet
Development

No branches or pull requests

2 participants