Simplify response parsing #572

llrs · 2021-04-22T22:26:08Z

Simplify object parsing towards #558

Steps:

Internal test for objects
Simplify and understand code
Provide helpers

tweets_to_tbl_ is doing the internal tests and transforming the data.frame from X columns to Y columns for all these functions:

function/endpoint	From	To
get_retweets	24	73
get_timeline_user
lookup_tweets
lookup_users
get_favorites_user
get_timeline
lists_status
get_my_timeline
search_users
search_tweets
post_message
search_fullarchive
search_30day
tweet_shot

*Numbers from the test cases.

Probably a list of endpoints that supply each kind of object will be much useful.

llrs · 2021-04-30T22:44:56Z

Updating the table:

function/endpoint	From	To
get_retweets	24	73
get_timeline_user	31	73
lookup_tweets	26	73
lookup_users	44	20
get_favorites_user	30	73 + 1
get_timeline	31	73
lists_status	31	73
get_my_timeline	32	73
search_users	44	20
search_tweets	31	73

On some cases there is less columns when the content is parsed than when not (lookup_users, search_users)! This is due to the users_with_tweets function, but apparently the API is returning lots of data from deprecated arguments but misses some data from supported values.
I'll start improving the user-object

The 73 + 1 is the user added when parsed to know which user has liked which tweet.

llrs · 2021-05-05T07:57:20Z

Documenting progress, about blocking problems at the moment:

Tweets, and all responses, are now parsed based on the objects they contain.
The latest changes trigger a warning: "Setting row names on a tibble is deprecated.", although row names are not used in any place to parse tweets. It only happens on do.call("rbind", ) from tweets_with_users and users_with_tweets.
For instance out <- get_favorites(users, n = 20) triggers an error z[seq_len(nro), ] <- y : replacement has length zero on xpdrows.data.frame. Which is surprising and couldn't understand why or how it is triggered. I think it is related to having data.frames inside data.frames inside data.frames.
The new data.frame has much less columns and different names to keep consistent with the API, so functions (and the corresponding test) that depend on certain columns fail.
Due to the above problems, it hasn't been tested if the tweets object has the same columns regardless of the endpoint called (Is this desired?) .

llrs · 2021-06-27T15:52:15Z

Many changes, but still need to add a test for #574 (and didn't check for #575 but should be fixed too)

Added some test for different type of tweets so that rtweet returns the same information
The columns are on the same order
Localization and other objects are now correctly identified/used
Some reorganizing on network_data

Maybe some columns can be further nested or hidden on an attribute.
There isn't any helper to identify replies, comments, quotes and simply status.
Moving away of tibbles as nested tibbles cannot be rbinded without a warning and would mean adding a new direct dependency tidyverse/tibble#898

Closes #575

llrs · 2021-06-29T07:56:47Z

@hadley Maybe you can give a look at this PR. Probably the best way to start is on the tweet function and the new functions to parse twitter objects.

hadley

Looks like some great improvements overall! Questions/comments below.

R/entities_objects.R

hadley · 2021-07-04T21:09:49Z

R/entities_objects.R

+# <https://developer.twitter.com/en/docs/twitter-api/v1/data-dictionary/object-model/entities#hashtags>
+hashtags <- function(x) {
+  if (NROW(x) == 0) {
+    return(data.frame(text = NA, indices = I(list(NA)), 


Is it correct that a zero row input yields a one row output?

Yes, the reason is to not omit anything for later easier parsing and linking between hasthags and tweets. There is also the problem when a user hasn't tweeted if the information about the tweets of that users and other users are collected it would break the tweet_data extraction see #574 for such a report.

Right, I think you always want to return a data frame here, but are you sure it should have one row and not zero rows?

Yes I'm sure. What is your concern? Filling the object with mostly empty data?

Just generally reasoning about this function. I don't understand how all the pieces fit together but this feels like a code smell.

I made some tests and it seems that mixing NAs and data.frames would be correctly handled by rbind unless there are nested data.frames. I'll wait a bit to merge the PR but I can't think of any workaround that won't complicate more the workflow.

Tests

``` r l <- list(a = NA, b = data.frame(a = "b", b = "a"), c = data.frame(a = c("C", "D"), b = c("B", "C"))) do.call(rbind, l) #> a b #> a <NA> <NA> #> b b a #> c.1 C B #> c.2 D C l <- list(a = data.frame(a = "b", b = "a"), b = NA, c = data.frame(a = c("C", "D"), b = c("B", "C"))) do.call(rbind, l) #> a b #> a b a #> b <NA> <NA> #> c.1 C B #> c.2 D C l <- list(a = data.frame(a = "b", b = "a"), b = NA, c = data.frame(a = "C", b = "B")) do.call(rbind, l) #> a b #> a b a #> b <NA> <NA> #> c C B l <- list(a = data.frame(a = "b", b = "a"), b = data.frame(a = c("C", "D"), b = c("B", "C")), c = NA) do.call(rbind, l) #> a b #> a b a #> b.1 C B #> b.2 D C #> c <NA> <NA> library("tibble") l <- list(a = NA, b = tibble(a = "b", b = "a"), c = tibble(a = c("C", "D"), b = c("B", "C"))) do.call(rbind, l) #> # A tibble: 4 x 2 #> a b #> * <chr> <chr> #> 1 <NA> <NA> #> 2 b a #> 3 C B #> 4 D C l <- list(a = tibble(a = "b", b = "a"), b = NA, c = tibble(a = c("C", "D"), b = c("B", "C"))) do.call(rbind, l) #> # A tibble: 4 x 2 #> a b #> * <chr> <chr> #> 1 b a #> 2 <NA> <NA> #> 3 C B #> 4 D C l <- list(a = tibble(a = "b", b = "a"), b = NA, c = tibble(a = "C", b = "B")) do.call(rbind, l) #> # A tibble: 3 x 2 #> a b #> * <chr> <chr> #> 1 b a #> 2 <NA> <NA> #> 3 C B l <- list(a = tibble(a = "b", b = "a"), b = tibble(a = c("C", "D"), b = c("B", "C")), c = NA) do.call(rbind, l) #> # A tibble: 4 x 2 #> a b #> * <chr> <chr> #> 1 b a #> 2 C B #> 3 D C #> 4 <NA> <NA> l <- list(a = NA, b = data.frame(a = I(data.frame(c = 1)), b = "a"), c = data.frame(a = I(data.frame(c = c(2, 3))), b = c("B", "C"))) do.call(rbind, l) #> Warning: non-unique value when setting 'row.names': '1' #> Error in `.rowNamesDF<-`(x, value = value): duplicate 'row.names' are not allowed l <- list(a = data.frame(a = I(data.frame(c = 1)), b = "a"), b = NA, c = data.frame(a = I(data.frame(c = c(2, 3))), b = c("B", "C"))) do.call(rbind, l) #> Warning: non-unique values when setting 'row.names': '1', '2' #> Error in `.rowNamesDF<-`(x, value = value): duplicate 'row.names' are not allowed l <- list(a = data.frame(a = I(data.frame(c = 1)), b = "a"), b = NA, c = data.frame(a = I(data.frame(c = c(2))), b = "B")) do.call(rbind, l) #> Warning: non-unique value when setting 'row.names': '1' #> Error in `.rowNamesDF<-`(x, value = value): duplicate 'row.names' are not allowed l <- list(a = data.frame(a = I(data.frame(c = 1)), b = "a"), b = data.frame(a = I(data.frame(c = c(2, 3))), b = c("B", "C")), c = NA) do.call(rbind, l) #> Warning: non-unique value when setting 'row.names': '1' #> Error in `.rowNamesDF<-`(x, value = value): duplicate 'row.names' are not allowed l <- list(a = NA, b = tibble(a = I(data.frame(c = 1)), b = "a"), c = tibble(a = I(data.frame(c = c(2, 3))), b = c("B", "C"))) do.call(rbind, l) #> Warning: non-unique value when setting 'row.names': '1' #> Error in `.rowNamesDF<-`(x, value = value): duplicate 'row.names' are not allowed l <- list(a = tibble(a = I(data.frame(c = 1)), b = "a"), b = NA, c = tibble(a = I(data.frame(c = c(2, 3))), b = c("B", "C"))) do.call(rbind, l) #> Warning: non-unique values when setting 'row.names': '1', '2' #> Error in `.rowNamesDF<-`(x, value = value): duplicate 'row.names' are not allowed l <- list(a = tibble(a = I(data.frame(c = 1)), b = "a"), b = NA, c = tibble(a = I(data.frame(c = c(2))), b = "B")) do.call(rbind, l) #> Warning: non-unique value when setting 'row.names': '1' #> Error in `.rowNamesDF<-`(x, value = value): duplicate 'row.names' are not allowed l <- list(a = tibble(a = I(data.frame(c = 1)), b = "a"), b = tibble(a = I(data.frame(c = c(2, 3))), b = c("B", "C")), c = NA) do.call(rbind, l) #> Warning: non-unique value when setting 'row.names': '1' #> Error in `.rowNamesDF<-`(x, value = value): duplicate 'row.names' are not allowed

^{Created on 2021-07-13 by the reprex package (v2.0.0)}

I'm probably missing something, but I don't see you creating an zero row data frames in your tests? But I don't really understand how all of this data flows together, and there's likely to be something upstream that means my suggestion wouldn't work.

R/graph-network.R

hadley · 2021-07-04T21:15:30Z

R/utils.R

@@ -100,7 +104,7 @@ maybe_n <- function(x) {
 }

 is_testing <- function() {
-  identical(Sys.getenv("TESTTHAT"), "true")  
+  requireNamespace("testthat", quietly = TRUE) && identical(Sys.getenv("TESTTHAT"), "true")


That check shouldn't be needed since only testthat should be setting the envvar.

It should because later on there are some function calling testthat function inside the package (skip_*). I swapped the order though so that if the variable is not present the package won't be loaded.

I don't think it's a good idea to change is_testing() because it's a standard helper you'd generally expect to be copied directly from testthat. Where is skip() being used?

There are two cases where this happens; on http.R lines 309-312:

handle_rate_limit <- function(x, api, retryonratelimit = NULL, verbose = TRUE) { if (is_testing()) { testthat::skip("Rate limit exceeded") } ....

and auth.R lines 273-275

no_token <- function() { if (is_testing()) { testthat::skip("Auth not available") ...

I could move the check outside the function, but I thought it was more accurately to leave it inside it.

tests/testthat/test-entities_objects.R

tests/testthat/test-extractors.R

llrs · 2021-07-06T15:12:23Z

Many thanks for all the comments. Will try to reply and address them soon (1 or 2 weeks probably).

Maybe I forgot to commit this??

mkearney · 2021-10-20T16:08:20Z

I'm moving this comment to #558 because it's more generally about the changes and less about implementation.

Use new helper function to simplify double calls to has_name_

80f98b6

llrs changed the title ~~Use new helper function to simplify double calls to has_name_~~ Simplify response parsing Apr 22, 2021

This was referenced Apr 24, 2021

lookup_users error with output for one specific twitter handle #574

Closed

get_timeline doesn't gather whole texts #575

Closed

llrs added 2 commits April 25, 2021 13:38

Setup some function for objects and their tests

d5ea248

shortening for easier read

b92854a

llrs mentioned this pull request Apr 28, 2021

Get Twitter poll data? #576

Closed

llrs added 4 commits April 29, 2021 22:41

Parse polls entity object

ef03c59

Allow to pass some other optional arguments

6863bee

Add undocumented parameter to get the most data of each query

1645492

Fix some errors

db8622d

llrs mentioned this pull request Apr 29, 2021

Octotree tree on draft pull requests. ovity/octotree#1073

Closed

llrs added 8 commits May 2, 2021 16:38

Subtitute users_to_tbl_ by object centered user function

804a53e

Include alt text by default

4a6a713

Changing name to match the API

6d5f866

fix indentation

eb15f0b

Check input just in case

9e73770

Parsing the object

0db826f

Simplifying the process

d68f911

Adapting test: code now keeps API name

c3346e5

llrs added 6 commits May 6, 2021 20:56

Fixes warnings and all major issues with the objects

c6db708

Look for other columns

4dec1ac

Fix some tests

c032cc0

Specific checks on different type of tweets

4c6de77

Fix code parsing objects

0fddcad

Work in progress to rework network_data

d8b8ae8

llrs mentioned this pull request May 27, 2021

Pass token to lookup_users on post_message #583

Merged

Merged upstream/master into tweets2tbl

6fe4879

llrs added 12 commits June 27, 2021 14:02

Better handling of indices

372a402

Handle all different values a tweet can have

260ea97

Move the function

334fba8

No usage of tibbles

afb926a

Improved testing on different type of tweets

744cf1e

Still evaluating if this function is worth keeping

0ad32c8

Testing network_data with new structure

c02590d

testing only for data.frames

5036097

Remove unused functions, improve documentation

7d03f3d

Not load devtools

f637c23

Update documentation

f0ebcd4

fix internals and renaming variables

81f650a

llrs marked this pull request as ready for review June 27, 2021 15:40

llrs added 2 commits June 28, 2021 22:43

Closes #574

56bc0ff

Add test to timeline

2d1806f

Closes #575

Remove some no longer needed functions

c92bd1b

hadley reviewed Jul 4, 2021

View reviewed changes

Fix, follow Hadley review comments

90c0916

llrs mentioned this pull request Jul 12, 2021

Adding blocking and unblocking #593

Merged

llrs added 2 commits July 12, 2021 22:36

Move symbols function closer to hasthags and explain it

8476987

Move data inside test_that calls

d9dd060

Maybe I forgot to commit this??

This was referenced Jul 13, 2021

get_friends() and get_followers() return objects with different column names. #594

Closed

lookup_users() parsing fails on users with no tweets #597

Closed

llrs merged commit 3b99d79 into master Jul 19, 2021

llrs deleted the tweets2tbl branch July 19, 2021 19:10

simonheb mentioned this pull request Oct 4, 2021

Reduce tweet fields returned by default #558

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Simplify response parsing #572

Simplify response parsing #572

llrs commented Apr 22, 2021 •

edited

Loading

llrs commented Apr 30, 2021

llrs commented May 5, 2021

llrs commented Jun 27, 2021

llrs commented Jun 29, 2021

hadley left a comment

hadley Jul 4, 2021

llrs Jul 10, 2021

hadley Jul 11, 2021

llrs Jul 12, 2021

hadley Jul 12, 2021

llrs Jul 12, 2021

hadley Jul 12, 2021

hadley Jul 4, 2021

llrs Jul 10, 2021

hadley Jul 11, 2021

llrs Jul 12, 2021

llrs commented Jul 6, 2021

mkearney commented Oct 20, 2021 •

edited

Loading

Simplify response parsing #572

Simplify response parsing #572

Conversation

llrs commented Apr 22, 2021 • edited Loading

llrs commented Apr 30, 2021

llrs commented May 5, 2021

llrs commented Jun 27, 2021

llrs commented Jun 29, 2021

hadley left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

llrs commented Jul 6, 2021

mkearney commented Oct 20, 2021 • edited Loading

llrs commented Apr 22, 2021 •

edited

Loading

mkearney commented Oct 20, 2021 •

edited

Loading