The search_tweets function does not retrieve the number of tweets set in "n =" argument. #442

kshtwork · 2020-09-02T10:58:25Z

I have an issue with the search_tweets (or search_tweets2) function of rtweet package. Not sure, whether the problem is rtweet dependent, though.

This is my code, I want to retrieve 50K tweets with certain keywords:

 pl_hrb <- search_tweets2(
+   c (" herbal OR Herbal OR yog* OR Yog*  "),
+   n = 50000, retryonratelimit = TRUE, lang = "en"
+ )

Normally, it would download the first 18K tweets, then wait (due to the retryonratelimit = TRUE argument), then download another 18K tweets, then wait again and then download the rest. But recently, the behavior of the function became unpredictable, with sudden drops in the middle of the process. No matter what I do, only rarely it completes the whole cycle. Most often it does not even complete the first round, dropping the whole process and writing only about 9K tweets to the variable pl_hrb.
Here are the rtweet version and the outputs (four random attempts):

### rtweet version
> packageVersion("rtweet") 
[1] ‘0.7.0’

Crash 1:


Downloading [>----------------------------------------]   2%
>

Crash 2:

Downloading [======>----------------------------------]   24%
>

Almost done:

Downloading [=========================================] 100%
retry on rate limit...
waiting about 5 minutes...
Downloading [=========================================] 100%
retry on rate limit...
waiting about 12 minutes...
Downloading [===================>---------------------]  50%
>

Success:

Downloading [=========================================] 100%
retry on rate limit...
waiting about 10 minutes...
Downloading [=========================================] 100%
retry on rate limit...
waiting about 15 minutes...
Downloading [=========================================]  100%
>

Session info:

> sessionInfo()
R version 3.6.3 (2020-02-29)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19041)

Matrix products: default

Then, after every failed retrieve attempt I get a different number of rows (tweets, observations) in my output dataframe - sometimes it is only 1.5K, sometimes it is about 40K and often something in between.

The create_token function seems to work seamlessly, the appropriate environment is created and the access to the Twitter API is granted.

I've asked the same question on the SOF forum last week but got no response, not even a note so far. This is very unusual for this forum, where people get answers to their questions in minutes. Also, I don't believe I am the first to obtain this error.
I want to address this issue to this community and to ask for help / solution.

Thank you in advance.

The text was updated successfully, but these errors were encountered:

nkandrian · 2020-09-02T11:22:51Z

It seems like I have a similar problem: in my case, I am trying to pull user data with the following code:

# create a log to document how many retweets were pulled
pull.log = 0

# create a log to document number of requests sent
requests.log = 0

# pull data for every user id (users is a tibble with one row of user_ids)
accounts = users$user_id %>%
  map_df(function(x){
    rl<- rtweet::rate_limit(query = "lookup_users")
    # request user info as long as there are remaining requests
    if(rl$remaining > 0){
      message("Requests remaining: ", rl$remaining)
      output = lookup_users(x)
      pull.log <<- pull.log+nrow(output)
      requests.log <<- requests.log + 1
    } else {
      # adjust sleep time how long (in s) for rate limit to reset
      message("Completed ", requests.log, " requests. Time: ", Sys.time(), ". Sleeping for 15 minutes...")
      Sys.sleep(60*15)
      message("Resuming at ", Sys.time())
      output = lookup_users(x)
      pull.log <<- 0
    }
    return(output)
  }
)

Rate limit for get_users is 900, but the function usually stops at a seemingly random amount of requests before that. Example:

...
Requests remaining: 727
Requests remaining: 726
Requests remaining: 725
Requests remaining: 724
Requests remaining: 723
Requests remaining: 722
Warning: Rate limit exceeded - 88
Error in if (rl$remaining > 0) { : argument is of length zero

In the 15 minute - interval after that, I cannot call the rate_limit function:

> rtweet::rate_limit()
Warning: Rate limit exceeded - 88
data frame with 0 columns and 0 rows

I am wondering if this has to do with the ongoing rebuild of the Twitter API: https://developer.twitter.com/en/products/twitter-api/early-access

My rtweet-version and R version are identical,

Thanks in advance!

Arf9999 · 2020-09-02T17:22:16Z

It seems like I have a similar problem: in my case, I am trying to pull user data with the following code:
# create a log to document how many retweets were pulled
pull.log = 0

# create a log to document number of requests sent
requests.log = 0

# pull data for every user id (users is a tibble with one row of user_ids)
accounts = users$user_id %>%
  map_df(function(x){
    rl<- rtweet::rate_limit(query = "lookup_users")
    # request user info as long as there are remaining requests
    if(rl$remaining > 0){
      message("Requests remaining: ", rl$remaining)
      output = lookup_users(x)
      pull.log <<- pull.log+nrow(output)
      requests.log <<- requests.log + 1
    } else {
      # adjust sleep time how long (in s) for rate limit to reset
      message("Completed ", requests.log, " requests. Time: ", Sys.time(), ". Sleeping for 15 minutes...")
      Sys.sleep(60*15)
      message("Resuming at ", Sys.time())
      output = lookup_users(x)
      pull.log <<- 0
    }
    return(output)
  }
)
Rate limit for get_users is 900, but the function usually stops at a seemingly random amount of requests before that. Example:
...
Requests remaining: 727
Requests remaining: 726
Requests remaining: 725
Requests remaining: 724
Requests remaining: 723
Requests remaining: 722
Warning: Rate limit exceeded - 88
Error in if (rl$remaining > 0) { : argument is of length zero
In the 15 minute - interval after that, I cannot call the rate_limit function:
> rtweet::rate_limit()
Warning: Rate limit exceeded - 88
data frame with 0 columns and 0 rows
I am wondering if this has to do with the ongoing rebuild of the Twitter API: https://developer.twitter.com/en/products/twitter-api/early-access

My rtweet-version and R version are identical,

Thanks in advance!
It seems likely that the rate limit that has been exceeded in this case is the call to check the rate limit itself.

JNavelski · 2020-09-11T18:13:02Z

I also have this problem.

jeffcsauer · 2020-09-18T19:40:17Z

Bumping, example code:

rt <- search_tweets(
  "lang:en", 
  geocode = "38.987202,-76.945999,1mi", # do not include spaces
  retryonratelimit = TRUE,
  n = 10000
)

Stops anywhere from 10-25%, have yet to get a successful pull.

hadley · 2021-02-27T13:27:35Z

Now tracking in #510

llrs mentioned this issue Feb 15, 2021

Update roadmap #471

Closed

llrs added bug location labels Feb 17, 2021

hadley closed this as completed Feb 27, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The search_tweets function does not retrieve the number of tweets set in "n =" argument. #442

The search_tweets function does not retrieve the number of tweets set in "n =" argument. #442

kshtwork commented Sep 2, 2020

nkandrian commented Sep 2, 2020 •

edited

Loading

Arf9999 commented Sep 2, 2020

JNavelski commented Sep 11, 2020

jeffcsauer commented Sep 18, 2020

hadley commented Feb 27, 2021

The search_tweets function does not retrieve the number of tweets set in "n =" argument. #442

The search_tweets function does not retrieve the number of tweets set in "n =" argument. #442

Comments

kshtwork commented Sep 2, 2020

nkandrian commented Sep 2, 2020 • edited Loading

Arf9999 commented Sep 2, 2020

JNavelski commented Sep 11, 2020

jeffcsauer commented Sep 18, 2020

hadley commented Feb 27, 2021

nkandrian commented Sep 2, 2020 •

edited

Loading