Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The search_tweets function does not retrieve the number of tweets set in "n =" argument. #442

Closed
kshtwork opened this issue Sep 2, 2020 · 5 comments

Comments

@kshtwork
Copy link

kshtwork commented Sep 2, 2020

I have an issue with the search_tweets (or search_tweets2) function of rtweet package. Not sure, whether the problem is rtweet dependent, though.

This is my code, I want to retrieve 50K tweets with certain keywords:

 pl_hrb <- search_tweets2(
+   c (" herbal OR Herbal OR yog* OR Yog*  "),
+   n = 50000, retryonratelimit = TRUE, lang = "en"
+ )

Normally, it would download the first 18K tweets, then wait (due to the retryonratelimit = TRUE argument), then download another 18K tweets, then wait again and then download the rest. But recently, the behavior of the function became unpredictable, with sudden drops in the middle of the process. No matter what I do, only rarely it completes the whole cycle. Most often it does not even complete the first round, dropping the whole process and writing only about 9K tweets to the variable pl_hrb.
Here are the rtweet version and the outputs (four random attempts):

### rtweet version
> packageVersion("rtweet") 
[1] ‘0.7.0’

Crash 1:


Downloading [>----------------------------------------]   2%
> 

Crash 2:

Downloading [======>----------------------------------]   24%
> 

Almost done:

Downloading [=========================================] 100%
retry on rate limit...
waiting about 5 minutes...
Downloading [=========================================] 100%
retry on rate limit...
waiting about 12 minutes...
Downloading [===================>---------------------]  50%
>

Success:

Downloading [=========================================] 100%
retry on rate limit...
waiting about 10 minutes...
Downloading [=========================================] 100%
retry on rate limit...
waiting about 15 minutes...
Downloading [=========================================]  100%
>

Session info:

> sessionInfo()
R version 3.6.3 (2020-02-29)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19041)

Matrix products: default

Then, after every failed retrieve attempt I get a different number of rows (tweets, observations) in my output dataframe - sometimes it is only 1.5K, sometimes it is about 40K and often something in between.

The create_token function seems to work seamlessly, the appropriate environment is created and the access to the Twitter API is granted.

I've asked the same question on the SOF forum last week but got no response, not even a note so far. This is very unusual for this forum, where people get answers to their questions in minutes. Also, I don't believe I am the first to obtain this error.
I want to address this issue to this community and to ask for help / solution.

Thank you in advance.

@nkandrian
Copy link

nkandrian commented Sep 2, 2020

It seems like I have a similar problem: in my case, I am trying to pull user data with the following code:

# create a log to document how many retweets were pulled
pull.log = 0

# create a log to document number of requests sent
requests.log = 0

# pull data for every user id (users is a tibble with one row of user_ids)
accounts = users$user_id %>%
  map_df(function(x){
    rl<- rtweet::rate_limit(query = "lookup_users")
    # request user info as long as there are remaining requests
    if(rl$remaining > 0){
      message("Requests remaining: ", rl$remaining)
      output = lookup_users(x)
      pull.log <<- pull.log+nrow(output)
      requests.log <<- requests.log + 1
    } else {
      # adjust sleep time how long (in s) for rate limit to reset
      message("Completed ", requests.log, " requests. Time: ", Sys.time(), ". Sleeping for 15 minutes...")
      Sys.sleep(60*15)
      message("Resuming at ", Sys.time())
      output = lookup_users(x)
      pull.log <<- 0
    }
    return(output)
  }
)

Rate limit for get_users is 900, but the function usually stops at a seemingly random amount of requests before that. Example:

...
Requests remaining: 727
Requests remaining: 726
Requests remaining: 725
Requests remaining: 724
Requests remaining: 723
Requests remaining: 722
Warning: Rate limit exceeded - 88
Error in if (rl$remaining > 0) { : argument is of length zero

In the 15 minute - interval after that, I cannot call the rate_limit function:

> rtweet::rate_limit()
Warning: Rate limit exceeded - 88
data frame with 0 columns and 0 rows

I am wondering if this has to do with the ongoing rebuild of the Twitter API: https://developer.twitter.com/en/products/twitter-api/early-access

My rtweet-version and R version are identical,

Thanks in advance!

@Arf9999
Copy link

Arf9999 commented Sep 2, 2020

It seems like I have a similar problem: in my case, I am trying to pull user data with the following code:

# create a log to document how many retweets were pulled
pull.log = 0

# create a log to document number of requests sent
requests.log = 0

# pull data for every user id (users is a tibble with one row of user_ids)
accounts = users$user_id %>%
  map_df(function(x){
    rl<- rtweet::rate_limit(query = "lookup_users")
    # request user info as long as there are remaining requests
    if(rl$remaining > 0){
      message("Requests remaining: ", rl$remaining)
      output = lookup_users(x)
      pull.log <<- pull.log+nrow(output)
      requests.log <<- requests.log + 1
    } else {
      # adjust sleep time how long (in s) for rate limit to reset
      message("Completed ", requests.log, " requests. Time: ", Sys.time(), ". Sleeping for 15 minutes...")
      Sys.sleep(60*15)
      message("Resuming at ", Sys.time())
      output = lookup_users(x)
      pull.log <<- 0
    }
    return(output)
  }
)

Rate limit for get_users is 900, but the function usually stops at a seemingly random amount of requests before that. Example:

...
Requests remaining: 727
Requests remaining: 726
Requests remaining: 725
Requests remaining: 724
Requests remaining: 723
Requests remaining: 722
Warning: Rate limit exceeded - 88
Error in if (rl$remaining > 0) { : argument is of length zero

In the 15 minute - interval after that, I cannot call the rate_limit function:

> rtweet::rate_limit()
Warning: Rate limit exceeded - 88
data frame with 0 columns and 0 rows

I am wondering if this has to do with the ongoing rebuild of the Twitter API: https://developer.twitter.com/en/products/twitter-api/early-access

My rtweet-version and R version are identical,

Thanks in advance!
It seems likely that the rate limit that has been exceeded in this case is the call to check the rate limit itself.

@JNavelski
Copy link

I also have this problem.

@jeffcsauer
Copy link

Bumping, example code:

rt <- search_tweets(
  "lang:en", 
  geocode = "38.987202,-76.945999,1mi", # do not include spaces
  retryonratelimit = TRUE,
  n = 10000
)

Stops anywhere from 10-25%, have yet to get a successful pull.

@hadley
Copy link
Member

hadley commented Feb 27, 2021

Now tracking in #510

@hadley hadley closed this as completed Feb 27, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants