-
Notifications
You must be signed in to change notification settings - Fork 52
What is size and limit? What decides how many results I get? #79
Comments
Coincidence! I asked the same thing about Pushshift here: https://www.reddit.com/r/pushshift/comments/ih66b8/difference_between_size_and_limit_and_are_they/ |
Hi, I have investigated the source code and found a possible issue. PushiftAPI.py lines 197 to 218
This tries to perform as many request as needed for retrieving all of the desired data. The meaning of Limit for PSAW is different from the limits of the Pushshift API in the sense that PSAW tries multiple fetches to get close to the desired limit. The Pushshift API however will just take it as a suggestion for the current request. Therefore, it is calculated how many batches of "max_results_per_request" size are needed. This is then given to the pushshift API as "limit". The issue is that it is not checked if "max_results_per_request" entries are actually returned by the API. The current default is 1000, which is an earlier max size the API will return. However, now it is 100. This means that the API will only 100 entries when PSAW thinks 1000 are returned. My suggestion: Implement a check if the API returns the expected amount of entries and if not increase the "limit" variable by the missing amount. I have some code for that already, will post it tomorrow. Also: Why is there Without "limit" PSAW will ignore "max_results_per_request" and just return whatever the API defaults to. I hope my analysis helps :) |
If I do a query such as
gen = api.search_submissions(score=">100", limit=1000)
then I get 100 results. How do I get as many as I specify?The text was updated successfully, but these errors were encountered: