1 year ago
#212307
cookie1986
Avoiding HTTP "too many requests" error when using SPARQLWrapper and Wikidata
I have a list of approximately 6k wikidata instance IDs (beginning Q#####) I want to look up the human-readable labels for. I am not too familiar with SPARQL, but following some guidelines have managed to find a query that works for a single ID.
from SPARQLWrapper import SPARQLWrapper, JSON
query = """
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX wd: <http://www.wikidata.org/entity/>
SELECT *
WHERE {
wd: Q##### rdfs:label ?label .
FILTER (langMatches( lang(?label), "EN" ) )
}
LIMIT 1
"""
sparql = SPARQLWrapper("http://query.wikidata.org/sparql")
sparql.setQuery(query)
sparql.setReturnFormat(JSON)
output = sparql.query().convert()
I had hoped that iterating over a list of IDs would be as simple as putting the IDs in a dataframe and using the apply function...
ids_DF['label'] = ids_DF['instance_id'].apply(my_query_function)
... However, when I do that it errors out with a "HTTPError: Too Many Requests" warning. Looking into the documentation, specifically the query limits section, it says the following:
Query limits
There is a hard query deadline configured which is set to 60 seconds. There are also following limits:
One client (user agent + IP) is allowed 60 seconds of processing time each 60 seconds
One client is allowed 30 error queries per minute
I'm unsure how to go about resolving this. Am I looking to run 6k error queries (i'm unsure what an error query even is)? In which case I presumably need to run them in batches to avoid going over the 30 second window.
My first attempt to resolve this was been to put a delay of 2 seconds after each query (see third from last line below). I noticed that each instance ID was taking approximately 1 second to return a value so my thinking was that a delay would boost the amount of time taken to 3 seconds (which should comfortably keep me within the limit). However, that still returns the same error. I've tried extending this sleep period as well, with the same results.
from SPARQLWrapper import SPARQLWrapper, JSON
query = """
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX wd: <http://www.wikidata.org/entity/>
SELECT *
WHERE {
wd: Q##### rdfs:label ?label .
FILTER (langMatches( lang(?label), "EN" ) )
}
LIMIT 1
"""
sparql = SPARQLWrapper("http://query.wikidata.org/sparql")
sparql.setQuery(query)
time.sleep(2) # imported from time
sparql.setReturnFormat(JSON)
output = sparql.query().convert()
A similar question on this topic was asked here but I've not been able to follow the advice given.
http
sparql
wikidata
http-status-code-429
sparqlwrapper
0 Answers
Your Answer