Avoiding HTTP "too many requests" error when using SPARQLWra - Enhance your coding expertise with cookie1986 on @onlycoders.net

2 years ago

#212307

cookie1986

Avoiding HTTP "too many requests" error when using SPARQLWrapper and Wikidata

I have a list of approximately 6k wikidata instance IDs (beginning Q#####) I want to look up the human-readable labels for. I am not too familiar with SPARQL, but following some guidelines have managed to find a query that works for a single ID.

from SPARQLWrapper import SPARQLWrapper, JSON

query = """
    PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
    PREFIX wd: <http://www.wikidata.org/entity/>
    SELECT *
    WHERE {
            wd: Q##### rdfs:label ?label .
            FILTER (langMatches( lang(?label), "EN" ) )
          }
    LIMIT 1
    """

sparql = SPARQLWrapper("http://query.wikidata.org/sparql")
sparql.setQuery(query)
sparql.setReturnFormat(JSON)
output = sparql.query().convert()

I had hoped that iterating over a list of IDs would be as simple as putting the IDs in a dataframe and using the apply function...

ids_DF['label'] = ids_DF['instance_id'].apply(my_query_function)

... However, when I do that it errors out with a "HTTPError: Too Many Requests" warning. Looking into the documentation, specifically the query limits section, it says the following:

Query limits

There is a hard query deadline configured which is set to 60 seconds. There are also following limits:

One client (user agent + IP) is allowed 60 seconds of processing time each 60 seconds

One client is allowed 30 error queries per minute

I'm unsure how to go about resolving this. Am I looking to run 6k error queries (i'm unsure what an error query even is)? In which case I presumably need to run them in batches to avoid going over the 30 second window.

My first attempt to resolve this was been to put a delay of 2 seconds after each query (see third from last line below). I noticed that each instance ID was taking approximately 1 second to return a value so my thinking was that a delay would boost the amount of time taken to 3 seconds (which should comfortably keep me within the limit). However, that still returns the same error. I've tried extending this sleep period as well, with the same results.

from SPARQLWrapper import SPARQLWrapper, JSON

query = """
    PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
    PREFIX wd: <http://www.wikidata.org/entity/>
    SELECT *
    WHERE {
            wd: Q##### rdfs:label ?label .
            FILTER (langMatches( lang(?label), "EN" ) )
          }
    LIMIT 1
    """

sparql = SPARQLWrapper("http://query.wikidata.org/sparql")
sparql.setQuery(query)
time.sleep(2) # imported from time
sparql.setReturnFormat(JSON)
output = sparql.query().convert()

A similar question on this topic was asked here but I've not been able to follow the advice given.

http

sparql

wikidata

http-status-code-429

sparqlwrapper

0 Answers

Your Answer

Posts

Questions

Blogs

Jobs

Avoiding HTTP &quot;too many requests&quot; error when using SPARQLWrapper and Wikidata

Avoiding HTTP "too many requests" error when using SPARQLWrapper and Wikidata