1 year ago
#298566
uomo_di_pietro
Strange spatial gaps in queries with DBpedia using Python SPARQL-Wrapper
I'm trying to query all the Wikipedia articles about places (have to be geolocated) in the United Kingdom. I'm using the SPARQL wrapper for python for my query to access the coordinates, article link, hierarchy and other metadata. and it looks like this:
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
SELECT DISTINCT ?id ?label ?link ?lat ?long ?cat_lab ?cat_lab2 ?nchar
WHERE {
?uri a dbo:Place .
?uri rdfs:label ?label . FILTER(lang(?label) = 'en') .
?uri dbo:wikiPageID ?id .
?uri rdf:type ?cat . FILTER (?cat LIKE <http://dbpedia.org/ontology/%>).
?cat rdfs:subClassOf ?cat2 . FILTER (?cat2 LIKE <http://dbpedia.org/ontology/%> AND
! ?cat2 LIKE <http://dbpedia.org/ontology/Place> AND
! ?cat2 LIKE <http://dbpedia.org/ontology/Location>) .
?cat rdfs:label ?cat_lab . FILTER(lang(?cat_lab) = 'en')
?cat2 rdfs:label ?cat_lab2 . FILTER(lang(?cat_lab2) = 'en')
?uri geo:lat ?lat .
?uri geo:long ?long .
?uri dbo:wikiPageLength ?nchar .
?uri prov:wasDerivedFrom ?link .
FILTER(?long >= -1.1 AND ?long <= 1.8 AND ?lat >= 51.1 AND ?lat <= 54.27)
}
LIMIT 10000
OFFSET 0
I query the data by changing the offset of my query in steps of 10'000 (b.c. of the query limit of 10'000 records per query) and then append them to a single data frame. This works fine, though I get a lot of duplicate records, but that's another issue.
However, when I look at the data plotted on a map it appears that the records are incomplete as there are two very distinctive stripes devoid of any records across the whole study area. As it is unlikely that this is the normal spatial distribution of the data and I suspect it has to do with way the database is queried.
Study area with the two stripes of missing data (each dot is a geo-located wiki article)
If I change the extent of the queried spatial bounds to a smaller one, the stripes persist but appear in a different place, sometimes it's even only one stripe. As I'm quite inexperienced with SPARQL, I'm out of ideas how these strange results can occur. Maybe one of you can give me a hint on why the data might look like this.
Cheers!
python
gis
sparql
dbpedia
sparqlwrapper
0 Answers
Your Answer