1 year ago

#315867

test-img

Idkwhatnomeis

Parsing Wikipedia raised KeyError: query (Python)

I'm trying to get wikipedia backlinks of more than 200 pages. To do this, I:

  • look for URLs in italian, if it doesn't work I look for them in English
  • put them in a list
  • iterate over this list to get the number of languages they are available in on Wikipedia (with bs4)
  • I append these languages in a list
  • I iterate over both languages and urls to get page titles and backlinks to put in a dictinonary with key the language and value the number of backlinks available in that language

But I get the error "query". I don't know why

opere = df.label #works
listaurl = []
for x in opere:
    try:
        wiki_wiki = wikipediaapi.Wikipedia('it')
        p = wiki_wiki.page(x).fullurl
        listaurl.append(p)
        print(p)
    except:
        wiki_wiki = wikipediaapi.Wikipedia('en')
        p = wiki_wiki.page(x).fullurl
        listaurl.append(p)
        print(p)


lista = []
for url in listaurl:
    soup = BeautifulSoup(urllib.request.urlopen(url))
    links = [(el.get('lang'), el.get('href')) for el in soup.select('li.interlanguage-link > a')]

    for language, link in links:
        lista.append(language)
testo = soup.title.text.replace(" ", "")
import wikipediaapi
lista2 = []
regex = r"(?<=/wiki/).*$"
dik = {}
for lang in lista:
    wikis = wikipediaapi.Wikipedia(lang)
    for apage in listaurl:
        wikipage = apage.split('/wiki/')[1]
        page_py = wikis.page(wikipage)
        print(page_py)
        titles = page_py.title
        print(titles)
        back = page_py.backlinks
        dik[lang] = len(back)

Example input to reproduce (the df):

item,label,authorlabel,authorlabel2,numWikipediaLanguages
http://www.wikidata.org/entity/Q172850,Il nome della rosa,,Umberto Eco,53
http://www.wikidata.org/entity/Q437791,Il pendolo di Foucault,,Umberto Eco,30
http://www.wikidata.org/entity/Q791487,Baudolino,,Umberto Eco,26

Error traceback:

Traceback (most recent call last):
  File "C:....myfile.py", line 43, in <module>
    back = page_py.backlinks
  File "C:\....\wikipediaapi\__init__.py", line 1112, in backlinks
    self._fetch('backlinks')
  File "C:....\wikipediaapi\__init__.py", line 1148, in _fetch
    getattr(self.wiki, call)(self)
  File "C:....wikipediaapi\__init__.py", line 468, in backlinks
    self._common_attributes(raw['query'], page)
KeyError: 'query'

python

python-3.x

web-scraping

wikipedia

wikipedia-api

0 Answers

Your Answer

Accepted video resources