1 year ago
#358506
Alex Yu
Python. Getting urls from page
This code parses the page and extracts the url to generate the sitemap. Along with the url, I also take away a part of the js code. How to flicker to exclude js ?
if (resp.status == 200 and
('text/html' in resp.headers.get('content-type'))):
data = (await resp.read()).decode('utf-8', 'replace')
urls = re.findall(r'(?i)href=["\']?([^\s"\'<>]+)', data)
asyncio.Task(self.addurls([(u, url) for u in urls])
python
html
parsing
sitemap
0 Answers
Your Answer