1 year ago
#387275
Lorenzo Romani
fetching thousands of urls with Newspaper3k and Multiprocessing slows down after few hundred calls
I have a code which is meant to:
a) call an API to get Google SERP results; b) open each retrieved url with the newspaper3k python3 library, which extracts the text of the news article; c) save the text of the article into a .txt file.
The implementation of the multiprocessing part is as follows:
def createFile(newspaper_article):
""" function that opens each article, parses it, and saves it to file on disk"""
def main():
p = ThreadPool(10)
p.map(partial(createFile), sourcesList)
p.close()
p.join()
if __name__ == '__main__':
main()
I have also tried with Pool instead of ThreadPool.
The problem is that after fetching and saving a few hundreds articles, it slows down dramatically. Sometimes it may happen that a link takes some time to load but i'd expect the other routines to keep goin in the meantime. What am I doing wrong?
python-3.x
multiprocessing
newspaper3k
0 Answers
Your Answer