1 year ago
#257903
LucasA
How to load a SPARK NLP pretrained pipeline through HDFS
I've already installed sparknlp and its assembly jars, but I still get an error when I try to use one of the models, I get a TypeError: 'JavaPackage' object is not callable
.
I cannot install the model and load it from disk because it's considered too big (>100MB) to my project, so I've been suggested to use HDFS to load the pretrained model. Is there a way to do that?
My code:
from sparknlp.pretrained import PretrainedPipeline
spark = sparknlp.start()
pipeline = PretrainedPipeline('analyze_sentimentdl_glove_imdb', lang = 'en')
annotations = pipeline.fullAnnotate("Hello from John Snow Labs ! ")[0]
What would be the equivalent for loading with HDFS?
EDIT: Full traceback:
Using NLU code:
Traceback (most recent call last):
File "/usr/local/juicer/juicer/spark/spark_minion.py", line 490, in _perform_execute
raise ex from None
File "/usr/local/juicer/juicer/spark/spark_minion.py", line 486, in _perform_execute
self._emit_event(room=job_id, namespace='/stand'))
File "/tmp/juicer_app_10_10_60.py", line 230, in main
task_futures['a6d45e1d-4322-443e-b7e9-ed78b504a8b0'].result()
File "/usr/lib/python3.7/concurrent/futures/_base.py", line 435, in result
return self.__get_result()
File "/usr/lib/python3.7/concurrent/futures/_base.py", line 384, in __get_result
raise self._exception
File "/usr/lib/python3.7/concurrent/futures/thread.py", line 57, in run
result = self.fn(*self.args, **self.kwargs)
File "/tmp/juicer_app_10_10_60.py", line 228, in <lambda>
lambda: sentimentanalysis_1(spark_session, cached_state, cached_emit_event))
File "/tmp/juicer_app_10_10_60.py", line 150, in sentimentanalysis_1
result_df = nlu.load('emotion').predict("I am so happy")
File "/usr/local/lib/python3.7/dist-packages/nlu/__init__.py", line 153, in load
f"Something went wrong during creating the Spark NLP model for your request = {request} Did you use a NLU Spell?")
Exception: Something went wrong during creating the Spark NLP model for your request = emotion Did you use a NLU Spell?
Using my original code (spark nlu):
Traceback (most recent call last):
File "/usr/local/juicer/juicer/spark/spark_minion.py", line 490, in _perform_execute
raise ex from None
File "/usr/local/juicer/juicer/spark/spark_minion.py", line 486, in _perform_execute
self._emit_event(room=job_id, namespace='/stand'))
File "/tmp/juicer_app_10_10_51.py", line 225, in main
task_futures['a6d45e1d-4322-443e-b7e9-ed78b504a8b0'].result()
File "/usr/lib/python3.7/concurrent/futures/_base.py", line 435, in result
return self.__get_result()
File "/usr/lib/python3.7/concurrent/futures/_base.py", line 384, in __get_result
raise self._exception
File "/usr/lib/python3.7/concurrent/futures/thread.py", line 57, in run
result = self.fn(*self.args, **self.kwargs)
File "/tmp/juicer_app_10_10_51.py", line 223, in <lambda>
lambda: sentimentanalysis_1(spark_session, cached_state, cached_emit_event))
File "/tmp/juicer_app_10_10_51.py", line 145, in sentimentanalysis_1
pipeline = PretrainedPipeline('analyze_sentimentdl_glove_imdb', lang = 'en')
File "/usr/local/lib/python3.7/dist-packages/sparknlp/pretrained.py", line 141, in __init__
self.model = ResourceDownloader().downloadPipeline(name, lang, remote_loc)
File "/usr/local/lib/python3.7/dist-packages/sparknlp/pretrained.py", line 72, in downloadPipeline
file_size = _internal._GetResourceSize(name, language, remote_loc).apply()
File "/usr/local/lib/python3.7/dist-packages/sparknlp/internal.py", line 232, in __init__
"com.johnsnowlabs.nlp.pretrained.PythonResourceDownloader.getDownloadSize", name, language, remote_loc)
File "/usr/local/lib/python3.7/dist-packages/sparknlp/internal.py", line 165, in __init__
self._java_obj = self.new_java_obj(java_obj, *args)
File "/usr/local/lib/python3.7/dist-packages/sparknlp/internal.py", line 175, in new_java_obj
return self._new_java_obj(java_class, *args)
File "/usr/local/spark/python/pyspark/ml/wrapper.py", line 67, in _new_java_obj
return java_obj(*java_args)
TypeError: 'JavaPackage' object is not callabl
python
apache-spark
nlp
hdfs
johnsnowlabs-spark-nlp
0 Answers
Your Answer