How to load a SPARK NLP pretrained pipeline through HDFS

1 year ago

#257903

LucasA

I've already installed sparknlp and its assembly jars, but I still get an error when I try to use one of the models, I get a TypeError: 'JavaPackage' object is not callable.

I cannot install the model and load it from disk because it's considered too big (>100MB) to my project, so I've been suggested to use HDFS to load the pretrained model. Is there a way to do that?

My code:

    from sparknlp.pretrained import PretrainedPipeline
    spark = sparknlp.start()
    pipeline = PretrainedPipeline('analyze_sentimentdl_glove_imdb', lang = 'en')
    annotations =  pipeline.fullAnnotate("Hello from John Snow Labs ! ")[0]

What would be the equivalent for loading with HDFS?

EDIT: Full traceback:

Using NLU code:

Traceback (most recent call last):

  File "/usr/local/juicer/juicer/spark/spark_minion.py", line 490, in _perform_execute
    raise ex from None

  File "/usr/local/juicer/juicer/spark/spark_minion.py", line 486, in _perform_execute
    self._emit_event(room=job_id, namespace='/stand'))

  File "/tmp/juicer_app_10_10_60.py", line 230, in main
    task_futures['a6d45e1d-4322-443e-b7e9-ed78b504a8b0'].result()

  File "/usr/lib/python3.7/concurrent/futures/_base.py", line 435, in result
    return self.__get_result()

  File "/usr/lib/python3.7/concurrent/futures/_base.py", line 384, in __get_result
    raise self._exception

  File "/usr/lib/python3.7/concurrent/futures/thread.py", line 57, in run
    result = self.fn(*self.args, **self.kwargs)

  File "/tmp/juicer_app_10_10_60.py", line 228, in <lambda>
    lambda: sentimentanalysis_1(spark_session, cached_state, cached_emit_event))

  File "/tmp/juicer_app_10_10_60.py", line 150, in sentimentanalysis_1
    result_df = nlu.load('emotion').predict("I am so happy")

  File "/usr/local/lib/python3.7/dist-packages/nlu/__init__.py", line 153, in load
    f"Something went wrong during creating the Spark NLP model for your request =  {request} Did you use a NLU Spell?")

Exception: Something went wrong during creating the Spark NLP model for your request =  emotion Did you use a NLU Spell?

Using my original code (spark nlu):

Traceback (most recent call last):

  File "/usr/local/juicer/juicer/spark/spark_minion.py", line 490, in _perform_execute
    raise ex from None

  File "/usr/local/juicer/juicer/spark/spark_minion.py", line 486, in _perform_execute
    self._emit_event(room=job_id, namespace='/stand'))

  File "/tmp/juicer_app_10_10_51.py", line 225, in main
    task_futures['a6d45e1d-4322-443e-b7e9-ed78b504a8b0'].result()

  File "/usr/lib/python3.7/concurrent/futures/_base.py", line 435, in result
    return self.__get_result()

  File "/usr/lib/python3.7/concurrent/futures/_base.py", line 384, in __get_result
    raise self._exception

  File "/usr/lib/python3.7/concurrent/futures/thread.py", line 57, in run
    result = self.fn(*self.args, **self.kwargs)

  File "/tmp/juicer_app_10_10_51.py", line 223, in <lambda>
    lambda: sentimentanalysis_1(spark_session, cached_state, cached_emit_event))

  File "/tmp/juicer_app_10_10_51.py", line 145, in sentimentanalysis_1
    pipeline = PretrainedPipeline('analyze_sentimentdl_glove_imdb', lang = 'en')

  File "/usr/local/lib/python3.7/dist-packages/sparknlp/pretrained.py", line 141, in __init__
    self.model = ResourceDownloader().downloadPipeline(name, lang, remote_loc)

  File "/usr/local/lib/python3.7/dist-packages/sparknlp/pretrained.py", line 72, in downloadPipeline
    file_size = _internal._GetResourceSize(name, language, remote_loc).apply()

  File "/usr/local/lib/python3.7/dist-packages/sparknlp/internal.py", line 232, in __init__
    "com.johnsnowlabs.nlp.pretrained.PythonResourceDownloader.getDownloadSize", name, language, remote_loc)

  File "/usr/local/lib/python3.7/dist-packages/sparknlp/internal.py", line 165, in __init__
    self._java_obj = self.new_java_obj(java_obj, *args)

  File "/usr/local/lib/python3.7/dist-packages/sparknlp/internal.py", line 175, in new_java_obj
    return self._new_java_obj(java_class, *args)

  File "/usr/local/spark/python/pyspark/ml/wrapper.py", line 67, in _new_java_obj
    return java_obj(*java_args)

TypeError: 'JavaPackage' object is not callabl

python

apache-spark

nlp

hdfs

johnsnowlabs-spark-nlp

0 Answers

Your Answer

Posts

Questions

Blogs

Jobs