Unable to generate chunks (If length is greater than 512 in bert), we - Enhance your coding expertise with Mark on @onlycoders.net

1 year ago

#184571

Mark

Unable to generate chunks (If length is greater than 512 in bert), we can use to split into chunks

I'm working Question & Answering hugging face pipeline, my sentence length is 3535, bert only takes 512 length, so i'm trying to divide into chunks and work on it.

In the code, i'm working on question and answering model from hugging face, if the length of the sentence is greater than 512, bert won't take it and we've to add extra argument Truncation=True, which doesn't consider some content from the sentence, which is a drawback. That's why i'm splitting the sentence into chunks and adding back.

Below is the code

from transformers import pipeline

def load_qa_model():
    model = pipeline(task='question-answering', model=model, tokenizer=tokenizer)
    return model

def generate_chunks(inp_str):
    max_chunk = 500
    inp_str = inp_str.replace('.', '.<eos>')
    inp_str = inp_str.replace('?', '?<eos>')
    inp_str = inp_str.replace('!', '!<eos>')

    sentences = inp_str.split('<eos>')
    current_chunk = 0
    chunks = []
    for sentence in sentences:
        if len(chunks) == current_chunk + 1:
            if len(chunks[current_chunk]) + len(sentence.split(' ')) <= max_chunk:
                chunks[current_chunk].extend(sentence.split(' '))
            else:
                current_chunk += 1
                chunks.append(sentence.split(' '))
        else:
            chunks.append(sentence.split(' '))

    for chunk_id in range(len(chunks)):
        chunks[chunk_id] = ' '.join(chunks[chunk_id])
    return chunks

sentence = ""  # Consider random sentence where the length is greater than 512
vect = generate_chunks(sentence)

qa = load_qa_model()
question = "Who released this article?"
answers = qa(question=question, context=vect)
print(answers['answer'])

Below is the link for the sentence (Article)

https://drive.google.com/file/d/1m8rYuOaFSW7bxqm_nYo_8Ryi9RUCY3Tq/view?usp=sharing

The output is below

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
~\AppData\Local\Temp\ipykernel_11012\2006680085.py in <module>
      1 qa = load_qa_model()
      2 question = "Who released this article?"
----> 3 answers = qa(question=question, context=vect)
      4 print(answers['answer'])

c:\users\nithi\miniconda3\lib\site-packages\transformers\pipelines\question_answering.py in __call__(self, *args, **kwargs)
    248 
    249         # Convert inputs to features
--> 250         examples = self._args_parser(*args, **kwargs)
    251         if len(examples) == 1:
    252             return super().__call__(examples[0], **kwargs)

c:\users\nithi\miniconda3\lib\site-packages\transformers\pipelines\question_answering.py in __call__(self, *args, **kwargs)
     80                 inputs = [{"question": kwargs["question"], "context": kwargs["context"]}]
     81             else:
---> 82                 raise ValueError("Arguments can't be understood")
     83         else:
     84             raise ValueError(f"Unknown arguments {kwargs}")

ValueError: Arguments can't be understood

How to overcome this issue?

deep-learning

huggingface-transformers

bert-language-model

transformer-model

nlp-question-answering

0 Answers

Your Answer

Posts

Questions

Blogs

Jobs