1 year ago
#184571
Mark
Unable to generate chunks (If length is greater than 512 in bert), we can use to split into chunks
I'm working Question & Answering hugging face pipeline, my sentence length is 3535, bert only takes 512 length, so i'm trying to divide into chunks and work on it.
In the code, i'm working on question and answering model from hugging face, if the length of the sentence is greater than 512, bert won't take it and we've to add extra argument Truncation=True, which doesn't consider some content from the sentence, which is a drawback. That's why i'm splitting the sentence into chunks and adding back.
Below is the code
from transformers import pipeline
def load_qa_model():
model = pipeline(task='question-answering', model=model, tokenizer=tokenizer)
return model
def generate_chunks(inp_str):
max_chunk = 500
inp_str = inp_str.replace('.', '.<eos>')
inp_str = inp_str.replace('?', '?<eos>')
inp_str = inp_str.replace('!', '!<eos>')
sentences = inp_str.split('<eos>')
current_chunk = 0
chunks = []
for sentence in sentences:
if len(chunks) == current_chunk + 1:
if len(chunks[current_chunk]) + len(sentence.split(' ')) <= max_chunk:
chunks[current_chunk].extend(sentence.split(' '))
else:
current_chunk += 1
chunks.append(sentence.split(' '))
else:
chunks.append(sentence.split(' '))
for chunk_id in range(len(chunks)):
chunks[chunk_id] = ' '.join(chunks[chunk_id])
return chunks
sentence = "" # Consider random sentence where the length is greater than 512
vect = generate_chunks(sentence)
qa = load_qa_model()
question = "Who released this article?"
answers = qa(question=question, context=vect)
print(answers['answer'])
Below is the link for the sentence (Article)
https://drive.google.com/file/d/1m8rYuOaFSW7bxqm_nYo_8Ryi9RUCY3Tq/view?usp=sharing
The output is below
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
~\AppData\Local\Temp\ipykernel_11012\2006680085.py in <module>
1 qa = load_qa_model()
2 question = "Who released this article?"
----> 3 answers = qa(question=question, context=vect)
4 print(answers['answer'])
c:\users\nithi\miniconda3\lib\site-packages\transformers\pipelines\question_answering.py in __call__(self, *args, **kwargs)
248
249 # Convert inputs to features
--> 250 examples = self._args_parser(*args, **kwargs)
251 if len(examples) == 1:
252 return super().__call__(examples[0], **kwargs)
c:\users\nithi\miniconda3\lib\site-packages\transformers\pipelines\question_answering.py in __call__(self, *args, **kwargs)
80 inputs = [{"question": kwargs["question"], "context": kwargs["context"]}]
81 else:
---> 82 raise ValueError("Arguments can't be understood")
83 else:
84 raise ValueError(f"Unknown arguments {kwargs}")
ValueError: Arguments can't be understood
How to overcome this issue?
deep-learning
huggingface-transformers
bert-language-model
transformer-model
nlp-question-answering
0 Answers
Your Answer