1 year ago
#362454
Hossain
Finding CDRs (sequence) by its definition
I am puzzling to find the CDRs by its definition. Definition is to be matched with the previous and next sequence pattern (known as prefix and suffix respectively) and the CDR is between them. Moreover few point mutation are allowed in prefix and suffix patterns. To make it clear I will explain it with the example.
Lets say we have a sequence= 'ABCDEFGHIJKLCDEFEFGHIJMGHIJMCDEFGHNOPQRSTUCDEFGHVWXYZ'.
CDR1 Definition:-> from 3rd position after prefix to starting of suffix
prefix pattern='ABCDEFGHI' with max 3 point mutation but 'D' is important at '3rd' position ie allowed prefix could be ABCDEFHHI, BBCDEFGHI etc
Suffix pattern='GHIJMCDEF' with max 3 point mutation but 'D' is important at '7th' position ie allowed prefix could be GJJJMCDEF, GHAAMCDCF etc
CDR2 Definition:-> prefix to 3 positions before to suffix
prefix pattern='DEFGHNOP' with 3 mutation but 'G' is important at '4th' position
Suffix pattern='WXYZ' with 3 mutation but 'X' is important at '2nd' position
So by the definition, Highlighted sequence is the CDR1 in the sequence:- ABCDEFGHIJKLCDEFEFGHIJMGHIJMCDEF
and CDR2 will be:- ABCDEFGHIJKLCDEFEFGHIJMGHIJMCDEFGHNOPQRSTUCDEFGHVWXYZ
I tried a lot but could not come up with any solution.
What I tried one of them is below
import random
# Define variables
A = ["a","b","c","d","e","f","g","h","i","j","k","l","M","N","O","P","Q","R","S","T","U","V","W","X","Y","Z"]
B = ["a","b","c","d","e","f","g","h","i","j","k","l","M","N","O","P","Q","R","S","T","U","V","W","X","Y","Z"]
C = []
# Iterate how ever many times, i.e. 10
for i in range(10):
results = A
swaps = random.randint(1, 3)
swapA = random.sample(A, swaps)
swapB = random.sample(B, swaps)
# Group the swaps together, so we don't need to use an extra loop
zipped = list(zip(swapA, swapB))
# Replace numbers with letters
for i in range(len(results)):
# Check if this is the right place to swap
for swap in zipped:
if results[i] == swap[0]:
# Swap the two numbers!
results[i] = swap[1]
# Add to results list
C.append(results)
# View results!
print(C)
What I am thinking is to generate all the possible permutation and combination of CDR suffix and prefix, store them into a list. Then match all the generated sequence one by one but it is very time consuming. I have 1 million sequences to separate CDR1 and CDR2.
I know this is not a right way to do it but having no coding background I could not come up with any solution.
Is there any way I can find it easily?
python
bioinformatics
biopython
fuzzy-search
fuzzy-logic
0 Answers
Your Answer