Finding CDRs (sequence) by its definition

1 year ago

#362454

Hossain

I am puzzling to find the CDRs by its definition. Definition is to be matched with the previous and next sequence pattern (known as prefix and suffix respectively) and the CDR is between them. Moreover few point mutation are allowed in prefix and suffix patterns. To make it clear I will explain it with the example.

Lets say we have a sequence= 'ABCDEFGHIJKLCDEFEFGHIJMGHIJMCDEFGHNOPQRSTUCDEFGHVWXYZ'.

CDR1 Definition:-> from 3rd position after prefix to starting of suffix

prefix pattern='ABCDEFGHI' with max 3 point mutation but 'D' is important at '3rd' position ie allowed prefix could be ABCDEFHHI, BBCDEFGHI etc

Suffix pattern='GHIJMCDEF' with max 3 point mutation but 'D' is important at '7th' position ie allowed prefix could be GJJJMCDEF, GHAAMCDCF etc

CDR2 Definition:-> prefix to 3 positions before to suffix

prefix pattern='DEFGHNOP' with 3 mutation but 'G' is important at '4th' position

Suffix pattern='WXYZ' with 3 mutation but 'X' is important at '2nd' position

So by the definition, Highlighted sequence is the CDR1 in the sequence:- ABCDEFGHIJKLCDEFEFGHIJMGHIJMCDEF

and CDR2 will be:- ABCDEFGHIJKLCDEFEFGHIJMGHIJMCDEFGHNOPQRSTUCDEFGHVWXYZ

I tried a lot but could not come up with any solution.

What I tried one of them is below

import random

# Define variables
A = ["a","b","c","d","e","f","g","h","i","j","k","l","M","N","O","P","Q","R","S","T","U","V","W","X","Y","Z"]
B = ["a","b","c","d","e","f","g","h","i","j","k","l","M","N","O","P","Q","R","S","T","U","V","W","X","Y","Z"]
C = []

# Iterate how ever many times, i.e. 10
for i in range(10):
    results = A
    swaps = random.randint(1, 3)
    swapA = random.sample(A, swaps)
    swapB = random.sample(B, swaps)
    
    # Group the swaps together, so we don't need to use an extra loop
    zipped = list(zip(swapA, swapB))

    # Replace numbers with letters
    for i in range(len(results)):
        # Check if this is the right place to swap
        for swap in zipped:
            if results[i] == swap[0]:
                # Swap the two numbers!
                results[i] = swap[1]

    # Add to results list
    C.append(results)

# View results!
print(C)

What I am thinking is to generate all the possible permutation and combination of CDR suffix and prefix, store them into a list. Then match all the generated sequence one by one but it is very time consuming. I have 1 million sequences to separate CDR1 and CDR2.

I know this is not a right way to do it but having no coding background I could not come up with any solution.

Is there any way I can find it easily?

python

bioinformatics

biopython

fuzzy-search

fuzzy-logic

0 Answers

Your Answer

Posts

Questions

Blogs

Jobs