Wednesday 24 May 2017

Getting CDR3 nucleotide sequence out of Decombinator

I was recently asked whether CDR3translator (the CDR3 extraction component of the Decombinator suite of T-cell receptor analysis scripts) has an option to output the CDR3 nucleotide, rather than amino acid sequence.

There currently is not, and I don't think there's much call to institute it as a in built feature, but I knocked together a bodge for them and it's an easy one line change so I'm posting it here in case anyone else has use of it:


This just exploits the way that the script currently finds CDR3s: it reconstructs the whole V/J nucleotide sequence from the five part Decombinator index (storing it in the 'nt' variable), translates this into its amino acid sequence ('aa') and then looks for the conserved motifs at the appropriate positions ('start_cdr3' and 'end_cdr3').

If you multiply each of these by 3 then you get the correct positions to extract the CDR3 nucleotide sequence, running from the conserved V-gene cysteine to the J-gene phenylalanine. If you want the GXG portion of the FGXG motif then you’ll need to add '+12' before the final closing square bracket.