Thursday, 24 October 2013

Biopython SeqIO tips

The Biopython libraries are of tremendous use in analysing biological data. I use them in the vast majority of my bespoke fastq analysis, as I'm sure many do.
However, there's a couple of tasks I regularly find myself wanting to do, but that I could not easily find solutions for. In case anyone finds themselves in similar need, here are the solutions I found, maybe save you a bit of time.
Output quality scores
This is the main one; it's always annoyed me that there doesn't seem to be an easy way to output the quality score in SeqIO. Sure, I know with the letter_annotations option I can output the actual score in numbers, but sometimes I want to output the actual ASCII characters (such as if I want to take a subsection of a fastq record, both of nucleotide and quality strings).
Here's how I get around it; turn the whole record into a string, split that on its new lines, and just take the fourth:
Generate new fastq records in situ
Sometimes it's necessary to generate complete fastq records on the fly, as opposed to reading them in from an existing file.
I've found a couple of ways of doing this. The first comes buried in amongst some unrelated Biopython features, and looks like this:
 from Bio.Alphabet import generic_dna  
 from Bio.Seq import Seq  
 from Bio.SeqRecord import SeqRecord  
 new_record = SeqRecord(Seq("AAAAAA", generic_dna), id="New Record", description="")  
 new_record.letter_annotations["phred_quality"] = [40,40,40,40,40,40]   
The second way came from an answer on Biostars, and works like this:
 from Bio import SeqIO  
 from StringIO import StringIO  
 fastq_string = "@%s\n%s\n+\n%s\n" % ("New Record", "AAAAAA", "IIIIII")  
 new_record =, "fastq-illumina")  
They both work, with a couple of differences.
According to a quick test, the latter is faster, and obviously requires loading fewer modules.
Which to use might also depend on what format you have your qualities in; if you only have integers, then the first might be more tempting, whereas if you're making your new fastq from pieces of other existing records then the second is probably the way to go.

Sunday, 20 October 2013

Installing python dependencies for Decombinator

I've written previously about Decombinator, the TCR repertoire analysis program developed in our laboratory.

Having just formatted my computer and reinstalled all my packages, I've been reminded of what a faff this can be.

This time around I did things a lot more efficiently than I had before. Seeing as how my colleague responsible for updating the current readme is busy writing up his thesis, here's my quick guide to installing all the python modules required by Decombinator in Linux - at least in Ubuntu.

 sudo apt-get install python-numpy python-biopython python-matplotlib python-levenshtein   
 sudo apt-get install python-pip  
 sudo pip install acora  

Most of the modules are available in the repositories, which has the added bonus of filling in all their required dependencies. Acora (the module that enacts the Aho-Corasick finite-state machine) however is not, but is easily installed from the command line using pip.

Easy. Time to get decombining.

August 2016 update:
Decombinator has gone through a number of improvements since writing this post. The current version (and installation details) can be found on the Innate2Adaptive lab's Github repo.

Thursday, 17 October 2013

Good pipetting rules of thumb

Accurate micro-pipetting is probably the single most important physical skill that a molecular or cell biologist need learn. It is the foundation upon which the vast majority of our experiments are built.

Pipettes are to us as brushes are to painters. They are the tools by which we achieve our aims; any mistake in pipetting at any stage stands to affect the outcome of our experiment.

Understandably, there are many resources available to help inform best pipetting practice, which I recommend budding biologists make use of. This is a nice one, and there are plenty of (cheesy) videos around too. People should also know a bit (if not a lot!) about how they work, how they can go wrong, and the different ways in which they can be used – there’s an unwieldy if thorough guide covering these here.

However, all this advice tends to focus on what’s best for the experiment – which is important – but overlooks another concern; your body.

Experiments can often involve highly repetitive movements, for extended periods of time. This isn’t that great on your mechanisms, particularly those of your thumb and wrist (as well I know!), and can cause repetitive strain injuries. There are a number of ways you can cut down on these risks, but here’s a quick rundown of the ones I think are worth bearing in mind:

         Don’t rush. Be fast, by all means – if you’re able – but hurrying usually means you do things the wrong way (both in terms of your data’s and your own safety)

         Set up your bench. My bench might look a mess when I’m working, but it’s organised so that I can access everything I need to without stretching or moving lots of stuff around. Which leads onto…

         Minimise. The injuries you’re at risk of are cumulative in nature, so you want to get away with using the minimum force and number of repetitions you can get away with. Maximise use of master mixes, reverse-pipette when needed, use multi-channel/step pipettes where possible, and for the love of all that is good, don’t bang your pipette tips on as hard as you can! (This is a bit of a bugbear of mine; the more you push it on just adds to the force required to get it off! It’s also damaging to the pipette in the long run. Gentle but firm pressure with a slight twist should be more than enough to form a strong enough seal for aspiration)

         Mix it up. If you’re able, swap hands every now and then to split the load. Admittedly, most people’s weaker-hand is probably less reliable than their stronger-hand, but accuracy’s sometimes less important – for instance, a lot of your larger-volume wash steps can probably afford to take the hit

         Take breaks. The most effective, but least practical option (which I am guilty of ignoring most of the time!). Try to break up your experiments, either with pauses or activities that don’t put so much pressure on your wrists and hands (so you can still be working, fear not PIs!).

The best part is that most of these tips help improve both the accuracy and speed of your work too.

Remember, you only get one pair of hands (if you’re lucky), so don’t put them at risk for your work, no matter how important you think it is at the time. Speaking from experience, it can also impact directly on your research, as my current bout of tendonitis tenosynovitis (damn you qPCR!) put me out of commission lab-wise for a fortnight*. It’s in everyone’s best interest for you to take the time and effort to pipette safely, and that includes your data.

OK now I’m anthropomorphising results. Time to end blog post.

* I know it seems rich me writing this article after admitting I have the injuries I claim knowledge of preventing, but it's a bit of a 'do as I do and not as I say' situation. I happen to have always had dodgy wrists, no doubt from a childhood spent on computer games, an adolescence spent typing, and an adulthood spent pipetting. I think the addition of thumb-swipey smart phone ownership was the last straw.