Sunday, 11 February 2018

High-throughput immunopeptidomics

In my PhD I focused on studying the complexity of the immune system at the level of the T cell repeptor. Recently I’ve been getting in to what happens on the other side of the conversation as well; in addition to looking at TCR repertoires I’m increasingly playing with MHC-bound peptide repertoires too.

Immunopeptidomics is a super interesting field, with a great deal of promise, but it’s got a much higher barrier to entry for research groups relative to something like AIRR-seq. Nearly every lab can do PCR, and access to deep-sequencing machines or cores becomes ever cheaper and more commonplace. However not every lab has expertise with fiddly pull downs, while only a tiny fraction can do highly sensitive mass spec. This is why efforts to make immunopeptide data generation and sharing easier should be suitably welcomed.

One of the groups whose work commendably contributes to both of these efforts is that of Michal Bassani-Sternberg. For sharing, she consistently makes all of her data available (and is seemingly a senior founder and major contributor to the recent SysteMHC Atlas Project), while for generation her papers give clear and thorough technical notes, which aid in reproducibility.

However from the generation perspective this paper (which came out at the end of last year in Mol. Cell Proteomics) describes a protocol which – through application of sensible experimental design – should result in the easier production of immunopeptidomic data, even from more limited samples.

The idea is to basically increase the throughput of the methods by hugely reducing the number of handling steps and time required to do the protocol. Samples are mushed up, lysed, spun, and then run through a variety of stacked plates. The first (if required) catches irrelevant, endogenous antibodies in the lysates; the next catches MHC class I (MHC-I) peptide complexes via bead-cross-linked antibodies; the next similarly catches pMHC-II, while the final well catches everything else (giving you lovely sample-matched gDNA and proteomes to play with, should you choose). Each plate of pMHC can then be taken and treated with acid to elute the peptides from their grooves, before purification and mass spec. It’s a nice neat solution, which supposedly can all be done with readily commercially available goodies (although how much all these bits and bobs cost I have no idea).

Crucially it means that you get everything you might want (peptides from MHC-I/-II, plus the rest of the lysates) in separate fractions, from a single input sample, in a protocol that spans hours rather then days. Having it all done in one pass helps boost recovery from limited samples, which is always nice for say clinical material. Although I should say, ‘limited’ is a relative term. For people used to dealing with nice, conveniently amplifiable nucleic acids, tens to thousands of cells may be limiting. Here, they managed to go down as low as 10 million. (Which is not to knock it, as this is still much much better then hundreds of millions to billions of cells which these experiments can sometimes require. I don’t want everyone to go away thinking about repurposing their collection of banked Super Rare But Sadly Impractically Tiny tissue samples here.)

So on technical merit alone, it’s already a pretty interesting paper. However, there’s also a nice angle where they test out their new protocol on an ovarian carcinoma cell line with or without IFNg treatment, which tacks on a nice bit of biology to the paper too.

You see the things you might expect – like a shift in peptides seemingly produced by degradation from the standard proteasome to more of those produced by the immunoproteasome – and some you might not. Another nice little observation which follows on perfectly from this is that you also see an alteration in the abundance of peptides presented by different HLA alleles: for instance the increased  chemotryptic-like degradation of the immunoproteasome favours the loading of HLA-B*07:02 molecules, due to making more peptides with the appropriate motif.

My favourite observation however relates to the fact that there’s a consistent quantitative and qualitative shift in peptidomes between IFNg treated cells and mock. This raises an interesting possibility to me, about what should be possible in the near future, as we iron out the remaining wrinkles in the methodologies. Not only should we learn about what proteins are being expressed, based on which proteins those peptides are derived from, but we should be able to infer something about what cytokines those cells have been expressed to, based on how those peptides have been processed and presented.

Thursday, 8 February 2018

Bulk downloading proteome files from UniProt using Python

It's that time again, where the following has happened:
  1. I want to do some niche bioinformatics related thing
  2. I cobble together a quick script to do said thing
  3. I throw that script up on the internet on the offchance it will save someone else the time of doing 2
It's a little shift of target and scale from a similar previous post (in which I used Python to extract specific DNA sequences from UCSC. This time I've been downloading a large number of proteome files from UniProt.

It's all explained in the docstring, but the basic idea is that you go on UniProt, search for the proteomes you want, and use their export tool to download tsv files containing the unique accession numbers with identify the data you're after. Then you simply run this script in the same directory; it takes those accessions, turns them in to URLs, downloads the FASTA data at that address and outputs it to new FASTA files on your computer, with separate files named after whatever the tsv files were named.

The best thing about this is you can download multiple different lists of accessions, and have them output to separate files. Say maybe you have a range of pathogens you're interesting in, each with multiple proteomes banked; this way you end up with one FASTA file for each, containing as many of their proteomes as you felt like including in your search.

Thursday, 10 August 2017

Diagnosing thermocycler issues with a cheap thermocouple and logger

Occasionally I have a peek at /r/labrats, which is a subreddit for people who work in science labs. One of the regular themes is the bemoaning of the unpredictable nature of research shared by all scientists everywhere, and such was the thread that inspired this post.

One user was complaining about some dodgy amplification results, cursing the capricious nature of the PCR gods, and among the suggestions for possible contributing factors was possible cold spots in the thermocycler. The original poster was a bit doubtful about this, so I thought I would link them to a blog post I wrote on this very topic a year or two earlier. However this was slightly hampered by that fact that it turns out I hadn’t actually ever written the post – I just meant to, and then presumably forgot to. So, in reference to that thread (note that while that initial reply has since been deleted, the subsequent thread with my replies remains here), and with future similar occasions in mind, here’s some of what I intended to say. It’s two years late, and I’m in a different institution now without access to all the files I had then, but hopefully it could still be useful.

Essentially, I was in a somewhat similar position to that Reddit poster, in that I had been getting some dodgy and inexplicable PCR results, as had others in my lab. After a series of control experiments, I began to suspect that it was the potentially the cycler itself that was to blame – the four-block G-STORM I was using was getting pretty old (the cream plastic was even starting to go that grungy ‘old keyboard’ shade of yellow), so I resolved to try to measure whether it was operating as expected.

To do this I ordered a cheap K type thermocouple probe and a basic logger (which, combining my memory with my Google Fu skills, I think was the EasyLog EL-USB-TC-LCD Data Logger from Lascar Electronics.

What I did then was poke a small hole in the top of a 0.2 ml PCR tube, the same kind I use for all my PCRs, add 50 ul of water inside, poke the thermocouple in through the side so that the probe tip is under the water level, and then hold it in place/seal the hole with a tiny strip of Parafilm. I don’t have a photo, but I found this beautiful reconstruction that I drew for a lab presentation I gave around the time:


Then I simply ran a bunch of different cycling steps, some PCRs, some incubations, some mock runs, and then plotted what temperatures were recorded throughout these programs, for different wells in the same block, across the four blocks in the machine. Unfortunately, I don’t seem to have copies of most of these plots, but the few I do have make the case well enough. Here are the first tests, showing that different opposite corners of the blocks failed to hit the proclaimed temperature in simple two-step heating experiments:

And here is a far more damning example, of two wells in the same block of the same PCR machine (which had incidentally recently been ‘repaired’) in a short mock PCR cycle (denaturing at 95°, annealing at 55° and then extending at 72°): one of the wells sampled failed to hit either denaturation or annealing in the allotted time! No wonder these amplifications were failing.


Happily, in this situation, it was a problem easily solved: we bought a new PCR machine, and I’d caught the problem before any important results could be affected. However, it really does make you wonder just how much you can trust your thermocycler? Or your heat blocks (especially above 60°, where the alcohol thermometers found in labs cease to be useful)? You know, those crucial bits of kit upon which whole sections of the output of your lab is possibly utterly reliant, and yet which you probably have no readout for or data on other than the traces that the inbuilt software displays on the screen (which I should say, were always completely normal on all of the dodgy blocks I tested).

The total price of the kit required to do the tests I described above was about £80, or $100, with 95% of that being the logger. This is considerably less than you probably regularly spend on polymerase, if your lab does a lot of PCR, which seems like a pretty small price to pay to be confident your reactions are proceeding as planned.