Anyone who does
any T cell receptor analysis will know IMGT (the ImmunoGenetics
DataBase), the repository for all things TCR and Ig. You either use
it, or you're one of those annoying people that makes me have to drag
up all the tables of outdated nomenclatures.
Much like any
resource, IMGT has it good points (simple and highly useful features
like GENE-DB and LocusView in particular) and its bad (the less said
about LIGM-DB the better).
However, again
like any resource, it's only as good as the data stored in it. The
data in it, as far as I can tell, is pretty damn good (and I use it a
lot). I guess that's why
they got to be in charge of all the data in the first place.
As such, when I
recently found an error in a sequence*, I made sure to let them know:
I certainly get a lot of mileage out of their data, it's only fair
that I pay them back (and pay it forward to others) by ensuring the
data that is there is good.
It's always a
little nerve-inducing, being a PhD student emailing senior doctors
and professors to let them know of a mistake you've discovered, but
as hoped the information was very warmly received, and I'm told that the
error will be corrected.
Science has to be
self-correcting to stop errors lingering and spreading; firing a
quick email off to correct an annotation might not seem like much,
but if it stops one person going through the same short time of
confusion that you went through unravelling the mistake then you've
done a net service to the world.
* For the people that found their way here suffering from this particular error, here's what I found. I was looking at
the TCR leader regions(the mono-spliced section of the transcript
between the start of translation and the beginning of the V region
which encodes the localisation signal peptide), when I noticed that
one gene never seemed to produce functional transcripts. It turned
out that while some of the entries for the human alpha gene
TRAV29,DV5 were correct, if you downloaded the L-PART2 region alone
the sequence produced actually contains a section of the start of
the V gene. So, instead of reading 'GGGTAAAC', it reads
'GGGTAAACAGTCAACAGAAGAATGAT'. I just checked and it still
gives the old sequence, but I assume there's a lag time for databases
to update.