Showing posts with label rna-seq. Show all posts
Showing posts with label rna-seq. Show all posts

Thursday, 10 November 2016

Installing Trinity on Mac OS X via homebrew -- update

A couple of months ago I wrote a short post about how I managed to install the RNA assembler Trinity on Mac OS X (El Capitan), on the off-chance it would be useful to someone else.

This morning I received an email from my friend Mazlina, who I worked with in London, saying she had been trying to do just that and had coincidentally stumbled on my post*. However it hadn't worked out quite as easily for her as it had for me.

It turned out to be due to a Java version problem. While 1.8 was installed, brew --config claimed that only 1.6 was, which is insufficient for Trinity installation.

Here's how she solved it, quoting from her email:

"First, I did
brew doctor
and just cleared up whatever it told me to [...]

$ brew doctor
Your system is ready to brew.

So by running
brew cask search java
it lists down the available java versions (the one you have, 8, is just java), and I went with 9-beta because that was the only one I could download at that time. And when I ran 

$ brew cask install java9-beta
$ brew install trinity
worked like a charm.

Not sure if that's exciting enough to go on the blog, but it solves it anyway."

(For reference, as I told her, given the average excitement level of the blog this should fit right in!)

* I'm never really sure whether these posts are read or not (as the stats from Blogger are always inflated by scanning bots) apart from when people let me know they've seen them - so if you ever bump in to me at a conference or something and have found one of them useful or interesting please let me know, I love to hear about it!

Thursday, 29 September 2016

Installing Trinity on Mac OS X

One of the inevitable joys of bioinformatic life is the installation of a variety of esoteric softwares on a variety of system. As I've just moved to a new position in a new institution, I get to go through this rigmarole again.

This time around I have an extra layer of faffery, as I am now for the first time using a Mac (having been on Ubuntu for the last ten years, and Windows in the distant recollections from before that). While the machine is gorgeous and responsive, I am still in the interminable murky phase where I don't know the intricacies and easy ways of doing things yet (and am still battling muscle memory for keyboard shortcuts!), which means that I'm back down the learning curve a little.

Anyway, as I've just discovered an incredibly easy way to install a very useful tool, I thought I'd share it.

I was installing the excellent RNA assembler Trinity on my iMac running OS X (El Capitan), or at least trying to, according to its website. However, despite attempts at using different (and newer) compilers, I kept running into this error, presumably reflecting my attempts at using alternative compilers failing:

clang: error: unsupported option '-fopenmp' trinity mac

Happily it turns out that Trinity is supported by the fantastic third party package manager homebrew, which I had coincidentally just installed anyway (you don't bundle wget in, what the heck Apple?).

Homebrew is easily installed following the details on their website, and then installing Trinity was as simple as this:

brew cask install java
brew install homebrew/science/trinity


Not only was this dead simple, but it automatically installed a number of other programs (as dependencies of Trinity) that were on my list to install anyway (e.g. trimmomatic and bamtools). It also installs everything directly to /usr/local/bin/, so there's no mucking about with your PATH required. Lovely.

NB: Whilst looking around for hints as to how to solve this problem, I did find this thread on SeqAnswers which suggests that you might need to take a little extra care when running Trinity on Mac systems as opposed to Linux. Something to bear in mind.

Monday, 7 December 2015

The key to finding TCR sequences in RNA-seq data

I had previously written a short blog post touching on how I'd tried to mine some PBMC RNA-seq data (from the ENCODE project) for rearranged T-cell receptor genes, to try and open up this huge resource for TCR repertoire analysis. However, I hadn't gotten very far, on account of finding very few TCR sequences per file.

That sets the background for an extremely pleasant surprise this morning, when I found that Scott Brown, Lisa Raeburn and Robert Holt from Vancouver (the latter of whom being notable for producing one of the very earliest high-throughput sequencing TCR repertoire papers) had published a very nice paper doing just that!

This is a lovely example of different groups seeing the same problem and coming up with different takes. I saw an extremely low rate of return when TCR-mining in RNA-seq data from heterogeneous cell types, and gave up on it as a search for needles in a haystack. The Holt group saw the same problem, and simply searched more haystacks!

This paper tidily exemplifies the re-purposing of biological datasets to allow us to ask new biological questions (something that I consider a practical and moral necessity, given the complexity of such data and the time and costs involved in their generation).

Moreover, they do some really nice tricks, like estimating TCR transcript proportions in other data sets based on constant region usage, investigate TCR diversity relative to CD3 expression, testing on simulated RNA-seq data sets as a control, looked for public or known-specificity receptors and inferred possible alpha-beta pairs by checking all each sample's possible combinations for their presence in at least one other sample (somewhat akin to Harlan Robins' pairSEQ approach).

All in all, a very nice paper indeed, and I hope we see more of this kind of data re-purposing in the field at large. Such approaches could certainly be adapted for immunoglobulin genes. I also wonder if, given whole-genome sequencing data from mixed blood cell populations, we might even be able to do a similar analysis on rearranged variable antigen receptors from gDNA.