Specifically, in troubleshooting a MiSeq run's poor yield, I wanted to see whether there were significantly more reads derived from one of the flow cell surfaces (top or bottom) relative to the other. The reason I did this was my FWHM (full cluster width at half maximum, a measure of the focus during imaging) was noticeably higher for that surface.
|I mean, I have no idea if ~3 is that much worse than ~2.8-2.9, but there's no harm in checking right?|
Therefore with a quick bit of basic bash we can find out exactly how many reads derived from each surface.
# Get all index reads (as the shortest) in one file zcat *I1*z > I1.fq # Extract the identifier lines with sed # and grep for those with a '1' at the right position # This indicated they derived from the top surface sed '2~4d;3~4d;4~4d' I1.fq | grep ^.............................1 -c # Do the same for '2', i.e. the bottom surface sed '2~4d;3~4d;4~4d' I1.fq | grep ^.............................2 -c
And there you have it. Simple, quick and effective.
(As it turned out I have almost equal numbers derived from both surfaces, so it wasn't to blame in my case, but this might be useful for other situations!)