Specifically, in troubleshooting a MiSeq run's poor yield, I wanted to see whether there were significantly more reads derived from one of the flow cell surfaces (top or bottom) relative to the other. The reason I did this was my FWHM (full cluster width at half maximum, a measure of the focus during imaging) was noticeably higher for that surface.
I mean, I have no idea if ~3 is that much worse than ~2.8-2.9, but there's no harm in checking right? |
Therefore with a quick bit of basic bash we can find out exactly how many reads derived from each surface.
# Get all index reads (as the shortest) in one file
zcat *I1*z > I1.fq
# Extract the identifier lines with sed
# and grep for those with a '1' at the right position
# This indicated they derived from the top surface
sed '2~4d;3~4d;4~4d' I1.fq | grep ^.............................1 -c
# Do the same for '2', i.e. the bottom surface
sed '2~4d;3~4d;4~4d' I1.fq | grep ^.............................2 -c
And there you have it. Simple, quick and effective.
(As it turned out I have almost equal numbers derived from both surfaces, so it wasn't to blame in my case, but this might be useful for other situations!)
No comments:
Post a Comment