View Single Post
Old 02-13-2015, 09:28 AM   #11
Brian Bushnell
Super Moderator
Location: Walnut Creek, CA

Join Date: Jan 2014
Posts: 2,707

Originally Posted by arash82 View Post
I kind of forgot to mention that I am using it on RNA-seq data from a HiSeq 2500. I currently don't have access to the mapped file, but I'll try it on them as soon as I can.
Just to clarify, CalcUniqueness does not make any use of mapping information, but it's possible to do a similar analysis with mapping information instead of kmers and there are probably programs that do so.

The thing is I am using the program (right now at least) just to determine if I am sequencing deep enough or if I can multiplex further. I don't need a perfect curve, just an estimate. Was thinking maybe to trim and then run, but shouldn't gain much from that...
If you want to get an advantage from trimming, you'd have to do fixed-length trimming on the left (like, removing the first 5 bases). Quality-trimming the right end won't affect the graphs (other than the "rand" column) unless the reads end up shorter than a kmer, and variable-length trimming on the left end would wreck them because the kmers would no longer start in the same place for previously identical reads. Quality-filtering and adapter-trimming might help, though: in=reads.fq out=clean.fq maq=15 ktrim=r k=25 mink=11 hdist=1 tpe tbo ref=truseq.fa.gz minlen=40

Here the "maq=15" will throw away reads with average quality below 15 (in other words, an expected error rate of over 1/30 or so), and reads trimmed shorter than 40bp after adapter removal will also be discarded. These may not be optimal settings for actual RNA-seq analysis (since requiring a high average quality can bias quantification), but it should clean up the data a bit to allow generation of more accurate saturation curves.
Brian Bushnell is offline   Reply With Quote