Originally Posted by Jane M View Post
I really appreciate your suggestions GenoMax, thank you

I used TopHat2 with default parameters:
Default is 20 locations for multi-mapped reads for TopHat as I recall.
It seems that there are reads with the same sequence but different identifiers. What are they?
Those may be the optical duplicates that were generated by pad-hopping (or PCR during prep). You can try to run the Picard MarkDuplicates protocol (including the optical dup marking) and see if they get flagged. I have tried doing this with a limited number of samples from HiSeq 4000 but have not managed to get useful results for the optical part.

Is there a way to try to map the unmapped reads to rRNA?
You can use the sequence of the human rDNA repeat found here to map against.

When trimming the HiSeq 4000 reads down to 75 bp -for 3 samples only- I got an increase in overall mapping rate of 1.1, 1.6 and 2.5%. Better but not 10% better.
Perhaps you should allow multi-mappers to map at all locations. See if that ups the percentage. An academic exercise
