View Single Post
Old 04-27-2016, 07:10 AM   #8
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 6,730
Default

Quote:
Originally Posted by Jane M View Post
I really appreciate your suggestions GenoMax, thank you

I used TopHat2 with default parameters:
Default is 20 locations for multi-mapped reads for TopHat as I recall.
Quote:
It seems that there are reads with the same sequence but different identifiers. What are they?
Those may be the optical duplicates that were generated by pad-hopping (or PCR during prep). You can try to run the Picard MarkDuplicates protocol (including the optical dup marking) and see if they get flagged. I have tried doing this with a limited number of samples from HiSeq 4000 but have not managed to get useful results for the optical part.

Quote:
Is there a way to try to map the unmapped reads to rRNA?
You can use the sequence of the human rDNA repeat found here to map against.

Quote:
When trimming the HiSeq 4000 reads down to 75 bp -for 3 samples only- I got an increase in overall mapping rate of 1.1, 1.6 and 2.5%. Better but not 10% better.
Perhaps you should allow multi-mappers to map at all locations. See if that ups the percentage. An academic exercise
GenoMax is offline   Reply With Quote