Quote:
Originally Posted by Jane M
I really appreciate your suggestions GenoMax, thank you
I used TopHat2 with default parameters:
|
Default is 20 locations for multi-mapped reads for TopHat as I recall.
Quote:
It seems that there are reads with the same sequence but different identifiers. What are they?
|
Those may be the optical duplicates that were generated by pad-hopping (or PCR during prep). You can try to run the
Picard MarkDuplicates protocol (including the optical dup marking) and see if they get flagged. I have tried doing this with a limited number of samples from HiSeq 4000 but have not managed to get useful results for the optical part.
Quote:
Is there a way to try to map the unmapped reads to rRNA?
|
You can use the sequence of the human rDNA repeat found
here to map against.
Quote:
When trimming the HiSeq 4000 reads down to 75 bp -for 3 samples only- I got an increase in overall mapping rate of 1.1, 1.6 and 2.5%. Better but not 10% better.
|
Perhaps you should allow multi-mappers to map at all locations. See if that ups the percentage. An academic exercise