I'm working with TruSeq data for the mouse genome, with one of the groups having a low complexity library, and the other two having a high complexity library. The reads are single end, 50 bp.
What would be the best parameters to set in order to handle the duplicates in the low complexity samples, and help normalize the alignments for downstream analysis?
Should I pay any attention to parameters involved intron sizes and gaps? How should I set the multimapping and mismatch parameters? Also, should I reduce the value for seedSearchStartLMax? If the reads are fragmented into smaller sizes, it will increase the amount of multi mapped reads, which I don't necessarily want? Any benefit to that? What's a good value for this?
Also, would splicing or isoform detection be worth pursuing with reads this short and the fact that one group has a low complexity library?
I'm using Gencode's M8 build with the primary assembly and primary annotations.
Any suggestions and help would be really appreciated! Thank you!
What would be the best parameters to set in order to handle the duplicates in the low complexity samples, and help normalize the alignments for downstream analysis?
Should I pay any attention to parameters involved intron sizes and gaps? How should I set the multimapping and mismatch parameters? Also, should I reduce the value for seedSearchStartLMax? If the reads are fragmented into smaller sizes, it will increase the amount of multi mapped reads, which I don't necessarily want? Any benefit to that? What's a good value for this?
Also, would splicing or isoform detection be worth pursuing with reads this short and the fact that one group has a low complexity library?
I'm using Gencode's M8 build with the primary assembly and primary annotations.
Any suggestions and help would be really appreciated! Thank you!
Comment