We've identified what appears to be a severe bug in adapter trimming in the newest version of MSR (2.0.25). The sample sheet (generated by Experiment manager 1.3, by default) specifies an adapter sequence to trim from the FASTQs.
The sequence that is specified is only 20bp of one side ( ) of the TruSeq adapter..."AGATCGGAAGAGCACACGTC"...but the trimming is so promiscuous that reads from several regions of the genome (that I've found so far) are getting trimmed at the genomic location. I haven't had time to further characterize the effect, but I've just shut off adapter trimming as it seems like it was developed and tested against a less complex genome than human *cough*phiX*cough*.
We first noticed the problem in CFTR exon 10, see the phenotype in the IGV screenshot below, where the same run was put through MSReporter with and without adapter trimming. I'll let you guess which is which. Note that only the R1's are trimmed...R2s from the same cluster aren't (because only one side of the TruSeq adapter is specified).
Another example in another gene:
...zoomed in to see the sequence:
Looking at the directionality of trimming and manually aligning the genomic location to the MSR trimming sequence, it seems immediately evident what's going on...and that it is indeed promiscuous adapter trimming...that somehow is trimming when there is 5-6 mismatches over 20bp.
Anyway, just thought I'd share...make sure to turn that off. And trust no one with trimming.
The sequence that is specified is only 20bp of one side ( ) of the TruSeq adapter..."AGATCGGAAGAGCACACGTC"...but the trimming is so promiscuous that reads from several regions of the genome (that I've found so far) are getting trimmed at the genomic location. I haven't had time to further characterize the effect, but I've just shut off adapter trimming as it seems like it was developed and tested against a less complex genome than human *cough*phiX*cough*.
We first noticed the problem in CFTR exon 10, see the phenotype in the IGV screenshot below, where the same run was put through MSReporter with and without adapter trimming. I'll let you guess which is which. Note that only the R1's are trimmed...R2s from the same cluster aren't (because only one side of the TruSeq adapter is specified).
Another example in another gene:
...zoomed in to see the sequence:
Looking at the directionality of trimming and manually aligning the genomic location to the MSR trimming sequence, it seems immediately evident what's going on...and that it is indeed promiscuous adapter trimming...that somehow is trimming when there is 5-6 mismatches over 20bp.
Anyway, just thought I'd share...make sure to turn that off. And trust no one with trimming.
Comment