Hi all,
I am new to next-gen bioinformatics. Been working on MiSeq data (targeted amplicon sequencing) for the past few weeks, using tools on Galaxy.
Initially I found that the percentage of reads aligned by BWA (to reference hg19) was quite low ~50%. Just by eyeballing I noted that the majority of the unmapped reads were 'contaminated' by the adapter sequence: CTGTCTCTTATACACATCT (library was Nextera); but intriguingly the adapter sequence did not just occur at the 3' ends, some reads had them in the middle.
So I decided to remove the adapter using a tool called Clip on Galaxy (this improved the percentage of mapped reads a lot!), and compared the variant-calling (GATK) results using adapter-trimmed reads versus untrimmed reads. I found that variant-calling was actually worse with adapter-trimmed reads - mapping quality in particular was generally lower e.g. a lot of MQ0 reads, and some true variants were skipped because read depth was too low. I wonder why this would happen and was I doing something wrong? I have read other threads and someone suggested that adapter sequence removal is actually not necessary for reference-based alignment. Is this true even when the percentage of aligned reads is low?
Any advice is greatly appreciated. Thanks!
I am new to next-gen bioinformatics. Been working on MiSeq data (targeted amplicon sequencing) for the past few weeks, using tools on Galaxy.
Initially I found that the percentage of reads aligned by BWA (to reference hg19) was quite low ~50%. Just by eyeballing I noted that the majority of the unmapped reads were 'contaminated' by the adapter sequence: CTGTCTCTTATACACATCT (library was Nextera); but intriguingly the adapter sequence did not just occur at the 3' ends, some reads had them in the middle.
So I decided to remove the adapter using a tool called Clip on Galaxy (this improved the percentage of mapped reads a lot!), and compared the variant-calling (GATK) results using adapter-trimmed reads versus untrimmed reads. I found that variant-calling was actually worse with adapter-trimmed reads - mapping quality in particular was generally lower e.g. a lot of MQ0 reads, and some true variants were skipped because read depth was too low. I wonder why this would happen and was I doing something wrong? I have read other threads and someone suggested that adapter sequence removal is actually not necessary for reference-based alignment. Is this true even when the percentage of aligned reads is low?
Any advice is greatly appreciated. Thanks!
Comment