Hi,
I am in a bit of a pickle after trimming some adapters from a heavily contaminated PE library.
I think the problem is caused by the trimming (CutAdapt) removing some reads entirely from R1 and R2 fastqs, meaning that there are now some unpaired reads withe th 2 read files now out of sync...
The first problem this lead to was with bwa sampe inferring massive insert sizes, but I can turn off that behaviour by using the -A parameter.
This then leads to some weird validation warnings using picard to convert sam to bam (ignored using VALIDATION_STRINGENCY=LENIENT).
I then get the same lines erroring at each picard step, but the 'leniancy-avoiding-approach' then falls down at GATK IndelRealigner (using -S SILENT) when I get the following message and cannot progress:
Error caching SAM record (null), which is usually caused by malformed SAM/BAM files in which multiple identical copies of a read are present.
Without trimming these reads, they go through the pipeline nicely.
By trimming them, for those samples that dont throw up validation errors, I get a much better alignment, utilising more reads than untrimmed.
Can anyone suggest how I can get around this ?
I want to keep the trimmed reads, and hopefully use the reads that have lost their mates, but dont know how to go about that ... does anyone have any ides?
Thanks
I am in a bit of a pickle after trimming some adapters from a heavily contaminated PE library.
I think the problem is caused by the trimming (CutAdapt) removing some reads entirely from R1 and R2 fastqs, meaning that there are now some unpaired reads withe th 2 read files now out of sync...
The first problem this lead to was with bwa sampe inferring massive insert sizes, but I can turn off that behaviour by using the -A parameter.
This then leads to some weird validation warnings using picard to convert sam to bam (ignored using VALIDATION_STRINGENCY=LENIENT).
I then get the same lines erroring at each picard step, but the 'leniancy-avoiding-approach' then falls down at GATK IndelRealigner (using -S SILENT) when I get the following message and cannot progress:
Error caching SAM record (null), which is usually caused by malformed SAM/BAM files in which multiple identical copies of a read are present.
Without trimming these reads, they go through the pipeline nicely.
By trimming them, for those samples that dont throw up validation errors, I get a much better alignment, utilising more reads than untrimmed.
Can anyone suggest how I can get around this ?
I want to keep the trimmed reads, and hopefully use the reads that have lost their mates, but dont know how to go about that ... does anyone have any ides?
Thanks
Comment