I have downloaded some exome data that appears to have been aligned using Novoalign
I determined this from the header:
@PG ID:novoalign VN:V2.06.09 CL:novoalign -c 16 -o SAM -o SoftClip -r Random -f 1.fastq 2.fastq -d /home/vg37/genomes/hg18_nohaps.fa.novocraft
When looking at the data in IGV, I see that lots of reads have a large fraction of the sequence softclipped, resulting in cigar strings such as "7S36M58S". The image below shows a region of the genome where this is particularly bad. I noticed that the softclipped sequences often match the genome sequence. I'm wondering why these regions were softclipped in the first place. Is this a known bug?
Ryan
I determined this from the header:
@PG ID:novoalign VN:V2.06.09 CL:novoalign -c 16 -o SAM -o SoftClip -r Random -f 1.fastq 2.fastq -d /home/vg37/genomes/hg18_nohaps.fa.novocraft
When looking at the data in IGV, I see that lots of reads have a large fraction of the sequence softclipped, resulting in cigar strings such as "7S36M58S". The image below shows a region of the genome where this is particularly bad. I noticed that the softclipped sequences often match the genome sequence. I'm wondering why these regions were softclipped in the first place. Is this a known bug?
Ryan
Comment