Hi All,
I'm using BWA to align some paired end illunmina data. The reads vary in length, but around 45-50bp. I fully expect that a large portion of reads will not pair nicely.
I run BWA aln on each end, and then sampe to pair the reads.
In the resulting sam file, I get many examples of reads where each end looks perfectly well aligned, but the PNEXT and RNEXT columns for one read will not match the RNAME and POS of the pair. e.g.:
The flags suggest that each read is mapped, and the tags that they map uniquely. I was hoping to use the the RNEXT and PNEXT in some downstream analysis, any ideas why they are so messed up? I'm also worried that the pairing is off somewhere and that the reads I think are mates are not.
The alignment was produced using bwa-0.7.5a against hg19 with following commands
I've checked that the reads are in the same order in the fastq files:
and that no names are duplicated:
I've pretty much run out of ideas. Is this a bug in BWA or am I miss interpreting what should be in these fields?
I include a sample of reads for reproduction purposes.
Any suggestions appreciated
Ian
---
I'm using BWA to align some paired end illunmina data. The reads vary in length, but around 45-50bp. I fully expect that a large portion of reads will not pair nicely.
I run BWA aln on each end, and then sampe to pair the reads.
In the resulting sam file, I get many examples of reads where each end looks perfectly well aligned, but the PNEXT and RNEXT columns for one read will not match the RNAME and POS of the pair. e.g.:
Code:
HISEQ:62:H0L2YADXX:2:2202:8851:45247 129 chr1 7856175 37 44M = 8863975 1007800 AGACACCCCCGACCCATTTTAGACATTCCTTAGCCCAGACTTGG GDHCFEEFFDC?BBB<CCDEFCCCCCCCDCCCCCCC?A<@CCCC XT:A:U NM:i:1 SM:i:37 AM:i:37 X0:i:1 X1:i:0 XM:i:1 XO:i:0 XG:i:0 MD:Z:9A34 HISEQ:62:H0L2YADXX:2:2202:8851:45247 113 chr1 7856291 37 46M chr3 18931504 0 GGATGGTAGGGATGGTTATTGCTACAGGGTTATCATTGCGTCTAGA ?CACDBCDDBCA>CCC@DDCCACADBBDDDDCCCBB?8BCFEBBFD XT:A:U NM:i:1 SM:i:37 AM:i:37 X0:i:1 X1:i:0 XM:i:1 XO:i:0 XG:i:0 MD:Z:39A6
The alignment was produced using bwa-0.7.5a against hg19 with following commands
Code:
bwa aln -l 25 -k 2 -n 5 -t 4 /ifs/mirror/genomes/bwa-0.7.5a/hg19 <( gunzip < test.fq.1.gz ) > test.1.sai 2>test.log bwa aln -l 25 -k 2 -n 5 -t 4 /ifs/mirror/genomes/bwa-0.7.5a/hg19 <( gunzip < test.fq.2.gz ) > test.2.sai 2>>test.log bwa sampe -A /ifs/mirror/genomes/bwa-0.7.5a/hg19 test.1.sai test.2.sai <( gunzip < test.fq.1.gz ) <( gunzip < test.fq.2.gz ) test.sam 2>>test.log
Code:
zcat test.fq.1.gz | awk '{if(NR%4==1) print $1}' > test.1.names zcat test.fq.2.gz | awk '{if(NR%4==1) print $1}' > test.2.names paste test.1.names test.2.names | awk '{if ($1 != $2) print $0}' | wc -l 0
Code:
wc -l test.1.names 594 test.1.names wc -l test.2.names 594 test.2.names cat test.1.names | sort | uniq | wc -l 594 cat test.2.names | sort | uniq | wc -l 594
I include a sample of reads for reproduction purposes.
Any suggestions appreciated
Ian
---
Comment