Hello everyone.
I'm getting a strange Bowtie output where the resulting sequence doesn't find a match to the genome, but remains included in my SAM output file. I only noticed this when running another process downstream and the number of alignments put in matched the number of inputs for Bowtie, although the inputs did not all align to the genome.
All of the 3732023 reads that failed to align to the genome remain in the output file and are not given a chromosome assignment, but a "*" and the start and stop regions are reported as zero. See below:
When I discovered this I used gawk to filter out all alignments with a chromosome assignment = * and the remaining result of alignments was the number of alignments reported to have been aligned.
Does anyone understand this problem? If so, is there a way to exclude these results from the output file other than using gawk or another utility to filter them out? I've used Bowtie quite a bit and I've never experienced a problem like this.
Thanks.
I'm getting a strange Bowtie output where the resulting sequence doesn't find a match to the genome, but remains included in my SAM output file. I only noticed this when running another process downstream and the number of alignments put in matched the number of inputs for Bowtie, although the inputs did not all align to the genome.
Code:
brandon@brandon:~/brachy/data/wt$ bowtie -t -p 4 --sam-nohead --sam-nosq -f -S -k 1 -v 0 ~/brachy/genome/brachygen wt.collapse.fq wt.collapse.sam Time loading forward index: 00:00:01 Time for 0-mismatch search: 00:00:10 # reads processed: 2009754 # reads with at least one reported alignment: 1636731 (81.44%) # reads that failed to align: 373023 (18.56%) Reported 1636731 alignments to 1 output stream(s) Time searching: 00:00:13 Overall time: 00:00:13
Code:
813501-1 0 Bd1 17008793 255 24M * 0 0 TGGAAAAGATTCTGGATCCTGTGC IIIIIIIIIIIIIIIIIIIIIIII XA:i:0 MD:Z:24 NM:i:0 813504-1 4 * 0 0 * * 0 0 GAACGGAATTACAGAACAAAATA IIIIIIIIIIIIIIIIIIIIIII XM:i:0 813506-1 4 * 0 0 * * 0 0 GATAACCGTAGTAATTCTAGAGCTGATACGTGC IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII XM:i:0 813505-1 16 Bd1 30522658 255 24M * 0 0 GCCACTGATTCCACCTGTAACCAA IIIIIIIIIIIIIIIIIIIIIIII XA:i:0 MD:Z:24 NM:i:0 813503-1 16 Bd4 34965205 255 25M * 0 0 CCCACTAAGTGTAGTTAATTTTAGG IIIIIIIIIIIIIIIIIIIIIIIII XA:i:0 MD:Z:25 NM:i:0 813507-1 0 Bd2 11654350 255 24M * 0 0 ACCCTGCCGATGGGACTCAGTGGA IIIIIIIIIIIIIIIIIIIIIIII XA:i:0 MD:Z:24 NM:i:0 813510-1 4 * 0 0 * * 0 0 CGCTATCAGATGAGCCTAGGTCGGATTA IIIIIIIIIIIIIIIIIIIIIIIIIIII XM:i:0 813508-1 0 Bd3 3399363 255 24M * 0 0 GCGGCTGATTCTGAATAATACCAA IIIIIIIIIIIIIIIIIIIIIIII XA:i:0 MD:Z:24 NM:i:0 813509-1 16 Bd5 24802257 255 21M * 0 0 ACGCCTTTGCTCAGGTGCCAT IIIIIIIIIIIIIIIIIIIII XA:i:0 MD:Z:21 NM:i:0 813512-1 0 Bd4 36841227 255 24M * 0 0 ATGGTCGAAATATTACATGACGCA IIIIIIIIIIIIIIIIIIIIIIII XA:i:0 MD:Z:24 NM:i:0
Does anyone understand this problem? If so, is there a way to exclude these results from the output file other than using gawk or another utility to filter them out? I've used Bowtie quite a bit and I've never experienced a problem like this.
Thanks.
Comment