Hello,
I've used the following set of parameters to perform paired-end alignment:
bowtie2 -x bowtie_db --very-sensitive -p # -q --fr -I # -X # --phred33 --no-mixed --no-discordant --no-contain -1 reads_a.fq -2 reads_b.fq -S output.sam
..., and when applied on the test fastq sequences the following stats appear after alignment is complete:
10000 reads; of these:
10000 (100.00%) were paired; of these:
525 (5.25%) aligned concordantly 0 times
6774 (67.74%) aligned concordantly exactly 1 time
2701 (27.01%) aligned concordantly >1 times
94.75% overall alignment rate
However, if you only consider those pairs that do not contain multi-mapping 'XS:i' tag, you get a lower number of reads:
samtools view -F 4 test.bam | grep -v 'XS:i' | awk '{print $1}' | uniq -d | wc -l
6765
So my question is -- how do you extract those 6774 pairs of aligned reads? Which other tag/flag is considered to be valid?
What confused me even more is that number of multi-mapped pairs also differs -- 2,512, which means that some of the combinations of unique and multi-mapped are put in one category or the other.
Thanks.
I've used the following set of parameters to perform paired-end alignment:
bowtie2 -x bowtie_db --very-sensitive -p # -q --fr -I # -X # --phred33 --no-mixed --no-discordant --no-contain -1 reads_a.fq -2 reads_b.fq -S output.sam
..., and when applied on the test fastq sequences the following stats appear after alignment is complete:
10000 reads; of these:
10000 (100.00%) were paired; of these:
525 (5.25%) aligned concordantly 0 times
6774 (67.74%) aligned concordantly exactly 1 time
2701 (27.01%) aligned concordantly >1 times
94.75% overall alignment rate
However, if you only consider those pairs that do not contain multi-mapping 'XS:i' tag, you get a lower number of reads:
samtools view -F 4 test.bam | grep -v 'XS:i' | awk '{print $1}' | uniq -d | wc -l
6765
So my question is -- how do you extract those 6774 pairs of aligned reads? Which other tag/flag is considered to be valid?
What confused me even more is that number of multi-mapped pairs also differs -- 2,512, which means that some of the combinations of unique and multi-mapped are put in one category or the other.
Thanks.