Hi everyone,
I am trying to extract only concordant paired reads from my bam file using a command that I found in posts with similar subject:
https://www.biostars.org/p/95929/
https://www.biostars.org/p/119316/
https://broadinstitute.github.io/pic...ain-flags.html
However the command that I use:
samtools view -b -f 0x2 accepted_hits.bam -o accepted_hits_conc.bam
results to the production of the accepted_hits_conc.bam file, which is much smaller than expected.
Specifically the original accepted_hits.bam file has size 1.4 Gb while the accepted_hits_conc.bam has size only 141 Mb.
Also by running samtools I get:
samtools view -c accepted_hits.bam
34745648
samtools view -c accepted_hits_conc.bam
3188162
Now what troubles me is that the align_summary.txt returned by tophat2 reports:
Left reads:
Input : 18363154
Mapped : 17461735 (95.1% of input)
of these: 1424801 ( 8.2%) have multiple alignments (1497233 have >1)
Right reads:
Input : 18363154
Mapped : 17283913 (94.1% of input)
of these: 1424801 ( 8.2%) have multiple alignments (1485742 have >1)
94.6% overall read mapping rate.
Aligned pairs: 16585377
of these: 1424801 ( 8.6%) have multiple alignments
3788640 (22.8%) are discordant alignments
69.7% concordant pair alignment rate.
If there are 16585377 aligned pairs from an input of 18363154 paired end reads and a 69.7% concordant pair alignment rate, why do I get so small output from samtools?
Also I saw in other posts that using this approach is preferable to using the --no-discordant option of tophat2. But why is that?
Shouldn't I get exactly the concordant paired reads as output, if I specify the --no-discordant and --no-mixed options in tophat2?
I am trying to extract only concordant paired reads from my bam file using a command that I found in posts with similar subject:
https://www.biostars.org/p/95929/
https://www.biostars.org/p/119316/
https://broadinstitute.github.io/pic...ain-flags.html
However the command that I use:
samtools view -b -f 0x2 accepted_hits.bam -o accepted_hits_conc.bam
results to the production of the accepted_hits_conc.bam file, which is much smaller than expected.
Specifically the original accepted_hits.bam file has size 1.4 Gb while the accepted_hits_conc.bam has size only 141 Mb.
Also by running samtools I get:
samtools view -c accepted_hits.bam
34745648
samtools view -c accepted_hits_conc.bam
3188162
Now what troubles me is that the align_summary.txt returned by tophat2 reports:
Left reads:
Input : 18363154
Mapped : 17461735 (95.1% of input)
of these: 1424801 ( 8.2%) have multiple alignments (1497233 have >1)
Right reads:
Input : 18363154
Mapped : 17283913 (94.1% of input)
of these: 1424801 ( 8.2%) have multiple alignments (1485742 have >1)
94.6% overall read mapping rate.
Aligned pairs: 16585377
of these: 1424801 ( 8.6%) have multiple alignments
3788640 (22.8%) are discordant alignments
69.7% concordant pair alignment rate.
If there are 16585377 aligned pairs from an input of 18363154 paired end reads and a 69.7% concordant pair alignment rate, why do I get so small output from samtools?
Also I saw in other posts that using this approach is preferable to using the --no-discordant option of tophat2. But why is that?
Shouldn't I get exactly the concordant paired reads as output, if I specify the --no-discordant and --no-mixed options in tophat2?
Comment