View Single Post
Old 01-26-2016, 03:02 PM   #1
kostask
Junior Member
 
Location: Greece

Join Date: Sep 2015
Posts: 8
Default extract_only_concordant_paired_reads_from_bam_file

Hi everyone,

I am trying to extract only concordant paired reads from my bam file using a command that I found in posts with similar subject:

https://www.biostars.org/p/95929/

https://www.biostars.org/p/119316/

https://broadinstitute.github.io/pic...ain-flags.html

However the command that I use:

samtools view -b -f 0x2 accepted_hits.bam -o accepted_hits_conc.bam

results to the production of the accepted_hits_conc.bam file, which is much smaller than expected.

Specifically the original accepted_hits.bam file has size 1.4 Gb while the accepted_hits_conc.bam has size only 141 Mb.

Also by running samtools I get:

samtools view -c accepted_hits.bam
34745648

samtools view -c accepted_hits_conc.bam
3188162

Now what troubles me is that the align_summary.txt returned by tophat2 reports:

Left reads:
Input : 18363154
Mapped : 17461735 (95.1% of input)
of these: 1424801 ( 8.2%) have multiple alignments (1497233 have >1)
Right reads:
Input : 18363154
Mapped : 17283913 (94.1% of input)
of these: 1424801 ( 8.2%) have multiple alignments (1485742 have >1)
94.6% overall read mapping rate.

Aligned pairs: 16585377
of these: 1424801 ( 8.6%) have multiple alignments
3788640 (22.8%) are discordant alignments
69.7% concordant pair alignment rate.

If there are 16585377 aligned pairs from an input of 18363154 paired end reads and a 69.7% concordant pair alignment rate, why do I get so small output from samtools?

Also I saw in other posts that using this approach is preferable to using the --no-discordant option of tophat2. But why is that?

Shouldn't I get exactly the concordant paired reads as output, if I specify the --no-discordant and --no-mixed options in tophat2?
kostask is offline   Reply With Quote