SEQanswers (
-   Bioinformatics (
-   -   extract_only_concordant_paired_reads_from_bam_file (

kostask 01-26-2016 04:02 PM

Hi everyone,

I am trying to extract only concordant paired reads from my bam file using a command that I found in posts with similar subject:

However the command that I use:

samtools view -b -f 0x2 accepted_hits.bam -o accepted_hits_conc.bam

results to the production of the accepted_hits_conc.bam file, which is much smaller than expected.

Specifically the original accepted_hits.bam file has size 1.4 Gb while the accepted_hits_conc.bam has size only 141 Mb.

Also by running samtools I get:

samtools view -c accepted_hits.bam

samtools view -c accepted_hits_conc.bam

Now what troubles me is that the align_summary.txt returned by tophat2 reports:

Left reads:
Input : 18363154
Mapped : 17461735 (95.1% of input)
of these: 1424801 ( 8.2%) have multiple alignments (1497233 have >1)
Right reads:
Input : 18363154
Mapped : 17283913 (94.1% of input)
of these: 1424801 ( 8.2%) have multiple alignments (1485742 have >1)
94.6% overall read mapping rate.

Aligned pairs: 16585377
of these: 1424801 ( 8.6%) have multiple alignments
3788640 (22.8%) are discordant alignments
69.7% concordant pair alignment rate.

If there are 16585377 aligned pairs from an input of 18363154 paired end reads and a 69.7% concordant pair alignment rate, why do I get so small output from samtools?

Also I saw in other posts that using this approach is preferable to using the --no-discordant option of tophat2. But why is that?

Shouldn't I get exactly the concordant paired reads as output, if I specify the --no-discordant and --no-mixed options in tophat2?

westerman 01-27-2016 08:22 AM

Answering your first question about samtools. One thing you could try is to see exactly what flags you have. Try

samtools view accepted_hits.bam | cut -f 2 | sort | uniq -c
That will, at least, tell you what you have in the file.

kostask 01-31-2016 10:55 AM

Thank you westerman

I saw exactly the flags and the corresponding reads for each one.

Wouid it be correct if I wanted to extract reads with certain flags (specifically 67, 131, 115 and 179) to do so with:

samtools view -b -f number_of_flag accepted_hits.bam -o flag_accepted_hits.bam

and then merge them together with:

samtools merge output_accepted_hits.bam flag1_accepted_hits.bam ...

It may be a silly question but with the samtools merge, the reads merge or they concatenate into a single bam file?

All times are GMT -8. The time now is 06:38 AM.

Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2018, vBulletin Solutions, Inc.