Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • extract_only_concordant_paired_reads_from_bam_file

    Hi everyone,

    I am trying to extract only concordant paired reads from my bam file using a command that I found in posts with similar subject:

    https://www.biostars.org/p/95929/

    https://www.biostars.org/p/119316/

    https://broadinstitute.github.io/pic...ain-flags.html

    However the command that I use:

    samtools view -b -f 0x2 accepted_hits.bam -o accepted_hits_conc.bam

    results to the production of the accepted_hits_conc.bam file, which is much smaller than expected.

    Specifically the original accepted_hits.bam file has size 1.4 Gb while the accepted_hits_conc.bam has size only 141 Mb.

    Also by running samtools I get:

    samtools view -c accepted_hits.bam
    34745648

    samtools view -c accepted_hits_conc.bam
    3188162

    Now what troubles me is that the align_summary.txt returned by tophat2 reports:

    Left reads:
    Input : 18363154
    Mapped : 17461735 (95.1% of input)
    of these: 1424801 ( 8.2%) have multiple alignments (1497233 have >1)
    Right reads:
    Input : 18363154
    Mapped : 17283913 (94.1% of input)
    of these: 1424801 ( 8.2%) have multiple alignments (1485742 have >1)
    94.6% overall read mapping rate.

    Aligned pairs: 16585377
    of these: 1424801 ( 8.6%) have multiple alignments
    3788640 (22.8%) are discordant alignments
    69.7% concordant pair alignment rate.

    If there are 16585377 aligned pairs from an input of 18363154 paired end reads and a 69.7% concordant pair alignment rate, why do I get so small output from samtools?

    Also I saw in other posts that using this approach is preferable to using the --no-discordant option of tophat2. But why is that?

    Shouldn't I get exactly the concordant paired reads as output, if I specify the --no-discordant and --no-mixed options in tophat2?

  • #2
    Answering your first question about samtools. One thing you could try is to see exactly what flags you have. Try
    Code:
    samtools view accepted_hits.bam | cut -f 2 | sort | uniq -c
    That will, at least, tell you what you have in the file.

    Comment


    • #3
      Thank you westerman

      I saw exactly the flags and the corresponding reads for each one.

      Wouid it be correct if I wanted to extract reads with certain flags (specifically 67, 131, 115 and 179) to do so with:

      samtools view -b -f number_of_flag accepted_hits.bam -o flag_accepted_hits.bam

      and then merge them together with:

      samtools merge output_accepted_hits.bam flag1_accepted_hits.bam ...

      It may be a silly question but with the samtools merge, the reads merge or they concatenate into a single bam file?

      Comment

      Latest Articles

      Collapse

      • seqadmin
        Techniques and Challenges in Conservation Genomics
        by seqadmin



        The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

        Avian Conservation
        Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
        03-08-2024, 10:41 AM
      • seqadmin
        The Impact of AI in Genomic Medicine
        by seqadmin



        Artificial intelligence (AI) has evolved from a futuristic vision to a mainstream technology, highlighted by the introduction of tools like OpenAI's ChatGPT and Google's Gemini. In recent years, AI has become increasingly integrated into the field of genomics. This integration has enabled new scientific discoveries while simultaneously raising important ethical questions1. Interviews with two researchers at the center of this intersection provide insightful perspectives into...
        02-26-2024, 02:07 PM

      ad_right_rmr

      Collapse

      News

      Collapse

      Topics Statistics Last Post
      Started by seqadmin, 03-14-2024, 06:13 AM
      0 responses
      32 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 03-08-2024, 08:03 AM
      0 responses
      71 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 03-07-2024, 08:13 AM
      0 responses
      80 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 03-06-2024, 09:51 AM
      0 responses
      68 views
      0 likes
      Last Post seqadmin  
      Working...
      X