Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to extract multi-mapped reads by samtools?

    Hi guys,
    I’m working with some RNA seq data and get some problems.

    I mapped the paired-end reads on tophat and got the result in sam/bam format. Now I’m using samtools to analyze the result.

    I want to extract the reads that map to more than one place in the genome, and this is my command line:
    Samtools view –h –f 0x100 in.bam > out.sam
    There are no output alignmens in the out.sam except the head, which means there are no multi-mapped reads

    However, I’ve run my own program in perl and find that there’re lots of reads whose IDs appear more than twice in the sam file, which means .these read mapped more than one place in the genome. (Two paired-end reads have the same ID, so most of the time, a certain ID appearing twice means it’s a unique mapping)

    I want to know whether my command line is right? I mean whether 0x100 is the right parameter to extract multi-mapped reads.

  • #2
    Originally posted by mavishou View Post
    Hi guys,

    I want to know whether my command line is right? I mean whether 0x100 is the right parameter to extract multi-mapped reads.
    It could be. Depends on the program that created the BAM file. In other words perhaps the program thought that *all* matches were primary and thus never set the 0x100 ("the alignment is not primary") flag.

    I suggest that, since you know the names of some of the reads that have multiple mappings, you look through the BAM file for those reads and see what flags are being set for those reads.

    Comment


    • #3
      I have the same issue to work on. I know that there are multi-mapped reads in my alignment

      grep -i XA:Z:chr15 S1_unsorted.sam | wc -l
      118242
      but samtools view -f100 -c gives me 0

      Checking the lines with XA:Tag

      118242

      grep -i XA:Z:chr15 S1_unsorted.sam | head -n 10
      61WG6AAXX_1:1:1205:11675 0 chr15 28901805 23 76M * 0 0 GCCTCCTGAGTGGCTGGGATTACGGTGTGTGCCACCACGCCTGGCTACTTTTGTCTTTTTAGTAGAGCTGG
      GGGTT eeeceadd^deeceedd^T\\cc`ccddLTc\cTccTc\ceeec\baYYYbb^KKacYc[R^TTaL]L]Y```\Lc XT:A:U NM:i:3 X0:i:1 X1:i:1 XM:i:3 XO:i:0 XG:i:0 MD:Z:XA:Z:chr15,-20643137,76M,4
      ;
      shows that actually the FLAG is not set. Question is, why not??!

      Comment


      • #4
        You need give the right FLAG number 0x100 or 256, Both should work.

        Originally posted by Marie_Noir View Post
        I have the same issue to work on. I know that there are multi-mapped reads in my alignment



        but samtools view -f100 -c gives me 0

        Checking the lines with XA:Tag

        118242



        shows that actually the FLAG is not set. Question is, why not??!

        Comment


        • #5
          Hello,

          How would I get all the other locations in the genome where a particular read maps? If I use the 256 flag, it seems to only give a secondary alignment. I would like to rank the reads such that I can find the top 10 reads (top 10 that map to the most places in the genome).

          Thanks

          Comment


          • #6
            Originally posted by ashmc View Post
            Hello,

            How would I get all the other locations in the genome where a particular read maps? If I use the 256 flag, it seems to only give a secondary alignment. I would like to rank the reads such that I can find the top 10 reads (top 10 that map to the most places in the genome).

            Thanks
            If you only need to check a single read or a few, you could use BLAST. Otherwise, you could try bwa-mem with -a option to get the secondary alignments. Also you could try GEM-mapper which in default outputs all secondary alignments given a maximum threshold of mismatch (e.g 4% of the read length).

            Comment

            Latest Articles

            Collapse

            • seqadmin
              Strategies for Sequencing Challenging Samples
              by seqadmin


              Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
              03-22-2024, 06:39 AM
            • seqadmin
              Techniques and Challenges in Conservation Genomics
              by seqadmin



              The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

              Avian Conservation
              Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
              03-08-2024, 10:41 AM

            ad_right_rmr

            Collapse

            News

            Collapse

            Topics Statistics Last Post
            Started by seqadmin, Yesterday, 06:37 PM
            0 responses
            11 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, Yesterday, 06:07 PM
            0 responses
            10 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 03-22-2024, 10:03 AM
            0 responses
            51 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 03-21-2024, 07:32 AM
            0 responses
            68 views
            0 likes
            Last Post seqadmin  
            Working...
            X