Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Extracting unique reads from a .ma or .bam file?

    Hi all,

    I wanted to know the most efficient way to extract uniquely mapped reads from BioScope's resequencing mapping pipeline output in either the .ma or .bam format (those are the files I have to work with)? By unique, I mean reads that mapped to one location alone. Thanks!

    J

  • #2
    .ma way:

    They keep all the coordinates for all the mapping positions available. You can use that
    to drop reads that map to multiple locations.
    Actually, they should have another file (.ma.unique.*) that does that for you already.

    .bam way:

    Look to see if they have any tag (NH for example) to indicate how many alignments where done for that read.

    You can post some records here we can take a look.
    -drd

    Comment


    • #3
      Originally posted by drio View Post
      .ma way:

      They keep all the coordinates for all the mapping positions available. You can use that
      to drop reads that map to multiple locations.
      Actually, they should have another file (.ma.unique.*) that does that for you already.

      .bam way:

      Look to see if they have any tag (NH for example) to indicate how many alignments where done for that read.

      You can post some records here we can take a look.

      I was looking for the .unique.ma, but I didn't find it or any options with BioScope after mapping my resequencing sample. I understand that Corona produced that automatically. From what I understand, unique reads in bioscope are a matter of score > 5.



      >2_29_53_F3,9_-110926286.224.2.15):q7
      T33333203331202321313113130023213300030321303003300
      >2_29_134_F3
      T23331313312033202331333300333110330110312303003300
      >2_29_155_F3
      T11322113002233330002013020033212300020300303003302
      >2_29_160_F3
      T11331323300033211333223330333013320210301303003300
      >2_29_200_F3,17_-90327533.241.5.0):q21,8_-43471518.229.3.0):q0
      T01030013331133333333303333033303333330333303303300
      >2_29_217_F3
      T13330023311233323333300302133123220020302303002300



      ******

      Is there a way to pick out a uniquely mapped read from a .bam file with SAM/Bam tools?

      Comment


      • #4
        Originally posted by JohnK View Post
        Is there a way to pick out a uniquely mapped read from a .bam file with SAM/Bam tools?
        No within the mandatory fields. That's why I was asking if you can post here
        some SAM entries from Bioscope so we can take a look.
        -drd

        Comment


        • #5
          http://sourceforge.net/apps/mediawik...from_SAM.2FBAM.

          Comment


          • #6
            He wants to get the unique mapped reads.
            But yes I agree with this (from the FAQ):

            We prefer to say an alignment is `reliable' rather than `unique' as `uniqueness' is not well defined in general cases. You can get reliable alignments by setting a threshold on mapping quality
            -drd

            Comment


            • #7
              Originally posted by drio View Post
              He wants to get the unique mapped reads.
              But yes I agree with this (from the FAQ):
              I had some trouble converting between bam back to sam. It appears to be incorrect. Here's my usage:

              samtools view in.bam > name.sam

              Probably not right usage here. Also, I'm re-formatting the matobam.ini file so it calcs 'unique' (or more reliable) reads based on the QV > 5. If anybody knows if this threshold is optimal, your help would be greatly appreciated? I'll then use bed tools to convert to bed and get the coverage off the unique. Hopefully, this will work and I can get a rough estimate of concordance between the unique reads from Corona versus the unique reads from BioScope.

              Comment


              • #8
                Your samtools cmd is correct. What error/problem are you getting?
                -drd

                Comment


                • #9
                  Originally posted by drio View Post
                  Your samtools cmd is correct. What error/problem are you getting?
                  17_1177_738 16 chr1 2999999 7 25H25M * 0 0 NNGAATTCTTTTCTATGATTTAGTT """IIIIIIIIIIIIIIIIIIIII! RG:Z:20100521202751742 NH:i:1 CS:Z:T30123003213322000220302023010232222031211331023323 CQ:Z:BB?@;A@??B@=4>B<6<<A<?=?A=96=?>289>;7/=?04<0B3+(=> CM:i:2
                  774_1259_2007 16 chr1 2999999 7 25H25M * 0 0 NNGAATTCTTTTCTATGATTTAGTT """IIIIIIIIIIIIIIIIIIIII! RG:Z:20100521202751742 NH:i:1 CS:Z:T30123003213322000220302023010232222031211331023123 CQ:Z:B:AABA7BBBA?@?B=;A>B;<B:A?7=.@?2<;A:=<<?;<?5>:15?@ CM:i:2

                  Comment


                  • #10
                    So what is the problem? What were you expecting as a output?
                    -drd

                    Comment


                    • #11
                      Originally posted by drio View Post
                      So what is the problem? What were you expecting as a output?
                      Nah- I realized it's correct now. It's just my first time really messing with and having to look at the SAM format meticulously. So this is what I'm expecting as output, but I just want to grab the unique reads. I'll keep reading over the manuals and what not and see if I can figure this out.

                      Comment


                      • #12
                        Originally posted by drio View Post
                        So what is the problem? What were you expecting as a output?
                        Just a follow up, if anyone cares to know. You can extract unique (aka more 'reliable' reads) via the matobam.output.parameter='unique' setting. Then with BedTools convert from bam to bed and so forth... I appreciate the help, all!

                        Comment


                        • #13
                          My bam file includes 2x35bp mate-pair and 2x101 pair-end, any efficient way to split it into MP.bam and PE.bam files?

                          Comment


                          • #14
                            Originally posted by bair View Post
                            My bam file includes 2x35bp mate-pair and 2x101 pair-end, any efficient way to split it into MP.bam and PE.bam files?
                            If you are confortable with C or java, use samtools or picard to iterate over your bam and dump the data to PE or MP bam based on the read length.
                            That's the most efficient way.

                            There are other ports of samtools for lisp and python (maybe more) if you
                            want a more higher level language.

                            Another option would be to stream the bam data to sam with samtools and pipe it to your split script. After that you can convert back to BAM. It would be slower but it'd work just fine.

                            Something like:

                            Code:
                            #!/bin/bash
                            #
                            samtools view ./i.bam  | ruby -a -ne 'puts $_ if $F[9].size == 35' > 35.sam
                            samtools view ./i.bam  | ruby -a -ne 'puts $_ if $F[9].size == 120' > 120.sam 
                            samtools view -bS ./35.sam > 35.bam
                            samtools view -bS ./120.sam > 120.bam
                            -drd

                            Comment


                            • #15
                              Thank you Drio,

                              Your suggestions and script code are very helpful, like to use the simple code.

                              Thanks again.

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Strategies for Sequencing Challenging Samples
                                by seqadmin


                                Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                                03-22-2024, 06:39 AM
                              • seqadmin
                                Techniques and Challenges in Conservation Genomics
                                by seqadmin



                                The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                                Avian Conservation
                                Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                                03-08-2024, 10:41 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, 03-27-2024, 06:37 PM
                              0 responses
                              13 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 03-27-2024, 06:07 PM
                              0 responses
                              11 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 03-22-2024, 10:03 AM
                              0 responses
                              53 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 03-21-2024, 07:32 AM
                              0 responses
                              69 views
                              0 likes
                              Last Post seqadmin  
                              Working...
                              X