Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • JohnK
    Senior Member
    • Feb 2010
    • 106

    Extracting unique reads from a .ma or .bam file?

    Hi all,

    I wanted to know the most efficient way to extract uniquely mapped reads from BioScope's resequencing mapping pipeline output in either the .ma or .bam format (those are the files I have to work with)? By unique, I mean reads that mapped to one location alone. Thanks!

    J
  • drio
    Senior Member
    • Oct 2008
    • 323

    #2
    .ma way:

    They keep all the coordinates for all the mapping positions available. You can use that
    to drop reads that map to multiple locations.
    Actually, they should have another file (.ma.unique.*) that does that for you already.

    .bam way:

    Look to see if they have any tag (NH for example) to indicate how many alignments where done for that read.

    You can post some records here we can take a look.
    -drd

    Comment

    • JohnK
      Senior Member
      • Feb 2010
      • 106

      #3
      Originally posted by drio View Post
      .ma way:

      They keep all the coordinates for all the mapping positions available. You can use that
      to drop reads that map to multiple locations.
      Actually, they should have another file (.ma.unique.*) that does that for you already.

      .bam way:

      Look to see if they have any tag (NH for example) to indicate how many alignments where done for that read.

      You can post some records here we can take a look.

      I was looking for the .unique.ma, but I didn't find it or any options with BioScope after mapping my resequencing sample. I understand that Corona produced that automatically. From what I understand, unique reads in bioscope are a matter of score > 5.



      >2_29_53_F3,9_-110926286.224.2.15):q7
      T33333203331202321313113130023213300030321303003300
      >2_29_134_F3
      T23331313312033202331333300333110330110312303003300
      >2_29_155_F3
      T11322113002233330002013020033212300020300303003302
      >2_29_160_F3
      T11331323300033211333223330333013320210301303003300
      >2_29_200_F3,17_-90327533.241.5.0):q21,8_-43471518.229.3.0):q0
      T01030013331133333333303333033303333330333303303300
      >2_29_217_F3
      T13330023311233323333300302133123220020302303002300



      ******

      Is there a way to pick out a uniquely mapped read from a .bam file with SAM/Bam tools?

      Comment

      • drio
        Senior Member
        • Oct 2008
        • 323

        #4
        Originally posted by JohnK View Post
        Is there a way to pick out a uniquely mapped read from a .bam file with SAM/Bam tools?
        No within the mandatory fields. That's why I was asking if you can post here
        some SAM entries from Bioscope so we can take a look.
        -drd

        Comment

        • lh3
          Senior Member
          • Feb 2008
          • 686

          #5
          http://sourceforge.net/apps/mediawik...from_SAM.2FBAM.

          Comment

          • drio
            Senior Member
            • Oct 2008
            • 323

            #6
            He wants to get the unique mapped reads.
            But yes I agree with this (from the FAQ):

            We prefer to say an alignment is `reliable' rather than `unique' as `uniqueness' is not well defined in general cases. You can get reliable alignments by setting a threshold on mapping quality
            -drd

            Comment

            • JohnK
              Senior Member
              • Feb 2010
              • 106

              #7
              Originally posted by drio View Post
              He wants to get the unique mapped reads.
              But yes I agree with this (from the FAQ):
              I had some trouble converting between bam back to sam. It appears to be incorrect. Here's my usage:

              samtools view in.bam > name.sam

              Probably not right usage here. Also, I'm re-formatting the matobam.ini file so it calcs 'unique' (or more reliable) reads based on the QV > 5. If anybody knows if this threshold is optimal, your help would be greatly appreciated? I'll then use bed tools to convert to bed and get the coverage off the unique. Hopefully, this will work and I can get a rough estimate of concordance between the unique reads from Corona versus the unique reads from BioScope.

              Comment

              • drio
                Senior Member
                • Oct 2008
                • 323

                #8
                Your samtools cmd is correct. What error/problem are you getting?
                -drd

                Comment

                • JohnK
                  Senior Member
                  • Feb 2010
                  • 106

                  #9
                  Originally posted by drio View Post
                  Your samtools cmd is correct. What error/problem are you getting?
                  17_1177_738 16 chr1 2999999 7 25H25M * 0 0 NNGAATTCTTTTCTATGATTTAGTT """IIIIIIIIIIIIIIIIIIIII! RG:Z:20100521202751742 NH:i:1 CS:Z:T30123003213322000220302023010232222031211331023323 CQ:Z:BB?@;A@??B@=4>B<6<<A<?=?A=96=?>289>;7/=?04<0B3+(=> CM:i:2
                  774_1259_2007 16 chr1 2999999 7 25H25M * 0 0 NNGAATTCTTTTCTATGATTTAGTT """IIIIIIIIIIIIIIIIIIIII! RG:Z:20100521202751742 NH:i:1 CS:Z:T30123003213322000220302023010232222031211331023123 CQ:Z:B:AABA7BBBA?@?B=;A>B;<B:A?7=.@?2<;A:=<<?;<?5>:15?@ CM:i:2

                  Comment

                  • drio
                    Senior Member
                    • Oct 2008
                    • 323

                    #10
                    So what is the problem? What were you expecting as a output?
                    -drd

                    Comment

                    • JohnK
                      Senior Member
                      • Feb 2010
                      • 106

                      #11
                      Originally posted by drio View Post
                      So what is the problem? What were you expecting as a output?
                      Nah- I realized it's correct now. It's just my first time really messing with and having to look at the SAM format meticulously. So this is what I'm expecting as output, but I just want to grab the unique reads. I'll keep reading over the manuals and what not and see if I can figure this out.

                      Comment

                      • JohnK
                        Senior Member
                        • Feb 2010
                        • 106

                        #12
                        Originally posted by drio View Post
                        So what is the problem? What were you expecting as a output?
                        Just a follow up, if anyone cares to know. You can extract unique (aka more 'reliable' reads) via the matobam.output.parameter='unique' setting. Then with BedTools convert from bam to bed and so forth... I appreciate the help, all!

                        Comment

                        • bair
                          Member
                          • Jan 2010
                          • 65

                          #13
                          My bam file includes 2x35bp mate-pair and 2x101 pair-end, any efficient way to split it into MP.bam and PE.bam files?

                          Comment

                          • drio
                            Senior Member
                            • Oct 2008
                            • 323

                            #14
                            Originally posted by bair View Post
                            My bam file includes 2x35bp mate-pair and 2x101 pair-end, any efficient way to split it into MP.bam and PE.bam files?
                            If you are confortable with C or java, use samtools or picard to iterate over your bam and dump the data to PE or MP bam based on the read length.
                            That's the most efficient way.

                            There are other ports of samtools for lisp and python (maybe more) if you
                            want a more higher level language.

                            Another option would be to stream the bam data to sam with samtools and pipe it to your split script. After that you can convert back to BAM. It would be slower but it'd work just fine.

                            Something like:

                            Code:
                            #!/bin/bash
                            #
                            samtools view ./i.bam  | ruby -a -ne 'puts $_ if $F[9].size == 35' > 35.sam
                            samtools view ./i.bam  | ruby -a -ne 'puts $_ if $F[9].size == 120' > 120.sam 
                            samtools view -bS ./35.sam > 35.bam
                            samtools view -bS ./120.sam > 120.bam
                            -drd

                            Comment

                            • bair
                              Member
                              • Jan 2010
                              • 65

                              #15
                              Thank you Drio,

                              Your suggestions and script code are very helpful, like to use the simple code.

                              Thanks again.

                              Comment

                              Latest Articles

                              Collapse

                              • SEQadmin2
                                Nine Things a Sample Prep Scientist Thinks About Before Sequencing
                                by SEQadmin2


                                I’m not a sequencing expert. I’m a purification scientist who uses NGS to evaluate workflows my group develops. With this perspective, we think about the sample first and the NGS workflow second. The sequencer is an exceptionally honest reporter, but it can only report on what you give it, so whether you get clean, interpretable data from an NGS workflow is largely determined before you begin.


                                Here are nine questions we think about, in roughly the order they matter, before...
                                06-18-2026, 07:11 AM
                              • SEQadmin2
                                From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
                                by SEQadmin2


                                Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


                                The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
                                ...
                                06-02-2026, 10:05 AM
                              • SEQadmin2
                                Single-Cell Sequencing at an Inflection Point: Early Impacts of New Platforms and Emerging Trends
                                by SEQadmin2


                                With the launch of new single-cell sequencing platforms in 2026, the field stands at an exciting inflection point. This article surveys the most impactful advances in the field and discusses how they’re reshaping research in cancer, immunology, and beyond.


                                Introduction

                                Single-cell sequencing technologies have undergone remarkable advances over the past decade, transitioning from low-throughput experimental approaches to highly scalable platforms capable of...
                                05-22-2026, 06:42 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by SEQadmin2, 06-17-2026, 06:09 AM
                              0 responses
                              24 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 06-09-2026, 11:58 AM
                              0 responses
                              40 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 06-05-2026, 10:09 AM
                              0 responses
                              47 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 06-04-2026, 08:59 AM
                              0 responses
                              49 views
                              0 reactions
                              Last Post SEQadmin2  
                              Working...