Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • Cbon
    Junior Member
    • Jun 2010
    • 2

    BWA - number of sequences aligned

    Hello everyone,

    I am quite a newbie in bioinformatics, and I have some difficulties using bwa software.

    I successfully aligned my dataset to my reference genome. Now, I want to see the impact of the settings on my results, but I don't know how to obtain the number of reads aligned in the .sam file.

    Is there a useful trick to obtain the number of reads aligned ?

    I have found no clue on the bwa manual, neither in the samtool package and I begin to feel quite helpless ...

    Thank you for your answers !
  • dawe
    Senior Member
    • Apr 2009
    • 258

    #2
    Originally posted by Cbon View Post
    Hello everyone,

    Is there a useful trick to obtain the number of reads aligned ?
    Try

    Code:
    samtools flagstat [I]file.bam[/I]
    Although I don't know if it works on sam files (in case you should convert in BAM, which I believe should be the default...).

    d

    Comment

    • racetrout
      Junior Member
      • Jun 2010
      • 2

      #3
      If you are using UNIX and GNU awk,
      Code:
      gawk '!/^[@#]/ && (!and($2,4){print}' bwa.sam| wc -l
      should give you the number of reads aligned - providing BWA is setting the flag for unaligned reads. If you have many reads and BWA writes out only the aligned reads,
      Code:
      wc -l bwa.sam
      is a very good simple approximate of aligned reads (it counts the header of course, but the ~5 lines of the header compared to the millions of reads are negligible).

      Comment

      • Cbon
        Junior Member
        • Jun 2010
        • 2

        #4
        Thank you both for your answers !

        My files are already converted in .bam, so I hope I can use the flagstat command. If it doesn't work, I will try the code.

        thank you again !

        Comment

        • saml
          Junior Member
          • Oct 2010
          • 4

          #5
          Originally posted by racetrout View Post
          If you are using UNIX and GNU awk,
          Code:
          gawk '!/^[@#]/ && (!and($2,4){print}' bwa.sam| wc -l
          should give you the number of reads aligned - providing BWA is setting the flag for unaligned reads. If you have many reads and BWA writes out only the aligned reads,
          Code:
          wc -l bwa.sam
          is a very good simple approximate of aligned reads (it counts the header of course, but the ~5 lines of the header compared to the millions of reads are negligible).
          Yes, but removing the lines is easy too. Skipping 2 header lines (think I had that), and starting on the 3rd line, can be done with the tail command and adding a '+' before the number of lines, to denote which line to start from:

          Code:
          tail -n +3 bwa.sam | gawk '!/^[@#]/ && (!and($2,4){print}' | wc -l
          ... or for the case with only aligned reads coming from bwa:
          Code:
          tail -n +3 bwa.sam | wc -l

          Comment

          • swbarnes2
            Senior Member
            • May 2008
            • 910

            #6
            bwa by default includes all reads, even unaligned ones. You can tell that they are unmapped, from the binary tag.

            So the flagstat command will distinguish between the mapped an unmapped reads.

            You can also use samtools view -bF0x4 all.bam > mapped .bam to get a bam which has all the unmapped reads filtered out.

            Comment

            • nilshomer
              Nils Homer
              • Nov 2008
              • 1283

              #7
              Use "samtools view -c -S <in.sam>" and add the "-f/-F" options as desired.

              Comment

              • solonas13
                Junior Member
                • May 2012
                • 1

                #8
                I personally find it---at the best case---unacceptable the fact that a short-read aligner does not have an extra line of code to report the total number of aligned reads... Especially when it outputs (on the standard error) things like:

                `[infer_isize] (25, 50, 75) percentile: (137, 179, 227)'

                Anyhow, isn't there a closing parenthesis missing from this one?

                Originally posted by saml View Post
                Code:
                tail -n +3 bwa.sam | gawk '!/^[@#]/ && (!and($2,4){print}' | wc -l
                Thanks
                Last edited by solonas13; 05-13-2012, 03:47 PM.

                Comment

                Latest Articles

                Collapse

                • SEQadmin2
                  Nine Things a Sample Prep Scientist Thinks About Before Sequencing
                  by SEQadmin2


                  I’m not a sequencing expert. I’m a purification scientist who uses NGS to evaluate workflows my group develops. With this perspective, we think about the sample first and the NGS workflow second. The sequencer is an exceptionally honest reporter, but it can only report on what you give it, so whether you get clean, interpretable data from an NGS workflow is largely determined before you begin.

                  Here are nine questions we think about, in roughly the order they matter, before...
                  06-18-2026, 07:11 AM
                • SEQadmin2
                  From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
                  by SEQadmin2


                  Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


                  The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
                  ...
                  06-02-2026, 10:05 AM

                ad_right_rmr

                Collapse

                News

                Collapse

                Topics Statistics Last Post
                Started by SEQadmin2, 06-26-2026, 11:10 AM
                0 responses
                11 views
                0 reactions
                Last Post SEQadmin2  
                Started by SEQadmin2, 06-17-2026, 06:09 AM
                0 responses
                45 views
                0 reactions
                Last Post SEQadmin2  
                Started by SEQadmin2, 06-09-2026, 11:58 AM
                0 responses
                105 views
                0 reactions
                Last Post SEQadmin2  
                Started by SEQadmin2, 06-05-2026, 10:09 AM
                0 responses
                125 views
                0 reactions
                Last Post SEQadmin2  
                Working...