Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • SAM flag field and removing unmapped reads from BFAST output

    Hi there,

    I'm using BFAST to align Solexa reads to a very small portion of a genome (~3kb), and have been considering the best way to remove unmapped reads from the output since these unnecessarily bulk up the output .sam file. I know that samtools can filter an incoming .sam file using the -F command. However, I've read some documentation on the SAM flag format and must admit I find it pretty confusing. Within the flag field I know there are fields for both "the mate is unmapped" and "the query sequence itself is unmapped", but for non-paired-end Solexa reads can either of these be used for removing unmapped reads? Furthermore, what would be the integer or string used in the -F command?

    Alternatively, there is the option in samtools view to filter by map quality (MAPQ). Would setting map quality filter to e.g. 1 remove all unmapped reads without affecting the filtered alignment from BFAST postprocess?

    Alternatively again, dbamfilter within the DNAA package has the capacity to remove unmapped reads, but if samtools can do the job I'd like to minimise the number of apps employed.

    What are thoughts on the best strategy?
    Aiden

  • #2
    Hi Aiden,

    I hope I'm not staing the obvious here, but are you familiar with Picard? They are some Java-based commandline tools to manipulate sam files and one of those may help you: ViewSam.jar. It basically prints a sam or bam file to the screen but you can set a flag to report all reads, just the aligned reads or just the unaligned reads.
    Take a look: http://picard.sourceforge.net/

    Cheers,
    Wil

    Comment


    • #3
      Originally posted by aiden View Post
      Hi there,

      I'm using BFAST to align Solexa reads to a very small portion of a genome
      (~3kb), and have been considering the best way to remove unmapped reads from
      the output since these unnecessarily bulk up the output .sam file. I know that
      samtools can filter an incoming .sam file using the -F command. However, I've
      read some documentation on the SAM flag format and must admit I find it pretty
      confusing. Within the flag field I know there are fields for both "the mate is
      unmapped" and "the query sequence itself is unmapped", but for non-paired-end
      Solexa reads can either of these be used for removing unmapped reads?
      Look at the BAM spec 2.2.2 (Notes):

      Code:
      1. Flag 0x02, 0x08, 0x20, 0x40 and 0x80 are only meaningful when flag 0x01 is present.
      Assuming you are using Fragment data, you want to filter using the 0x0004 flag.

      Furthermore, what would be the integer or string used in the -F command?
      Code:
      $ samtools view -F 4 ./foo.bam # display mapped reads only
      $ samtools view -f 4 ./foo.bam # display unmapped reads only
      Alternatively, there is the option in samtools view to filter by map quality
      (MAPQ). Would setting map quality filter to e.g. 1 remove all unmapped reads
      without affecting the filtered alignment from BFAST postprocess?
      Go for samtools as suggested.
      You need to do the postprocessing prior to be able to filter your reads anyway.

      Alternatively again, dbamfilter within the DNAA package has the capacity to
      remove unmapped reads, but if samtools can do the job I'd like to minimise the number of apps employed.
      samtools can do the job. But give dnaa a try too.
      -drd

      Comment


      • #4
        Thanks for the very helpful replies, much appreciated.

        Comment

        Latest Articles

        Collapse

        • seqadmin
          Current Approaches to Protein Sequencing
          by seqadmin


          Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
          04-04-2024, 04:25 PM
        • seqadmin
          Strategies for Sequencing Challenging Samples
          by seqadmin


          Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
          03-22-2024, 06:39 AM

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by seqadmin, 04-11-2024, 12:08 PM
        0 responses
        27 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-10-2024, 10:19 PM
        0 responses
        31 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-10-2024, 09:21 AM
        0 responses
        27 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-04-2024, 09:00 AM
        0 responses
        52 views
        0 likes
        Last Post seqadmin  
        Working...
        X