Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Non-BS mismatch count in .SAM (esp. Bismark)?

    Hello,

    This question should be of general interest, and judging from documentation/publication I am not sure about the issue, so here it goes:

    In Bismark, where can I see the number of non-BS-mismatches in aligned reads (i.e. in the SAM file)?

    I'm especially asking whether the number ## in NM:i:## is reliable. As I understand, it reflects only the true non-BS-mismatches (no C->T and no G->A conversion), since it is always lower than the number of total methylated+unmethylated positions.

    So, I'd just like to be sure that BS-mismatches aren't in any way taken into account in the number of the NM string.

    I'm interested in Bismark's behavior but note that this question may also be interesting for other mappers like bsmap, which might report differently here.

    Thanks!

  • #2
    Originally posted by mixter View Post
    Hello,

    This question should be of general interest, and judging from documentation/publication I am not sure about the issue, so here it goes:

    In Bismark, where can I see the number of non-BS-mismatches in aligned reads (i.e. in the SAM file)?

    I'm especially asking whether the number ## in NM:i:## is reliable. As I understand, it reflects only the true non-BS-mismatches (no C->T and no G->A conversion), since it is always lower than the number of total methylated+unmethylated positions.

    So, I'd just like to be sure that BS-mismatches aren't in any way taken into account in the number of the NM string.

    I'm interested in Bismark's behavior but note that this question may also be interesting for other mappers like bsmap, which might report differently here.

    Thanks!
    Hi Mixter,

    The NM field in the SAM output gives you the edit distance of the observed sequence to the genomic sequence (detailed in the XX field), so this includes all kinds of mismatchs (including bisulfite ones).

    In the default version, there is no special field describing non-BS mismatches, I did have however have some applications myself in which I needed to know this number. Bismark is prepared to output the number of non-BS mismatched as an addtional column but you need to manually change the last position in the Bismark print SAM command:

    locate in Bismark the SAM output command (roughly line 5873):
    Code:
      
    # SAM format: QNAME, FLAG, RNAME, 1-based POS, MAPQ, CIGAR, RNEXT, PNEXT, TLEN, SEQ, QUAL, optional fields
      print OUT join("\t",($id,$flag,$chr,$start,$mapq,$cigar,$rnext,$pnext,$tlen,$actual_seq,$qual,$NM_tag,$XX_tag,$XM_tag,$XR_tag,$XG_tag)),"\n";
    and add at the end but still within the braces: $number_of_mismatches, like so:
    Code:
      print OUT join("\t",($id,$flag,$chr,$start,$mapq,$cigar,$rnext,$pnext,$tlen,$actual_seq,$qual,$NM_tag,$XX_tag,$XM_tag,$XR_tag,$XG_tag[COLOR="Red"],$number_of_mismatches[/COLOR])),"\n";
    For paired-end data it is quite similar, but the variables are then called $number_of_mismatches_1 and $number_of_mismatches_2. Let me know if you require any help with this.

    Cheers,
    Felix

    Comment


    • #3
      Hello,

      Many thanks for the help. I've tried making these modifications, but it seems I need a bit more help. The mismatch number was not displayed in SAM output for either- single or paired-end (adding the numbers together with optional tags XA and XB did display those tags, no python warnings...).

      I've attached the modified binary and a diff (bismark 0.7.6), it would be much appreciated if you could correct the output whenever you have time. This probably would be an attractive option for other users in a future bismark version as well, especially for bioinformatics/statistics purposes.

      Thanks!
      Attached Files

      Comment


      • #4
        Originally posted by mixter View Post
        Hello,

        Many thanks for the help. I've tried making these modifications, but it seems I need a bit more help. The mismatch number was not displayed in SAM output for either- single or paired-end (adding the numbers together with optional tags XA and XB did display those tags, no python warnings...).

        I've attached the modified binary and a diff (bismark 0.7.6), it would be much appreciated if you could correct the output whenever you have time. This probably would be an attractive option for other users in a future bismark version as well, especially for bioinformatics/statistics purposes.

        Thanks!
        Hi Mixter,

        I have just included the code into the current version of Bismark, 0.7.7, and it does work exactly as expected. I'll attach a copy here. It is probably a good idea to have an output for this in a coming release. Let me know how you get on.
        Attached Files

        Comment


        • #5
          Originally posted by mixter View Post
          Hello,

          Many thanks for the help. I've tried making these modifications, but it seems I need a bit more help. The mismatch number was not displayed in SAM output for either- single or paired-end (adding the numbers together with optional tags XA and XB did display those tags, no python warnings...).

          I've attached the modified binary and a diff (bismark 0.7.6), it would be much appreciated if you could correct the output whenever you have time. This probably would be an attractive option for other users in a future bismark version as well, especially for bioinformatics/statistics purposes.

          Thanks!
          Hi Mixter,

          I have actually just implemented a new option '--non_bs_mm' into the current version (0.7.7) which lets you choose between the standard and the mm-extended SAM output. Let me know if you encounter any problems.


          Edit: It is worth noting that this option works currently only Bowtie 1 alignments. Bowtie 2 uses a different concept than mismatches alone, and we are currently thinkout about ways of how to best implement the Bowtie 2 alignment score (AS).
          Attached Files
          Last edited by fkrueger; 10-11-2012, 07:11 AM.

          Comment


          • #6
            Right, I have now amended Bismark to also output a number of mismatches for Bowtie 2 alignments. For this to work Bismark compares the alignments scores and CIGAR strings of best alignments, and separates out the number of mismatches and potential insertions or deletions. This will also work correctly if non-default gap opening and extension penalties were specified.

            Unless there are any reports of problems with the option --non_bs_mismatches it will be part of the next release.
            Attached Files

            Comment

            Latest Articles

            Collapse

            • seqadmin
              Current Approaches to Protein Sequencing
              by seqadmin


              Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
              04-04-2024, 04:25 PM
            • seqadmin
              Strategies for Sequencing Challenging Samples
              by seqadmin


              Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
              03-22-2024, 06:39 AM

            ad_right_rmr

            Collapse

            News

            Collapse

            Topics Statistics Last Post
            Started by seqadmin, 04-11-2024, 12:08 PM
            0 responses
            18 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-10-2024, 10:19 PM
            0 responses
            22 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-10-2024, 09:21 AM
            0 responses
            17 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-04-2024, 09:00 AM
            0 responses
            49 views
            0 likes
            Last Post seqadmin  
            Working...
            X