Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Bismark and unique matches

    Hello,

    I have a mostly theoretical question which my group would need to adress, though. Bismark (and possibly, other methylation mappers) have a hardcoded behavior of allowing only the unique best match to be returned as a result (Bowtie2 option -k 1, I believe).

    My question would be whether this is just a recommended setting, or vitally necessary for Bismark-style methylation mapping to be correct? If so, why?

    Our motivation is that we are thinking about mapping some specific data to a set of sequence subclasses (instead of the whole genome) and would like to get the 5 best hits in those sequence subclasses or so per sequence (in case more than one possible hit exists). And we're wondering whether this kind of result would be ok to interpret from Bismark methylation mapping, or why not, if not.

    Many thanks

  • #2
    Hi Mixter,
    We chose to only report unique matches in Bismark simply because one can't tell the exact origin of a read with certainty if a read or read pair maps to several distinct locations in the genome with the same quality (say perfect matches). In a extreme situation, imagine the following two sequences:

    (1) CGCGCGTTTCCCCC
    (2) TGTGTGTTTTTTTT


    Due to the 3-letter alignment mode both sequences would look exactly the same during the mapping step. In a multi-mapping scenario you would call region (1) to be 100% methylated, whereas region (2) would be called 0% methylated, but would you be confident about these results?

    We did not implement a multi-alignment step (yet) since I think it is very likely that many would use it as their standard mode (greedy as humans are they would like to have as many methylation calls per $/£ spent as possible...). I would imagine that it is not entirely trivial to separate such potentially unreliable reads out from uniquely mapping ones later on, and I can see lots of requests heading my way :P. While I agree that multi-mapping might be useful in certain specialised scenarios, I would not want to offer it as a standard option.

    Just for the record, the Bowtie 1 mode in Bismark uses -k 2 to determine unique best alignments. You are right that this mode is hard-coded in order to determine unique alignments; theoretically, if one circumvented the determination of unique alignments one could feed several best alignments straight into the methylation calling routine per read-in sequence. In Bowtie 2 mode, the alignment score (AS:i is used to determine whether the next best hit (if there is any) is equally good or worse. Similarly, with some adaptation this might also be changed to report more than one alignment. This sounds fairly trivial, but as these things go it might actually be somewhat time-consuming...

    Comment


    • #3
      Hello Felix,

      Thanks for the detailed explanation, so, that means that it's conceptually possible but with the caveats you mentioned.

      Meanwhile I'm practically interested in doing this, i.e. returning non-unique / ambiguously mapped sequences. I noticed that the latest Bismark (0.7.7) already supports the Bowtie2 option --most_valid_alignments. However, how do you return ambiguous alignments? --ambiguous just outputs a FASTA file; What would be the easiest way to get a SAM with the possible multiple genomic coordinates of ambiguously mapped reads, if any?

      Comment


      • #4
        Originally posted by mixter View Post
        Hello Felix,

        Thanks for the detailed explanation, so, that means that it's conceptually possible but with the caveats you mentioned.

        Meanwhile I'm practically interested in doing this, i.e. returning non-unique / ambiguously mapped sequences. I noticed that the latest Bismark (0.7.7) already supports the Bowtie2 option --most_valid_alignments. However, how do you return ambiguous alignments? --ambiguous just outputs a FASTA file; What would be the easiest way to get a SAM with the possible multiple genomic coordinates of ambiguously mapped reads, if any?
        Hi Mixter,

        The Bowtie2 option --most_valid_alignments used to be the Bowtie2 option -M, however it is now deprecated (from the Bowtie2 side). I left the description in there for information only:

        -most_valid_alignments <int>
        This used to be the Bowtie 2 parameter -M. As of Bowtie 2 version 2.0.0 beta7 the option -M is
        deprecated. It will be removed in subsequent versions. What used to be called -M mode is still the
        default mode, but adjusting the -M setting is deprecated. Use the -D and -R options to adjust the
        effort expended to find valid alignments.

        This option just affected how much effort was spent by Bowtie2 to actually find the best alignment, but it does not mean that Bismark is actually reporting multiple ambiguous hits. The easiest option to get Bismark to do it would probably be to copy the entire subroutine "check_bowtie_results_single_end" (or paired end, with or w/o Bowtie2) and edit it so that it skips all checks that determine whether a read is unique or not. This sounds pretty straight forward, but there might be some twists...

        Comment

        Latest Articles

        Collapse

        • seqadmin
          Current Approaches to Protein Sequencing
          by seqadmin


          Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
          04-04-2024, 04:25 PM
        • seqadmin
          Strategies for Sequencing Challenging Samples
          by seqadmin


          Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
          03-22-2024, 06:39 AM

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by seqadmin, 04-11-2024, 12:08 PM
        0 responses
        18 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-10-2024, 10:19 PM
        0 responses
        22 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-10-2024, 09:21 AM
        0 responses
        17 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-04-2024, 09:00 AM
        0 responses
        49 views
        0 likes
        Last Post seqadmin  
        Working...
        X