Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Mapping Bisulphite converted sequence

    What software are people using to map bisulphite converted sequence?

    We've been getting more of this kind of data recently and doing the mapping and QC has proved to have all sorts of odd quirks about it. Our mapping pipeline is just a set of scripts which sit on top of bowtie, but I wondered if anyone had done a more formal tool which took into account things such as:
    • Whether the sequence is expected to be fully converted or not
    • Eliminating preferential mapping of unconverted sequence
    • Working out overall conversion frequencies


    Does anyone have any good recommendations or are we all building our own?

  • #2
    Originally posted by simonandrews View Post
    What software are people using to map bisulphite converted sequence?

    We've been getting more of this kind of data recently and doing the mapping and QC has proved to have all sorts of odd quirks about it. Our mapping pipeline is just a set of scripts which sit on top of bowtie, but I wondered if anyone had done a more formal tool which took into account things such as:
    • Whether the sequence is expected to be fully converted or not
    • Eliminating preferential mapping of unconverted sequence
    • Working out overall conversion frequencies


    Does anyone have any good recommendations or are we all building our own?
    BFAST can easily be used to align bisulphite treated sequence (see the reference manual). I don't know of a tool for summarizing the conversion frequencies (beyond personal perl scripts), but if you find one let me know.

    Comment


    • #3
      Check out RMAPBS - its RMAP modified for BS data. There was a recent genome research paper where it featured. Otherwise its build your own from what i can see.

      Comment


      • #4
        Originally posted by MadraghRua View Post
        Check out RMAPBS - its RMAP modified for BS data. There was a recent genome research paper where it featured. Otherwise its build your own from what i can see.
        Thanks - RMAPBS was one I'd not seen before. However it seems to suffer the same problem as most mapping programs I've seen, which is that it will map unconverted sequence more efficiently than converted sequence. For some applications this won't matter, but for epigenomics work where you're looking at the ratio of converted to unconverted reads in a particular region then it's essential that both forms should map with equal efficiency, even if this means removing some unconverted reads which otherwise could have been mapped.

        Maybe the applications of bisulphite conversion are too varied to be handled cleanly in a single program?

        Comment


        • #5
          check out bsmap
          Background Bisulfite sequencing is a powerful technique to study DNA cytosine methylation. Bisulfite treatment followed by PCR amplification specifically converts unmethylated cytosines to thymine. Coupled with next generation sequencing technology, it is able to detect the methylation status of every cytosine in the genome. However, mapping high-throughput bisulfite reads to the reference genome remains a great challenge due to the increased searching space, reduced complexity of bisulfite sequence, asymmetric cytosine to thymine alignments, and multiple CpG heterogeneous methylation. Results We developed an efficient bisulfite reads mapping algorithm BSMAP to address the above issues. BSMAP combines genome hashing and bitwise masking to achieve fast and accurate bisulfite mapping. Compared with existing bisulfite mapping approaches, BSMAP is faster, more sensitive and more flexible. Conclusion BSMAP is the first general-purpose bisulfite mapping software. It is able to map high-throughput bisulfite reads at whole genome level with feasible memory and CPU usage. It is freely available under GPL v3 license at http://code.google.com/p/bsmap/ .

          Comment


          • #6
            Originally posted by simonandrews View Post
            Thanks - RMAPBS was one I'd not seen before. However it seems to suffer the same problem as most mapping programs I've seen, which is that it will map unconverted sequence more efficiently than converted sequence. For some applications this won't matter, but for epigenomics work where you're looking at the ratio of converted to unconverted reads in a particular region then it's essential that both forms should map with equal efficiency, even if this means removing some unconverted reads which otherwise could have been mapped.

            Maybe the applications of bisulphite conversion are too varied to be handled cleanly in a single program?
            When the methylation of interest is CpG methylation, RMAPBS *WILL NOT* bias mapping towards a particular methylation state. It exploits unconverted Cs at non-CpG positions to gain specificity in mapping without using those at CpG positions to gain specificity.

            Comment


            • #7
              BS mode of novoalign and from there Maq pilup and then custom perl scripts.

              Comment


              • #8
                bsmap maybe is good for you , but the cost of time is huge.
                if you knew the mechnism of bisulfite alignment that many aligners is also ok .

                Comment


                • #9
                  adapter trimming would be an important pre-processing step for BS seq ... rrbsmap, bismark, bsseeker are some tools that work exclusively for bisulphite, but all face the challenge of missing out on a big proportion of alignments..
                  --
                  bioinfosm

                  Comment


                  • #10
                    Originally posted by bioinfosm View Post
                    rrbsmap, bismark, bsseeker are some tools that work exclusively for bisulphite, but all face the challenge of missing out on a big proportion of alignments..
                    why ist that ?

                    Comment


                    • #11
                      Originally posted by bioinfosm View Post
                      adapter trimming would be an important pre-processing step for BS seq ... rrbsmap, bismark, bsseeker are some tools that work exclusively for bisulphite, but all face the challenge of missing out on a big proportion of alignments..
                      Really? Sure, the mapping efficiencies for bisulphite converted sequence are lower than for conventional sequencing, but nearly all of this is due to the loss of information in the conversion process meaning that the read can't be uniquely assigned to the original genome. In addition some aligners specifically choose to ignore unique alignments which couldn't have been found if the methylation state of the sequence was different to ensure that mapping is always fair and unbiased, but other than that I don't see that there's a problem affecting bisulphite aligners which is any worse than deficiencies in conventional aligners.

                      This isn't to say that there aren't still problems in bisulphite alignment. The issue of samples having a different genetic background to the reference genome leads to systematic methylation miscalls which are difficult to spot and lead to methylation change predictions which are actually genetic changes, but this is more a problem of calling than mapping.

                      Comment


                      • #12
                        Hi simon,

                        We have done some methylation analysis with RRBS samples (human).. we have compared 3 aligners. (RMAPBS, BSMAP and Bismark) RMAPBS and Bismark came up with reaosanable methylation percentage ( around 40%, but the library is CGI rich). However, BSMAP showed extreme low level of methylation 15-18%. for QC checked ( we have done dynamic trimming and removed adaptor contamination as well) high quality data. Do you have any comments on that. Looks like one can tune methylation by choosing a particular alinger. Will really appreciate if you kindly reply with your views.

                        Regards,
                        Aniruddha.

                        Comment


                        • #13
                          I'm surprised to hear that you're seeing such variable results from different programs. Were the mapping efficiencies wildly different between runs? You'd need quite a difference in mapping distribution to generate that kind of discrepancy. We've shown on simulated datasets that with bismark we can reliably extract the true methylation level regardless of the level of methylation in the library. The only factors which really influence this are the things you mentioned (adapters or poor quality sequence).

                          When mapping BS-Seq data it's more important that what you map is accurate than getting really good coverage. If in doubt you should make your mapping parameters more stringent. Mapping and adapter errors tend to drag the predicted methylation level towards 50% so this is especially problematic for low methylation libraries.

                          If you're seeing differences of 25% in your data then I suspect something more fundamental is going wrong in the way the programs are being run. The only thing which we've ever seen which makes this kind of difference is that some programs have an option to remove any reads containing more than 3 unconverted Cs, which can have a dramatic effect on the overall level, but normally this would only be applied in non-CpG context so this shouldn't be the problem in your case if your library is CpG rich.

                          Comment


                          • #14
                            The new version of bsmap(v2.2) has greatly improved the mapping speed
                            (28M 76bp PE reads mapped to hg19 genome in about 7 hours, using 8 threads RAM usage: ~9GB)

                            It also includes RRBS mode.

                            Best,

                            Yuanxin

                            Originally posted by sciencewu View Post
                            bsmap maybe is good for you , but the cost of time is huge.
                            if you knew the mechnism of bisulfite alignment that many aligners is also ok .
                            Last edited by yxibcm; 10-05-2011, 08:40 AM.

                            Comment


                            • #15
                              Hi Aniruddha,

                              I'm the developer of BSMAP. Could you provide some details about the BSMAP command line and your input reads? I'm very interested in knowing why BSMAP has low level of methylation.

                              Also BSMAP support RRBS mode through option "-D" that adds the digestion sites specificity in mapping, or you can run the separate program RRBSMAP. This mode is also much faster memory efficient.

                              Best,

                              Yuanxin

                              Originally posted by aniruddha.otago View Post
                              Hi simon,

                              We have done some methylation analysis with RRBS samples (human).. we have compared 3 aligners. (RMAPBS, BSMAP and Bismark) RMAPBS and Bismark came up with reaosanable methylation percentage ( around 40%, but the library is CGI rich). However, BSMAP showed extreme low level of methylation 15-18%. for QC checked ( we have done dynamic trimming and removed adaptor contamination as well) high quality data. Do you have any comments on that. Looks like one can tune methylation by choosing a particular alinger. Will really appreciate if you kindly reply with your views.

                              Regards,
                              Aniruddha.
                              Last edited by yxibcm; 10-05-2011, 08:43 AM.

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Strategies for Sequencing Challenging Samples
                                by seqadmin


                                Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                                03-22-2024, 06:39 AM
                              • seqadmin
                                Techniques and Challenges in Conservation Genomics
                                by seqadmin



                                The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                                Avian Conservation
                                Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                                03-08-2024, 10:41 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, Yesterday, 06:37 PM
                              0 responses
                              8 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, Yesterday, 06:07 PM
                              0 responses
                              8 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 03-22-2024, 10:03 AM
                              0 responses
                              49 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 03-21-2024, 07:32 AM
                              0 responses
                              66 views
                              0 likes
                              Last Post seqadmin  
                              Working...
                              X