Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #16
    Thanks for the info.
    I'm afraid I have never really tried to get a consensus BS-converted sequence, so I would be curious to hear your successes with samtools. I think that might be the best option right now, but I'll let you know if I hear something better.

    Cheers,
    Oliver

    Comment


    • #17
      Hi natstreet,

      With regards to pair-end Bismark, the methodology I suggested to generate bigWig is the same as single-end (i.e. methylation-extract, perl script, BEDGraph to bigWig).

      I have attached a VERY preliminary version of the bismark_to_SAM_v2.pl. Theoretically, this should convert pair-end bismark to SAM. However, I have yet to test the resulting SAM/BAM file on Gbrowse, so I don't know if it works properly. Feel free to try it, and let me know if it crashes badly.

      Sorry if I couldn't be more help.

      Cheers,
      Oliver
      Attached Files

      Comment


      • #18
        I just tried to generate a BAM file from the SAM output I have from bismark2sam for single end mapped reads and there is a problem. I get the error

        Code:
        Parse error at line 9: sequence and quality are inconsistent
        Aborted
        My samtools command is

        Code:
        samtools import arab.fa.fai Bismark_mapping_results_col-0_lib1PE120_1.fq.sam col-0_lib1PE120_1.bam
        Do you know the cause of the error message?

        Comment


        • #19
          Hi natstreet,

          I apologize for the error. Can I have a look at the bismark mapping results you used as input and the SAM file that was generated? You don't need to send the entire file, just the first 2000 lines would be sufficient.

          Cheers,
          Oliver

          Comment


          • #20
            I've put some more files on the ftp site. The updated script you posted for paired end reads created sam files that samtools seems happy with - I just need to check the bam file in GBrowse to be certain everything worked ok (which I will do very soon).

            I also found a problem with the genome_methylation_bismark2bedGraph.pl script that is specific to processing the CHH files. All CpG and CHG output files I've tested so far have been fine. All CHH files produce an error for every line of this type

            Code:
            Methylation state of sequence (FC61KG1AAXX:5:1:1101:10101#0/1) in file (exampleCHH) on line 2000 is inconsistent (meth_state is -, meth_state2 = h)
            It makes the output file and this seems fine so I'm not sure if the error should concern me or not. I've also put the first 2000 lines from an example CHH file on the ftp site.

            Comment


            • #21
              I've loaded an example BAM file for paired end data into GBrowse and I have a question about miss-match information in relation to methylation in the SAM file.

              If you look at this example couple of base pairs in GBrowse there is first a C on the plus strand that I assume was methylated because it remains as a C in the aligned reads. The next base as a C for the - strand and it seems that this was not methylated because the aligned - strand reads are all A.

              I was thinking that the A for the - strand reads would show up as a miss match but it doesn't seem to (it should be shaded in red if it is a MM and I've checked that this works for other BAM files from standard genomic sequence data).

              Am I thinking about this in the wrong way and do you have a method to colour methylated bases?

              Comment


              • #22
                Hi natstreet,

                I apologize for the errors. The genome_methylation_bismark2bedGraph.pl problem was a typo on my part. The attached version should run on CHH now.
                As for the BAM/SAM error, this is due to an update of bismark that I haven't implemented in my single-end script. I have now updated the script (bismart_to_SAM_v2.pl, attached), and if you run it in the single-end mode, it should work (let me know if it doesn't).

                Unfortunately, I have very little experience with pair-ends SAM/BAM, so I'll need some time to look through it. In the meantime, see if the single-end pipeline works well.

                Thanks.

                Cheers,
                Oliver
                Attached Files

                Comment


                • #23
                  No apologies needed - you're helping me out massively!

                  I'll give the scripts a try over the weekend and let you know if they're all working.

                  As soon as I started looking at some example methylated or non-methylated (so bisulfite converted) sites in GBrowse I realised that the consensus calling isn't so simple because things are not in agreement (so to say) between the two strands. I haven't yet got my head round how to deal with that.

                  Comment


                  • #24
                    Are you using the single-end or paired-end option?
                    I think there could be a slight bug in the paired-end bismark_to_SAM option (not in the single-end), and I'm still trying to debug it. The single-end should work, and might be useful in giving you the pileup that you need.

                    Good luck.

                    Cheers,
                    Oliver

                    Comment


                    • #25
                      So I guess there is a way to represent the Bismark methylation alignment in SAM format. Are there some standard public datasets we could use to benchmark some methylation alignment programs? I think it would be a useful exercise.
                      Another thought I had about this was how to output the methylation context graphs in BigWig for genome browser display.

                      Comment


                      • #26
                        Hi Zee,

                        I have just re-posted this from an earlier discussion:

                        To follow up on the BEDGraph/BigWig idea, we have developed a workaround for this. Again, this is based on single-end analysis, so I haven't tested paired-ends

                        1) Use methylation-extractor on the Bismark output, with '--comprehensive' option ('--merge_non_CpG' is optional)
                        2) Run the following script (genome_methylation_bismark2bedGraph.pl). This script sorts the methylation extractor output, then parses the results to generate an "overall methylation level" as a BEDGraph file, with one sampled cytosine site per line.
                        3) Use the bedGraphToBigWig program (available online, I believe) to convert the BEDGraph to BigWig.

                        Here's the usage for the genome_methylation_bismark2bedGraph.pl

                        Usage: genome_methylation_count.pl (--cutoff [threshold] ) [Bismark methylation caller output] > [output]

                        --cutoff [threshold] - The minimum number of times a methylation state was
                        seen for that nucleotide before its methylation
                        percentage is reported.
                        Default is no threshold

                        The output file is a tab-delimited BedGraph file with the following information:

                        <Chromosome> <Start Position> <End Position> <Methylation Percentage>

                        Bismark methylation caller (v0.2.0 or later) should produce three output files
                        (CpG, CHG and CHH) when using the "--comprehensive" option
                        (Two files if using the "--merge_non_CpG" option).
                        To count both CpG and Non-CpG, combine the output files.

                        Bismark methylation caller (v0.1.5 or earlier) should produce two output files
                        (CpG and Non-CpG) when using the "--comprehensive" option.
                        To count both CpG and Non-CpG, combine the two output files.
                        The script (genome_methylation_bismark2bedGraph.pl) should be in an earlier thread, but I have attached the script again (hopefully the most "bug-free" version)

                        Let me know if you have any questions, issues or bugs (since this is a workaround)

                        Cheers,
                        Oliver
                        Attached Files

                        Comment


                        • #27
                          I like the idea of using public data to benchmark the various programs. You should contact fkrueger, who made this tool. Maybe he has done something similar

                          Comment


                          • #28
                            We wrote a methylation aligner about 2 years ago before Bismark was out and at that time it was only MAQ and BSMapper that could do methylation alignment. Another thing absent 2 years ago was SAM/BAM so now some standard is available the challenge is to make things easier on the end user who can plug-and-play their analysis work with the sorted alignments.
                            So, perhaps a specific SAM tag could be added to represent methylation-specific information in the record so the bedGraph, BigWigs,etc programs could be written to read these directly from the BAM file.
                            I would be happy to look at taking some test data and doing some methylation aligner benchmarking once again.

                            Comment


                            • #29
                              Originally posted by olivertam View Post
                              Are you using the single-end or paired-end option?
                              I think there could be a slight bug in the paired-end bismark_to_SAM option (not in the single-end), and I'm still trying to debug it. The single-end should work, and might be useful in giving you the pileup that you need.

                              Good luck.

                              Cheers,
                              Oliver
                              Hi Oliver

                              I was using the paired end script. Today I'll load in BAM files made using the updated single end script you posted and see how that looks.

                              Comment


                              • #30
                                Zee

                                I'd be happy to share our data as a test set for working on representing methylation in SAM and bedgraph/bigwig files as well as a dataset for alignment benchmarking. Let me know if that would be useful and I can make it available for download.

                                I'm also happy to run all of the different aligners and post the results. I'm not qualified for actually benchmarking them etc but I'm more than willing to spend the time providing what people need for this.

                                Comment

                                Latest Articles

                                Collapse

                                • seqadmin
                                  Techniques and Challenges in Conservation Genomics
                                  by seqadmin



                                  The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                                  Avian Conservation
                                  Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                                  03-08-2024, 10:41 AM
                                • seqadmin
                                  The Impact of AI in Genomic Medicine
                                  by seqadmin



                                  Artificial intelligence (AI) has evolved from a futuristic vision to a mainstream technology, highlighted by the introduction of tools like OpenAI's ChatGPT and Google's Gemini. In recent years, AI has become increasingly integrated into the field of genomics. This integration has enabled new scientific discoveries while simultaneously raising important ethical questions1. Interviews with two researchers at the center of this intersection provide insightful perspectives into...
                                  02-26-2024, 02:07 PM

                                ad_right_rmr

                                Collapse

                                News

                                Collapse

                                Topics Statistics Last Post
                                Started by seqadmin, 03-14-2024, 06:13 AM
                                0 responses
                                32 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 03-08-2024, 08:03 AM
                                0 responses
                                71 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 03-07-2024, 08:13 AM
                                0 responses
                                80 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 03-06-2024, 09:51 AM
                                0 responses
                                68 views
                                0 likes
                                Last Post seqadmin  
                                Working...
                                X