Seqanswers Leaderboard Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • Jon17
    Member
    • Jun 2016
    • 15

    Seems like there are a lot of pipelines out there that get big differences in methylation results. Is there a paper out that that shows a complete workflow and has control data and output results I can compare?

    I can get bismark to work but I'm doing epigenetic sequence capture and want to know my results are in line with industry standard methods.

    Comment

    • Jon17
      Member
      • Jun 2016
      • 15

      How do you handle control DNA?

      I have Bisulfite Sequencing data with lambda control (NC_001416) spike in.

      I'm looking at the alignment files (SAM) and I see there are reads aligned to the lambda (NC_001416) reference.

      So how do I separate methyl stats of these two samples? bismark_methylation_extractor seems to give me only one set of statistics and doesn't differentiate the control group from the sample group.

      I don't see any guidance in the manual or tutorials.

      This command is not ideal: --split_by_chromosome
      Last edited by Jon17; 09-30-2016, 07:18 AM.

      Comment

      • fkrueger
        Senior Member
        • Sep 2009
        • 627

        You could either align the to the lambda sequence by itself which should be really quick (probably several hundred million sequences/hour), or if you have the output already I would probably just filter for reads containing the Lambda sequence (NC_001416) and then run the bismark_methylation_extractor on that new SAM file. A command like this should get you the alignments:

        cat current_alignment_file.sam | grep NC_001416 > lambda_alignments.sam

        bismark_methylation_extracor --bedGraph --gzip lambda_alignments.sam

        If you are starting from BAM files the commands need to include Samtools but the idea is the same. I hope this helps. Cheers, Felix

        Comment

        • Jon17
          Member
          • Jun 2016
          • 15

          How does Bismark handle multiple read groups?

          I know you can set the read group flag in Bismark but that's only for 1 read group. What if you have multiple? Does Bismark align those read-groups separately against the reference? Do any re-calibrations?

          GATK seems to imply recalibration (BQSR) is necessary for SNP calling. What about methylation calls?

          I'm wondering if I should:

          1) Run Bismark on each read group separately
          2) Combine the multiple bam files
          3) Re-calibrate base scores (BQSR)
          4) Run bismark_methylation_extractor on the final re-calibrated bam file

          Comment

          • fkrueger
            Senior Member
            • Sep 2009
            • 627

            Hi Jon,

            Bismark allows you to specify a read group for each sample, so that if you were to combine several different samples afterwards you could still tell which read came from which sequencing run. Other than that I am afraid the methylation extractor doesn't do anything sophisticated such as the BQSR step, it will simply extract the methylation information from the BAM files as is.

            In contrast to tools like GATK which specifically ask you not to perform any quality trimming up-front but rather use soft clipping during the alignment step, Bismark is designed to to be used only with properly adapter and base call quality trimmed data (Trim Galore uses a Phred score cut-off of 20 by default). Because of this additional step we do not check the basecall quality again during the extraction process. I hope this helps, Best, Felix

            Comment

            • Jon17
              Member
              • Jun 2016
              • 15

              Thank you for the response. I'm currently using GATK and Picard for alignment metrics (depth of coverage, insert size, on target bases, FOLD_80_BASE_PENALTY, etc). Given I'm using trim_galore and bismark, would you consider that use of GATK & Picard ok? Or should I use a different tool? What do you recommend?

              If Bismark is "designed to to be used only with properly adapter and base call quality trimmed data", what are the pitfalls?

              Comment

              • fkrueger
                Senior Member
                • Sep 2009
                • 627

                Hi Jon,

                I would assume that GATK and Picard will probably run but I don't know exactly what they need to look at the metrics you mentioned, you might have to ask their developers to be honest. We personally use SeqMonk to get all of the stats we are interested in. It is a genome browser that does all sorts of other relevant quantitations as well. Cheers, Felix

                Comment

                • Jon17
                  Member
                  • Jun 2016
                  • 15

                  Is there a way to use your tools to get CpG coverage? For instance

                  55% of CpG sites are at 30x
                  45% of CpG sites are at 20x
                  25% of CpG sites are at 10x

                  As well as methylation rates given certain depths at CpG sites?

                  55% of CpG sites are methylated at 30x
                  45% of CpG sites are methylated at 20x
                  25% of CpG sites are methylated at 10x

                  Comment

                  • fkrueger
                    Senior Member
                    • Sep 2009
                    • 627

                    Not directly I am afraid, but one could probably write a quick script (or even use some clever awk/sed constructs) to extract this kind of information from either the coverage files or genome-wide methylation reports.

                    Comment

                    • chxu02
                      Member
                      • Jan 2015
                      • 18

                      Hi Felix,
                      When I tried to run the recently added script:
                      Code:
                      filter_non_conversion
                      I got the following error message:
                      Code:
                      syntax error at ~/bismark-0.17.0/filter_non_conversion line 280, near "last nless ("
                      Execution of ~/bismark-0.17.0/filter_non_conversion aborted due to compilation errors.

                      Comment

                      • fkrueger
                        Senior Member
                        • Sep 2009
                        • 627

                        Sorry for this, I have just fixed this typo. Please clone the latest development version here:

                        A tool to map bisulfite converted sequence reads and determine cytosine methylation states - FelixKrueger/Bismark


                        Cheers, Felix

                        Comment

                        • seqfast
                          Member
                          • Aug 2008
                          • 16

                          Directional/Non-directional amplicons

                          Hi (hopefully) Felix,

                          I'm working with amplicons generated from bisulfite converted DNA. The PCR primers are not methylation specific, so they will amplify both converted and non-converted strands. Standard TruSeq adapters were ligated and sequencing as single-end. As I understand it, this might still be 'directional' due to the TruSeq adapters.

                          This is the SE report, which confuses me a little bit, since the bottom strand is seeing more coverage. As well, in amplicon regions many of the converted CpG bases are present in only one strand (either forward or reverse). In some samples there are no reverse reads, in others there are both forward and reverse reads. Thanks for any insights!

                          Final Alignment report
                          ======================
                          Sequences analysed in total: 57545
                          Number of alignments with a unique best hit from the different alignments: 36008
                          Mapping efficiency: 62.6%
                          Sequences with no alignments under any condition: 21427
                          Sequences did not map uniquely: 110
                          Sequences which were discarded because genomic sequence could not be extracted: 0

                          Number of sequences with unique best (first) alignment came from the bowtie output:
                          CT/CT: 1667 ((converted) top strand)
                          CT/GA: 13421 ((converted) bottom strand)
                          GA/CT: 1088 (complementary to (converted) top strand)
                          GA/GA: 19832 (complementary to (converted) bottom strand)

                          Final Cytosine Methylation Report
                          =================================
                          Total number of C's analysed: 448283

                          Total methylated C's in CpG context: 22378
                          Total methylated C's in CHG context: 6624
                          Total methylated C's in CHH context: 5983

                          Total unmethylated C's in CpG context: 9707
                          Total unmethylated C's in CHG context: 221431
                          Total unmethylated C's in CHH context: 182160

                          C methylated in CpG context: 69.7%
                          C methylated in CHG context: 2.9%
                          C methylated in CHH context: 3.2%

                          Comment

                          • fkrueger
                            Senior Member
                            • Sep 2009
                            • 627

                            Originally posted by seqfast View Post
                            Hi (hopefully) Felix,

                            I'm working with amplicons generated from bisulfite converted DNA. The PCR primers are not methylation specific, so they will amplify both converted and non-converted strands. Standard TruSeq adapters were ligated and sequencing as single-end. As I understand it, this might still be 'directional' due to the TruSeq adapters.

                            This is the SE report, which confuses me a little bit, since the bottom strand is seeing more coverage. As well, in amplicon regions many of the converted CpG bases are present in only one strand (either forward or reverse). In some samples there are no reverse reads, in others there are both forward and reverse reads. Thanks for any insights!

                            Final Alignment report
                            ======================
                            Sequences analysed in total: 57545
                            Number of alignments with a unique best hit from the different alignments: 36008
                            Mapping efficiency: 62.6%
                            Sequences with no alignments under any condition: 21427
                            Sequences did not map uniquely: 110
                            Sequences which were discarded because genomic sequence could not be extracted: 0

                            Number of sequences with unique best (first) alignment came from the bowtie output:
                            CT/CT: 1667 ((converted) top strand)
                            CT/GA: 13421 ((converted) bottom strand)
                            GA/CT: 1088 (complementary to (converted) top strand)
                            GA/GA: 19832 (complementary to (converted) bottom strand)

                            Final Cytosine Methylation Report
                            =================================
                            Total number of C's analysed: 448283

                            Total methylated C's in CpG context: 22378
                            Total methylated C's in CHG context: 6624
                            Total methylated C's in CHH context: 5983

                            Total unmethylated C's in CpG context: 9707
                            Total unmethylated C's in CHG context: 221431
                            Total unmethylated C's in CHH context: 182160

                            C methylated in CpG context: 69.7%
                            C methylated in CHG context: 2.9%
                            C methylated in CHH context: 3.2%
                            Hi seqfast,

                            When you are generating and sequencing PCR amplified regions it is quite common that you see only the top strand (with both the OT and CTOT strands) or the bottom strand (with both OB and CTOB as in your case), depending on which strand you targeted when designing the primers.

                            To help getting your head around this it really helps to draw the sequence of an amplicon of interest out on a sheet of paper. You should see that the bisulfite treatment will change one of the two strands so much that the oligos will only amplify one of the two strands, and this PCR product is then usually sequenced from both sides, e.g. the CTOB and OB strands, but this may be different depending on how the library preparation was done.

                            So in short: PCR amplicons are not normally directional libraries but you only sequence both versions of either the top or the bottom strand, depending on how the primers were designed. I hope this helps, let me know if I can be of any further assistance. Cheers, Felix

                            Comment

                            • seqfast
                              Member
                              • Aug 2008
                              • 16

                              Thank you Felix,

                              I've been told to PCR primers are 'agnostic' to bisulfite conversion, but I believe there will still be some effects and a likelihood of some strand-favoring in the amplification.

                              Appreciate your quick reply, and thanks for a very valuable tool!

                              -sf

                              Comment

                              • Kuckunniwi
                                Junior Member
                                • Apr 2017
                                • 3

                                Hi,

                                I am trying to launch methylation_extractor and it does not like my files =( It says:

                                Use of uninitialized value $read_conversion in string eq at ./bismark_methylation_extractor line 1654, <IN> line 27.
                                Use of uninitialized value $read_conversion in string eq at ./bismark_methylation_extractor line 1657, <IN> line 27.
                                Use of uninitialized value $read_conversion in string eq at ./bismark_methylation_extractor line 1660, <IN> line 27.
                                Use of uninitialized value $read_conversion in string eq at ./bismark_methylation_extractor line 1663, <IN> line 27.
                                Use of uninitialized value $read_conversion in concatenation (.) or string at ./bismark_methylation_extractor line 1667, <IN> line 27.
                                Use of uninitialized value $genome_conversion in concatenation (.) or string at ./bismark_methylation_extractor line 1667, <IN> line 27.
                                Unexpected combination of read and genome conversion: '' / ''
                                The line 27 is the first read of sam file:

                                Code:
                                27 D00796:121:C9MR4ANXX:1:1103:15888:85556 1:N:0:GATCAG    16      chr10   71747   255     51M     *       0       0       CTAAACAACATCACAAAACACTATCTCTATAATTTCTTTTTAAACCAAC     CG     3<BBBGCGGG1FG1@F0EFGGGGEG1>1?1:@FGGGG1FC<=FBFFGGGFG     NM:i:9  MD:Z:2AAA6CA3A2A2A24A3  XM:Z:Z..x........................x..z..h...h.......hhx..
                                The methylation extractor worked well for me before (even for the same files), what can cause the problem? (I've tried to play with cmd modes, did not help...)

                                The command line: ./bismark_methylation_extractor -s test.sam

                                Thanks!
                                Last edited by Kuckunniwi; 04-10-2017, 12:59 PM. Reason: added the command line

                                Comment

                                Latest Articles

                                Collapse

                                • seqadmin
                                  Pathogen Surveillance with Advanced Genomic Tools
                                  by seqadmin




                                  The COVID-19 pandemic highlighted the need for proactive pathogen surveillance systems. As ongoing threats like avian influenza and newly emerging infections continue to pose risks, researchers are working to improve how quickly and accurately pathogens can be identified and tracked. In a recent SEQanswers webinar, two experts discussed how next-generation sequencing (NGS) and machine learning are shaping efforts to monitor viral variation and trace the origins of infectious...
                                  03-24-2025, 11:48 AM
                                • seqadmin
                                  New Genomics Tools and Methods Shared at AGBT 2025
                                  by seqadmin


                                  This year’s Advances in Genome Biology and Technology (AGBT) General Meeting commemorated the 25th anniversary of the event at its original venue on Marco Island, Florida. While this year’s event didn’t include high-profile musical performances, the industry announcements and cutting-edge research still drew the attention of leading scientists.

                                  The Headliner
                                  The biggest announcement was Roche stepping back into the sequencing platform market. In the years since...
                                  03-03-2025, 01:39 PM

                                ad_right_rmr

                                Collapse

                                News

                                Collapse

                                Topics Statistics Last Post
                                Started by seqadmin, Yesterday, 10:17 AM
                                0 responses
                                7 views
                                0 reactions
                                Last Post seqadmin  
                                Started by seqadmin, 03-20-2025, 05:03 AM
                                0 responses
                                49 views
                                0 reactions
                                Last Post seqadmin  
                                Started by seqadmin, 03-19-2025, 07:27 AM
                                0 responses
                                59 views
                                0 reactions
                                Last Post seqadmin  
                                Started by seqadmin, 03-18-2025, 12:50 PM
                                0 responses
                                50 views
                                0 reactions
                                Last Post seqadmin  
                                Working...