Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • We have sequenced a genome using Illumina's True-seq bisulfite sequencing kit. After getting back the seq, we are analyzing methylation rate using Bismark. I Need help with the interpretation of the result and proper way of normalization.

    Before sequencing: Sample DNA was divided into 2 groups: 1. Bisulfite treatment was carried out and DNA was subsequently sequenced (group 1, methylated group) 2. DNA was sequenced without bisulfite treatment (group 2, control group)

    Both group was sequenced in paired-end fashion.

    I am using Bismark to analyze the seq and trying to get the methylation rate in this particular genome. After running Bismark on Methylated files I got this finale percentages:

    C methylated in CpG context: 0.6%

    C methylated in CHG context: 0.5%

    C methylated in CHH context: 0.7%

    Whereas after running Bismark on my Control files I got these percentages:

    C methylated in CpG context: 99.6%

    C methylated in CHG context: 99.3%

    C methylated in CHH context: 99.9%

    So, how would I interpret my data?

    a. Is 0.6 % (CpG) the actual methylation percentage in my genome?

    b. I have found in some literatures that if CpG, CHG, and CHH percentages are very close, that means that genome actually does not do methylation. Is it true?

    c. What was the purpose of using the control group (group 2)? Do I still need any spike-in control to normalize the data? If so, what that could be?

    Thank you very much for reading this long post!!

    Bests!!!

    Comment


    • We have sequenced a genome using Illumina's True-seq bisulfite sequencing kit. After getting back the seq, we are analyzing methylation rate using Bismark. I Need help with the interpretation of the result and proper way of normalization.

      Before sequencing: Sample DNA was divided into 2 groups: 1. Bisulfite treatment was carried out and DNA was subsequently sequenced (group 1, methylated group) 2. DNA was sequenced without bisulfite treatment (group 2, control group)

      Both group was sequenced in paired-end fashion.

      I am using Bismark to analyze the seq and trying to get the methylation rate in this particular genome. After running Bismark on Methylated files I got this finale percentages:

      C methylated in CpG context: 0.6%

      C methylated in CHG context: 0.5%

      C methylated in CHH context: 0.7%

      Whereas after running Bismark on my Control files I got these percentages:

      C methylated in CpG context: 99.6%

      C methylated in CHG context: 99.3%

      C methylated in CHH context: 99.9%

      So, how would I interpret my data?

      a. Is 0.6 % (CpG) the actual methylation percentage in my genome?

      b. I have found in some literatures that if CpG, CHG, and CHH percentages are very close, that means that genome actually does not do methylation. Is it true?

      c. What was the purpose of using the control group (group 2)? Do I still need any spike-in control to normalize the data? If so, what that could be?

      Thank you very much for reading this long post!!

      Bests!!!

      Comment


      • Hi, fkrueger,
        If I want to compare paired sample (one vs one). Can I do it with seqmonk in unix? Like using some unix command, and obtain a table with p-value per probe?

        Comment


        • Erm, what would you like to compare exactly? SeqMonk can certainly do a number of things, I suggest you follow the guidelines and practical of this methylation analysis course: http://www.bioinformatics.babraham.a...ing.html#bsseq.

          Cheers, Felix

          Comment


          • Hi, fkrueger,
            Is there any annotation file for the methylation data? Like after I do the alignment, I can get a probe like: chr, start position.... How can I know which gene is it from? Thank you.

            Comment


            • Originally posted by twotwo View Post
              Hi, fkrueger,
              Is there any annotation file for the methylation data? Like after I do the alignment, I can get a probe like: chr, start position.... How can I know which gene is it from? Thank you.
              We tend to use SeqMonk for this purpose (https://www.bioinformatics.babraham....jects/seqmonk/). It is a very fast and powerful genome viewer and quantitation tool. Some examples can be found in the documentation of this training course: https://www.bioinformatics.babraham....ing.html#bsseq.

              Comment


              • Thanks fkrueger!

                Comment


                • Hi Felix,

                  I'm a bit confused about the reads reported as "ambiguous" by Bismark, which made me doubt my understanding of how Bismark reports alignments (using Bowtie2).

                  So to my understanding, Bismark does not use -a mode nor -k N mode, it uses Bowtie2 default mode: "search for multiple alignments, report the best one" and how hard to looks for alignments is determined by the effort options (-D,-R). So reads that have multiple distinct alignments are reported once: their best alignment is reported or a randomly chosen alignment when equally-good choices are found.

                  On the other hand, ambiguous reads are defined as:

                  --ambiguous Write all reads which produce more than one valid alignment with the same number of lowest
                  mismatches or other reads that fail to align uniquely to a file in the output directory.
                  Written reads will appear as they did in the input, without any of the translation of quality
                  values that may have taken place within Bowtie or Bismark. Paired-end reads will be written to two
                  parallel files with _1 and _2 inserted in theit filenames, i.e. _ambiguous_reads_1.txt and
                  _ambiguous_reads_2.txt. These reads are not written to the file specified with --un.
                  I think that reads that produce more that one valid alignment should be reported based on Bowtie2 default mode (not all alignments, only one). How does having the same number of lowest mismatches affect how these reads are reported?
                  Same with reads that fail to align uniquely, I think they should be reported once (by the way, to me, failing to align uniquely is equivalent to producing more than one valid alignment..)

                  Could you please correct my understanding of how Bismark reports alignments and anything I wrote that might be off..

                  Thanks a lot!!

                  Amira

                  Comment


                  • Hi Amira,

                    I think that reads that produce more that one valid alignment should be reported based on Bowtie2 default mode (not all alignments, only one). How does having the same number of lowest mismatches affect how these reads are reported?

                    >> Just generally, if a read aligns with Bowtie2 it will get an alignment score (the AS:i field):

                    Code:
                    AS:i:<N>	Alignment score. Can be negative. Only present if SAM record is for an aligned read.
                    If a read has a second alignment, the alignment score of the second best alignment is reported as well:

                    Code:
                    XS:i:<N>	Alignment score for the best-scoring alignment found other than the alignment reported. Can be negative. Only present if the SAM record is for an aligned read and more than one alignment was found for the read. Note that, when the read is part of a concordantly-aligned pair, this score could be greater than AS:i.

                    So if a read has an alignment which is at least equally good as the best alignment, the read is considered ambiguous, and as such written out to the ambiguous FastQ files. In Bismark we do not want to go down the route of reporting a random alignment if there are several different (equally good) alignments, because if we can’t be sure where a read came from we also don’t want to assign a methylation status to that region. If you are interested in seeing where in the genome a read aligned to you might want to consider using the option --ambig_bam (even if this this procedure won't give you any methylation information).

                    Does this help clearing things up?

                    Comment


                    • Hi Felix,

                      Thanks for the quick reply,

                      So if a read has an alignment which is at least equally good as the best alignment, the read is considered ambiguous, and as such written out to the ambiguous FastQ files
                      This statement clears things up for me. To make things clearer, here are the cases I was wondering about when using Bismark and how I see things now:

                      - If AS and XS are equally good ==> Read is considered ambiguous because Bismark does not report random alignments for the reason you stated.
                      - If AS is better than XS ==> Read is not considered ambiguous because, even though it produces multiple alignments, the reported one is the best and the rest are not equally good. Right?
                      - If XS is better than AS ==> not sure whether that's possible because I would expect the reported alignment to be the best (unless the read is part of a concordantly-aligned pair, says Bowtie2 manual). How do you deal with this case?

                      Good to know about "--ambig_bam" option, I've been working with an older version of Bismark so I didn't see it.. I've updated now.

                      Comment


                      • Hi Amira,

                        Your understanding is now spot-on! For the last point: For paired-end reads we always use the sum of the alignment scores (AS) and sum of the next best alignment scores (XS) of both reads to make decisions. So yes, the best reported hit is supposed to be the best hit for the entire read pair even though this might occasionally look different on a per-read basis.

                        All the best, Felix

                        Comment


                        • Thank you very much Felix!

                          Comment


                          • Hi,

                            I am having trouble getting the bismark2bedGraph software in the Bismark package to run. I have successfully aligned my RRBS reads to my reference genome and then used bismark_methylation_extractor to generate CpG methylation call files

                            As my reference genome is in the forms of scaffolds rather than chromosomes, I have used the --scaffolds command

                            However, when i attempt to run the program (v.0.17.0), I get the following error message output:

                            Using these input files: CpG_context_G08F07.txt

                            Summary of parameters for bismark2bedGraph conversion:
                            ======================================================
                            bedGraph output: G08F07_Astato.bedGraph.gz
                            output directory: >xxxx/epigenetics_project/Alignment/Bismark_min20/Methylation_reports/Comp/Bedgraph/<
                            remove whitespaces: no
                            CX context: no (CpG context only, default)
                            No-header selected: no
                            Sorting method: Unix sort-based (smaller memory footprint, but slower)
                            Sort buffer size: 2G
                            Coverage threshold: 1
                            =============================================================================
                            Methylation information will now be written into a bedGraph and coverage file
                            =============================================================================

                            Using the following files as Input:
                            Writing bedGraph to file: G08F07_Astato.bedGraph.gz
                            Also writing out a coverage file including counts methylated and unmethylated residues to file: G08F07_Astato.bismark.cov.gz

                            Changed directory to xxxx/epigenetics_project/Alignment/Bismark_min20/Methylation_reports/Comp/Bedgraph/
                            The genome of interest was specified to contain gazillions of chromosomes or scaffolds. Sorting everything in memory instead of writing out individual chromosome files ...
                            Sorting input file CpG_context_G08F07.txt by positions (using -S of 2G)
                            sort: open failed: CpG_context_G08F07.txt: No such file or directory
                            Died at /panfs/panasas01/bisc/ah15824/bin/Bismark-master/bismark2bedGraph line 486.


                            If I am interpreting the error message correctly it seems to be the sort command that is failing. I have read that this may be due to the format of the scaffold names. My CpG_context.txt file format is:

                            NB501345:32:HVVJLBGXY:1:11101:23754:1048_1:N:0:CACGAT + 000352F|quiver 66253 Z


                            I have tried removing the F|quiver part of the scaffold name but I still get the same error message

                            Any suggestions as to how to solve this problem would be gratefully received

                            Thanks!

                            Comment


                            • Hi there,

                              Could you please update to the latest version (0.19.0; https://github.com/FelixKrueger/Bismark/releases) to see if the problem is still there? If you are still seeing the same problem can you please send me an email with some part of your CpG data and a information on the genome used.

                              Cheers, Felix

                              Comment


                              • Hi Alan,

                                Thanks for making the files available. I have just tried to run your files with the following command line:

                                Code:
                                bismark2bedGraph --dir output_dir -o test --scaffold *G10G06*
                                And it finished just fine within a few minutes:

                                Created output directory output_dir/
                                Using these input files: CpG_OB_G10G06.sam.gz.txt CpG_OT_G10G06.sam.gz.txt

                                Summary of parameters for bismark2bedGraph conversion:
                                ======================================================
                                bedGraph output: test.gz
                                output directory: >/bi/home/fkrueger/Bismark_issues_3rd_parties/Alan_RRBS/output_dir/<
                                remove whitespaces: no
                                CX context: no (CpG context only, default)
                                No-header selected: no
                                Sorting method: Unix sort-based (smaller memory footprint, but slower)
                                Sort buffer size: 2G
                                Coverage threshold: 1
                                =============================================================================
                                Methylation information will now be written into a bedGraph and coverage file
                                =============================================================================

                                Using the following files as Input:
                                /bi/home/fkrueger/Bismark_issues_3rd_parties/Alan_RRBS/CpG_OB_G10G06.sam.gz.txt /bi/home/fkrueger/Bismark_issues_3rd_parties/Alan_RRBS/CpG_OT_G10G06.sam.gz.txt

                                Writing bedGraph to file: test.gz
                                Also writing out a coverage file including counts methylated and unmethylated residues to file: test.gz.bismark.cov.gz

                                Changed directory to /bi/home/fkrueger/Bismark_issues_3rd_parties/Alan_RRBS/output_dir/
                                The genome of interest was specified to contain gazillions of chromosomes or scaffolds. Merging all input files and sorting everything in memory instead of writing out individual chromosome files...
                                Writing all merged methylation calls to temp file test.gz.methylation_calls.merged

                                Finished writing methylation calls from CpG_OB_G10G06.sam.gz.txt to merged temp file
                                Finished writing methylation calls from CpG_OT_G10G06.sam.gz.txt to merged temp file
                                Sorting input file test.gz.methylation_calls.merged by positions (using -S of 2G)
                                Successfully deleted the temporary input file test.gz.methylation_calls.merged

                                Since the error message is actually:
                                sort: open failed: CpG_context_G08F07.txt: No such file or directory

                                Is there a possibility that you gave it the wrong file as input file by accident?

                                Comment

                                Latest Articles

                                Collapse

                                • seqadmin
                                  Techniques and Challenges in Conservation Genomics
                                  by seqadmin



                                  The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                                  Avian Conservation
                                  Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                                  03-08-2024, 10:41 AM
                                • seqadmin
                                  The Impact of AI in Genomic Medicine
                                  by seqadmin



                                  Artificial intelligence (AI) has evolved from a futuristic vision to a mainstream technology, highlighted by the introduction of tools like OpenAI's ChatGPT and Google's Gemini. In recent years, AI has become increasingly integrated into the field of genomics. This integration has enabled new scientific discoveries while simultaneously raising important ethical questions1. Interviews with two researchers at the center of this intersection provide insightful perspectives into...
                                  02-26-2024, 02:07 PM

                                ad_right_rmr

                                Collapse

                                News

                                Collapse

                                Topics Statistics Last Post
                                Started by seqadmin, 03-14-2024, 06:13 AM
                                0 responses
                                32 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 03-08-2024, 08:03 AM
                                0 responses
                                71 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 03-07-2024, 08:13 AM
                                0 responses
                                80 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 03-06-2024, 09:51 AM
                                0 responses
                                68 views
                                0 likes
                                Last Post seqadmin  
                                Working...
                                X