Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • fkrueger
    Senior Member
    • Sep 2009
    • 627

    Originally posted by Kuckunniwi View Post
    Hi,
    27 D00796:121:C9MR4ANXX:1:1103:15888:85556 1:N:0:GATCAG 16 chr10 71747 255 51M * 0 0 CTAAACAACATCACAAAACACTATCTCTATAATTTCTTTTTAAACCAAC CG 3<BBBGCGGG1FG1@F0EFGGGGEG1>1?1:@FGGGG1FC<=FBFFGGGFG NM:i:9 MD:Z:2AAA6CA3A2A2A24A3 XM:Z:Z..x........................x..z..h...h.......hhx..
    The alignment line is truncated, the methylation call is not quite long enough and the read does not contain information about the read or genome conversion which is required by the methylation extractor.

    Files can be truncated if the file was copied and the transfer corrupted the files, or potentially if the alignment process was terminated abruptly. This may happen either by pressing ctrl + C or something similar, or if the Bismark align process runs out of memory or gets killed by your OS.
    Normally it is sufficient to run the alignments again (making sure that you have enough system resources available). If you can see the final Bismark alignment report the file should be ready for the methylation extraction. Best, Felix

    Comment

    • Kuckunniwi
      Junior Member
      • Apr 2017
      • 3

      Originally posted by fkrueger View Post
      The alignment line is truncated, the methylation call is not quite long enough and the read does not contain information about the read or genome conversion which is required by the methylation extractor.
      Thanks for the answer! Actually the file is not truncated (it has a lot more lines while it will end exactly at this line in case of truncated file), most probably I was not able to copy the full line from the
      Code:
      less -S file.sam
      . Also, these files were processed once before. I will check if I accidentally truncated line after copying it and will edit this answer...

      UPD: yes, this .bam file does not contain any additional columns. I really wonder how they were processed before (I have bismark report for these files!) and how columns, not rows, may be truncated... ='(
      Last edited by Kuckunniwi; 04-11-2017, 08:24 AM.

      Comment

      • pig_raffles
        Member
        • Feb 2012
        • 23

        Hi,

        I have been using Bismark to align RRBS read data to a reference genome. Up until now I have been using the default alignment settings with Bowtie2. However I want to find out if these are optimal

        One way of doing this would be to vary the -L and -N parameters, however I find it quite confusing understanding what these parameters actually do. The default for -L is L,0,-0.2 but I am not sure what varying the two different numeric values would actually achieve (despite having read the Bowtie2 page), for instance what would a more relaxed alignment parameter allowing more mismatches look like?

        Finally, does anyone have any tips on choosing the best alignment results from repeated alignment runs using different parameters?

        Many thanks

        Comment

        • fkrueger
          Senior Member
          • Sep 2009
          • 627

          The score min function is linear (L), the first number is offset (which will be added to the final score which is determined by the number of mismatches, Ns, or insertions and deletions in the read), and the last number is the penalty that is allowed for each base pair in the read. As an example for a 100bp long read:

          L,0,-0.2 would allow an alignment score of 0 + (100 x -0.2) = -20

          A mismatch counts as -6, so this setting would allow up to 3 mismatches (AS = -18). A read with 4 mismatches (-24) or more would be rejected.

          An insertion of 3 bp somewhere in the read would cost -5 for opening a gap, and -2 for each bp of insertion, so 3 * -2 = -6 here. Thus, a 3bp insertion would count as -11, so you could add another mismatch somewhere in the read, or another smallish InDel if you want to stay under the allowed limit of -20.

          If you were to increase score min to L,0,-0.6 would allow an alignment score of 0 + (100 x -0.6) = -60

          This would now allow up to 10 (non-bisulfite) mismatches in the 100bp read, or a several mismatches and indels.

          If your RRBS reads were trimmed well (e.g. with Trim Galore in --rrbs mode) then you should normally see that the mapping efficiency doesn't change a great deal for score min scores of -0.2 to -0.6, but then it might start to increase quite a bit because you allow reads to map to locations where the read most likely doesn't belong. If I had to choose between pretty much the same mapping efficiency for different mapping scenarios I would go for the more stringent one because this reduces the chances of introducing mis-alignments to the results.

          Comment

          • fkrueger
            Senior Member
            • Sep 2009
            • 627

            Originally posted by pig_raffles View Post
            Hi,

            I have been using Bismark to align RRBS read data to a reference genome. Up until now I have been using the default alignment settings with Bowtie2. However I want to find out if these are optimal

            One way of doing this would be to vary the -L and -N parameters, however I find it quite confusing understanding what these parameters actually do. The default for -L is L,0,-0.2 but I am not sure what varying the two different numeric values would actually achieve (despite having read the Bowtie2 page), for instance what would a more relaxed alignment parameter allowing more mismatches look like?

            Finally, does anyone have any tips on choosing the best alignment results from repeated alignment runs using different parameters?

            Many thanks
            The score min function is linear (L), the first number is offset (which will be added to the final score which is determined by the number of mismatches, Ns, or insertions and deletions in the read), and the last number is the penalty that is allowed for each base pair in the read. As an example for a 100bp long read:

            L,0,-0.2 would allow an alignment score of 0 + (100 x -0.2) = -20

            A mismatch counts as -6, so this setting would allow up to 3 mismatches (AS = -18). A read with 4 mismatches (-24) or more would be rejected.

            An insertion of 3 bp somewhere in the read would cost -5 for opening a gap, and -2 for each bp of insertion, so 3 * -2 = -6 here. Thus, a 3bp insertion would count as -11, so you could add another mismatch somewhere in the read, or another smallish InDel if you want to stay under the allowed limit of -20.

            If you were to increase score min to L,0,-0.6 would allow an alignment score of 0 + (100 x -0.6) = -60

            This would now allow up to 10 (non-bisulfite) mismatches in the 100bp read, or a several mismatches and indels.

            If your RRBS reads were trimmed well (e.g. with Trim Galore in --rrbs mode) then you should normally see that the mapping efficiency doesn't change a great deal for score min scores of -0.2 to -0.6, but then it might start to increase quite a bit because you allow reads to map to locations where the read most likely doesn't belong. If I had to choose between pretty much the same mapping efficiency for different mapping scenarios I would go for the more stringent one because this reduces the chances of introducing mis-alignments to the results.

            Comment

            • fkrueger
              Senior Member
              • Sep 2009
              • 627

              The score min function is linear (L), the first number is offset (which will be added to the final score which is determined by the number of mismatches, Ns, or insertions and deletions in the read), and the last number is the penalty that is allowed for each base pair in the read. As an example for a 100bp long read:

              L,0,-0.2 would allow an alignment score of 0 + (100 x -0.2) = -20

              A mismatch counts as -6, so this setting would allow up to 3 mismatches (AS = -18). A read with 4 mismatches (-24) or more would be rejected.

              An insertion of 3 bp somewhere in the read would cost -5 for opening a gap, and -2 for each bp of insertion, so 3 * -2 = -6 here. Thus, a 3bp insertion would count as -11, so you could add another mismatch somewhere in the read, or another smallish InDel if you want to stay under the allowed limit of -20.

              If you were to increase score min to L,0,-0.6 would allow an alignment score of 0 + (100 x -0.6) = -60

              This would now allow up to 10 (non-bisulfite) mismatches in the 100bp read, or a several mismatches and indels.

              If your RRBS reads were trimmed well (e.g. with Trim Galore in --rrbs mode) then you should normally see that the mapping efficiency doesn't change a great deal for score min scores of -0.2 to -0.6, but then it might start to increase quite a bit because you allow reads to map to locations where the read most likely doesn't belong. If I had to choose between pretty much the same mapping efficiency for different mapping scenarios I would go for the more stringent one because this reduces the chances of introducing mis-alignments to the results.

              Best wishes, Felix

              Comment

              • pig_raffles
                Member
                • Feb 2012
                • 23

                Excellent, thank you for the very clear explanation

                Comment

                • twotwo
                  Member
                  • May 2012
                  • 40

                  Originally posted by Ahra View Post
                  hello. I am a newbie using bismark. I'm not a bioinformation and also i don't know much about NGS alignment, especially how to running alignment programs. But, i could use bismark thanks to kindly well made bismark user guide book.

                  i ran bismark and got the report file and sam file. so to get the position of methylC, i ran bismark_methylation_extractor especially used --bedGraph option to see methlylation %.

                  According to user guide book,

                  A typical command including the optional bedGraph --counts output could look like this:
                  bismark_methylation_extractor -s --bedGraph --counts --buffer_size 10G s_1_sequence.txt_bismark.sam


                  my data is paired-end so i edited little that one. like this:

                  bismark_methylation_extractor -p --no_overlap --comprehensive --CX
                  --bedGraph --counts --buffer_size 10G s_1_sequence.txt_bismark.sam


                  but, i couldn't get additional information about <chromosome> <start position> <end position> <methylation percentage>

                  my result was like thins

                  Bismark methylation extractor version v0.7.11
                  HWI-D00111:9926ADACXX:7:1101:1567:2165_1:N:0:CTTGTA/1 - Vr08 7498192 x
                  HWI-D00111:9926ADACXX:7:1101:1567:2165_1:N:0:CTTGTA/1 - Vr08 7498173 x
                  HWI-D00111:9926ADACXX:7:1101:1567:2165_1:N:0:CTTGTA/2 - Vr08 7498096 x
                  HWI-D00111:9926ADACXX:7:1101:1567:2165_1:N:0:CTTGTA/2 -

                  how can i get the methylation percentage information and run further bedGraph2Cytosines process!

                  i am looking forward anyone's reply
                  I have the same question as yours. Did you solve your problem? Then which command should I use the generate the methylation percentage?

                  Comment

                  • fkrueger
                    Senior Member
                    • Sep 2009
                    • 627

                    A typical command to extract methylation calls and get a coverage file is:

                    Code:
                    bismark_methylation_extractor --bedGraph --buffer_size 10G file_bismark.bam
                    Do you have any problems running this command, and if so can you post details?

                    Comment

                    • twotwo
                      Member
                      • May 2012
                      • 40

                      Originally posted by fkrueger View Post
                      A typical command to extract methylation calls and get a coverage file is:

                      Code:
                      bismark_methylation_extractor --bedGraph --buffer_size 10G file_bismark.bam
                      Do you have any problems running this command, and if so can you post details?
                      Hi, Thanks fkrueger! I ran your code. The following is the head what I have from the bedgraph file:

                      track type=bedGraph
                      chr11 110189 110190 100
                      chr11 110211 110212 100
                      chr11 113464 113465 100
                      chr11 113508 113509 100
                      chr11 113509 113510 100
                      chr11 113524 113525 100
                      chr11 113525 113526 100
                      chr11 123420 123421 100
                      chr11 123449 123450 100
                      So my question is: Is there all of the probe having 100% methylation percentage? That is a little weird....
                      By the way, I have another question: If I have paired samples, can I compare the methylation percentage with bismark? Thanks!

                      Comment

                      • fkrueger
                        Senior Member
                        • Sep 2009
                        • 627

                        You should probably look at the coverage file because this will also tell you how many counts you saw methylated or unmethylated. If you see 100% then I would suspect you saw only a single call for this position, which in this case happened to be methylated.

                        To compare different samples we tend to use SeqMonk, a lightweight but fast and powerful genome browser and analysis tool. Here are some presentations to about what methylation analysis in SeqMonk looks like. https://www.bioinformatics.babraham....ing.html#bsseq

                        Best, Felix

                        Comment

                        • twotwo
                          Member
                          • May 2012
                          • 40

                          Originally posted by fkrueger View Post
                          You should probably look at the coverage file because this will also tell you how many counts you saw methylated or unmethylated. If you see 100% then I would suspect you saw only a single call for this position, which in this case happened to be methylated.

                          To compare different samples we tend to use SeqMonk, a lightweight but fast and powerful genome browser and analysis tool. Here are some presentations to about what methylation analysis in SeqMonk looks like. https://www.bioinformatics.babraham....ing.html#bsseq

                          Best, Felix
                          Hi, Felix,
                          Thanks for your quick answer. Here is the head of the coverage file. Does that mean that I should merge the data (get all the information for one SNP) and get the methylation percentage?


                          chr11 110190 110190 100 1 0
                          chr11 110212 110212 100 2 0
                          chr11 113465 113465 100 1 0
                          chr11 113509 113509 100 4 0
                          chr11 113510 113510 100 1 0
                          chr11 113525 113525 100 2 0
                          chr11 113526 113526 100 1 0
                          chr11 123421 123421 100 1 0
                          chr11 123450 123450 100 1 0
                          chr11 123849 123849 100 5 0

                          Comment

                          • fkrueger
                            Senior Member
                            • Sep 2009
                            • 627

                            I am not quite sure if I understand your question here to be honest.

                            Code:
                            chr11 113509 113509 100 4 0
                            This example line means that for the position 113509 on chromosome 11 you had 4 methylation calls in total that were methylated (in the entire dataset), and 0 calls that were unmethylated. This translates into a 100% methylation percentage at this position (column 4). Also, the positions here are simply cytosines in the genome but not SNP.

                            Just to remind you this this the format:

                            Code:
                            The coverage output looks like this (tab-delimited, 1-based genomic coords):
                            ============================================================================================================================================
                            
                            <chromosome>  <start position>  <end position>  <methylation percentage>  <count methylated>  <count non-methylated>
                            I hope this helps.

                            Comment

                            • twotwo
                              Member
                              • May 2012
                              • 40

                              Thank you very much!

                              Comment

                              • Juulluu21
                                Junior Member
                                • Jun 2016
                                • 6

                                Data Analysis with Bismark

                                We have sequenced a genome using Illumina's True-seq bisulfite sequencing kit. After getting back the seq, we are analyzing methylation rate using Bismark. I Need help with the interpretation of the result and proper way of normalization.

                                Before sequencing: Sample DNA was divided into 2 groups: 1. Bisulfite treatment was carried out and DNA was subsequently sequenced (group 1, methylated group) 2. DNA was sequenced without bisulfite treatment (group 2, control group)

                                Both group was sequenced in paired-end fashion.

                                I am using Bismark to analyze the seq and trying to get the methylation rate in this particular genome. After running Bismark on Methylated files I got this finale percentages:

                                C methylated in CpG context: 0.6%

                                C methylated in CHG context: 0.5%

                                C methylated in CHH context: 0.7%

                                Whereas after running Bismark on my Control files I got these percentages:

                                C methylated in CpG context: 99.6%

                                C methylated in CHG context: 99.3%

                                C methylated in CHH context: 99.9%

                                So, how would I interpret my data?

                                a. Is 0.6 % (CpG) the actual methylation percentage in my genome?

                                b. I have found in some literatures that if CpG, CHG, and CHH percentages are very close, that means that genome actually does not do methylation. Is it true?

                                c. What was the purpose of using the control group (group 2)? Do I still need any spike-in control to normalize the data? If so, what that could be?

                                Thank you very much for reading this long post!!

                                Bests!!!

                                Comment

                                Latest Articles

                                Collapse

                                • SEQadmin2
                                  From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
                                  by SEQadmin2


                                  Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


                                  The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
                                  ...
                                  06-02-2026, 10:05 AM
                                • SEQadmin2
                                  Single-Cell Sequencing at an Inflection Point: Early Impacts of New Platforms and Emerging Trends
                                  by SEQadmin2


                                  With the launch of new single-cell sequencing platforms in 2026, the field stands at an exciting inflection point. This article surveys the most impactful advances in the field and discusses how they’re reshaping research in cancer, immunology, and beyond.


                                  Introduction

                                  Single-cell sequencing technologies have undergone remarkable advances over the past decade, transitioning from low-throughput experimental approaches to highly scalable platforms capable of...
                                  05-22-2026, 06:42 AM
                                • SEQadmin2
                                  Environmental Genomics in the Age of NGS: From Microbes to Conservation Strategies
                                  by SEQadmin2

                                  Studying ecosystems means dealing with complex, multi-species communities that are hard to observe at scale. This complexity, however, hides many important questions to be answered, from how biogeochemical cycles work and how climate change can affect species distribution to how conservation strategies can work best.


                                  Genomics, particularly since the expansion of NGS, has transformed ecosystem ecology. By sequencing environmental DNA, we can now assess biodiversity without direct...
                                  05-06-2026, 09:04 AM

                                ad_right_rmr

                                Collapse

                                News

                                Collapse

                                Topics Statistics Last Post
                                Started by SEQadmin2, 06-02-2026, 12:03 PM
                                0 responses
                                19 views
                                0 reactions
                                Last Post SEQadmin2  
                                Started by SEQadmin2, 06-02-2026, 11:40 AM
                                0 responses
                                14 views
                                0 reactions
                                Last Post SEQadmin2  
                                Started by SEQadmin2, 05-28-2026, 11:40 AM
                                0 responses
                                29 views
                                0 reactions
                                Last Post SEQadmin2  
                                Started by SEQadmin2, 05-26-2026, 10:12 AM
                                0 responses
                                31 views
                                0 reactions
                                Last Post SEQadmin2  
                                Working...