Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Originally posted by Kuckunniwi View Post
    Hi,
    27 D00796:121:C9MR4ANXX:1:1103:15888:85556 1:N:0:GATCAG 16 chr10 71747 255 51M * 0 0 CTAAACAACATCACAAAACACTATCTCTATAATTTCTTTTTAAACCAAC CG 3<BBBGCGGG1FG1@F0EFGGGGEG1>1?1:@FGGGG1FC<=FBFFGGGFG NM:i:9 MD:Z:2AAA6CA3A2A2A24A3 XM:Z:Z..x........................x..z..h...h.......hhx..
    The alignment line is truncated, the methylation call is not quite long enough and the read does not contain information about the read or genome conversion which is required by the methylation extractor.

    Files can be truncated if the file was copied and the transfer corrupted the files, or potentially if the alignment process was terminated abruptly. This may happen either by pressing ctrl + C or something similar, or if the Bismark align process runs out of memory or gets killed by your OS.
    Normally it is sufficient to run the alignments again (making sure that you have enough system resources available). If you can see the final Bismark alignment report the file should be ready for the methylation extraction. Best, Felix

    Comment


    • Originally posted by fkrueger View Post
      The alignment line is truncated, the methylation call is not quite long enough and the read does not contain information about the read or genome conversion which is required by the methylation extractor.
      Thanks for the answer! Actually the file is not truncated (it has a lot more lines while it will end exactly at this line in case of truncated file), most probably I was not able to copy the full line from the
      Code:
      less -S file.sam
      . Also, these files were processed once before. I will check if I accidentally truncated line after copying it and will edit this answer...

      UPD: yes, this .bam file does not contain any additional columns. I really wonder how they were processed before (I have bismark report for these files!) and how columns, not rows, may be truncated... ='(
      Last edited by Kuckunniwi; 04-11-2017, 08:24 AM.

      Comment


      • Hi,

        I have been using Bismark to align RRBS read data to a reference genome. Up until now I have been using the default alignment settings with Bowtie2. However I want to find out if these are optimal

        One way of doing this would be to vary the -L and -N parameters, however I find it quite confusing understanding what these parameters actually do. The default for -L is L,0,-0.2 but I am not sure what varying the two different numeric values would actually achieve (despite having read the Bowtie2 page), for instance what would a more relaxed alignment parameter allowing more mismatches look like?

        Finally, does anyone have any tips on choosing the best alignment results from repeated alignment runs using different parameters?

        Many thanks

        Comment


        • The score min function is linear (L), the first number is offset (which will be added to the final score which is determined by the number of mismatches, Ns, or insertions and deletions in the read), and the last number is the penalty that is allowed for each base pair in the read. As an example for a 100bp long read:

          L,0,-0.2 would allow an alignment score of 0 + (100 x -0.2) = -20

          A mismatch counts as -6, so this setting would allow up to 3 mismatches (AS = -18). A read with 4 mismatches (-24) or more would be rejected.

          An insertion of 3 bp somewhere in the read would cost -5 for opening a gap, and -2 for each bp of insertion, so 3 * -2 = -6 here. Thus, a 3bp insertion would count as -11, so you could add another mismatch somewhere in the read, or another smallish InDel if you want to stay under the allowed limit of -20.

          If you were to increase score min to L,0,-0.6 would allow an alignment score of 0 + (100 x -0.6) = -60

          This would now allow up to 10 (non-bisulfite) mismatches in the 100bp read, or a several mismatches and indels.

          If your RRBS reads were trimmed well (e.g. with Trim Galore in --rrbs mode) then you should normally see that the mapping efficiency doesn't change a great deal for score min scores of -0.2 to -0.6, but then it might start to increase quite a bit because you allow reads to map to locations where the read most likely doesn't belong. If I had to choose between pretty much the same mapping efficiency for different mapping scenarios I would go for the more stringent one because this reduces the chances of introducing mis-alignments to the results.

          Comment


          • Originally posted by pig_raffles View Post
            Hi,

            I have been using Bismark to align RRBS read data to a reference genome. Up until now I have been using the default alignment settings with Bowtie2. However I want to find out if these are optimal

            One way of doing this would be to vary the -L and -N parameters, however I find it quite confusing understanding what these parameters actually do. The default for -L is L,0,-0.2 but I am not sure what varying the two different numeric values would actually achieve (despite having read the Bowtie2 page), for instance what would a more relaxed alignment parameter allowing more mismatches look like?

            Finally, does anyone have any tips on choosing the best alignment results from repeated alignment runs using different parameters?

            Many thanks
            The score min function is linear (L), the first number is offset (which will be added to the final score which is determined by the number of mismatches, Ns, or insertions and deletions in the read), and the last number is the penalty that is allowed for each base pair in the read. As an example for a 100bp long read:

            L,0,-0.2 would allow an alignment score of 0 + (100 x -0.2) = -20

            A mismatch counts as -6, so this setting would allow up to 3 mismatches (AS = -18). A read with 4 mismatches (-24) or more would be rejected.

            An insertion of 3 bp somewhere in the read would cost -5 for opening a gap, and -2 for each bp of insertion, so 3 * -2 = -6 here. Thus, a 3bp insertion would count as -11, so you could add another mismatch somewhere in the read, or another smallish InDel if you want to stay under the allowed limit of -20.

            If you were to increase score min to L,0,-0.6 would allow an alignment score of 0 + (100 x -0.6) = -60

            This would now allow up to 10 (non-bisulfite) mismatches in the 100bp read, or a several mismatches and indels.

            If your RRBS reads were trimmed well (e.g. with Trim Galore in --rrbs mode) then you should normally see that the mapping efficiency doesn't change a great deal for score min scores of -0.2 to -0.6, but then it might start to increase quite a bit because you allow reads to map to locations where the read most likely doesn't belong. If I had to choose between pretty much the same mapping efficiency for different mapping scenarios I would go for the more stringent one because this reduces the chances of introducing mis-alignments to the results.

            Comment


            • The score min function is linear (L), the first number is offset (which will be added to the final score which is determined by the number of mismatches, Ns, or insertions and deletions in the read), and the last number is the penalty that is allowed for each base pair in the read. As an example for a 100bp long read:

              L,0,-0.2 would allow an alignment score of 0 + (100 x -0.2) = -20

              A mismatch counts as -6, so this setting would allow up to 3 mismatches (AS = -18). A read with 4 mismatches (-24) or more would be rejected.

              An insertion of 3 bp somewhere in the read would cost -5 for opening a gap, and -2 for each bp of insertion, so 3 * -2 = -6 here. Thus, a 3bp insertion would count as -11, so you could add another mismatch somewhere in the read, or another smallish InDel if you want to stay under the allowed limit of -20.

              If you were to increase score min to L,0,-0.6 would allow an alignment score of 0 + (100 x -0.6) = -60

              This would now allow up to 10 (non-bisulfite) mismatches in the 100bp read, or a several mismatches and indels.

              If your RRBS reads were trimmed well (e.g. with Trim Galore in --rrbs mode) then you should normally see that the mapping efficiency doesn't change a great deal for score min scores of -0.2 to -0.6, but then it might start to increase quite a bit because you allow reads to map to locations where the read most likely doesn't belong. If I had to choose between pretty much the same mapping efficiency for different mapping scenarios I would go for the more stringent one because this reduces the chances of introducing mis-alignments to the results.

              Best wishes, Felix

              Comment


              • Excellent, thank you for the very clear explanation

                Comment


                • Originally posted by Ahra View Post
                  hello. I am a newbie using bismark. I'm not a bioinformation and also i don't know much about NGS alignment, especially how to running alignment programs. But, i could use bismark thanks to kindly well made bismark user guide book.

                  i ran bismark and got the report file and sam file. so to get the position of methylC, i ran bismark_methylation_extractor especially used --bedGraph option to see methlylation %.

                  According to user guide book,

                  A typical command including the optional bedGraph --counts output could look like this:
                  bismark_methylation_extractor -s --bedGraph --counts --buffer_size 10G s_1_sequence.txt_bismark.sam


                  my data is paired-end so i edited little that one. like this:

                  bismark_methylation_extractor -p --no_overlap --comprehensive --CX
                  --bedGraph --counts --buffer_size 10G s_1_sequence.txt_bismark.sam


                  but, i couldn't get additional information about <chromosome> <start position> <end position> <methylation percentage>

                  my result was like thins

                  Bismark methylation extractor version v0.7.11
                  HWI-D00111:9926ADACXX:7:1101:1567:2165_1:N:0:CTTGTA/1 - Vr08 7498192 x
                  HWI-D00111:9926ADACXX:7:1101:1567:2165_1:N:0:CTTGTA/1 - Vr08 7498173 x
                  HWI-D00111:9926ADACXX:7:1101:1567:2165_1:N:0:CTTGTA/2 - Vr08 7498096 x
                  HWI-D00111:9926ADACXX:7:1101:1567:2165_1:N:0:CTTGTA/2 -

                  how can i get the methylation percentage information and run further bedGraph2Cytosines process!

                  i am looking forward anyone's reply
                  I have the same question as yours. Did you solve your problem? Then which command should I use the generate the methylation percentage?

                  Comment


                  • A typical command to extract methylation calls and get a coverage file is:

                    Code:
                    bismark_methylation_extractor --bedGraph --buffer_size 10G file_bismark.bam
                    Do you have any problems running this command, and if so can you post details?

                    Comment


                    • Originally posted by fkrueger View Post
                      A typical command to extract methylation calls and get a coverage file is:

                      Code:
                      bismark_methylation_extractor --bedGraph --buffer_size 10G file_bismark.bam
                      Do you have any problems running this command, and if so can you post details?
                      Hi, Thanks fkrueger! I ran your code. The following is the head what I have from the bedgraph file:

                      track type=bedGraph
                      chr11 110189 110190 100
                      chr11 110211 110212 100
                      chr11 113464 113465 100
                      chr11 113508 113509 100
                      chr11 113509 113510 100
                      chr11 113524 113525 100
                      chr11 113525 113526 100
                      chr11 123420 123421 100
                      chr11 123449 123450 100
                      So my question is: Is there all of the probe having 100% methylation percentage? That is a little weird....
                      By the way, I have another question: If I have paired samples, can I compare the methylation percentage with bismark? Thanks!

                      Comment


                      • You should probably look at the coverage file because this will also tell you how many counts you saw methylated or unmethylated. If you see 100% then I would suspect you saw only a single call for this position, which in this case happened to be methylated.

                        To compare different samples we tend to use SeqMonk, a lightweight but fast and powerful genome browser and analysis tool. Here are some presentations to about what methylation analysis in SeqMonk looks like. https://www.bioinformatics.babraham....ing.html#bsseq

                        Best, Felix

                        Comment


                        • Originally posted by fkrueger View Post
                          You should probably look at the coverage file because this will also tell you how many counts you saw methylated or unmethylated. If you see 100% then I would suspect you saw only a single call for this position, which in this case happened to be methylated.

                          To compare different samples we tend to use SeqMonk, a lightweight but fast and powerful genome browser and analysis tool. Here are some presentations to about what methylation analysis in SeqMonk looks like. https://www.bioinformatics.babraham....ing.html#bsseq

                          Best, Felix
                          Hi, Felix,
                          Thanks for your quick answer. Here is the head of the coverage file. Does that mean that I should merge the data (get all the information for one SNP) and get the methylation percentage?


                          chr11 110190 110190 100 1 0
                          chr11 110212 110212 100 2 0
                          chr11 113465 113465 100 1 0
                          chr11 113509 113509 100 4 0
                          chr11 113510 113510 100 1 0
                          chr11 113525 113525 100 2 0
                          chr11 113526 113526 100 1 0
                          chr11 123421 123421 100 1 0
                          chr11 123450 123450 100 1 0
                          chr11 123849 123849 100 5 0

                          Comment


                          • I am not quite sure if I understand your question here to be honest.

                            Code:
                            chr11 113509 113509 100 4 0
                            This example line means that for the position 113509 on chromosome 11 you had 4 methylation calls in total that were methylated (in the entire dataset), and 0 calls that were unmethylated. This translates into a 100% methylation percentage at this position (column 4). Also, the positions here are simply cytosines in the genome but not SNP.

                            Just to remind you this this the format:

                            Code:
                            The coverage output looks like this (tab-delimited, 1-based genomic coords):
                            ============================================================================================================================================
                            
                            <chromosome>  <start position>  <end position>  <methylation percentage>  <count methylated>  <count non-methylated>
                            I hope this helps.

                            Comment


                            • Thank you very much!

                              Comment


                              • Data Analysis with Bismark

                                We have sequenced a genome using Illumina's True-seq bisulfite sequencing kit. After getting back the seq, we are analyzing methylation rate using Bismark. I Need help with the interpretation of the result and proper way of normalization.

                                Before sequencing: Sample DNA was divided into 2 groups: 1. Bisulfite treatment was carried out and DNA was subsequently sequenced (group 1, methylated group) 2. DNA was sequenced without bisulfite treatment (group 2, control group)

                                Both group was sequenced in paired-end fashion.

                                I am using Bismark to analyze the seq and trying to get the methylation rate in this particular genome. After running Bismark on Methylated files I got this finale percentages:

                                C methylated in CpG context: 0.6%

                                C methylated in CHG context: 0.5%

                                C methylated in CHH context: 0.7%

                                Whereas after running Bismark on my Control files I got these percentages:

                                C methylated in CpG context: 99.6%

                                C methylated in CHG context: 99.3%

                                C methylated in CHH context: 99.9%

                                So, how would I interpret my data?

                                a. Is 0.6 % (CpG) the actual methylation percentage in my genome?

                                b. I have found in some literatures that if CpG, CHG, and CHH percentages are very close, that means that genome actually does not do methylation. Is it true?

                                c. What was the purpose of using the control group (group 2)? Do I still need any spike-in control to normalize the data? If so, what that could be?

                                Thank you very much for reading this long post!!

                                Bests!!!

                                Comment

                                Latest Articles

                                Collapse

                                • seqadmin
                                  Strategies for Sequencing Challenging Samples
                                  by seqadmin


                                  Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                                  03-22-2024, 06:39 AM
                                • seqadmin
                                  Techniques and Challenges in Conservation Genomics
                                  by seqadmin



                                  The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                                  Avian Conservation
                                  Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                                  03-08-2024, 10:41 AM

                                ad_right_rmr

                                Collapse

                                News

                                Collapse

                                Topics Statistics Last Post
                                Started by seqadmin, 03-27-2024, 06:37 PM
                                0 responses
                                13 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 03-27-2024, 06:07 PM
                                0 responses
                                11 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 03-22-2024, 10:03 AM
                                0 responses
                                53 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 03-21-2024, 07:32 AM
                                0 responses
                                69 views
                                0 likes
                                Last Post seqadmin  
                                Working...
                                X