Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Bismark release v0.9.0

    We have just released a new version of Bismark (v0.9.0) which most notably introduces a new HTML report to produce a visual summary of the alignment, deduplication and M-bias statistics of a BS-Seq experiment (requires Javascript). Here are examples for a standard paired-end BS-Seq or a single-end PBAT report. It also adds a new Unknown cytosine context for Bowtie 2 alignments for Cs that are very close to Ns or insertions in the reference sequence. Here are all the changes in more detail:

    Bismark: Implemented the new methylation call symbols 'U' and 'u' for methylated or unmethylated cytosines in Unknown sequence context, respectively. If the sequence context bases contain an N, e.g. CN or CHN, the context cannot be determined accurately (previously, these cases were assumed to be in CHH context). These situations may arise whenever the reference sequence contains Ns, or when insertions in the read occur close to a cytosine position (bases inserted into the read have no direct equivalent in the reference sequence and were assumed to be Ns for the methylation call). In practical terms, the 'U/u' methylation calls will only occur for Bowtie 2 alignments because Bowtie 1 does not support gapped alignments or read alignments if the reference contains any N's. The Bismark report will now also include the 'U/u' statistics, such as count and % methylation, however only if run in Bowtie 2 mode.

    bismark2report: this new module generates a graphical interactive HTML report of the Bismark alignment, deduplication, splitting and M-bias statistics for convenient visualisation of what is going on. Since several different modules of Bismark may be included into this report that may or may not have been run, bismark2report requires the user to specify the relevant reports as input files. Many thanks to Phil Ewels for the conceptual design and his help with this report.

    Bismark: Fixed a bug affecting the generation of the alignment overview pie chart which occurred for PBAT libraries only

    Methylation Extractor: Added handling of the newly introduced methylation call U/u for cytosines in Unknown sequence context (CN or CHN). These methylation calls are simply ignored in the extraction process to not cause too much confusion for downstream analysis

    bismark2bedGraph: Added a check to see whether input files start with CpG_* or not. If they don't, please include the option '--CX' when running bismark2bedGraph as a stand-alone tool

    The new bismark2report module appears to be working well with the latest version Bismark, but I didn’t have the time to investigate how far backward-compatible it is. Also, error handling or documentation might need some further updating. However, as it is the summer holiday period I won’t be able to fix any potential bugs immediately, but comments/criticism or suggestions are very welcome and I will deal with them upon my return.

    Bismark is available for download from https://www.bioinformatics.babraham....jects/bismark/.

    Comment


    • The new HTML reports are very slick!

      Comment


      • Hi Felix,

        Very good update, this soft is really awesome

        I have run the deduplicate script with a SAM file created with the previous version of Bismark (0.8.3). It works fine, tells you on screen the number of alignments finally printed out of the total analyzed . However, the report file is totally blank (0 bytes)

        Any thoughts?

        This is what you can see on screen:
        Code:
        Now printing out alignments with the most representative methylation call(s)
        
        Total number of alignments analysed in BP2_FIS17_trim_galore_1.fq_bismark_bt2_pe.sam: 39944049
        Total number of representative alignments printed from B2FIS17_trim_galore_1.fq_bismark_bt2_pe.sam in total:        38271493 (95.81%)
        Regarding the options, Will you recommend to run the deduplication script with the option --representative for paired end WGBS samples?



        Cheers

        PS. I have a last minute doubt, Does the --no_overlap option make any sense for directional experiments (only OT and OB)?
        Last edited by oria34; 08-20-2013, 09:11 AM.

        Comment


        • The --no_overlap option is for paired-end libraries, regardless of whether they are directional or non-directional. Basically, you don't want to "double count" methylation calls from any overlapping sequence of read_1 and read_2 from a paired-end read. The --no_overlap option means that bismark_methylation_extractor will only use the methylation calls from read_1 of a read-pair in the overlapping region.

          Comment


          • Hi oria,

            Regarding the empty deduplication report, I would guess that this is probably a bug in --representative mode. Speaking of which, I would not recommend using --representative mode anyway because it will give you the most highly amplified pcr duplicate for each position and not a random one. I might lose this option entirely for future versions since it seems to cause confusion only. Can you just rerun the deduplication in default mode?

            Comment


            • Originally posted by PeteH View Post
              The --no_overlap option is for paired-end libraries, regardless of whether they are directional or non-directional. Basically, you don't want to "double count" methylation calls from any overlapping sequence of read_1 and read_2 from a paired-end read. The --no_overlap option means that bismark_methylation_extractor will only use the methylation calls from read_1 of a read-pair in the overlapping region.
              Well, to my understanding the reads of a pair always mapp to different strands (always?). Methylations calls from reads mapping to one strand will be informative only for that strand and therefore the pair will never be informative (actually Bismark doesn't take into consideration that reads for the methylation call, Does it?).

              Please, correct me if I am wrong. I am quite new in all this so any new information will be very welcome



              Originally posted by fkrueger View Post
              Hi oria,

              Regarding the empty deduplication report, I would guess that this is probably a bug in --representative mode. Speaking of which, I would not recommend using --representative mode anyway because it will give you the most highly amplified pcr duplicate for each position and not a random one. I might lose this option entirely for future versions since it seems to cause confusion only. Can you just rerun the deduplication in default mode?
              Yeah, you are right. Rerunning without the option generates the full report.

              Many thanks for the tip about the representative option, that was also my guest but I wanted to try it anyway just to compare

              Cheers

              Comment


              • Originally posted by oria34 View Post
                Well, to my understanding the reads of a pair always mapp to different strands (always?). Methylations calls from reads mapping to one strand will be informative only for that strand and therefore the pair will never be informative (actually Bismark doesn't take into consideration that reads for the methylation call, Does it?).

                Please, correct me if I am wrong. I am quite new in all this so any new information will be very welcome
                Sort of, read pairs are associated with a real (OT or OB) or theoretical (CTOT or CTOB) strand (remember that both reads arise from the same original template, which represents a single strand), which may or may not be the same as the orientation of the reads. So, if a read pair arises from the original top strand, even the read with a reverse orientation is giving information about the + strand, rather than the - strand.

                Comment


                • Originally posted by dpryan View Post
                  Sort of, read pairs are associated with a real (OT or OB) or theoretical (CTOT or CTOB) strand (remember that both reads arise from the same original template, which represents a single strand), which may or may not be the same as the orientation of the reads. So, if a read pair arises from the original top strand, even the read with a reverse orientation is giving information about the + strand, rather than the - strand.
                  Many thanks for your reply!

                  Is that true also in the case your experiment is directional? i.e. if you only have only OT and OB mapped reads?

                  Comment


                  • Read 2 is always informative for the alignment strand of read 1, so yes, it will be relevant for any kind of library.

                    Comment


                    • bismark_methylation_extraction sort

                      Hi all,
                      I have a question regarding sorting the methXtractor.temp files
                      I may not have enough background using this tool, but I wonder why the sort command in bismark2bedGraph script
                      open my $ifh, "sort -S $sort_size -T $sort_dir -k3,3 -k4,4n $in |" or die "Input file could not be sorted. $!";
                      has the parameter -k3,3

                      In each methXtractor.temp file, isn't the third column always the same in all the lines of the file (name of the chromosome)?

                      only specifying the "-k4,4n" parameter seems to be faster :
                      for a methXtractor.temp containing 254 882 387 lines, sorting with only "-k4,4n" takes about 100 minutes to complete, whereas the same file sorted with "-k3,3 -k4,4n" is still running after 240 minutes.

                      Do I miss something? What can happen if I delete the "-k3,3" parameter from the sort command line?

                      Thank you for your help

                      Gérald
                      my full command line :
                      bismark_methylation_extractor file.bam --paired-end --no_overlap --report --bedGraph --counts --cytosine_report --zero_based --CX_context --buffer_size 24G --genome_folder /save/bismarkGenome/gg4/ -o /work/gg/
                      Last edited by gerald2545; 08-23-2013, 03:26 AM.

                      Comment


                      • Sorting by chromosome in addition to the position might indeed be a relict of former versions of the script back when files weren't sorted into individual chromosome files. I'll take a look at this once I am back, but for the moment you should be fine just deleting the -k 3,3 from the sort command.

                        Comment


                        • Thank you Felix for your time, but don't forget : you are on holidays

                          Gérald

                          Comment


                          • jute a follow-up, the same file took 1500 minutes to sort with -k3,3 parameter

                            gerald

                            Comment


                            • Seems like a point worthy addressing... But now I shall focus on holiday!

                              Comment


                              • Hi all,

                                Running the last version of Bismark and focusing on the name of the reads (we discuss about that a while ago here) I have found that the names of the pairs are no longer /1 & /2. In my case both member of a pair are names "...../1" & "....../1".

                                I know it doesn't matter too much since Bismark do the methylation call properly but I was wondering whether it can interfere with other downstream applications or genome viewers.



                                Left alignment
                                ----------------------
                                Read name = FCD1LHLACXX:8:2308:5026:30317#ACCAGACT/1
                                Location = groupXXI:767
                                Alignment start = 756 (+)
                                Cigar = 99M
                                Mapped = yes
                                Mapping quality = 255
                                ----------------------
                                Base = A
                                Base phred quality = 39
                                ----------------------
                                Pair start = groupXXI:900 (-)
                                Pair is mapped = yes
                                Insert size = 242
                                Pair orientation = F2R1
                                ----------------------
                                Second in pair
                                -------------------
                                XG = GA
                                NM = 16
                                XM = ...........x.............h..........x...xh......hh..xh..........x.h...
                                x........h..x.....x.h........

                                XR = GA
                                XX = 11G13G10G3GG6GG2GG10G1G3G8G2G5G1G8
                                -------------------Right alignment
                                ----------------------
                                Read name = FCD1LHLACXX:8:2308:5026:30317#ACCAGACT/1
                                Location = groupXXI:767
                                Alignment start = 900 (-)
                                Cigar = 98M
                                Mapped = yes
                                Mapping quality = 255
                                ----------------------
                                ----------------------
                                Pair start = groupXXI:756 (+)
                                Pair is mapped = yes
                                Insert size = -242
                                Pair orientation = F2R1
                                ----------------------
                                First in pair
                                -------------------
                                XG = GA
                                NM = 19
                                XM = ........x..hh..x.Z..xh....x.....xh.....h.........Z...xh......x........
                                ....xh..xh.......x.....Z....

                                XR = CT
                                XX = 8G2GG2G4GG4G5GG5G13GG6G12GG1TGG7G10
                                -------------------

                                Comment

                                Latest Articles

                                Collapse

                                • seqadmin
                                  Strategies for Sequencing Challenging Samples
                                  by seqadmin


                                  Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                                  03-22-2024, 06:39 AM
                                • seqadmin
                                  Techniques and Challenges in Conservation Genomics
                                  by seqadmin



                                  The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                                  Avian Conservation
                                  Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                                  03-08-2024, 10:41 AM

                                ad_right_rmr

                                Collapse

                                News

                                Collapse

                                Topics Statistics Last Post
                                Started by seqadmin, Yesterday, 06:37 PM
                                0 responses
                                11 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, Yesterday, 06:07 PM
                                0 responses
                                10 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 03-22-2024, 10:03 AM
                                0 responses
                                51 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 03-21-2024, 07:32 AM
                                0 responses
                                68 views
                                0 likes
                                Last Post seqadmin  
                                Working...
                                X