Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #91
    Hi Felix

    thank you for your software.
    I want to use Bismark to align paired-end bisulfite sequences from mamalian genome. which option can i use to run it in parallel threads, I checked -p ,but it didn't work.

    best regards
    jamal

    Comment


    • #92
      Hi Jamal,

      I have tried to implement a -p option, but unfortunately it is technically not so straight forward to get this to work. The easiest way to parallelize Bismark right now is to split the input file(s) into smaller subfiles and merge the ouput again afterwards. (Just as a side note, with appropriate alignment parameters 1 thread of Bismark should align roughly 10-20 M sequences per hour)

      I am sorry that the parallelization is currently not easier, maybe I should work on implementing a similar strategy (splitting files internally and merging the output again) for a future release. If you have any questions regarding parameters just send me an email.

      Best wishes,
      Felix

      Comment


      • #93
        Why not -p option for Bismark

        I got the same query,and Felix,Bismark developer,gave a explanation about this,the details are as follows
        ++++++++++++++++++++++++++++++++++++++++++++++
        Let's say you give each of the bowtie threads 4 CPUs instead of 1. This could practically mean that
        CPU1 gets reads 1-5,
        CPU2 gets reads 6-10,
        CPU3 gets reads 11-15, and
        CPU4 gets reads 16-20.

        Now if for whatever reason read 16 aligns first (maybe a perfect match) it would get reported as the first alignment of the 'thread 1' fbowtie instance, which practically means that all other reads (1-15) would be scored as having no alignment. It is the randomness of Bowtie's reporting that does not allow specifying a higher -p option, I have tried it already.

        If you have spare capacity on your computer you can split up your input files though, and run multiple instances of Bismark at the same time and merge them again afterwards. Just keep in mind that running Bismark once will use 5 CPU threads and for e.g. the human genome 8 GB of RAM. Make sure that you do not run more instances of Bismark as you have got RAM available as this might cause some threads of Bowtie to die due to insufficient memory which will ultimately produce wrong alignment results.

        I hope I explained it better this time,

        Comment


        • #94
          Just a quick note that Bismark v0.5.1 has just been released, which fixes a few minor issues:

          - The genome folder for the bismark_genome_preparation can now be specified either as absolute or relative path.
          - Fixed a bug where a newline character was missing after the quality values in the unmapped reads FastQ output.
          - Fixed a bug which prevented paired-end alignments in FastA format.
          - Input files for the methylation extractor can now also have a relative path.
          - The bismark2bedGraph script received an update to fix a 1-off error.

          All associated files can be found at http://www.bioinformatics.bbsrc.ac.uk/projects/bismark/ (Ctrl+Refresh might be needed to force a cache update).

          Felix

          Comment


          • #95
            Question about calculating %methylated from bismark data...

            Hi Felix,

            I want to calculate the ratio of methylated CpGs to total CpGs for a given stretch of DNA.
            My Bismark data is non-directional, thus it is my understanding from the manual that
            Cs methylated on the bottom strand will appear as methylated Gs on the top strand.

            So if a given stretch of DNA has 10 CpGs on the top strand, and Bismark reports
            let's say 5 positions as methylated, is the ratio of methylated CpGs 5/10, or 5/20?

            This example has 10 CpGs on the top strand, which means it has 10 on the bottom
            strand as well, for a total of 20. Again, since Bismark is reporting data from
            both strands, should I include both strands in my 'total CpGs' calculation?

            Thank you!

            Larry Wilhelm

            Comment


            • #96
              Hi Larry,

              Thanks for this confusing example . Just to quickly summarise for your example with 10 CpGs on the top strand:

              Cs on the original top strand (OT) and also Gs at the same position on the complementary strand (CTOT) will report the methylation status of the original top strand.

              Likewise, Cs on the original bottom strand (OB) and Gs on its complementary strand (CTOB) will report the methylation of the original bottom strand.

              It is not granted that you will see reads from both OT or CTOT and OB or CTOB in the same sequencing experiment, so you can end up having methylation information for one strand only. Thus, I think you can't simply use the total number Cs present on both strands and calculate the methylation ratio as, in your example, 5 out of 20 CpGs found methylated, thus 25% methylation. Rather, you will have to take the number of Cs found methylated - per strand - and divide this number by the total number of Cs you actually observed in this region. Then you can compare the methylation on both strands to see if you have the same level of methylation on both strands or whether they differ from each other, and average over both of them.

              In it's internal calculation, Bismark just divides the number of methylated Cs by (methylated+unmethylated) Cs observed to get a percentage methylation.

              I hope this doesn't add to the confusion

              Felix

              Comment


              • #97
                Follow up: Question about calculating %methylated from Bismark data...

                Thanks Felix, very helpful.

                You have reduced confusion. I have added to it because I was mistaken, my library is directional.

                Here's some real data. I've listed the lines from the CpG_context output of the
                Methylation Extractor that correspond to a small region of interest. I've appended the output with
                two fields, the base on the top strand at that position, and my interpretation of the
                bismark output:

                read_id:2:82:3541:4552#0 - 1 1343509 z : C : un-methylated C on top strand
                read_id:2:82:3541:4552#0 + 1 1343515 Z : C : methylated C on top strand
                read_id:2:82:3541:4552#0 + 1 1343524 Z : C : methylated C on top strand
                read_id:2:82:3541:4552#0 + 1 1343526 Z : C : methylated C on top strand
                read_id:2:82:3541:4552#0 + 1 1343528 Z : C : methylated C on top strand
                read_id:2:82:3541:4552#0 - 1 1343570 z : C : un-methylated C on top strand
                read_id:2:82:3541:4552#0 - 1 1343561 z : C : un-methylated C on top strand
                read_id:2:90:16977:1371#0 - 1 1343619 z : G : un-methylated C on bottom strand
                read_id:2:90:16977:1371#0 + 1 1343562 Z : G : methylated C on bottom strand
                read_id:2:90:16977:1371#0 + 1 1343571 Z : G : methylated C on bottom strand

                CpGs on top strand = 11 (counted from the sequence itself)
                CpGs for which we have alignment data = 10

                mCs_top_strand = 4 (methylated CpGs on top strand)
                un-mCs_top_strand = 3 (un-methylated CpGs on top strand)

                mCs_bottom_strand = 2
                un-mCs_bottom_strand = 1

                %mCs_top_strand = 4/7 = 57%
                %mCs_bottom_strand = 2/3 = 67%

                Is this all a reasonable interpretation?

                We do not have anything near saturating coverage in these experiments, so just
                reporting the number of Cs methylated as a fraction of the total for which we
                have alignment data seems appropriate.

                Thanks again.

                Larry

                Comment


                • #98
                  Hi Larry,

                  This looks reasonable to me for a strand specific methylation rate. To get the total methylation state it would then just be (methylated Cs top strand
                  + methylated Cs bottom strand) / (all Cs top strand + all Cs bottom strand) * 100 = % methylation.

                  Best,
                  Felix

                  Comment


                  • #99
                    Dear all,

                    I wanted to let you know that there is now a new version of the bismark2SAM conversion script (v4), which is available for download here (force a browser refresh if needed (Ctrl+refresh)).
                    Previous versions reported slightly wrong alignments for paired-end or non-directional libraries.

                    The new version should now report correct alignments for all kinds of sequencing libraries (SE and PE, directional and non-directional). Also, sorting of SAM files by chromosome and start position is now optional, as files can be sorted by Samtools sort later on.

                    I advise users also to no longer use the bismark2BAM script anymore as it contains the same bugs as the older bismark2SAM script(s), but rather use the following workflow for data analysis via the SAM/BAM and/or pileup route:

                    - generate bismark alignments
                    - bismark2SAM_v4 to convert to SAM format
                    - samtools view to convert SAM to BAM format
                    - samtools sort to sort the BAM file
                    - use samtools pileup for pileup files

                    Please let me know if there are any problems with the new conversion script,

                    Many thanks,
                    Felix

                    Comment


                    • Bismark version 0.5.2 was just released. It fixes a bug with the methylation extractor when the option --ignore was specified and adds some convenience-related features. Also, it has been brought to my attention that tab characters in the read ID field cause Bowtie to truncate the read ID (this does not happen with spaces), which causes internal checks to fail. Bismark will now produce a warning if there are tab characters in the read ID, but since the dataset this was tried on was >4 years old and I have since not seen any read IDs containing tabs, we'll leave it up to the user to remove tabs in the input files before running Bismark.

                      The changes include:

                      - Increased the 'chunkmbs' default value to 256 MB (up from 64 MB)
                      - Bismark will now accept input files in both comma and space separated format
                      - Fixed a bug in the methylation extractor which resulted in offset positions for reverse reads when the option '--ignore' was used (single-end only)
                      - Included a check (and warning) whether the read IDs in the input files contain tab characters, as this will cause Bowtie to truncate the reads and result in no alignments


                      The Bismark download is available from the Bismark project page.

                      Comment


                      • We have just released Bismark version 0.5.3. This fixes some bugs in Bismark and the bismark2SAM and bismark2bedGraph conversion scripts.

                        The changes include:
                        - Increased the 'chunkmbs' default value to 512 MB (up from 256 MB)
                        - Corrected a mix-up of the strand names of the complementary strands in the alignment report for single-end alignments (see release notes)
                        - Fixed a bug in the genome_methylation_bismark2bedGraph script that was introduced during the 1-based (Bismark) to 0-based (bedGraph) coordinate adaptation in June 2011. Thanks to M.A. Bentley for his contributions to the new version.
                        - Improved the bismark2SAM script to more accurately describe the origin of a bisulfite strand in the bitwise FLAG field. Thanks to E. Vidal for his contributions to the new version.

                        All files are available from the Bismark project page.

                        Comment


                        • We have just released Bismark version 0.5.4. This version introduces some convenient new features and fixes some minor flaws in the ambiguous read and paired-end alignment report outputs.

                          The changes include:
                          - Bismark will now accept input files in either normal, uncompressed or gzipped format
                          - Added the option -o/--output_dir <dir> to Bismark which lets you specify the folder for all Bismark output files instead of writing into the same folder as the input file(s). If the output directory does not exist already it will be created first
                          - The path to the genome folder can now be absolute or relative (e.g. ../genomes/mouse/)
                          - Changed the way unmapped or ambiguous reads are reported so that one output file (and/or ambiguous read file) is generated per input file. Their name will be derived from the input file name. For paired-end samples, the unmapped or ambiguous filenames can be discriminated by _1 and _2 in their file names
                          - Added the number of sequences analysed in total to the paired-end report file (was only printed on screen previously)
                          - Fixed a bug for the FastQ output for ambiguous reads where quality scores were not followed by a new line

                          I would also like to stress that Bismark still requires the 'old' version of Bowtie (which is now called Bowtie 1) and does not (yet) work with Bowtie 2 which was released only yesterday.

                          All files are available from the Bismark project page.

                          Comment


                          • Hi Guys

                            I was trying to work with bismark for Paired end analysis, this is the command line I use:

                            ./bismark --path_to_bowtie /home/rini/bismark/bowtie-0.12.7 -q /home/rini/bismark/bowtie-0.12.7/genomes/ --fastq -1 /media/3TBpt1/bismark_intermediate_results/1725-SB-5_1_sequence.fastq -2 /media/3TBpt1/bismark_intermediate_results/1725-SB-5_2_sequence.fastq -o /home/rini/bismark/bismark_v0.5.4/


                            Now each time I get the same error:
                            Failed to write to /media/3TBpt1/bismark_intermediate_results/1725-SB-5_1_sequence.fastq_bismark_pe.txt: No such file or directory at ./bismark line 198, <$__ANONIO__> line 2.

                            But I am not trying to write to /media/3TBpt1/bismark_intermediate_results/1725-SB-5_1_sequence.fastq_bismark_pe.txt at all! I am trying to write to /home/rini/bismark/bismark_v0.5.4/....


                            Please help,

                            Rini

                            Comment


                            • Can you try launching Bismark from within the folder which contains your 2 sequence files, i.e.: /media/3TBpt1/bismark_intermediate_results/?

                              By the way you might as well wait another hour or so because I am going to realease a new version of Bismark very soon.

                              Comment


                              • Bismark v0.6.beta1 has just been released. This new version also supports gapped alginments and reports SAM output.

                                For questions or suggestions please see http://seqanswers.com/forums/showthr...9101#post59101.

                                Comment

                                Latest Articles

                                Collapse

                                • seqadmin
                                  Current Approaches to Protein Sequencing
                                  by seqadmin


                                  Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                                  04-04-2024, 04:25 PM
                                • seqadmin
                                  Strategies for Sequencing Challenging Samples
                                  by seqadmin


                                  Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                                  03-22-2024, 06:39 AM

                                ad_right_rmr

                                Collapse

                                News

                                Collapse

                                Topics Statistics Last Post
                                Started by seqadmin, 04-11-2024, 12:08 PM
                                0 responses
                                25 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 04-10-2024, 10:19 PM
                                0 responses
                                28 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 04-10-2024, 09:21 AM
                                0 responses
                                24 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 04-04-2024, 09:00 AM
                                0 responses
                                52 views
                                0 likes
                                Last Post seqadmin  
                                Working...
                                X