Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • problems with bismark2bedGraph and coverage2cytosine to get methylations extracted

    Hi everyone,

    I am analysing WGBS data with Bismark v0.14.5. I have trimmed, aligned the data with Bowtie and deduplicated with no issues. However, I am having problems to extract the methylations. My genome is in 47,100 scaffolds. The command I am using is:

    bismark_methylation_extractor -p --comprehensive --merge_non_CpG --samtools_path /opt/samtools-0.1.19 --genome_folder /path_to_genome/Bisulfite_Genome_BowtieOne --buffer_size 10G --report --bedGraph --cytosine_report --scaffolds --gzip --multicore 3 -o /path/file.fastq.gz_bismark_pe.deduplicated.sam

    I get proper CpG_context_file.fastq.gz_bismark_pe.deduplicated.txt.gz and Non_CpG_context_file.fastq.gz_bismark_pe.deduplicated.txt.gz files, see the first lines for an example:

    Bismark methylation extractor version v0.14.4
    HWI-ST539:249:C7BDRACXX:6:1101:1460:1903_1:N:0:GCCAAT + scaffold.s31344 999331 Z
    HWI-ST539:249:C7BDRACXX:6:1101:1460:1903_1:N:0:GCCAAT - scaffold.s31344 999181 z
    HWI-ST539:249:C7BDRACXX:6:1101:1586:1973_1:N:0:GCCAAT - scaffold.s10570 115561 z
    HWI-ST539:249:C7BDRACXX:6:1101:1586:1973_1:N:0:GCCAAT + scaffold.s10570 115578 Z
    HWI-ST539:249:C7BDRACXX:6:1101:1586:1973_1:N:0:GCCAAT + scaffold.s10570 115590 Z
    HWI-ST539:249:C7BDRACXX:6:1101:1586:1973_1:N:0:GCCAAT + scaffold.s10570 115616 Z
    HWI-ST539:249:C7BDRACXX:6:1101:1586:1973_1:N:0:GCCAAT + scaffold.s10570 115624 Z
    HWI-ST539:249:C7BDRACXX:6:1101:1586:1973_1:N:0:GCCAAT + scaffold.s10570 115639 Z
    HWI-ST539:249:C7BDRACXX:6:1101:1586:1973_1:N:0:GCCAAT + scaffold.s10570 115632 Z

    However, it does not convert properly into bed files (I get and empty file) and the cytosine reports I get is like this, with no methylation at all (columns 3 and 4 are all 0’s):

    scaffold.s00001 45 + 0 0 CG CGC
    scaffold.s00001 46 - 0 0 CG CGT
    scaffold.s00001 49 + 0 0 CG CGG
    scaffold.s00001 50 - 0 0 CG CGT
    scaffold.s00001 1095 + 0 0 CG CGT
    scaffold.s00001 1096 - 0 0 CG CGG
    scaffold.s00001 1481 + 0 0 CG CGC
    scaffold.s00001 1482 - 0 0 CG CGG
    scaffold.s00001 1560 + 0 0 CG CGC
    scaffold.s00001 1561 - 0 0 CG CGG



    I have tried to do things step by step, but I get the same result. I have been working on this for more than a week now and I do not find were is the error. Could someone help me with this?

    Thanks a lot in advance!

    Begoña

    Comment


    • Hi Begona,

      It looks like the bismark2bedGraph step is somehow failing, can you check the error logs (or what appears on screen) to see what is going wrong exactly?

      The CpG report simply puts the coverage file into genomic context, so if the coverage file is empty then the CpG report will show 0 0 only as well.

      I'm happy to look at this in more detail, you can also send me email with the error logs. Best, Felix

      Comment


      • It appears that GZIP-compressed input files were streamed directly into the Unix sort command (e.g. when using the option --scaffolds/--gazillion), but sort cannot read compressed files and thus it would produce an empty output. I have opened an issue of Github for that (https://github.com/FelixKrueger/Bismark/issues/9) and fixed the way GZIP compressed files are streamed to sort and it seems to work fine on my end. The latest version can be cloned straight from Github.

        Comment


        • problems with bismark2bedGraph and coverage2cytosine to get methylations extracted

          Hi Felix and everyone,

          Thanks a lot for your help, Felix, the problem is solved.
          Otherwise, I have and additional issue related to the sorting of the CpG_context and Non_CpG_context files in my operating system. I am reporting it in case someone else can be affected and to ask for your opinion.

          I get an error when sorting these files (this is done within the script bismark2bedgraph). The line that does the sorting in the script is the following one:

          open $ifh, "sort -S $sort_size -T $sort_dir -k3,3V -k4,4n $in |" or die "Input file could not be sorted. $!\n";

          I have solved the issue sorting just with "3,3". Otherwise, I am still trying to confirm that this is not affecting the final results. If anyone can give a hint on this, I would greatly appreciate.

          Begoña

          Comment


          • Bismark Bug v0.14.4 --remove_spaces in bismark_methylation_extractor.

            I think there is some bugs in the option of --remove_spaces in bismark_methylation_extractor.


            The error is as the following:



            Changed directory to /media/LTS_33T/SG_LTS33T/monod/haib/methyfreq/
            Now replacing whitespaces in the sequence ID field of the Bismark methylation extractor output /media/LTS_33T/SG_LTS33T/monod/haib/methyfreq/CpG_context_ENCFF000LWP_trimmed.fq_bismark_bt2.txt prior to bedGraph conversion

            Couldn't write to file /media/LTS_33T/SG_LTS33T/monod/haib/methyfreq/CpG_context_ENCFF000LWP_trimmed.fq_bismark_bt2.txt.spaces_removed.txt: No such file or directory
            Finished BedGraph conversion ...

            Comment


            • Originally posted by Tlexander View Post
              I think there is some bugs in the option of --remove_spaces in bismark_methylation_extractor.


              The error is as the following:



              Changed directory to /media/LTS_33T/SG_LTS33T/monod/haib/methyfreq/
              Now replacing whitespaces in the sequence ID field of the Bismark methylation extractor output /media/LTS_33T/SG_LTS33T/monod/haib/methyfreq/CpG_context_ENCFF000LWP_trimmed.fq_bismark_bt2.txt prior to bedGraph conversion

              Couldn't write to file /media/LTS_33T/SG_LTS33T/monod/haib/methyfreq/CpG_context_ENCFF000LWP_trimmed.fq_bismark_bt2.txt.spaces_removed.txt: No such file or directory
              Finished BedGraph conversion ...
              Thanks for reporting this, could you please also post the exact command you used when you called the methylation extractor so I can reproduce it more easily?

              Comment


              • When I try to run bismark2bedGraph from a directory that doesn't directly contain the input file it fails to find the input file, even though I've specified the path to the input file. For example, if I'm in the directory that contains the input file and use this command it works properly (note that I've taken the output from the methylation extractor and split it by chromosome for batch processing but the same thing happens if I use the original unsplit file):

                bismark2bedGraph -o ./output/chr10.bg ./CpG_chr10.txt
                If I move up a directory and run this command it doesn't work:

                bismark2bedGraph -o ./directory/output/chr10.bg ./directory/CpG_chr10.txt
                The programs ends with:

                Using the following files as Input:
                CpG_chr10.txt

                Writing bedGraph to file: ./directory/output/chr10.bg.gz
                Also writing out a coverage file including counts methylated and unmethylated residues to file: ./directory/output/chr10.bg.gz.bismark.cov.gz

                Couldn't find file 'CpG_chr10.txt': No such file or directory

                Comment


                • Thanks for reporting this, I have filed an issue on the Bismark GitHub page and will address it as soon as I find some time. Cheers, Felix

                  Comment


                  • Mapping SE and PE reads

                    Hi,

                    I have PE reads, some 20% of which overlap. I usually overlap these reads before mapping, so that I have the following files:

                    Non-overlappling_reads_1.fastq
                    Non-overlappling_reads_2.fastq
                    Merged_overlappling_reads.fastq

                    I would submit those reads to bismark as:

                    bismark -1 Non-overlappling_reads_1.fastq -2 Non–overlappling_reads_2.fastq Merged_overlappling_reads.fastq

                    ….but then my resulting BAM-file contains both SE and PE reads, so do I use the –p or –s flag on bismark_methylation_extractor ?

                    Also, from the results it doesn't really look like the Merged_overlappling_reads.fastq actually gets read.

                    Or I could run bismark AND bismark_methylation_extractor twice,; once for SE and one for PE – but then at what point do I merge the results?
                    Last option – just not do the overlap, but feed in all PE as is, and use: bismark_methylation_extractor —include_overlap and hope that all will be well. But then I loose the SE reads.

                    So many options, but hopefully only one optimal solution!
                    Very grateful for your advice!

                    Cheers,
                    Magdalena

                    Comment


                    • Hi Magdalena,

                      Code:
                      bismark -1 Non-overlappling_reads_1.fastq -2 Non–overlappling_reads_2.fastq Merged_overlappling_reads.fastq
                      does not work in the way you think it will as it would really only do PE alignments of the overlapping reads. So if you wanted to split this up manually (but why?) then you can run the PE on non-overlapping reads first and the SE on the overlapping reads, and then run a PE and SE methylation extraction separately. If you wanted to you could then merge the data again for the bismark2bedGraph step, just feed in all the CpG* files from both PE and SE mapping.

                      I am not quite sure however if merging and making things complicated isn’t exactly doing exactly what the methylation extractor is doing anyway: mapping non-overlapping reads and getting the information from both reads, and only getting the information once from overlapping reads because of the --no_verlap option (isn’t this the same as your ‘merghing’ step?)

                      Comment


                      • Hi,

                        yep, when I benchmarked that code I get same results if I include the SE or not.

                        I'm a bit keen to also use SE reads, as I after adapter trimming get improved mapping results, but some 3% of reads are after adapter trimming unpaired. I guess I could chuck them away, but I'll have a go with your suggestion:
                        to merge them in bismark2bedGraph.

                        Every little helps ..... :-)

                        Many thanks for your rapid reply!

                        Comment


                        • Originally posted by biocomputer View Post
                          When I try to run bismark2bedGraph from a directory that doesn't directly contain the input file it fails to find the input file, even though I've specified the path to the input file. For example, if I'm in the directory that contains the input file and use this command it works properly (note that I've taken the output from the methylation extractor and split it by chromosome for batch processing but the same thing happens if I use the original unsplit file):



                          If I move up a directory and run this command it doesn't work:



                          The programs ends with:
                          I have just been looking into this and can't reproduce the error. I then went back to the Release Notes and found this one for version 0.14.0:

                          bismark2bedGraph: Fixed path handling for cases where the input files were given with path information and an output directory had been specified as well

                          Did you by any chance run an older version than that?

                          Comment


                          • I'm using 0.14.4:

                            $ bismark2bedGraph --version


                            Bismark Methylation Extractor Module -
                            bismark2bedGraph

                            Bismark Extractor Version: v0.14.4
                            Copyright 2010-15 Felix Krueger, Babraham Bioinformatics

                            Comment


                            • Right, I believe it is now fixed, the latest version can be downloaded here: https://github.com/FelixKrueger/Bismark/issues/18

                              Comment


                              • Hi!Can anyone explain to me the differences between bisulfite-seq, WGBS, RRBS and COBRA library reads?thanks in advance!

                                Comment

                                Latest Articles

                                Collapse

                                • seqadmin
                                  Current Approaches to Protein Sequencing
                                  by seqadmin


                                  Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                                  04-04-2024, 04:25 PM
                                • seqadmin
                                  Strategies for Sequencing Challenging Samples
                                  by seqadmin


                                  Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                                  03-22-2024, 06:39 AM

                                ad_right_rmr

                                Collapse

                                News

                                Collapse

                                Topics Statistics Last Post
                                Started by seqadmin, 04-11-2024, 12:08 PM
                                0 responses
                                18 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 04-10-2024, 10:19 PM
                                0 responses
                                22 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 04-10-2024, 09:21 AM
                                0 responses
                                16 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 04-04-2024, 09:00 AM
                                0 responses
                                47 views
                                0 likes
                                Last Post seqadmin  
                                Working...
                                X