Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Originally posted by lancelothk View Post
    Hi fkrueger,

    I am currently trying to use bismark to analyse a huge BS-seq dataset in HPC environment. I am thinking to split the big fastq into smaller pieces, run bismark with each of them in one node, then merge back the BAM results. Do you have any suggestion how can I merge bismark reports? Do you have existing script to do this?

    Thanks.
    Hi Lanceloth,

    As it stands there is no stand-alone script to merge the mapping reports, but the code should pretty much all be there for it is used for --multicore core runs anyway. The two subroutines
    Code:
    'merge_individual_mapping_reports' and
    'read_alignment_report'
    should contain everything. Let me know if you need help merging these into a stand-alone script.

    Just as a short word of warning when you are trying to merge paired-end BAM files with samtools merge you need to make sure that the files are subsequently sorted by read name, otherwise the reads are not guaranteed to follow each other line by line. Would maybe the --multicore option be a little more feasible?

    Best, Felix

    Comment


    • Thank you so much Felix. I will take a look at the source code.

      Comment


      • Hi fkrueger,

        I found two minor issues in bismark v0.14.3.
        The deduplicate_bismark will give errors with --representative option:
        Failed to close output filehandle: Bad file descriptor
        Failed to close report filehandle: Bad file descriptor

        I found out that it is caused by a bug in line:548. The } should be after two close lines, since OUT and REPORT have been closed in deduplicate_representative().

        The -B/--basename <basename> option can be found in the script, but not in pdf version manual.

        BTW, I finished extracting reports merging code into stand alone script. The most painful part is the global variable...

        Thanks.

        Comment


        • Thanks for pointing out these issues. I have updated the manual and removed the superfluous closing statements. They will find their way into a new release which we'll be releasing later today.

          Edit: Just as a quick word of warning: the --representative mode is almost certainly not what what you want to use because it will find the most highly amplified and thus biased sequence for a given position instead of a random. I will probably hide it from use in the next release...
          Last edited by fkrueger; 08-19-2015, 01:19 AM.

          Comment


          • Bismark v0.14.4. New functionality and allele-specific alignment support

            We have just released a new Bismark version (v0.14.4). This brings a few convenience features, adds some options and also fixes some bugs, further details are outlined below.

            It is also worth mentioning that it should now be possible to use Bismark in conjunction with SNPsplit to align Bisulfite-Seq data in an allele-specific fashion against an N-masked genome if both genotypes are known. More information about this may be found on the SNPsplit project page.

            o Bismark: Changed the FLAG values of paired-end alignments to the CTOT or CTOB strands so that reads can be properly displayed in SeqMonk when imported as BAM files. This change affects only paired-end alignments in --pbat or --non_directional mode. In detail we simply swapped the Read 1 and Read 2 FLAG values round so reads now resemble exactly concordant read pairs to the OT or OB strands. Note that results produced by the methylation extractor or further downstream of that are not affected by this change
            o Bismark: Input files specified with filepath information for FastA files are now handled properly in --multicore runs (this was fixed only for FastQ files in the previous patch)
            o Bismark: Unmapped and ambiguous files (options --unmapped and --ambiguous) are now written out as gzip compressed files by default
            o Bismark: Changed the default mode of operation to --bowtie2. Bowtie (1) alignments may still be chosen using the option --bowtie1

            o Bismark Genome Preparation: Changed the execution of the genome indexing of the parent process to system() rather than an exec() call since this seemed to lead to interesting faults when run in a pipeline setting
            o Bismark Genome Preparation: Changed the default indexing mode to --bowtie2. Bowtie (1) indexing is still available via the option --bowtie1

            o bismark2bedGraph: The coverage (.cov) and bedGraph (.bedGraph) files are now written out as gzip compressed files by default

            o coverage2cytopsine: Added new option '--gc/--gc_context' to reprocess the genome to find methylation in GpC context. This might be useful for specialist applications where GpC methylases had been employed. The output format is exactly the same as for the normal cytosine report, and only positions covered by at least one read are reported (output file ends in .GpC_report.txt). In addition this will write out a Bismark coverage file (ending in GpC.cov)

            o deduplicate_bismark: Removed redundant closing statements to get rid of warning messages
            o deduplicate_bismark: The option --representative is no longer displayed in the help text. The option was once useful to determine the PCR bias that had been introduced by over digestion with bisulfite and is nearly always not what should be used for deduplication (it will be left in and is still functional for the time being though)

            Bismark is available from the Babraham Bioinformatics project page.

            Comment


            • I found one more bug in deduplicate_bismark. It is also in v0.14.4.
              There are several calls of samtools directly use "samtools" instead of using $samtools_path. E.g. line 269, line 207.

              Comment


              • Originally posted by lancelothk View Post
                I found one more bug in deduplicate_bismark. It is also in v0.14.4.
                There are several calls of samtools directly use "samtools" instead of using $samtools_path. E.g. line 269, line 207.
                Thanks for spotting that, I've fixed all these calls.

                Comment


                • error with seedlen &gt; 32

                  I get an error from bowtie2 when I try to define seed length of 50 in bismark. I haven't found any mention of this problem elsewhere nor mention of seedlen limits in the bowtie2 manual. Particularly, it seems strange given the recommended "typical' settings are for a seed length of 50 in the bismark manual. Can anyone help me to trace the source of this error?

                  Using Bowtie 2 index: /home/tair10/Bisulfite_Genome/CT_conversion/BS_CT

                  Error: -L argument must be <= 32; was50
                  Error: Encountered internal Bowtie 2 exception (#1)
                  Command: /cm/shared/apps/bowtie/2-2.1.0/bowtie2-align --wrapper basic-0 -q -N 1 -L 50 --score-min L,0,-0.2 --ignore-quals --norc -x /home/tair10/Bisulfite_Genome/CT_conversion/BS_CT -U 4AB_trimmed_r1.fastq_C_to_T.fastq
                  bowtie2-align exited with value 1
                  The alignment does seem to work when no seedlen is defined. Here is a sample of a read from the relevant fastq, you will note the read length is 50bp but I don't think this is relevant since the error says -L must be <= 32.

                  @HWI-D00458:73:C6EBDANXX:1:1101:1728:1972 1:N:0:GCTCTA_A
                  AGCGTGGTTTATTGATTTTTTAGATTTTCGGAATTTGAAGTTAGAGGTGT
                  +
                  CG>EFF<EFECE>D1<111@/FG>CFGGGG///0=1:FGGD1FE1FGEBG
                  P.S. This is my first comment in the forum (though I have been stalking this place for years) so I apologise if it is out of place.
                  Last edited by marcusmchale; 09-01-2015, 10:10 AM.

                  Comment


                  • If you type bowtie2 --help you can find the following text:

                    Code:
                    -L <int>           length of seed substrings; must be >3, <32 (22)
                    Obviously this is not mentioned in the manual but you need to keep the seed substrings in the range of 3 to 32. The default is 22. I hope this helps, Felix

                    Comment


                    • Thanks for the prompt reply, the manual for bismark suggests the following command:

                      bismark -n 1 -l 50 /data/genomes/homo_sapiens/GRCh37/ test_dataset.fastq
                      Which would call bowtie2 to use "-L 50".

                      Is there something I'm missing?

                      Oh, it's because of the differences in alignment strategy between bowtie1/bowtie2. Thanks for the lead!
                      Last edited by marcusmchale; 09-01-2015, 10:52 AM.

                      Comment


                      • Oh it seems I need to update the manual because we very recently changed the default aligner to Bowtie 2, and the command in the manual still refers to bowtie1 (if you use --bowtie1 you can use the command as in the manual). I'll have this changed soon, thanks for spotting this.

                        If you want to run the test dataset just leave out all options and try using the defaults. Best, Felix

                        Comment


                        • genome preparation

                          hi,
                          I am trying to run bismark genome preparation but unable to do so.
                          I have bismark v 14.5 unzipped folder on server and have bowtie-2.2.2.6 version unzipped folder and genome files for human grch38- all these three folders in one folder. Do i need to run any installation step for bismark/bowtie before i run genome preparation ?

                          I am new to methylation analysis so will be great if you could please help.

                          thanks in advance.

                          Comment


                          • Bismark just needs to be extracted as is outlined step by step in the manual (http://www.bioinformatics.babraham.a...User_Guide.pdf). I believe Bowtie 2 also only needs to be unzipped, then either you place the bowtie2 executable in the PATH (just google how to do this), or you specify the path with --path_to_bowtie in Bismark.

                            All other steps including the genome preparation (
                            Code:
                            bismark_genome_preparation [options] <path_to_genome_folder>
                            ) are explained in detail in the manual, this protocol, or this methylation analysis course. Good luck, Felix

                            Comment


                            • Hi,

                              I am unable to run the bismark_genome_preparation step yet.
                              I get an error "Command not found'.
                              Any idea? I am trying since yesterday, not sure what am i doing wrong?

                              Comment


                              • Originally posted by daanum View Post
                                Hi,

                                I am unable to run the bismark_genome_preparation step yet.
                                I get an error "Command not found'.
                                Any idea? I am trying since yesterday, not sure what am i doing wrong?

                                I admire your perseverance but you might want to consider doing a basic Linux operation tutorial, I think you might benefit.

                                Here you've got a couple of options:
                                1) either you move to the folder containing the Bismark installation and then run ./bismark_genome_preparation (./ prepends the path to the current genome)
                                2) you can type /path/to/Bismark/bismark_genome_preparation which should work from anywhere.

                                Comment

                                Latest Articles

                                Collapse

                                • seqadmin
                                  Choosing Between NGS and qPCR
                                  by seqadmin



                                  Next-generation sequencing (NGS) and quantitative polymerase chain reaction (qPCR) are essential techniques for investigating the genome, transcriptome, and epigenome. In many cases, choosing the appropriate technique is straightforward, but in others, it can be more challenging to determine the most effective option. A simple distinction is that smaller, more focused projects are typically better suited for qPCR, while larger, more complex datasets benefit from NGS. However,...
                                  10-18-2024, 07:11 AM
                                • seqadmin
                                  Non-Coding RNA Research and Technologies
                                  by seqadmin




                                  Non-coding RNAs (ncRNAs) do not code for proteins but play important roles in numerous cellular processes including gene silencing, developmental pathways, and more. There are numerous types including microRNA (miRNA), long ncRNA (lncRNA), circular RNA (circRNA), and more. In this article, we discuss innovative ncRNA research and explore recent technological advancements that improve the study of ncRNAs.

                                  Nobel Prize for MicroRNA Discovery
                                  This week,...
                                  10-07-2024, 08:07 AM

                                ad_right_rmr

                                Collapse

                                News

                                Collapse

                                Topics Statistics Last Post
                                Started by seqadmin, Yesterday, 05:31 AM
                                0 responses
                                10 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 10-24-2024, 06:58 AM
                                0 responses
                                20 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 10-23-2024, 08:43 AM
                                0 responses
                                50 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 10-17-2024, 07:29 AM
                                0 responses
                                58 views
                                0 likes
                                Last Post seqadmin  
                                Working...
                                X