Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • This may become moot as bowtie2 can now support large genomes so separate alignments for different strands of the genome could be a thing of the past and simplify bisulfite alignment quite a bit! Of course this will take some work upgrading the various bisulfite alignment packages to be compatible with the new bowtie.

    Comment


    • If you need MAPQ values then you have a number of different options, including:
      1. BSmooth: Like bismark, it's a perl wrapper around bowtie2. It's always been slower than bismark when I've used it. It simply passes through the MAPQ of the best alignment, rather than recalculating it.
      2. bwa-meth: A python wrapper around bwa mem that happens to work like frozenlyse just mentioned (i.e., it concatenates two versions of the converted genome together and then aligns against that in one go). Only supports directional and paired-end libraries.
      3. Bison: A C wrapper around bowtie2 that otherwise functions similar to bismark, but recalculates MAPQ scores according to alignments to different strands using the same MAPQ algorithm in bowtie2. The setup can be non-trivial, so I wouldn't bother for a small one-off dataset (you might try bwa-meth in that case, unless you have a non-directional library).


      There are a number of others out there in addition to these, of course.
      Last edited by dpryan; 03-13-2014, 01:52 AM.

      Comment


      • Devon, can the calculation of MAPQ be summarized in a straight forward formula (described in detail in the Bowtie2 documentation?) or does one have to delve into the source code to find it? It shouldn't be hard to implement since all the mapping values are available and being processed anyway...

        Comment


        • You have to go through the source code to find it, unfortunately. It would seem straight-forward enough to add to bismark (more or less, there are some oddities so implementing it in perl might produce ever so slightly different results, though they'd be close enough). You can find a C version of the algorithm here (starting on line 109). It's never been clear to me how they came up with that MAPQ algorithm, so don't ask me to explain why it is the way it is

          Comment


          • Cheers for that. It doesn't look overly complicated (and only slightly random), I might have a go at this next time I am running out of other things to do

            Comment


            • Originally posted by fkrueger View Post
              I might have a go at this next time I am running out of other things to do
              Hah! I suppose you'll retire at some point... :P

              Comment


              • Originally posted by dpryan View Post

                bwa-meth: A python wrapper around bwa mem that happens to work like frozenlyse just mentioned (i.e., it concatenates two versions of the converted genome together and then aligns against that in one go). Only supports directional and paired-end libraries.
                it does support single-end as well, but yeah, only directional for now.

                glad to see bismark outputting the sorted header, that does simplify things for downstream analyses.

                what filtering to bismark/bison methylation extractors do?

                Comment


                • I can't speak to what bismark does at the moment, but bison filters by (1) read/pair MAPQ (2) base phred score and (3) position in a read (to ignore biased regions). There's also some filtering in paired-end reads if the pairs disagree on a methylation call (this is relatively rare).

                  Edit: I forgot to mention that flagged PCR duplicates are ignored.
                  Last edited by dpryan; 03-15-2014, 09:43 AM. Reason: Forgot PCR duplicates

                  Comment


                  • Hi Felix,

                    A seemingly harmless error is raised right at the end of running the methylation extractor with the --mbias_only flag (v0.10.1):

                    bismark_methylation_extractor -p --mbias_only sampleName.bam
                    # Various
                    # Lines
                    # Of
                    # Output
                    Failed to read from methylation extractor output file CpG_OT_sampleName.txt: No such file or directory

                    As far as I can tell nothing has actually gone wrong and there are no "orphan" files left behind by the methylation extractor.

                    Cheers,
                    Pete

                    Comment


                    • Hi Pete,
                      I have just fixed this, it should never actually attempt to look for files in the --mbias_only mode. Cheers, Felix

                      Comment


                      • We just released a new version of Bismark (v0.11.1) which adds the following functionality/fixes:

                        o Bismark: The option --pbat now also works for use with Bowtie 2, in both single-end and paired-end mode. The only limitation to this is that it only works with FastQ files and uncompressed temporary files
                        o Bismark: Changed the order in which the @SQ lines are written out to the SAM/BAM header from random to the same order they are being read in from the genomes folder (or the order of the files in which they occur within a multi-FastA file)
                        o Bismark: Included a new option '-B/--basename basename' for output files instead of deriving these names from the input file. --basename takes precedence over the option --prefix.
                        o Bismark: Unmapped or ambiguous files now end in .fq or.fa for FastA or FastQ files, respectively (instead of .txt files)
                        o Methylation extractor: willl no longer attempt to delete unused files if --mbias_only was speficied
                        o Methylation extractor: Added a test to see if a file that does not end in .bam is in fact a BAM file, and if this succeeds open the file using Samtools view

                        Bismark can be downloaded here: www.bioinformatics.babraham.ac.uk/projects/

                        Comment


                        • Hi Felix,

                          I took a crack at adding MAPQ values to Bismark. Following Devon's suggestion, I tried to apply the same MAPQ calculation used in bowtie2. I only made changes to paired-end mode.

                          The attached Perl script was modified from bismark-0.11.1. Let me know what you think if you have time and interest.

                          Cheers,
                          Andrew
                          Attached Files

                          Comment


                          • Originally posted by adeirossi View Post
                            Hi Felix,

                            I took a crack at adding MAPQ values to Bismark. Following Devon's suggestion, I tried to apply the same MAPQ calculation used in bowtie2. I only made changes to paired-end mode.

                            The attached Perl script was modified from bismark-0.11.1. Let me know what you think if you have time and interest.

                            Cheers,
                            Andrew
                            Hi Andrew,

                            Thanks a lot for having a go at this, I'll try to have a look next week and implement it for the next release. Let's hope this serves as an incentive for me to extend it to single-end mode as well.

                            Best, Felix

                            Comment


                            • Hi Andrew,

                              Thanks very much for your initiative with the MAPQ values! I have reviewed your suggestions and extended their implementation to single-end mode (for use with Bowtie 2 only). I have uploaded a tentative version 0.12.1 (attached), would you mind testing it to see if it works as expected?

                              Cheers, Felix
                              Attached Files

                              Comment


                              • Hi Felix,

                                Thank you for taking a look and extending the MAPQ value reporting to single-end mode so quickly. I have tested your bismark.pl script in both paired- and single-end modes and it works as expected.

                                There is, however, one minor issue that does not affect the SAM output. (It was not caused by the recent changes; it happens in v0.11.1 as well.) Occasionally I see warnings like this:
                                Use of uninitialized value in numeric ne (!=) at ./bismark line 3201, <$__ANONIO__> line 578.
                                Chromosomal sequence could not be extracted for SEQUENCER12:47:H7FFHADXX:1:1206:6069:83277_1:N:0:AGATGT J02459.1 107

                                The issue occurs in paired-end mode when read 2 maps to the first base of a chromosome (in this case, lambda DNA). When this happens, extract_corresponding_genomic_sequence_paired_ends_bowtie2() returns without defining $methylation_call_params->{$identifier}->{unmodified_genomic_sequence_1}, which is then tested on line 3201, resulting in the "uninitialized value" warning and the printing of position_1 instead of position_2. I added one line to your bismark.pl script (see attached) to fix this behavior; now only (what I assume to be) the intended warning is printed, showing that position_2 is equal to 1:
                                Chromosomal sequence could not be extracted for SEQUENCER12:47:H7FFHADXX:1:1206:6069:83277_1:N:0:AGATGT J02459.1 1

                                Can you confirm that this change is an improvement?

                                Thanks,
                                Andrew
                                Attached Files

                                Comment

                                Latest Articles

                                Collapse

                                • seqadmin
                                  Current Approaches to Protein Sequencing
                                  by seqadmin


                                  Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                                  04-04-2024, 04:25 PM
                                • seqadmin
                                  Strategies for Sequencing Challenging Samples
                                  by seqadmin


                                  Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                                  03-22-2024, 06:39 AM

                                ad_right_rmr

                                Collapse

                                News

                                Collapse

                                Topics Statistics Last Post
                                Started by seqadmin, 04-11-2024, 12:08 PM
                                0 responses
                                25 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 04-10-2024, 10:19 PM
                                0 responses
                                28 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 04-10-2024, 09:21 AM
                                0 responses
                                24 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 04-04-2024, 09:00 AM
                                0 responses
                                52 views
                                0 likes
                                Last Post seqadmin  
                                Working...
                                X