Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Hi Sung,

    I am now off for weekend in Wells-next-the-Sea but to quickly answer your questions: to 1) Frank Wessely has come up with this table, so you need to ask him really. To 2) No, the BAM file does not store the AS, but it shouldn't be hard to get it to report it. To 3) The MAPQ value is calculated using the same algorithm that Bowtie 2 is using (or at least the Perl adaptation of the C code). More about this can be found further up here in the thread.
    Cheers, Felix

    Comment


    • Hi Felix,

      Surprisingly, replacement of the XX tag with the MD tag seems to have caused an error in my use of Bismark. Here is a bismark_methylation_extractor error that emerged when I upgraded from bismark-0.12.2 to 0.12.3:
      $ perl bismark_methylation_extractor --paired-end --no_overlap --output STD --report --samtools_path /mnt/ngs/analysis/software/samtools-0.1.19 --mbias_only 99_26_1_GATCAG_003.bam

      *** Bismark methylation extractor version v0.12.3 ***

      Summarising Bismark methylation extractor parameters:
      ===============================================================
      Bismark paired-end SAM format specified (default)
      Output path specified as: STD/


      Now testing Bismark result file meth/results/A73406CXU/99_26_1_GATCAG_003/99_26_1_GATCAG_003.bam for positional sorting (which would be bad...) ...passed!

      Now reading in Bismark result file meth/results/A73406CXU/99_26_1_GATCAG_003/99_26_1_GATCAG_003.bam

      skipping SAM header line: @HD VN:1.4 GO:none SO:queryname
      ...
      Unexpected combination of read and genome conversion: CT
      / CT

      From the error message above, $first_read_conversion (the value of the XR tag) is equal to "CT\n", and the trailing newline renders it unrecognizable. Currently, bismark_methylation_extractor removes the newline only when the line ends with the XG tag; when the line ends with the XR tag, it is not removed. For most people, this is probably a non-issue because BAM files directly output by bismark never have lines ending with the XR tag. However, I use Picard to manipulate BAM files between alignment and extraction, and that is where the trouble comes in.

      Picard AddOrReplaceReadGroups (v1.93) has the unfortunate behavior of reordering the tags in the input BAM file. Under bismark-0.12.2, it changed the order from "NM XX XM XR XG" to "RG XG NM XM XR XX". Here, there was no problem because the XX tag was not used by bismark_methylation_extractor. But under bismark-0.12.3 with the replacement of the XX tag with the MD tag, Picard now changes the tag order from "NM MD XM XR XG" to "MD RG XG NM XM XR". With the XR tag occurring last, its value ends in an unchomped newline and causes the error.

      Though Picard is not without blame here, I think it might be generally more robust for bismark_methylation_extractor not to depend on a specific order of tags. To this end, I have attached a modification for your review that should ensure consistent chomping of the newline from the SAM input stream.

      Thanks,
      Andrew
      Attached Files

      Comment


      • Hi Andrew, thanks for breaking the methylation extractor in yet another completely new way! I have now implemented the new and earlier chomping, and it will be available in the next release. Cheers, Felix

        Comment


        • Bismark v0.12.4 released. This release addresses several issues, most importantly the handling of which read alignments count as ambiguous in Bowtie 2 mode. This change might improve overall mapping efficiency slightly, so Bowtie 2 users are highly encouraged to upgrade to the latest version which is available here: https://www.bioinformatics.babraham....jects/bismark/!

          Bismark: Improved the way ambiguous alignments are handled in Bowtie 2 mode. Previously, sequences were classified as ambiguously aligning as soon as a sequence produced several equally good alignments within the same alignment thread. Under certain circumstances however there may exist equally good alignments within the same alignment thread, but the sequence might have a better (unique) alignment in another thread. Such a unique alignment will now trump the ambiguous alignment as it should
          Bismark: Got rid of 2 warning messages of MD-tag information for reads containing deletions (Bowtie 2 mode only) which accidentally made it through to the release
          Bismark: Added '-x' to the invocation of Bowtie 2 for FastA sequences so that it works again (It used to work previously only because Bowtie 2 did not check it properly and automatically used bowtie2-align-s, but now it does check...)
          Methylation Extractor: Line endings are now chomped at an earlier stage so that interfering with the optional fields in the Bismark BAM file doesn't break the methylation extractor (e.g. reordering of optional tags by Picard)

          Comment


          • We had to add one additional check to the way ambiguous alignments are handled in Bowtie 2 mode. In more detail this adds a test whether the current ambiguous alignment is worse than the best alignment so far, in which case the sequence does not get flagged as ambiguous. A hotfix (v0.12.5) is available for download now. Sorry for that.

            Comment


            • bismark_methylation_extractor

              Hi,
              I am wondering whether bismark_methylation_extractor is written to use multi threads? Any plan, if not?
              Thanks,
              Sung

              Comment


              • Originally posted by sunggong View Post
                Hi,
                I am wondering whether bismark_methylation_extractor is written to use multi threads? Any plan, if not?
                Thanks,
                Sung
                As it stands the extractor is only using a single core. There have been plans to extend this to multi-core usage for quite some time however I never quite got round to doing it. We were thinking however to lock ourselves away for a day or two in the not too distant future where we would all have a go at doing something that never gets done due to the day-to-day business, and multi-threading of Bismark is very high up on that list!

                For the time being you might want to split up the BAM file into several smaller chunks and launch them individually to gain some speed. Cheers, Felix

                Comment


                • Originally posted by fkrueger View Post
                  As it stands the extractor is only using a single core. There have been plans to extend this to multi-core usage for quite some time however I never quite got round to doing it. We were thinking however to lock ourselves away for a day or two in the not too distant future where we would all have a go at doing something that never gets done due to the day-to-day business, and multi-threading of Bismark is very high up on that list!

                  For the time being you might want to split up the BAM file into several smaller chunks and launch them individually to gain some speed. Cheers, Felix
                  I got this error while trying to run bismark_methylation_extractor by chromosome-wise (BAM files splitted by chromosome):
                  "This might be a result of sorting the paired-end SAM/BAM files by chromosomal position which is not compatible with correct methylation extraction. Please use an unsorted file instead".

                  It seems I need to provide unsorted BAM files, but I thought it would be great if bismark could deal with a sorted BAM file as this makes splitting the original BAM file by chromosome easier using samtools.

                  Comment


                  • Sorting the BAM file into smaller files by chromosome shouldn't matter, you only need to take care that you don't sort by position as well so that you do not disturb the Read 1/ Read 2 order of the file. Since read pairs should always align to the same chromosome you could probably resort the reads by name (samtools sort -n) in case they were sorted by position.
                    It is very unlikely that we will change the methylation extractor so that it will be able to cope with sorted or otherwise random R1/R2 order since this may be a huge (t)ask memory-wise because BAM files can easily reach several tens of GB in size.

                    Comment


                    • Hi Felix,

                      Sorry to perpetuate my reputation for breaking the methylation extractor, but I recently noticed an issue with the patch I submitted that was incorporated in the v0.12.2 release.

                      Background on the old patch: Running v0.12.1 of bismark_methylation_extractor, I experienced a failure when I used high --ignore or --ignore_r2 values that equaled or exceeded the alignment length. The old patch I came up with simply called "next" when it encountered this condition, and this indeed prevented the fatal error.

                      However, I recently noticed an unfortunate behavior: with --ignore set to 0 and --ignore_r2 set to a high value, when a read 2 alignment is encountered that is shorter than --ignore_r2, the read 1 alignment is discarded as well. Scrutinizing the code, the behavior is caused by the "next" call from the old patch. Since the while loop iterates over each read pair (not each read), calling "next" discards the whole read pair though only one of the two reads may be problematic.

                      I came up with a new patch that corrects this behavior of the old one (see attached): now when one read is too short for its "ignore" value, its mate is not automatically discarded. If you could take a look when you get the chance, that would be much appreciated. My apologies for not catching this the first time.

                      Andrew
                      Attached Files
                      Last edited by adeirossi; 08-01-2014, 03:37 PM. Reason: Fix attached script to not spew Perl warnings

                      Comment


                      • Hi Andrew,

                        I am currently away on holiday but I will definitely take a look once I am back. Thanks for your continued support of the methylationXtractor ;D

                        Comment


                        • Hi. I am just wondering if it is advisable to merge bam files produced by Bismark using samtools merge, and then run methyl_extrator?

                          The reason why I am asking this question is because I have hundreds of gzipped single-end fastq files and I am very reluctant to merge all the fastq files. The most convenient way is to run bismark on every fastq files, and merge the bamfiles before the running methyl_extrator.

                          Is it "legal"?

                          Comment


                          • Hi Wilson,

                            If you've got many small FastQ files it should be absolutely fine to merge the BAM files and then run the methylation extractor. This is at least true for single-end files, however we have observed that samtools merge might interfere with the Read1/Read2 order of paired-end files, and they might need to be sorted by read name afterwards (e.g. with samtools sort -n). Again, since you seem to have single-end files it should be fine.

                            The only (minor) inconvenience is that you won't be able to generate the swish html report for the entire sample with bismark2report because the sample is split into many small reports.

                            Comment


                            • Hi Felix,

                              I thought it would be really nice if 'methyl_extrator' have an option to search a specified interval only (e.g. -L chr2, -L chr2:1000-2000, or -L <target_region.bed>).

                              Thanks,
                              Sung

                              Comment


                              • Hi Sung,

                                This might be a useful feature but I think it would be more sensible to apply this on e.g. the coverage report which has all values sorted by position with percentage methylation and count methylated/unmethyalted already, rather than resticting the methylation extractor itself. It should be relatively straight forward to filter the .cov file with a small script and should only run for a few seconds, but I'll be off on holiday for a while now...

                                Comment

                                Latest Articles

                                Collapse

                                • seqadmin
                                  Current Approaches to Protein Sequencing
                                  by seqadmin


                                  Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                                  04-04-2024, 04:25 PM
                                • seqadmin
                                  Strategies for Sequencing Challenging Samples
                                  by seqadmin


                                  Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                                  03-22-2024, 06:39 AM

                                ad_right_rmr

                                Collapse

                                News

                                Collapse

                                Topics Statistics Last Post
                                Started by seqadmin, 04-11-2024, 12:08 PM
                                0 responses
                                25 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 04-10-2024, 10:19 PM
                                0 responses
                                28 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 04-10-2024, 09:21 AM
                                0 responses
                                24 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 04-04-2024, 09:00 AM
                                0 responses
                                52 views
                                0 likes
                                Last Post seqadmin  
                                Working...
                                X