Seqanswers Leaderboard Ad

**fkrueger** · 06-20-2014, 06:36 AM

Hi Sung,

I am now off for weekend in Wells-next-the-Sea but to quickly answer your questions: to 1) Frank Wessely has come up with this table, so you need to ask him really. To 2) No, the BAM file does not store the AS, but it shouldn't be hard to get it to report it. To 3) The MAPQ value is calculated using the same algorithm that Bowtie 2 is using (or at least the Perl adaptation of the C code). More about this can be found further up here in the thread.
Cheers, Felix

**adeirossi** · 06-26-2014, 05:00 PM

Hi Felix,

Surprisingly, replacement of the XX tag with the MD tag seems to have caused an error in my use of Bismark. Here is a bismark_methylation_extractor error that emerged when I upgraded from bismark-0.12.2 to 0.12.3:

$ perl bismark_methylation_extractor --paired-end --no_overlap --output STD --report --samtools_path /mnt/ngs/analysis/software/samtools-0.1.19 --mbias_only 99_26_1_GATCAG_003.bam

*** Bismark methylation extractor version v0.12.3 ***

Summarising Bismark methylation extractor parameters:
===============================================================
Bismark paired-end SAM format specified (default)
Output path specified as: STD/

Now testing Bismark result file meth/results/A73406CXU/99_26_1_GATCAG_003/99_26_1_GATCAG_003.bam for positional sorting (which would be bad...) ...passed!

Now reading in Bismark result file meth/results/A73406CXU/99_26_1_GATCAG_003/99_26_1_GATCAG_003.bam

skipping SAM header line: @HD VN:1.4 GO:none SO:queryname
...
Unexpected combination of read and genome conversion: CT
/ CT

From the error message above, $first_read_conversion (the value of the XR tag) is equal to "CT\n", and the trailing newline renders it unrecognizable. Currently, bismark_methylation_extractor removes the newline only when the line ends with the XG tag; when the line ends with the XR tag, it is not removed. For most people, this is probably a non-issue because BAM files directly output by bismark never have lines ending with the XR tag. However, I use Picard to manipulate BAM files between alignment and extraction, and that is where the trouble comes in.

Picard AddOrReplaceReadGroups (v1.93) has the unfortunate behavior of reordering the tags in the input BAM file. Under bismark-0.12.2, it changed the order from "NM XX XM XR XG" to "RG XG NM XM XR XX". Here, there was no problem because the XX tag was not used by bismark_methylation_extractor. But under bismark-0.12.3 with the replacement of the XX tag with the MD tag, Picard now changes the tag order from "NM MD XM XR XG" to "MD RG XG NM XM XR". With the XR tag occurring last, its value ends in an unchomped newline and causes the error.

Though Picard is not without blame here, I think it might be generally more robust for bismark_methylation_extractor not to depend on a specific order of tags. To this end, I have attached a modification for your review that should ensure consistent chomping of the newline from the SAM input stream.

Thanks,
Andrew

Attached Files

bismark_methylation_extractor-0.12.3-mod.pl (183.7 KB, 21 views)

**fkrueger** · 06-27-2014, 01:27 AM

Hi Andrew, thanks for breaking the methylation extractor in yet another completely new way!

I have now implemented the new and earlier chomping, and it will be available in the next release. Cheers, Felix

**fkrueger** · 07-21-2014, 04:52 AM

Bismark v0.12.4 released. This release addresses several issues, most importantly the handling of which read alignments count as ambiguous in Bowtie 2 mode. This change might improve overall mapping efficiency slightly, so Bowtie 2 users are highly encouraged to upgrade to the latest version which is available here: https://www.bioinformatics.babraham....jects/bismark/!

Bismark: Improved the way ambiguous alignments are handled in Bowtie 2 mode. Previously, sequences were classified as ambiguously aligning as soon as a sequence produced several equally good alignments within the same alignment thread. Under certain circumstances however there may exist equally good alignments within the same alignment thread, but the sequence might have a better (unique) alignment in another thread. Such a unique alignment will now trump the ambiguous alignment as it should
Bismark: Got rid of 2 warning messages of MD-tag information for reads containing deletions (Bowtie 2 mode only) which accidentally made it through to the release
Bismark: Added '-x' to the invocation of Bowtie 2 for FastA sequences so that it works again (It used to work previously only because Bowtie 2 did not check it properly and automatically used bowtie2-align-s, but now it does check...)
Methylation Extractor: Line endings are now chomped at an earlier stage so that interfering with the optional fields in the Bismark BAM file doesn't break the methylation extractor (e.g. reordering of optional tags by Picard)

**fkrueger** · 07-21-2014, 12:36 PM

We had to add one additional check to the way ambiguous alignments are handled in Bowtie 2 mode. In more detail this adds a test whether the current ambiguous alignment is worse than the best alignment so far, in which case the sequence does not get flagged as ambiguous. A hotfix (v0.12.5) is available for download now. Sorry for that.

**sunggong** · 07-23-2014, 03:49 AM

bismark_methylation_extractor

Hi,
I am wondering whether bismark_methylation_extractor is written to use multi threads? Any plan, if not?
Thanks,
Sung

**fkrueger** · 07-23-2014, 05:33 AM

Originally posted by sunggong View Post

Hi,
I am wondering whether bismark_methylation_extractor is written to use multi threads? Any plan, if not?
Thanks,
Sung

As it stands the extractor is only using a single core. There have been plans to extend this to multi-core usage for quite some time however I never quite got round to doing it. We were thinking however to lock ourselves away for a day or two in the not too distant future where we would all have a go at doing something that never gets done due to the day-to-day business, and multi-threading of Bismark is very high up on that list!

For the time being you might want to split up the BAM file into several smaller chunks and launch them individually to gain some speed. Cheers, Felix

**sunggong** · 07-25-2014, 02:10 AM

Originally posted by fkrueger View Post

As it stands the extractor is only using a single core. There have been plans to extend this to multi-core usage for quite some time however I never quite got round to doing it. We were thinking however to lock ourselves away for a day or two in the not too distant future where we would all have a go at doing something that never gets done due to the day-to-day business, and multi-threading of Bismark is very high up on that list!

For the time being you might want to split up the BAM file into several smaller chunks and launch them individually to gain some speed. Cheers, Felix

I got this error while trying to run bismark_methylation_extractor by chromosome-wise (BAM files splitted by chromosome):
"This might be a result of sorting the paired-end SAM/BAM files by chromosomal position which is not compatible with correct methylation extraction. Please use an unsorted file instead".

It seems I need to provide unsorted BAM files, but I thought it would be great if bismark could deal with a sorted BAM file as this makes splitting the original BAM file by chromosome easier using samtools.

**fkrueger** · 07-25-2014, 02:30 AM

Sorting the BAM file into smaller files by chromosome shouldn't matter, you only need to take care that you don't sort by position as well so that you do not disturb the Read 1/ Read 2 order of the file. Since read pairs should always align to the same chromosome you could probably resort the reads by name (samtools sort -n) in case they were sorted by position.
It is very unlikely that we will change the methylation extractor so that it will be able to cope with sorted or otherwise random R1/R2 order since this may be a huge (t)ask memory-wise because BAM files can easily reach several tens of GB in size.

**adeirossi** · 07-28-2014, 04:11 PM

Hi Felix,

Sorry to perpetuate my reputation for breaking the methylation extractor, but I recently noticed an issue with the patch I submitted that was incorporated in the v0.12.2 release.

Background on the old patch: Running v0.12.1 of bismark_methylation_extractor, I experienced a failure when I used high --ignore or --ignore_r2 values that equaled or exceeded the alignment length. The old patch I came up with simply called "next" when it encountered this condition, and this indeed prevented the fatal error.

However, I recently noticed an unfortunate behavior: with --ignore set to 0 and --ignore_r2 set to a high value, when a read 2 alignment is encountered that is shorter than --ignore_r2, the read 1 alignment is discarded as well. Scrutinizing the code, the behavior is caused by the "next" call from the old patch. Since the while loop iterates over each read pair (not each read), calling "next" discards the whole read pair though only one of the two reads may be problematic.

I came up with a new patch that corrects this behavior of the old one (see attached): now when one read is too short for its "ignore" value, its mate is not automatically discarded. If you could take a look when you get the chance, that would be much appreciated. My apologies for not catching this the first time.

Andrew

Attached Files

bismark_methylation_extractor-0.12.5-fix_ignore_behavior.pl (184.1 KB, 22 views)

**fkrueger** · 07-29-2014, 11:11 AM

Hi Andrew,

I am currently away on holiday but I will definitely take a look once I am back. Thanks for your continued support of the methylationXtractor ;D

**wilson90** · 08-06-2014, 06:00 PM

Hi. I am just wondering if it is advisable to merge bam files produced by Bismark using samtools merge, and then run methyl_extrator?

The reason why I am asking this question is because I have hundreds of gzipped single-end fastq files and I am very reluctant to merge all the fastq files. The most convenient way is to run bismark on every fastq files, and merge the bamfiles before the running methyl_extrator.

Is it "legal"?

**fkrueger** · 08-07-2014, 12:43 AM

Hi Wilson,

If you've got many small FastQ files it should be absolutely fine to merge the BAM files and then run the methylation extractor. This is at least true for single-end files, however we have observed that samtools merge might interfere with the Read1/Read2 order of paired-end files, and they might need to be sorted by read name afterwards (e.g. with samtools sort -n). Again, since you seem to have single-end files it should be fine.

The only (minor) inconvenience is that you won't be able to generate the swish html report for the entire sample with bismark2report because the sample is split into many small reports.

**sunggong** · 08-14-2014, 04:48 AM

Hi Felix,

I thought it would be really nice if 'methyl_extrator' have an option to search a specified interval only (e.g. -L chr2, -L chr2:1000-2000, or -L <target_region.bed>).

Thanks,
Sung

**fkrueger** · 08-14-2014, 05:32 AM

Hi Sung,

This might be a useful feature but I think it would be more sensible to apply this on e.g. the coverage report which has all values sorted by position with percentage methylation and count methylated/unmethyalted already, rather than resticting the methylation extractor itself. It should be relatively straight forward to filter the .cov file with a small script and should only run for a few seconds, but I'll be off on holiday for a while now...

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 25 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 28 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 24 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 52 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News