![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
CisGenome -- an integrated tool for ChIP-seq data analysis | hji | Bioinformatics | 66 | 12-30-2014 02:55 PM |
Bismark Bisulfite Aligner - Now supporting CpG, CHG and CHH context | fkrueger | Bioinformatics | 27 | 10-11-2013 06:40 AM |
Bismark v0.6.beta1: Now supporting gapped Bisulfite-Seq alignments | fkrueger | Bioinformatics | 6 | 03-19-2012 06:06 AM |
Apparent duplication levels incongruence between bismark and fastqc with BS-Seq data | gcarbajosa | Bioinformatics | 2 | 12-13-2011 09:43 AM |
![]() |
|
Thread Tools |
![]() |
#421 | |||
Senior Member
Location: Cambridge, UK Join Date: Sep 2009
Posts: 625
|
![]() Quote:
Quote:
Quote:
|
|||
![]() |
![]() |
![]() |
#422 | |
Junior Member
Location: Phoenix Join Date: May 2010
Posts: 7
|
![]() Quote:
Sorry for this late message. Thanks Andrew for your reply to my question regarding M-bias. I effectively ran the Extractor with the --no_overlap option on before. As suggested I just rerun Bismark Methyl Extractor without the --no_overlap option and it worked. I do not see the number of calls falling anymore for CHH and CHG. Thanks Felix for confirming Andrew's answer. Yes, you are right, the CpG calls was also falling slightly but better now. I also noticed that the R2 calls counts are more heterogeneous across the read length than the number of calls across R1. I do not know if anyone else noticed that. Thanks, Best, Christophe |
|
![]() |
![]() |
![]() |
#423 |
Senior Member
Location: MO Join Date: May 2010
Posts: 108
|
![]()
Hi, Felix,
I have another question about the alignment files. I just checked the bam file after running bismark alignment and it is unsorted. Then using the bam output to do deduplication, the resulting deduplicated.bam is also unsorted. My first question is deduplication step doesn't require the input bam to be sorted? I was trying to merge several deduplicated.bam files from different lanes into one big final bam and run methylation extractor on it and the resulting coverage file (.zero.cov with --zero_based specified) has methylation calls on non-Cs locations like chr1 133 134. That's how I traced it back to the issue of unsorted bam.. So it's probably better to implement sorting into the bismark pipeline? |
![]() |
![]() |
![]() |
#424 | |
Devon Ryan
Location: Freiburg, Germany Join Date: Jul 2011
Posts: 3,480
|
![]() Quote:
|
|
![]() |
![]() |
![]() |
#425 | |
Senior Member
Location: Cambridge, UK Join Date: Sep 2009
Posts: 625
|
![]() Quote:
|
|
![]() |
![]() |
![]() |
#426 | |
Senior Member
Location: MO Join Date: May 2010
Posts: 108
|
![]() Quote:
I then tried to sort the deduplicated bam with the command Code:
samtools sort -n deduplicated.bam deduplicated_sort Thanks! |
|
![]() |
![]() |
![]() |
#427 |
Senior Member
Location: Cambridge, UK Join Date: Sep 2009
Posts: 625
|
![]()
Indeed, you should not sort the files by coordinates at all before running the deduplication.
|
![]() |
![]() |
![]() |
#428 |
Senior Member
Location: MO Join Date: May 2010
Posts: 108
|
![]()
Should I sort it by read names using -n in samtools sort just like I listed before doing deduplication? Or should I not sort it at all?
Last edited by gene_x; 09-11-2014 at 08:43 AM. |
![]() |
![]() |
![]() |
#429 |
Senior Member
Location: Cambridge, UK Join Date: Sep 2009
Posts: 625
|
![]()
Of you can, just use the Bismark files as they are generated. If you have to
using samtools sort -n should do the trick as well. |
![]() |
![]() |
![]() |
#430 | ||
Senior Member
Location: MO Join Date: May 2010
Posts: 108
|
![]() Quote:
Another question, I'm not sure if my previous run without doing any sorting was correct. Here is my command Code:
samtools merge input.bam plate1/plate1_all_sort.bam plate2/plate2_all_sort.bam deduplicate_bismark -p --bam input.bam bismark_methylation_extractor -p --no_overlap --ignore 3 --ignore_r2 3 --bedGraph --counts --zero_based --report input.deduplicated.bam 2> input.meth_extractor_log.txt Quote:
|
||
![]() |
![]() |
![]() |
#431 |
Senior Member
Location: Cambridge, UK Join Date: Sep 2009
Posts: 625
|
![]()
Hi gene_x,
not quite sure what is going here or which genome you are aligning your reads to, but for at least human or mouse chromosome 2 the first couple of megabases are masked by Ns, and there is no way that Bismark would map any reads to these sequences or extract methylation calls from it.... |
![]() |
![]() |
![]() |
#432 |
Junior Member
Location: Boston Join Date: Feb 2012
Posts: 6
|
![]()
Has anyone here actually tried this?
I'm seeing some strange RRBS results with high non-CpG methylation levels (~4%) in my samples. I suspect it's due to poor(er) conversion, but my client thinks it's due to the non-standard areas we selected for RRBS (larger fragments targetted to non-CpG island areas). I could use a more objective measure of conversion to help decide the issue. Would appreciate any and all pointers/scripts to get this to work. Best, John |
![]() |
![]() |
![]() |
#433 |
Senior Member
Location: Cambridge, UK Join Date: Sep 2009
Posts: 625
|
![]()
Hi John,
To see if it is the difference of the captured regions I would suggest looking at non-CG context only in CpG islands. This could be done with a subset of your reads (maybe methylation calls from some 10M reads), import them into SeqMonk, design probes over CGIs and then look at the average methylation (shouldn't take more than 5 mins to find out). Alternatively you could try to look at overall methylation levels on the mitochondrium which is normally very lowly methylated, I am not sure however how well the MT is covered in an RRBS setting... Basically whatever lowest average methylation level you find in any larger aggregate of regions can be considered the upper limit of bisulfite conversion error. |
![]() |
![]() |
![]() |
#434 | |
Senior Member
Location: MO Join Date: May 2010
Posts: 108
|
![]() Quote:
A related question, if you have multiple lanes of data (fastq), do you run alignment on each individual lane first and them merge the resulting bam file or do you merge all fastq first and then do a big alignment on the merged fast file? |
|
![]() |
![]() |
![]() |
#435 |
Senior Member
Location: Cambridge, UK Join Date: Sep 2009
Posts: 625
|
![]()
If done correctly either way is fine. I find it easier to merge FastQ up front because that way you can set off a pipeline without having to intervene until you get the final reports.
|
![]() |
![]() |
![]() |
#436 | |
Junior Member
Location: Boston Join Date: Feb 2012
Posts: 6
|
![]() Quote:
|
|
![]() |
![]() |
![]() |
#437 | |
Senior Member
Location: MO Join Date: May 2010
Posts: 108
|
![]() Quote:
I've tried a few different ways of doing it but the resulting .zero.cov file still has methylation calls on non-CpG sites. I couldn't figure out how this could happen.. Can you take a look at my pipeline and see if there is anything suspicious? 1. Run adapter trimming and PE bismark(bowtie1) mapping on individual lanes Code:
bismark -n 2 -l 50 --chunkmbs 1024 -X 800 -un --ambiguous --bam $index -1 lane1_read1.fastq -2 lane1_read2.fastq Code:
samtools sort -n lane1.bam lane1_sort 3. Merge all sorted bam files Code:
samtools merge all_lanes.bam lane1_sort.bam lane2_sort.bam ... Code:
deduplicate_bismark -p --bam all_lanes.bam Code:
bismark_methylation_extractor -p --no_overlap --ignore 3 --ignore_r2 3 --bedGraph --counts --zero_based --report all_lanes.deduplicated.bam Also I'm wondering if you actually have tried to map individual lanes first and them merge the bam files etc? Is it possible that there might be some hidden bug in the process? Thank you very much! |
|
![]() |
![]() |
![]() |
#438 |
Devon Ryan
Location: Freiburg, Germany Join Date: Jul 2011
Posts: 3,480
|
![]()
You can probably skip the sorting/merging and just use "samtools cat". The alignments will then be in the same order as they would have had you merged prior to alignment and this will be significantly faster.
|
![]() |
![]() |
![]() |
#439 |
Senior Member
Location: MO Join Date: May 2010
Posts: 108
|
![]() |
![]() |
![]() |
![]() |
#440 |
Senior Member
Location: Cambridge, UK Join Date: Sep 2009
Posts: 625
|
![]()
We have just released a new version of Bismark (v0.13.0), which is available from the Babraham Bioinformatics website. This version adds a couple of useful options and changes some default behavior. Perhaps most notably the methylation extractor may now optionally be run in a multithreaded manner which greatly reduces its processing time (in a preliminary benchmark the elapsed time went down almost linearly when more cores were being used for this process, see below for more details). Here is a list of all changes:
o Bismark: Fixed renaming issue for SAM to BAM files (which would have replaced any occurrence of sam in the file name, e.g. sample1_... instead of the file extension .sam) o Methylation Extractor: Added new option '--multicore INT' to set the number of cores to be used for the methylation extraction process. If system resources are plentiful this is a viable option to speed up the extraction process (we observed a near linear speed increase for up to 10 cores specified). Please note that a typical process of extracting a BAM file and writing out '.gz' output streams will in fact use ~3 cores per value of --multicore INT specified (1 for the methylation extractor itself, 1 for a Samtools stream, 1 for a GZIP stream), so --multicore 10 is likely to use around 30 cores of system resources. This option has no bearing on the speed of the bismark2bedGraph or genome-wide cytosine report processes o Methylation Extractor: Added two new options '--ignore_3prime INT' (for single-end alignments and Read 1 of paired-end alignments) and '--ignore_3prime_r2 INT' (for Read 2 of paired-end alignments) to remove positions that display a methylation call bias on the 3' end of reads o Methylation Extractor: The option --no_overlap is now the default for paired-end data. One may explicitly choose to include overlapping data with the option '--include_overlap' o Methylation Extractor: The splitting report will now be written out by default (previously optional --report) o Methylation Extractor: In paired-end mode, read-pairs which had been skipped because either read was shorter than a specified (very high) value of '--ignore' or '--ignore_r2' will now have the information of the other read extracted if it meets the length criteria (if applicable). Thanks to Andrew Dei Rossi for contributing a patch o bismark2bedGraph: Fixed the location of the sorting directory which could have failed if an output directory had been specified |
![]() |
![]() |
![]() |
Tags |
alignment, bisulfite, bisulphite, methylation, sequencing |
Thread Tools | |
|
|