![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
CisGenome -- an integrated tool for ChIP-seq data analysis | hji | Bioinformatics | 66 | 12-30-2014 02:55 PM |
Bismark Bisulfite Aligner - Now supporting CpG, CHG and CHH context | fkrueger | Bioinformatics | 27 | 10-11-2013 06:40 AM |
Bismark v0.6.beta1: Now supporting gapped Bisulfite-Seq alignments | fkrueger | Bioinformatics | 6 | 03-19-2012 06:06 AM |
Apparent duplication levels incongruence between bismark and fastqc with BS-Seq data | gcarbajosa | Bioinformatics | 2 | 12-13-2011 09:43 AM |
![]() |
|
Thread Tools |
![]() |
#381 |
Junior Member
Location: Cambridge, UK Join Date: May 2014
Posts: 6
|
![]()
Hi,
I am wondering whether bismark_methylation_extractor is written to use multi threads? Any plan, if not? Thanks, Sung |
![]() |
![]() |
![]() |
#382 | |
Senior Member
Location: Cambridge, UK Join Date: Sep 2009
Posts: 625
|
![]() Quote:
For the time being you might want to split up the BAM file into several smaller chunks and launch them individually to gain some speed. Cheers, Felix |
|
![]() |
![]() |
![]() |
#383 | |
Junior Member
Location: Cambridge, UK Join Date: May 2014
Posts: 6
|
![]() Quote:
"This might be a result of sorting the paired-end SAM/BAM files by chromosomal position which is not compatible with correct methylation extraction. Please use an unsorted file instead". It seems I need to provide unsorted BAM files, but I thought it would be great if bismark could deal with a sorted BAM file as this makes splitting the original BAM file by chromosome easier using samtools. |
|
![]() |
![]() |
![]() |
#384 |
Senior Member
Location: Cambridge, UK Join Date: Sep 2009
Posts: 625
|
![]()
Sorting the BAM file into smaller files by chromosome shouldn't matter, you only need to take care that you don't sort by position as well so that you do not disturb the Read 1/ Read 2 order of the file. Since read pairs should always align to the same chromosome you could probably resort the reads by name (samtools sort -n) in case they were sorted by position.
It is very unlikely that we will change the methylation extractor so that it will be able to cope with sorted or otherwise random R1/R2 order since this may be a huge (t)ask memory-wise because BAM files can easily reach several tens of GB in size. |
![]() |
![]() |
![]() |
#385 |
Member
Location: California Join Date: Jan 2012
Posts: 11
|
![]()
Hi Felix,
Sorry to perpetuate my reputation for breaking the methylation extractor, but I recently noticed an issue with the patch I submitted that was incorporated in the v0.12.2 release. Background on the old patch: Running v0.12.1 of bismark_methylation_extractor, I experienced a failure when I used high --ignore or --ignore_r2 values that equaled or exceeded the alignment length. The old patch I came up with simply called "next" when it encountered this condition, and this indeed prevented the fatal error. However, I recently noticed an unfortunate behavior: with --ignore set to 0 and --ignore_r2 set to a high value, when a read 2 alignment is encountered that is shorter than --ignore_r2, the read 1 alignment is discarded as well. Scrutinizing the code, the behavior is caused by the "next" call from the old patch. Since the while loop iterates over each read pair (not each read), calling "next" discards the whole read pair though only one of the two reads may be problematic. I came up with a new patch that corrects this behavior of the old one (see attached): now when one read is too short for its "ignore" value, its mate is not automatically discarded. If you could take a look when you get the chance, that would be much appreciated. My apologies for not catching this the first time. Andrew Last edited by adeirossi; 08-01-2014 at 04:37 PM. Reason: Fix attached script to not spew Perl warnings |
![]() |
![]() |
![]() |
#386 |
Senior Member
Location: Cambridge, UK Join Date: Sep 2009
Posts: 625
|
![]()
Hi Andrew,
I am currently away on holiday but I will definitely take a look once I am back. Thanks for your continued support of the methylationXtractor ;D |
![]() |
![]() |
![]() |
#387 |
Member
Location: Singapore Join Date: May 2012
Posts: 48
|
![]()
Hi. I am just wondering if it is advisable to merge bam files produced by Bismark using samtools merge, and then run methyl_extrator?
The reason why I am asking this question is because I have hundreds of gzipped single-end fastq files and I am very reluctant to merge all the fastq files. The most convenient way is to run bismark on every fastq files, and merge the bamfiles before the running methyl_extrator. Is it "legal"? |
![]() |
![]() |
![]() |
#388 |
Senior Member
Location: Cambridge, UK Join Date: Sep 2009
Posts: 625
|
![]()
Hi Wilson,
If you've got many small FastQ files it should be absolutely fine to merge the BAM files and then run the methylation extractor. This is at least true for single-end files, however we have observed that samtools merge might interfere with the Read1/Read2 order of paired-end files, and they might need to be sorted by read name afterwards (e.g. with samtools sort -n). Again, since you seem to have single-end files it should be fine. The only (minor) inconvenience is that you won't be able to generate the swish html report for the entire sample with bismark2report because the sample is split into many small reports. |
![]() |
![]() |
![]() |
#389 |
Junior Member
Location: Cambridge, UK Join Date: May 2014
Posts: 6
|
![]()
Hi Felix,
I thought it would be really nice if 'methyl_extrator' have an option to search a specified interval only (e.g. -L chr2, -L chr2:1000-2000, or -L <target_region.bed>). Thanks, Sung |
![]() |
![]() |
![]() |
#390 |
Senior Member
Location: Cambridge, UK Join Date: Sep 2009
Posts: 625
|
![]()
Hi Sung,
This might be a useful feature but I think it would be more sensible to apply this on e.g. the coverage report which has all values sorted by position with percentage methylation and count methylated/unmethyalted already, rather than resticting the methylation extractor itself. It should be relatively straight forward to filter the .cov file with a small script and should only run for a few seconds, but I'll be off on holiday for a while now... |
![]() |
![]() |
![]() |
#391 |
Junior Member
Location: Phoenix Join Date: May 2010
Posts: 7
|
![]()
Hi Everyone,
I have questions regarding M-bias plot and CHH,CHG methylation calls generated with bismark. I hope it is the right thread to ask otherwise let me know where should have I posted it. Has anyone seen a drop off of CHH and CHG total calls within a M-bias plot generated by bismark tool? I have observed this drop off only in the M-bias read2 plot. (See Attached png plots) The data are from Whole Genome BiSulphite with reads of 101 bases Library Prep was performed using Ultralow Ovation MS kit. How to interpret the falling number of calls after 50 bases? and .... How this can impact the alignment of read2 in bismark? and the Methylation calls? Thanks for your help, Christophe |
![]() |
![]() |
![]() |
#392 |
Member
Location: California Join Date: Jan 2012
Posts: 11
|
![]()
Hi Christophe,
I have also seen a falling number of methylation calls for read 2 in my data. I have always assumed this was because I ran bismark_methylation_extractor with the --no_overlap option and many of the read pairs had overlaps (i.e. insert length less than twice the read length). With --no_overlap, when the extractor encounters an overlap between read 1 and read 2, I believe it uses the read 1 calls only and discards the read 2 calls, which would explain why the number of read 2 calls decreases as you go toward its 3' end. Because this "no overlap" filtering occurs after mapping, I assume there is no impact on the alignment of read 2. If, on the other hand, you are not using the --no_overlap option or you don't have many reads pairs with inserts <200 bp, then I have no idea what's going on. Andrew |
![]() |
![]() |
![]() |
#393 | |
Senior Member
Location: Cambridge, UK Join Date: Sep 2009
Posts: 625
|
![]() Quote:
|
|
![]() |
![]() |
![]() |
#394 |
Member
Location: London Join Date: Jan 2013
Posts: 25
|
![]()
Dear all,
Apologies if this was answered previously, but I couldn't find the information by Googling and browsing the thread. I am a postgraduate bioinformatics student dealing with the first methylation dataset in our group. Issue My issue is that I see strikingly low alignment rate by bismark (28-45% for bisulfite libraries, 49-55% for control non-bisulfite libraries). Details I was given a dataset of 2 x 150bp reads of whole-genome bisulfite sequencing libraries (WGBS) using the Epignome Methyl-Seq kit, sequencing using Illumina platform. I have clipped adapter sequences and trimmed low-quality bases with Trimmomatic, as suggested in the Epicentre guide (http://www.epibio.com/applications/b...eq-kit?details). This left over 90% of the reads in each sample with some workable sequence. From reading I have done so far, I feel that the read length is much longer than necessary, and might even cause more problems than it brings information. It seem 50-75 read length is better suited to this type of analysis. I would like to know if anyone could advise me appropriate non-default alignment parameters suited to my data. Do I have to control the number of mismatches allowed in the seed alignments? The length of the seeds? The overall alignment score? etc etc. I prefer asking experienced users rather than test and overfit parameters of my own. Other In the meantime, I truncated the forward read file of the sample with worst alignment rate (28.4%) to 75 bp, and try to align that as single-end to see if I get a better alignment rate. It turned out to align 63.9% ! Rather than clip all my reads, I'd be very grateful to learn how to salvage information from the full length reads ![]() Thanks in advance! |
![]() |
![]() |
![]() |
#395 |
Senior Member
Location: Australia Join Date: Sep 2008
Posts: 136
|
![]()
kevinrun - what does fastqc look like on your forward and reverse reads?
We've recently switched over to the same kit, and now only do 71bp PE reads (we can get that out of 2 50 cycle SBS kits). I similarly got not great alignment using bismark (even after trimming the 6bp from each end) - however I get good alignment using bwameth, and methylation % obtained on the same dataset run through either pipeline were very similar (r=0.97 IIRC) - so perhaps bwa-meth is worth a look? I get about 80% of reads aligning with a mapping quality >=40. |
![]() |
![]() |
![]() |
#396 | ||
Senior Member
Location: Cambridge, UK Join Date: Sep 2009
Posts: 625
|
![]()
A recent post in this thread was asking something quite similar, maybe the reply is helpful in some way:
Quote:
|
||
![]() |
![]() |
![]() |
#397 |
Member
Location: London Join Date: Jan 2013
Posts: 25
|
![]()
Hi,
Thank you both for the very quick answers. Much appreciated. It took me a bit of time to get this answer together. I hope it's clear. I didn't mention yet that I am working with the Bos taurus UMD3.1 genome. 3 billion base pairs on 29 autosomes + 2 sex chromosomes: comparable to human. frozenlyse I attached the fastQC archives (recompressed in tar format to satisfy the upload requirements of SeqAnswer), as I am not sure which part of it you were interested in. Due to the C->T conversion caused by the bisulfite treatment, I am not particularly concerned by:
bwameth sounds like a good alternative. I will look for a standard pipeline of analysis from alignment to differential methylation calls.. except if you have a link to offer ![]() Felix Sorry I missed that post, thanks for forwarding. I started reading it to understand the entire procedure, although it sounds convoluted. I suppose that after all those alignments, I would have to aggregate the methylation calls for each cytosine (from the paired alignment + single end forward + pbat single end reverse), and then feed it into an analytic software package. What tools do you recommend for statistical analysis of differential methylation by the way? Info I just tried to align 1 million non-truncated forward reads in single end mode and got 60.7% again. Looks like the problem of chimeric reads indeed. Here is a copy paste of the full bismark report if it is any use: Bismark report for: /workspace/scratch/krue/Methylation/2014-08-27_analysis/Trimmomatic/Paired/C1_ATCACG_1P.fastq (version: v0.12.1) Option '--directional' specified (default mode): alignments to complementary strands (CTOT, CTOB) were ignored (i.e. not performed) Bismark was run with Bowtie 2 against the bisulfite genome of /workspace/storage/genomes/bostaurus/UMD3.1.75/source_file/ with the specified options: -q --score-min L,0,-0.2 --ignore-quals Final Alignment report ====================== Sequences analysed in total: 1000000 Number of alignments with a unique best hit from the different alignments: 606793 Mapping efficiency: 60.7% Sequences with no alignments under any condition: 179520 Sequences did not map uniquely: 213687 Sequences which were discarded because genomic sequence could not be extracted: 5 Number of sequences with unique best (first) alignment came from the bowtie output: CT/CT: 303497 ((converted) top strand) CT/GA: 303291 ((converted) bottom strand) GA/CT: 0 (complementary to (converted) top strand) GA/GA: 0 (complementary to (converted) bottom strand) Number of alignments to (merely theoretical) complementary strands being rejected in total: 0 Final Cytosine Methylation Report ================================= Total number of C's analysed: 14776304 Total methylated C's in CpG context: 644197 Total methylated C's in CHG context: 32496 Total methylated C's in CHH context: 56579 Total methylated C's in Unknown context: 756 Total unmethylated C's in CpG context: 249170 Total unmethylated C's in CHG context: 3536727 Total unmethylated C's in CHH context: 10257135 Total unmethylated C's in Unknown context: 8798 C methylated in CpG context: 72.1% C methylated in CHG context: 0.9% C methylated in CHH context: 0.5% C methylated in Unknown context (CN or CHN): 7.9% C1_ATCACG_1P.fastq_bismark_bt2_SE_report.txt (END) |
![]() |
![]() |
![]() |
#398 |
Senior Member
Location: Cambridge, UK Join Date: Sep 2009
Posts: 625
|
![]()
Hi Kevin,
The FastQC reports look completely normal, and the results you posted look quite decent as well. So for the record, can we just state that it's not really Bismark's fault but the PBAT protocol itself is generating wild artefacts ![]() Regarding DMR analysis: I don't think there anything is accepted as gold standard just yet, new tools are being published almost every week. Promising and widely used tools are: bsseq, BiSeq, methylKit, MOABS, RADMeth, DMAP, RnBeads and a few more, most of them interface rather nicely with the output of Bismark. Please find attached a review of the underlying concepts of some of these tools by Mark Robinson: http://biorxiv.org/content/biorxiv/e...07120.full.pdf, I hope it may help. |
![]() |
![]() |
![]() |
#399 |
Member
Location: London Join Date: Jan 2013
Posts: 25
|
![]()
Hi Felix,
Thanks for the feedback on the fastQC and alignment report. Plus the list of DMR tools. I only tested BiSeq so far during my initial attempt to an analytic pipeline, and didn't find any significant DMR, but Katja told me the pacakge was meant for RRBS analyses. Anyway, as you can see the proper alignment of my reads might have been the limiting factor. Indeed, as far as I can see, bismark does indeed a good job aligning reads, with good quality reads and adequate parameters ![]() Given my recent arrival in the bisulfite-sequencing field (I have only dealt with transcriptomics data for the past two years, microarray and RNA-seq), I can only strive to understand all the complexity of the bisulfite data, the sources during their obtention and their subsequent analysis which you are much more familiar with than I am. New result I have a little update which surprised me. I have attached a screenshot of the alignment rate for SingleEnd/PE reads mode, with or without truncating to 75bp. Two of them were only performed on the first million read/pairs using "-u 1000000" to post you this answer today ![]() What surprised me, as I believed chimeric read/pairs were being discarded during the alignment, is that truncating both mates to 75 bp (or fewer if quality trimming left less than that) restored an alignment rate of 56.8% (of 1 million pair), compared to 28.4% (of 8.4 million pairs, in) for full-length read pairs. If inserts were indeed chimeric, truncating the read pairs should identified those reads as chimeric and left them out again, shouldn't it? Or am I missing something? Now, I think I will repeat tonight the alignment of truncated read pairs with the entire 8.4 million read pairs to satisfy my OCDs and give you a table with fully comparable numbers, but I am fairly certain that these figures are representative enough. Many thanks for your help and time Kevin Last edited by kevinrue; 09-01-2014 at 01:48 PM. Reason: missing image upload?! |
![]() |
![]() |
![]() |
#400 |
Senior Member
Location: Cambridge, UK Join Date: Sep 2009
Posts: 625
|
![]()
I am keen to hear more about what you discover!
|
![]() |
![]() |
![]() |
Tags |
alignment, bisulfite, bisulphite, methylation, sequencing |
Thread Tools | |
|
|