SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
CisGenome -- an integrated tool for ChIP-seq data analysis hji Bioinformatics 66 12-30-2014 02:55 PM
Bismark Bisulfite Aligner - Now supporting CpG, CHG and CHH context fkrueger Bioinformatics 27 10-11-2013 06:40 AM
Bismark v0.6.beta1: Now supporting gapped Bisulfite-Seq alignments fkrueger Bioinformatics 6 03-19-2012 06:06 AM
Apparent duplication levels incongruence between bismark and fastqc with BS-Seq data gcarbajosa Bioinformatics 2 12-13-2011 09:43 AM

Reply
 
Thread Tools
Old 07-26-2013, 11:43 PM   #281
fkrueger
Senior Member
 
Location: Cambridge, UK

Join Date: Sep 2009
Posts: 620
Default

Thanks for noticing, luckily it was only a mixup in the documentation and not the code. I have still corrected it in User manual, the Bismark help and the release notes, and put a new tar file for download.
fkrueger is offline   Reply With Quote
Old 07-29-2013, 12:50 AM   #282
PeteH
Member
 
Location: Melbourne

Join Date: Jun 2010
Posts: 64
Default

Quote:
Originally Posted by fkrueger View Post
Thanks for noticing, luckily it was only a mixup in the documentation and not the code. I have still corrected it in User manual, the Bismark help and the release notes, and put a new tar file for download.
Thanks for the new tar file. There's some issue with the archive utility on my macbook because it works fine for me on other machines but not on my macbook. Sorry for the confusion.

The description of the --old_flag argument is still not quite right for the old_flag column in the Release Notes or when you run bismark --help.
PeteH is offline   Reply With Quote
Old 07-31-2013, 02:07 AM   #283
fkrueger
Senior Member
 
Location: Cambridge, UK

Join Date: Sep 2009
Posts: 620
Default

Thansk Pete, all should be fine now.
fkrueger is offline   Reply With Quote
Old 08-06-2013, 11:25 AM   #284
yzizhen
Junior Member
 
Location: Seattle

Join Date: Jun 2011
Posts: 2
Default

Hi,

I have some pair-end RRBS data that suffer from poor qualities at the 3' end of R2 reads (Unfortunately about 20 out of 36 cycles are affected). What is the best strategy to deal with it? It seems with bowtie1, quality scores will be taken into account. Will mismatches at read positions with poor quality be ignored? With bowtie2, --ignore-quals are disabled.
Do you think choosing bowtie1 will make the difference? Or should I simply trim R2 sequences? Would different lengths of R1 and R2 have any unexpected downstream effect?

It appears to me that a better solution is to use bowtie2 "local" mode, with which the reads with poor quality at the 3' end will be trimmed. Will it be difficult to support "local" mode for bismark?

Thank you very much for your help!
yzizhen is offline   Reply With Quote
Old 08-06-2013, 12:50 PM   #285
fkrueger
Senior Member
 
Location: Cambridge, UK

Join Date: Sep 2009
Posts: 620
Default

Quote:
Originally Posted by yzizhen View Post
Hi,

I have some pair-end RRBS data that suffer from poor qualities at the 3' end of R2 reads (Unfortunately about 20 out of 36 cycles are affected). What is the best strategy to deal with it? It seems with bowtie1, quality scores will be taken into account. Will mismatches at read positions with poor quality be ignored? With bowtie2, --ignore-quals are disabled.
Do you think choosing bowtie1 will make the difference? Or should I simply trim R2 sequences? Would different lengths of R1 and R2 have any unexpected downstream effect?

It appears to me that a better solution is to use bowtie2 "local" mode, with which the reads with poor quality at the 3' end will be trimmed. Will it be difficult to support "local" mode for bismark?

Thank you very much for your help!
If 20 out of 36 cycles are badly affected by poor qualities, there won't be much good sequence left you can make use of... In any case, I would suggest trimming with Trim Galore and the following parameters:

trim_galore --paired --rrbs --trim1 file1.fq file2.fq

This will remove adapters, qualities lower than Phred 20 and also trim one additional bp off the 3' end to facilitate alignments with Bowtie (1). It also deals with fill-in problems with unmethylated cytosines, for more information please refer to the Trim Galore and RRBS guide on the Babraham Bioinformatics website.

Just to clarify a few points: Bowtie 1 takes qualities into account, but only as much as that it would allow more mismatches in poor quality sequence than it would allow high confidence mismatches. For example in its default mode it would allow up to 7 mismatches in the entire read (0-3 within the seed, the rest beyond that), but only 2 good quality ones (with a Phred score of 30+). Thus, this is certainly the wrong approach for BS-Seq.

The way Bowtie 2 is implemented in Bismark, every mismatch will get a penalty score of -6 (regardless of its quality), so for poor quality sequences it would allow even fewer mismatches than Bowtie 1 and thus be more stringent.

If you quality-trim your data first, the Bowtie 1 mode will allow roughly 2 mismatches (sometimes 3) per read, while Bowtie 2 mode will only allow 1 for shorter reads, but already 3 by the time you reach a read length 90bp. Bowtie 2 is specifically not designed for short read lengths, so I would favour the Trim Galore -> Bowtie 1 model.

Alternatively, if read 2 is likely to be trimmed so much that it would throw out a big portion of your data either in the trimming process or during mapping if the reads got way to short, you might want to consider forgetting about read 2 entirely, trim read 1 alone and run alignments in single-end mode.

I hope this helps, best, Felix
fkrueger is offline   Reply With Quote
Old 08-07-2013, 12:24 PM   #286
yzizhen
Junior Member
 
Location: Seattle

Join Date: Jun 2011
Posts: 2
Default

Thanks a lot for your quick response and insightful input. I will trim the reads as suggested, remove the reads entirely if they become too short and run bowtie 1.
yzizhen is offline   Reply With Quote
Old 08-16-2013, 03:46 AM   #287
fkrueger
Senior Member
 
Location: Cambridge, UK

Join Date: Sep 2009
Posts: 620
Default Bismark release v0.9.0

We have just released a new version of Bismark (v0.9.0) which most notably introduces a new HTML report to produce a visual summary of the alignment, deduplication and M-bias statistics of a BS-Seq experiment (requires Javascript). Here are examples for a standard paired-end BS-Seq or a single-end PBAT report. It also adds a new Unknown cytosine context for Bowtie 2 alignments for Cs that are very close to Ns or insertions in the reference sequence. Here are all the changes in more detail:

Bismark: Implemented the new methylation call symbols 'U' and 'u' for methylated or unmethylated cytosines in Unknown sequence context, respectively. If the sequence context bases contain an N, e.g. CN or CHN, the context cannot be determined accurately (previously, these cases were assumed to be in CHH context). These situations may arise whenever the reference sequence contains Ns, or when insertions in the read occur close to a cytosine position (bases inserted into the read have no direct equivalent in the reference sequence and were assumed to be Ns for the methylation call). In practical terms, the 'U/u' methylation calls will only occur for Bowtie 2 alignments because Bowtie 1 does not support gapped alignments or read alignments if the reference contains any N's. The Bismark report will now also include the 'U/u' statistics, such as count and % methylation, however only if run in Bowtie 2 mode.

bismark2report: this new module generates a graphical interactive HTML report of the Bismark alignment, deduplication, splitting and M-bias statistics for convenient visualisation of what is going on. Since several different modules of Bismark may be included into this report that may or may not have been run, bismark2report requires the user to specify the relevant reports as input files. Many thanks to Phil Ewels for the conceptual design and his help with this report.

Bismark: Fixed a bug affecting the generation of the alignment overview pie chart which occurred for PBAT libraries only

Methylation Extractor: Added handling of the newly introduced methylation call U/u for cytosines in Unknown sequence context (CN or CHN). These methylation calls are simply ignored in the extraction process to not cause too much confusion for downstream analysis

bismark2bedGraph: Added a check to see whether input files start with CpG_* or not. If they don't, please include the option '--CX' when running bismark2bedGraph as a stand-alone tool

The new bismark2report module appears to be working well with the latest version Bismark, but I didn’t have the time to investigate how far backward-compatible it is. Also, error handling or documentation might need some further updating. However, as it is the summer holiday period I won’t be able to fix any potential bugs immediately, but comments/criticism or suggestions are very welcome and I will deal with them upon my return.

Bismark is available for download from http://www.bioinformatics.babraham.a...jects/bismark/.
fkrueger is offline   Reply With Quote
Old 08-16-2013, 07:53 PM   #288
PeteH
Member
 
Location: Melbourne

Join Date: Jun 2010
Posts: 64
Default

The new HTML reports are very slick!
PeteH is offline   Reply With Quote
Old 08-20-2013, 09:53 AM   #289
oria34
Member
 
Location: Finland

Join Date: Feb 2013
Posts: 15
Default

Hi Felix,

Very good update, this soft is really awesome

I have run the deduplicate script with a SAM file created with the previous version of Bismark (0.8.3). It works fine, tells you on screen the number of alignments finally printed out of the total analyzed . However, the report file is totally blank (0 bytes)

Any thoughts?

This is what you can see on screen:
Code:
Now printing out alignments with the most representative methylation call(s)

Total number of alignments analysed in BP2_FIS17_trim_galore_1.fq_bismark_bt2_pe.sam: 39944049
Total number of representative alignments printed from B2FIS17_trim_galore_1.fq_bismark_bt2_pe.sam in total:        38271493 (95.81%)
Regarding the options, Will you recommend to run the deduplication script with the option --representative for paired end WGBS samples?



Cheers

PS. I have a last minute doubt, Does the --no_overlap option make any sense for directional experiments (only OT and OB)?

Last edited by oria34; 08-20-2013 at 10:11 AM.
oria34 is offline   Reply With Quote
Old 08-20-2013, 05:51 PM   #290
PeteH
Member
 
Location: Melbourne

Join Date: Jun 2010
Posts: 64
Default

The --no_overlap option is for paired-end libraries, regardless of whether they are directional or non-directional. Basically, you don't want to "double count" methylation calls from any overlapping sequence of read_1 and read_2 from a paired-end read. The --no_overlap option means that bismark_methylation_extractor will only use the methylation calls from read_1 of a read-pair in the overlapping region.
PeteH is offline   Reply With Quote
Old 08-21-2013, 12:24 AM   #291
fkrueger
Senior Member
 
Location: Cambridge, UK

Join Date: Sep 2009
Posts: 620
Default

Hi oria,

Regarding the empty deduplication report, I would guess that this is probably a bug in --representative mode. Speaking of which, I would not recommend using --representative mode anyway because it will give you the most highly amplified pcr duplicate for each position and not a random one. I might lose this option entirely for future versions since it seems to cause confusion only. Can you just rerun the deduplication in default mode?
fkrueger is offline   Reply With Quote
Old 08-21-2013, 09:18 AM   #292
oria34
Member
 
Location: Finland

Join Date: Feb 2013
Posts: 15
Default

Quote:
Originally Posted by PeteH View Post
The --no_overlap option is for paired-end libraries, regardless of whether they are directional or non-directional. Basically, you don't want to "double count" methylation calls from any overlapping sequence of read_1 and read_2 from a paired-end read. The --no_overlap option means that bismark_methylation_extractor will only use the methylation calls from read_1 of a read-pair in the overlapping region.
Well, to my understanding the reads of a pair always mapp to different strands (always?). Methylations calls from reads mapping to one strand will be informative only for that strand and therefore the pair will never be informative (actually Bismark doesn't take into consideration that reads for the methylation call, Does it?).

Please, correct me if I am wrong. I am quite new in all this so any new information will be very welcome



Quote:
Originally Posted by fkrueger View Post
Hi oria,

Regarding the empty deduplication report, I would guess that this is probably a bug in --representative mode. Speaking of which, I would not recommend using --representative mode anyway because it will give you the most highly amplified pcr duplicate for each position and not a random one. I might lose this option entirely for future versions since it seems to cause confusion only. Can you just rerun the deduplication in default mode?
Yeah, you are right. Rerunning without the option generates the full report.

Many thanks for the tip about the representative option, that was also my guest but I wanted to try it anyway just to compare

Cheers
oria34 is offline   Reply With Quote
Old 08-21-2013, 09:27 AM   #293
dpryan
Devon Ryan
 
Location: Freiburg, Germany

Join Date: Jul 2011
Posts: 3,480
Default

Quote:
Originally Posted by oria34 View Post
Well, to my understanding the reads of a pair always mapp to different strands (always?). Methylations calls from reads mapping to one strand will be informative only for that strand and therefore the pair will never be informative (actually Bismark doesn't take into consideration that reads for the methylation call, Does it?).

Please, correct me if I am wrong. I am quite new in all this so any new information will be very welcome
Sort of, read pairs are associated with a real (OT or OB) or theoretical (CTOT or CTOB) strand (remember that both reads arise from the same original template, which represents a single strand), which may or may not be the same as the orientation of the reads. So, if a read pair arises from the original top strand, even the read with a reverse orientation is giving information about the + strand, rather than the - strand.
dpryan is offline   Reply With Quote
Old 08-21-2013, 02:01 PM   #294
oria34
Member
 
Location: Finland

Join Date: Feb 2013
Posts: 15
Default

Quote:
Originally Posted by dpryan View Post
Sort of, read pairs are associated with a real (OT or OB) or theoretical (CTOT or CTOB) strand (remember that both reads arise from the same original template, which represents a single strand), which may or may not be the same as the orientation of the reads. So, if a read pair arises from the original top strand, even the read with a reverse orientation is giving information about the + strand, rather than the - strand.
Many thanks for your reply!

Is that true also in the case your experiment is directional? i.e. if you only have only OT and OB mapped reads?
oria34 is offline   Reply With Quote
Old 08-21-2013, 02:12 PM   #295
fkrueger
Senior Member
 
Location: Cambridge, UK

Join Date: Sep 2009
Posts: 620
Default

Read 2 is always informative for the alignment strand of read 1, so yes, it will be relevant for any kind of library.
fkrueger is offline   Reply With Quote
Old 08-23-2013, 04:23 AM   #296
gerald2545
Member
 
Location: Toulouse

Join Date: Nov 2008
Posts: 21
Default bismark_methylation_extraction sort

Hi all,
I have a question regarding sorting the methXtractor.temp files
I may not have enough background using this tool, but I wonder why the sort command in bismark2bedGraph script
Quote:
open my $ifh, "sort -S $sort_size -T $sort_dir -k3,3 -k4,4n $in |" or die "Input file could not be sorted. $!";
has the parameter -k3,3

In each methXtractor.temp file, isn't the third column always the same in all the lines of the file (name of the chromosome)?

only specifying the "-k4,4n" parameter seems to be faster :
for a methXtractor.temp containing 254 882 387 lines, sorting with only "-k4,4n" takes about 100 minutes to complete, whereas the same file sorted with "-k3,3 -k4,4n" is still running after 240 minutes.

Do I miss something? What can happen if I delete the "-k3,3" parameter from the sort command line?

Thank you for your help

Gérald
my full command line :
bismark_methylation_extractor file.bam --paired-end --no_overlap --report --bedGraph --counts --cytosine_report --zero_based --CX_context --buffer_size 24G --genome_folder /save/bismarkGenome/gg4/ -o /work/gg/

Last edited by gerald2545; 08-23-2013 at 04:26 AM.
gerald2545 is offline   Reply With Quote
Old 08-23-2013, 06:37 AM   #297
fkrueger
Senior Member
 
Location: Cambridge, UK

Join Date: Sep 2009
Posts: 620
Default

Sorting by chromosome in addition to the position might indeed be a relict of former versions of the script back when files weren't sorted into individual chromosome files. I'll take a look at this once I am back, but for the moment you should be fine just deleting the -k 3,3 from the sort command.
fkrueger is offline   Reply With Quote
Old 08-23-2013, 07:28 AM   #298
gerald2545
Member
 
Location: Toulouse

Join Date: Nov 2008
Posts: 21
Default

Thank you Felix for your time, but don't forget : you are on holidays

Gérald
gerald2545 is offline   Reply With Quote
Old 08-24-2013, 01:12 PM   #299
gerald2545
Member
 
Location: Toulouse

Join Date: Nov 2008
Posts: 21
Default

jute a follow-up, the same file took 1500 minutes to sort with -k3,3 parameter

gerald
gerald2545 is offline   Reply With Quote
Old 08-24-2013, 01:24 PM   #300
fkrueger
Senior Member
 
Location: Cambridge, UK

Join Date: Sep 2009
Posts: 620
Default

Seems like a point worthy addressing... But now I shall focus on holiday!
fkrueger is offline   Reply With Quote
Reply

Tags
alignment, bisulfite, bisulphite, methylation, sequencing

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 05:52 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO