SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
CisGenome -- an integrated tool for ChIP-seq data analysis hji Bioinformatics 66 12-30-2014 02:55 PM
Bismark Bisulfite Aligner - Now supporting CpG, CHG and CHH context fkrueger Bioinformatics 27 10-11-2013 06:40 AM
Bismark v0.6.beta1: Now supporting gapped Bisulfite-Seq alignments fkrueger Bioinformatics 6 03-19-2012 06:06 AM
Apparent duplication levels incongruence between bismark and fastqc with BS-Seq data gcarbajosa Bioinformatics 2 12-13-2011 09:43 AM

Reply
 
Thread Tools
Old 04-25-2017, 02:46 AM   #621
pig_raffles
Member
 
Location: Sheffield, UK

Join Date: Feb 2012
Posts: 15
Default

Excellent, thank you for the very clear explanation
pig_raffles is offline   Reply With Quote
Old 05-11-2017, 02:39 PM   #622
twotwo
Member
 
Location: Ohio

Join Date: May 2012
Posts: 40
Default

Quote:
Originally Posted by Ahra View Post
hello. I am a newbie using bismark. I'm not a bioinformation and also i don't know much about NGS alignment, especially how to running alignment programs. But, i could use bismark thanks to kindly well made bismark user guide book.

i ran bismark and got the report file and sam file. so to get the position of methylC, i ran bismark_methylation_extractor especially used --bedGraph option to see methlylation %.

According to user guide book,

A typical command including the optional bedGraph --counts output could look like this:
bismark_methylation_extractor -s --bedGraph --counts --buffer_size 10G s_1_sequence.txt_bismark.sam


my data is paired-end so i edited little that one. like this:

bismark_methylation_extractor -p --no_overlap --comprehensive --CX
--bedGraph --counts --buffer_size 10G s_1_sequence.txt_bismark.sam


but, i couldn't get additional information about <chromosome> <start position> <end position> <methylation percentage>

my result was like thins

Bismark methylation extractor version v0.7.11
HWI-D00111:9926ADACXX:7:1101:1567:2165_1:N:0:CTTGTA/1 - Vr08 7498192 x
HWI-D00111:9926ADACXX:7:1101:1567:2165_1:N:0:CTTGTA/1 - Vr08 7498173 x
HWI-D00111:9926ADACXX:7:1101:1567:2165_1:N:0:CTTGTA/2 - Vr08 7498096 x
HWI-D00111:9926ADACXX:7:1101:1567:2165_1:N:0:CTTGTA/2 -

how can i get the methylation percentage information and run further bedGraph2Cytosines process!

i am looking forward anyone's reply
I have the same question as yours. Did you solve your problem? Then which command should I use the generate the methylation percentage?
twotwo is offline   Reply With Quote
Old 05-11-2017, 02:51 PM   #623
fkrueger
Senior Member
 
Location: Cambridge, UK

Join Date: Sep 2009
Posts: 597
Default

A typical command to extract methylation calls and get a coverage file is:

Code:
bismark_methylation_extractor --bedGraph --buffer_size 10G file_bismark.bam
Do you have any problems running this command, and if so can you post details?
fkrueger is offline   Reply With Quote
Old 05-12-2017, 08:04 AM   #624
twotwo
Member
 
Location: Ohio

Join Date: May 2012
Posts: 40
Default

Quote:
Originally Posted by fkrueger View Post
A typical command to extract methylation calls and get a coverage file is:

Code:
bismark_methylation_extractor --bedGraph --buffer_size 10G file_bismark.bam
Do you have any problems running this command, and if so can you post details?
Hi, Thanks fkrueger! I ran your code. The following is the head what I have from the bedgraph file:

track type=bedGraph
chr11 110189 110190 100
chr11 110211 110212 100
chr11 113464 113465 100
chr11 113508 113509 100
chr11 113509 113510 100
chr11 113524 113525 100
chr11 113525 113526 100
chr11 123420 123421 100
chr11 123449 123450 100
So my question is: Is there all of the probe having 100% methylation percentage? That is a little weird....
By the way, I have another question: If I have paired samples, can I compare the methylation percentage with bismark? Thanks!
twotwo is offline   Reply With Quote
Old 05-12-2017, 08:32 AM   #625
fkrueger
Senior Member
 
Location: Cambridge, UK

Join Date: Sep 2009
Posts: 597
Default

You should probably look at the coverage file because this will also tell you how many counts you saw methylated or unmethylated. If you see 100% then I would suspect you saw only a single call for this position, which in this case happened to be methylated.

To compare different samples we tend to use SeqMonk, a lightweight but fast and powerful genome browser and analysis tool. Here are some presentations to about what methylation analysis in SeqMonk looks like. https://www.bioinformatics.babraham....ing.html#bsseq

Best, Felix
fkrueger is offline   Reply With Quote
Old 05-12-2017, 11:03 AM   #626
twotwo
Member
 
Location: Ohio

Join Date: May 2012
Posts: 40
Default

Quote:
Originally Posted by fkrueger View Post
You should probably look at the coverage file because this will also tell you how many counts you saw methylated or unmethylated. If you see 100% then I would suspect you saw only a single call for this position, which in this case happened to be methylated.

To compare different samples we tend to use SeqMonk, a lightweight but fast and powerful genome browser and analysis tool. Here are some presentations to about what methylation analysis in SeqMonk looks like. https://www.bioinformatics.babraham....ing.html#bsseq

Best, Felix
Hi, Felix,
Thanks for your quick answer. Here is the head of the coverage file. Does that mean that I should merge the data (get all the information for one SNP) and get the methylation percentage?


chr11 110190 110190 100 1 0
chr11 110212 110212 100 2 0
chr11 113465 113465 100 1 0
chr11 113509 113509 100 4 0
chr11 113510 113510 100 1 0
chr11 113525 113525 100 2 0
chr11 113526 113526 100 1 0
chr11 123421 123421 100 1 0
chr11 123450 123450 100 1 0
chr11 123849 123849 100 5 0
twotwo is offline   Reply With Quote
Old 05-12-2017, 01:33 PM   #627
fkrueger
Senior Member
 
Location: Cambridge, UK

Join Date: Sep 2009
Posts: 597
Default

I am not quite sure if I understand your question here to be honest.

Code:
chr11 113509 113509 100 4 0
This example line means that for the position 113509 on chromosome 11 you had 4 methylation calls in total that were methylated (in the entire dataset), and 0 calls that were unmethylated. This translates into a 100% methylation percentage at this position (column 4). Also, the positions here are simply cytosines in the genome but not SNP.

Just to remind you this this the format:

Code:
The coverage output looks like this (tab-delimited, 1-based genomic coords):
============================================================================================================================================

<chromosome>  <start position>  <end position>  <methylation percentage>  <count methylated>  <count non-methylated>
I hope this helps.
fkrueger is offline   Reply With Quote
Old 05-12-2017, 01:43 PM   #628
twotwo
Member
 
Location: Ohio

Join Date: May 2012
Posts: 40
Default

Thank you very much!
twotwo is offline   Reply With Quote
Old 07-27-2017, 01:27 PM   #629
twotwo
Member
 
Location: Ohio

Join Date: May 2012
Posts: 40
Default

Hi, fkrueger,
If I want to compare paired sample (one vs one). Can I do it with seqmonk in unix? Like using some unix command, and obtain a table with p-value per probe?
twotwo is offline   Reply With Quote
Old 07-28-2017, 01:58 AM   #630
fkrueger
Senior Member
 
Location: Cambridge, UK

Join Date: Sep 2009
Posts: 597
Default

Erm, what would you like to compare exactly? SeqMonk can certainly do a number of things, I suggest you follow the guidelines and practical of this methylation analysis course: http://www.bioinformatics.babraham.a...ing.html#bsseq.

Cheers, Felix
fkrueger is offline   Reply With Quote
Old 08-07-2017, 02:30 PM   #631
twotwo
Member
 
Location: Ohio

Join Date: May 2012
Posts: 40
Default

Hi, fkrueger,
Is there any annotation file for the methylation data? Like after I do the alignment, I can get a probe like: chr, start position.... How can I know which gene is it from? Thank you.
twotwo is offline   Reply With Quote
Old 08-08-2017, 03:53 AM   #632
fkrueger
Senior Member
 
Location: Cambridge, UK

Join Date: Sep 2009
Posts: 597
Default

Quote:
Originally Posted by twotwo View Post
Hi, fkrueger,
Is there any annotation file for the methylation data? Like after I do the alignment, I can get a probe like: chr, start position.... How can I know which gene is it from? Thank you.
We tend to use SeqMonk for this purpose (https://www.bioinformatics.babraham....jects/seqmonk/). It is a very fast and powerful genome viewer and quantitation tool. Some examples can be found in the documentation of this training course: https://www.bioinformatics.babraham....ing.html#bsseq.
fkrueger is offline   Reply With Quote
Old 08-08-2017, 06:53 AM   #633
twotwo
Member
 
Location: Ohio

Join Date: May 2012
Posts: 40
Default

Thanks fkrueger!
twotwo is offline   Reply With Quote
Old 10-02-2017, 09:15 AM   #634
akramdi
Junior Member
 
Location: Paris, France

Join Date: Mar 2016
Posts: 9
Default

Hi Felix,

I'm a bit confused about the reads reported as "ambiguous" by Bismark, which made me doubt my understanding of how Bismark reports alignments (using Bowtie2).

So to my understanding, Bismark does not use -a mode nor -k N mode, it uses Bowtie2 default mode: "search for multiple alignments, report the best one" and how hard to looks for alignments is determined by the effort options (-D,-R). So reads that have multiple distinct alignments are reported once: their best alignment is reported or a randomly chosen alignment when equally-good choices are found.

On the other hand, ambiguous reads are defined as:

Quote:
--ambiguous Write all reads which produce more than one valid alignment with the same number of lowest
mismatches or other reads that fail to align uniquely to a file in the output directory.
Written reads will appear as they did in the input, without any of the translation of quality
values that may have taken place within Bowtie or Bismark. Paired-end reads will be written to two
parallel files with _1 and _2 inserted in theit filenames, i.e. _ambiguous_reads_1.txt and
_ambiguous_reads_2.txt. These reads are not written to the file specified with --un.
I think that reads that produce more that one valid alignment should be reported based on Bowtie2 default mode (not all alignments, only one). How does having the same number of lowest mismatches affect how these reads are reported?
Same with reads that fail to align uniquely, I think they should be reported once (by the way, to me, failing to align uniquely is equivalent to producing more than one valid alignment..)

Could you please correct my understanding of how Bismark reports alignments and anything I wrote that might be off..

Thanks a lot!!

Amira
akramdi is offline   Reply With Quote
Old 10-03-2017, 02:51 AM   #635
fkrueger
Senior Member
 
Location: Cambridge, UK

Join Date: Sep 2009
Posts: 597
Default

Hi Amira,

I think that reads that produce more that one valid alignment should be reported based on Bowtie2 default mode (not all alignments, only one). How does having the same number of lowest mismatches affect how these reads are reported?

>> Just generally, if a read aligns with Bowtie2 it will get an alignment score (the AS:i field):

Code:
AS:i:<N>	Alignment score. Can be negative. Only present if SAM record is for an aligned read.
If a read has a second alignment, the alignment score of the second best alignment is reported as well:

Code:
XS:i:<N>	Alignment score for the best-scoring alignment found other than the alignment reported. Can be negative. Only present if the SAM record is for an aligned read and more than one alignment was found for the read. Note that, when the read is part of a concordantly-aligned pair, this score could be greater than AS:i.

So if a read has an alignment which is at least equally good as the best alignment, the read is considered ambiguous, and as such written out to the ambiguous FastQ files. In Bismark we do not want to go down the route of reporting a random alignment if there are several different (equally good) alignments, because if we canít be sure where a read came from we also donít want to assign a methylation status to that region. If you are interested in seeing where in the genome a read aligned to you might want to consider using the option --ambig_bam (even if this this procedure won't give you any methylation information).

Does this help clearing things up?
fkrueger is offline   Reply With Quote
Old 10-03-2017, 08:54 AM   #636
akramdi
Junior Member
 
Location: Paris, France

Join Date: Mar 2016
Posts: 9
Default

Hi Felix,

Thanks for the quick reply,

Quote:
So if a read has an alignment which is at least equally good as the best alignment, the read is considered ambiguous, and as such written out to the ambiguous FastQ files
This statement clears things up for me. To make things clearer, here are the cases I was wondering about when using Bismark and how I see things now:

- If AS and XS are equally good ==> Read is considered ambiguous because Bismark does not report random alignments for the reason you stated.
- If AS is better than XS ==> Read is not considered ambiguous because, even though it produces multiple alignments, the reported one is the best and the rest are not equally good. Right?
- If XS is better than AS ==> not sure whether that's possible because I would expect the reported alignment to be the best (unless the read is part of a concordantly-aligned pair, says Bowtie2 manual). How do you deal with this case?

Good to know about "--ambig_bam" option, I've been working with an older version of Bismark so I didn't see it.. I've updated now.
akramdi is offline   Reply With Quote
Old 10-04-2017, 05:39 AM   #637
fkrueger
Senior Member
 
Location: Cambridge, UK

Join Date: Sep 2009
Posts: 597
Default

Hi Amira,

Your understanding is now spot-on! For the last point: For paired-end reads we always use the sum of the alignment scores (AS) and sum of the next best alignment scores (XS) of both reads to make decisions. So yes, the best reported hit is supposed to be the best hit for the entire read pair even though this might occasionally look different on a per-read basis.

All the best, Felix
fkrueger is offline   Reply With Quote
Old 10-04-2017, 06:59 AM   #638
akramdi
Junior Member
 
Location: Paris, France

Join Date: Mar 2016
Posts: 9
Default

Thank you very much Felix!
akramdi is offline   Reply With Quote
Old 10-20-2017, 07:54 AM   #639
pig_raffles
Member
 
Location: Sheffield, UK

Join Date: Feb 2012
Posts: 15
Default

Hi,

I am having trouble getting the bismark2bedGraph software in the Bismark package to run. I have successfully aligned my RRBS reads to my reference genome and then used bismark_methylation_extractor to generate CpG methylation call files

As my reference genome is in the forms of scaffolds rather than chromosomes, I have used the --scaffolds command

However, when i attempt to run the program (v.0.17.0), I get the following error message output:

Using these input files: CpG_context_G08F07.txt

Summary of parameters for bismark2bedGraph conversion:
======================================================
bedGraph output: G08F07_Astato.bedGraph.gz
output directory: >xxxx/epigenetics_project/Alignment/Bismark_min20/Methylation_reports/Comp/Bedgraph/<
remove whitespaces: no
CX context: no (CpG context only, default)
No-header selected: no
Sorting method: Unix sort-based (smaller memory footprint, but slower)
Sort buffer size: 2G
Coverage threshold: 1
=============================================================================
Methylation information will now be written into a bedGraph and coverage file
=============================================================================

Using the following files as Input:
Writing bedGraph to file: G08F07_Astato.bedGraph.gz
Also writing out a coverage file including counts methylated and unmethylated residues to file: G08F07_Astato.bismark.cov.gz

Changed directory to xxxx/epigenetics_project/Alignment/Bismark_min20/Methylation_reports/Comp/Bedgraph/
The genome of interest was specified to contain gazillions of chromosomes or scaffolds. Sorting everything in memory instead of writing out individual chromosome files ...
Sorting input file CpG_context_G08F07.txt by positions (using -S of 2G)
sort: open failed: CpG_context_G08F07.txt: No such file or directory
Died at /panfs/panasas01/bisc/ah15824/bin/Bismark-master/bismark2bedGraph line 486.


If I am interpreting the error message correctly it seems to be the sort command that is failing. I have read that this may be due to the format of the scaffold names. My CpG_context.txt file format is:

NB501345:32:HVVJLBGXY:1:11101:23754:1048_1:N:0:CACGAT + 000352F|quiver 66253 Z


I have tried removing the F|quiver part of the scaffold name but I still get the same error message

Any suggestions as to how to solve this problem would be gratefully received

Thanks!
pig_raffles is offline   Reply With Quote
Old 10-20-2017, 08:05 AM   #640
fkrueger
Senior Member
 
Location: Cambridge, UK

Join Date: Sep 2009
Posts: 597
Default

Hi there,

Could you please update to the latest version (0.19.0; https://github.com/FelixKrueger/Bismark/releases) to see if the problem is still there? If you are still seeing the same problem can you please send me an email with some part of your CpG data and a information on the genome used.

Cheers, Felix
fkrueger is offline   Reply With Quote
Reply

Tags
alignment, bisulfite, bisulphite, methylation, sequencing

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 04:01 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2017, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO