Seqanswers Leaderboard Ad

**rpauly** · 12-08-2011, 09:43 AM

It is still not working...I tried putting the bismark file within the folder /media/3TBpt1/bismark_intermediate_results/ too.

I will now try it with the new bismark version..

**fkrueger** · 12-08-2011, 09:55 AM

Bismark itself does not have to be in the analysis folder, but you need to start the analysis from withing that directory. Specifying relative or abolsute paths for filenames will cause it to fail.

cd /media/3TBpt1/bismark_intermediate_results/

and then

./bismark --path_to_bowtie /home/rini/bismark/bowtie-0.12.7 /home/rini/bismark/bowtie-0.12.7/genomes/ -1 1725-SB-5_1_sequence.fastq -2 1725-SB-5_2_sequence.fastq -o /home/rini/bismark/bismark_v0.5.4/

The same is also true for 0.6.beta1. Hope it'll work now.

**rpauly** · 12-08-2011, 10:30 AM

I am within the folder..it does not seem to work!

**fkrueger** · 12-08-2011, 10:33 AM

Can you please send me the exact command you are using via email to
[email protected] (including the error message)? Cheers, Felix

**shawpa** · 12-14-2011, 05:53 AM

I am having a lot of issues with the code alignment. I have tried many different things. I think the problem is that (1)I don't understand what folder to be in to execute the command and (2) sometimes it wants me to put </../../> or sometimes just /.../.../

The following is something I tried. Please help if you could.

/usr/local/bin/bismark_v0.6.beta1/bismark --path_to_bowtie /usr/local/bin/bowtie /mnt/DATA/Cores/hiseq2000/Homo_sapiens_UCSC_hg19/Homo_sapiens/UCSC/hg19/Sequence/Chromosomes/ -1 /mnt/DATA/Cores/hiseq2000/111123_SN874_0071_AD0F4CACXX/unaligned/Project_kinome_WGBS/Sample_22647_BS/read_1/22647_BS_GATCAG_L002_R1_001.fastq -o /mnt/DATA/Cores/hiseq2000/111123_SN874_0071_AD0F4CACXX/bismark/

**fkrueger** · 12-14-2011, 07:15 AM

It is really quite simple, the input files which you specify with -1 and -2 must not contain full path information as this will screw up the naming of the output files. (in your example you might have forgotten to specify -2 in addition to that).

For this to work you need to be in the folder containing the files to be aligned (e.g. cd /mnt/DATA/Cores/hiseq2000/111123_SN874_0071_AD0F4CACXX/unaligned/Project_kinome_WGBS/Sample_22647_BS/read_1/) and then use:

-1 file1.fastq -2 fastq.fastq

For everything else you should be able to use path information as well.

Hope this helps.

**sbst** · 12-30-2011, 08:30 AM

Two questions:

1. How does Bismark handle haplotype variation with regards to methylation. In other words, what happens to a methylation call when 10 reads show CpG methylation at a particular site, while 10 reads do not show it at the same site?

2. In our analysis, the number of C's analyzed in the Final Cytosine Methylation Report is 5X bigger than the total number of base pairs in the genome (all A,C,G,Ts). I think I am misunderstanding the output here. Why is this?

**fkrueger** · 12-30-2011, 01:36 PM

Originally posted by sbst View Post

Two questions:

1. How does Bismark handle haplotype variation with regards to methylation. In other words, what happens to a methylation call when 10 reads show CpG methylation at a particular site, while 10 reads do not show it at the same site?

2. In our analysis, the number of C's analyzed in the Final Cytosine Methylation Report is 5X bigger than the total number of base pairs in the genome (all A,C,G,Ts). I think I am misunderstanding the output here. Why is this?

Hi sbst,

1. Bismark itself doesn't perform any sophisticated haplotype analysis. It will simply determine unique best alignments, and then perform its methylation call. For cytosine positions in the genome, and only for these, Bismark determines whether it was methylated (C in the read) or unmethylated (T in the read). Bases other than C or T at the position in question will be ignored.

2. The number of Cs analysed in total is simply summing up all cytosine positions for all reads for which a methylation call has been performed. The report is intended to provide a rough idea about the methylation state of the sample analysed and is totally independent of the genome used for the alignments.

I hope this helps,
Felix

**sbst** · 12-30-2011, 03:16 PM

Originally posted by fkrueger View Post

Hi sbst,

1. Bismark itself doesn't perform any sophisticated haplotype analysis. It will simply determine unique best alignments, and then perform its methylation call. For cytosine positions in the genome, and only for these, Bismark determines whether it was methylated (C in the read) or unmethylated (T in the read). Bases other than C or T at the position in question will be ignored.

2. The number of Cs analysed in total is simply summing up all cytosine positions for all reads for which a methylation call has been performed. The report is intended to provide a rough idea about the methylation state of the sample analysed and is totally independent of the genome used for the alignments.

I hope this helps,
Felix

Thanks Felix. That's helpful! So at any specific cytosine, if there are 10 calls as methylated and 10 calls as not methylated (from a total of 20 reads), then it will have a methylation status of 50%. If this is correct, then I totally understand now.

**fkrueger** · 12-30-2011, 03:25 PM

That's indeed right, such a position would have an overall methylation rate of 50%. Bismark itself determines the methylation only on a read-by-read basis, so the actual quantitation would be be accomplished by your analysis program (or script) of choice.

**fkrueger** · 01-04-2012, 05:11 AM

New Bismark version 0.6.3

We have just released a new version of Bismark (v0.6.3) and updated its documentation extensively to account for the recent changes that arose from implementing Bowtie 2 and changing the default output format to SAM.

Main changes:

- The methylation extractor does now also work with Bismark SAM output files
- Fixed a bug caused when a read was called 0 (zero)
- Changed the XX:Z mismatch field in the SAM output to display mismatching nucleotides of the reference sequence (instead of the read sequence ones)

More information can be found here or on the Bismark project page.

**wilhelml** · 01-18-2012, 06:11 PM

Hi Felix,

Since v0.6.3 now produces SAM files, do you see any reason why I can't use samtools rmdup or picard to remove alignment duplicates? Would it be better to output to vanilla format and use the old de-duplicate script? With SAM it will be necessary to convert to BAM, run rmdup, then convert back to SAM to run the methylation extractor, so perhaps running vanilla output is best way to go if you want to remove alignment duplicates.

Apologies if this question has been asked already.

Larry W.

**fkrueger** · 01-19-2012, 12:16 AM

Originally posted by wilhelml View Post

Hi Felix,

Since v0.6.3 now produces SAM files, do you see any reason why I can't use samtools rmdup or picard to remove alignment duplicates? Would it be better to output to vanilla format and use the old de-duplicate script? With SAM it will be necessary to convert to BAM, run rmdup, then convert back to SAM to run the methylation extractor, so perhaps running vanilla output is best way to go if you want to remove alignment duplicates.

Apologies if this question has been asked already.

Larry W.

Hi Larry,

I have adapted the de-duplication script to handle both SAM or vanilla output so there should be no need run it in vanilla mode just for this reason. Basically, I would imagine that rmdup or Picard could also be used for deduplication, I just didn't want to have an out-of-date version of the deduplication script floating around. I am not quite sure whether they would get confused by the somewhat unusual FLAG tags which are used for paired-end BS-Seq files. I have not compared the outputs, but it would certainly be worth a try.

Best,
Felix

**fkrueger** · 02-06-2012, 08:16 AM

We have just released a new version of Bismark (v0.6.4) to address a few minor issues.

The changes include:

- Adjusted the options -u and -s so that only the non-skipped part of the input file will be transcribed and analysed. This allows splitting up very large files into smaller chunks to allow parallel processing, e.g -s 10000000 -u 20000000 would analyse sequences 10000001 to 20000000. The alignment report will be based on this reduced number of reads analysed
- In paired-end mode, the options --unmapped and --ambiguous do now output unaligned or multiply aligned reads, respectively, to their correct output files as intended
- Sequences in FastA format do now receive Phred score qualities of 40 throughout (ASCII 'I') to prevent the SAM to BAM conversion in SAMtools from failing
- If a genomic sequence could not be extracted it will now also be counted and reported for use with Bowtie 1
- Suppressed debugging warning meassages that were printed in error for Bowtie2 alignments (single-end mode only)

Bismark is available here.

**fwessely** · 02-13-2012, 04:26 AM

I have some questions regarding the directional (strand-specific) vs non-directional protocol.

1. As far as a I understood, the reads from a non-directional library (based on the Cokus protocol) have some sequence tags left (FW reads: TCTGT and RC reads: TCCAT).
Do I have to remove these tags in all the reads before using Bismark? Or does Bismark handle them internally, perhaps, by using their information to find the correct alignment on the correct strand? I think BS Seeker does exploit this.

2. What about reads from a non-directional library that do not have such sequence tags. I do not know why some reads (it seems to be the minority) do not have these tags. Are they artefacts?

3. Is there a way to infer the underlying experimental protocol (directional vs non-directional), if it is not clear from the information of the metadata of the sequencing run?
One idea would be to scan the reads for these tags. However, I do not know if these tags are always present in the raw data based on non-directional protocols.
A workaround could be to run Bismark with its current default of having a directional library, then check in the summary report whether there are lots of rejected alignments to the complementary strands indicating a non-directional protocol.
Should there be approximately equal amounts of alignments to all the four strands in case of a non-directional library?

Any comments highly appreciated.

Topics	Statistics	Last Post
New Model Aims to Explain Polygenic Diseases by Connecting Genomic Mutations and Regulatory Networks by seqadmin Started by seqadmin, Yesterday, 05:31 AM	0 responses 10 views 0 likes	Last Post by seqadmin Yesterday, 05:31 AM
Small Blood Stem Cell Subset Linked to Immune System Aging by seqadmin Started by seqadmin, 10-24-2024, 06:58 AM	0 responses 20 views 0 likes	Last Post by seqadmin 10-24-2024, 06:58 AM
New AI Model Designs Synthetic DNA Switches for Targeted Gene Expression in Specific Cell Types by seqadmin Started by seqadmin, 10-23-2024, 08:43 AM	0 responses 50 views 0 likes	Last Post by seqadmin 10-23-2024, 08:43 AM
Microbes in Urban Spaces Adapt to Disinfectants and Scarce Resources by seqadmin Started by seqadmin, 10-17-2024, 07:29 AM	0 responses 58 views 0 likes	Last Post by seqadmin 10-17-2024, 07:29 AM

Seqanswers Leaderboard Ad

Announcement

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News