SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
CisGenome -- an integrated tool for ChIP-seq data analysis hji Bioinformatics 66 12-30-2014 01:55 PM
Bismark Bisulfite Aligner - Now supporting CpG, CHG and CHH context fkrueger Bioinformatics 27 10-11-2013 05:40 AM
Bismark v0.6.beta1: Now supporting gapped Bisulfite-Seq alignments fkrueger Bioinformatics 6 03-19-2012 05:06 AM
Apparent duplication levels incongruence between bismark and fastqc with BS-Seq data gcarbajosa Bioinformatics 2 12-13-2011 08:43 AM

Reply
 
Thread Tools
Old 10-22-2017, 01:11 AM   #641
fkrueger
Senior Member
 
Location: Cambridge, UK

Join Date: Sep 2009
Posts: 609
Default

Hi Alan,

Thanks for making the files available. I have just tried to run your files with the following command line:

Code:
bismark2bedGraph --dir output_dir -o test --scaffold *G10G06*
And it finished just fine within a few minutes:

Created output directory output_dir/
Using these input files: CpG_OB_G10G06.sam.gz.txt CpG_OT_G10G06.sam.gz.txt

Summary of parameters for bismark2bedGraph conversion:
======================================================
bedGraph output: test.gz
output directory: >/bi/home/fkrueger/Bismark_issues_3rd_parties/Alan_RRBS/output_dir/<
remove whitespaces: no
CX context: no (CpG context only, default)
No-header selected: no
Sorting method: Unix sort-based (smaller memory footprint, but slower)
Sort buffer size: 2G
Coverage threshold: 1
=============================================================================
Methylation information will now be written into a bedGraph and coverage file
=============================================================================

Using the following files as Input:
/bi/home/fkrueger/Bismark_issues_3rd_parties/Alan_RRBS/CpG_OB_G10G06.sam.gz.txt /bi/home/fkrueger/Bismark_issues_3rd_parties/Alan_RRBS/CpG_OT_G10G06.sam.gz.txt

Writing bedGraph to file: test.gz
Also writing out a coverage file including counts methylated and unmethylated residues to file: test.gz.bismark.cov.gz

Changed directory to /bi/home/fkrueger/Bismark_issues_3rd_parties/Alan_RRBS/output_dir/
The genome of interest was specified to contain gazillions of chromosomes or scaffolds. Merging all input files and sorting everything in memory instead of writing out individual chromosome files...
Writing all merged methylation calls to temp file test.gz.methylation_calls.merged

Finished writing methylation calls from CpG_OB_G10G06.sam.gz.txt to merged temp file
Finished writing methylation calls from CpG_OT_G10G06.sam.gz.txt to merged temp file
Sorting input file test.gz.methylation_calls.merged by positions (using -S of 2G)
Successfully deleted the temporary input file test.gz.methylation_calls.merged

Since the error message is actually:
sort: open failed: CpG_context_G08F07.txt: No such file or directory

Is there a possibility that you gave it the wrong file as input file by accident?
fkrueger is offline   Reply With Quote
Old 10-24-2017, 10:39 AM   #642
pig_raffles
Member
 
Location: Sheffield, UK

Join Date: Feb 2012
Posts: 15
Default

Hi Felix,

I managed to get the script to work by having all the files (input, output) in the same directory as the bismark2Bedgraph script.

I could not find anything wrong with the file locations/file names that I had used previously but it works now!

Thanks again for your time and assistance
pig_raffles is offline   Reply With Quote
Old 09-02-2018, 06:54 PM   #643
daisyko
Junior Member
 
Location: Hong Kong

Join Date: Aug 2018
Posts: 2
Default

Quote:
Originally Posted by fkrueger View Post
Hi seqfast,

When you are generating and sequencing PCR amplified regions it is quite common that you see only the top strand (with both the OT and CTOT strands) or the bottom strand (with both OB and CTOB as in your case), depending on which strand you targeted when designing the primers.

To help getting your head around this it really helps to draw the sequence of an amplicon of interest out on a sheet of paper. You should see that the bisulfite treatment will change one of the two strands so much that the oligos will only amplify one of the two strands, and this PCR product is then usually sequenced from both sides, e.g. the CTOB and OB strands, but this may be different depending on how the library preparation was done.

So in short: PCR amplicons are not normally directional libraries but you only sequence both versions of either the top or the bottom strand, depending on how the primers were designed. I hope this helps, let me know if I can be of any further assistance. Cheers, Felix

Hi Felix or any other experts,

I encountered similar problem which most of the reads from a paired-end library aligned to the bottom strand after running Bismark (with either directional option or with the non-directional option). I would like to know that how we could process the further analysis and how we would calculate the coverage for each site if the reads from bottom strands and from top strands are extremely unbalanced.
Sorry for the inconvenience caused and hope to get a response from any of you guys.

Best regards,
Daisy
daisyko is offline   Reply With Quote
Old 09-04-2018, 02:19 AM   #644
fkrueger
Senior Member
 
Location: Cambridge, UK

Join Date: Sep 2009
Posts: 609
Default

Hi Daisy,

Were you expecting to see imbalances in your library, i.e. was it some kind of target enrichment or (PCR) amplification library? Depending on what you expect from the library design you should only analyse alignments from the strands you are expecting, so if you only expect OT or OB alignments you should not perform non-directional alignments.

If your library strategy was designed in a way that you only amplified or pulled down one the two strands prior to preparing the libraries you will indeed only see the methylation of either the top or the bottom strands, but not both. The coverage in that case would simply be the number of reads you see for a region (unless you used UMIs, in which case you could also take uniqueness of fragments into account...).
fkrueger is offline   Reply With Quote
Old 09-04-2018, 04:31 AM   #645
daisyko
Junior Member
 
Location: Hong Kong

Join Date: Aug 2018
Posts: 2
Default

Quote:
Originally Posted by fkrueger View Post
Hi Daisy,

Were you expecting to see imbalances in your library, i.e. was it some kind of target enrichment or (PCR) amplification library? Depending on what you expect from the library design you should only analyse alignments from the strands you are expecting, so if you only expect OT or OB alignments you should not perform non-directional alignments.

If your library strategy was designed in a way that you only amplified or pulled down one the two strands prior to preparing the libraries you will indeed only see the methylation of either the top or the bottom strands, but not both. The coverage in that case would simply be the number of reads you see for a region (unless you used UMIs, in which case you could also take uniqueness of fragments into account...).
Dear Felix,

Thanks a lot for answering.

So I think in that case my enrichment sequencing is targeted at one strand, I should only look into that strand. That really solves my problem
daisyko is offline   Reply With Quote
Old 09-04-2018, 10:08 AM   #646
shawpa
Member
 
Location: Pittsburgh

Join Date: Aug 2011
Posts: 71
Default divergent CpG dinucleotide methylation

I am not sure if this is the proper thread to post this on but since I am using bismark output, I thought I would ask here. I have very low coverage WGBS data. Ideally I would have higher coverage and/or more samples per group but right now, I only have 2 samples I am comparing with low coverage. I was trying to see if I could identify any DMR using a sliding window analysis. After I did that calculation, I wanted to see what the individual sites in those regions looked like in terms of per CpG methylation. What I think I am seeing in many cases is the same CpG site but the forward and reverse strand so it appears 1 base apart. However, they are showing pretty different methylation values at what should be the same CpG site. It also shows really different coverage values too. I assume this is some kind of mapping bias but I am not sure. Is what I am seeing "normal" for this type of data or should it be more consistent at the sample CpG site? I am having trouble trusting my window analysis when I see the methylation values jump around so wildly. Attached is a text file with all the sites in 1 region of my 2 samples.
Attached Files
File Type: txt sample_region.txt (4.3 KB, 2 views)
shawpa is offline   Reply With Quote
Old 09-05-2018, 03:06 AM   #647
fkrueger
Senior Member
 
Location: Cambridge, UK

Join Date: Sep 2009
Posts: 609
Default

Just generally, in order to get reads for both the forward and reverse strands you need to have a fairly good sequencing depth and have a library with a good complexity. While your sample 1 has several CpG dinucleotides covered on both strands, Sample 2 has hardly any covered on more than one strand.

By just glancing at over the values in your example the values don't seem to be differing all that wildly to be honest. If you for example take the example:

Code:
224012575	224012575	10	50	50
224012576	224012576	3	66.66666667	33.33333333
Here the 2 positions don't agree perfectly but with 3 reads in total there simply are only a limited number of percentages you can achieve, namely 0, 33, 66 or 100%. There will always be a problem to match numbers perfectly if you have a shallow read depth, but this is why you need to average the values over larger distances to even those limitations out.
fkrueger is offline   Reply With Quote
Reply

Tags
alignment, bisulfite, bisulphite, methylation, sequencing

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 04:55 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2018, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO