SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
RNA-seq file format of the near future merilius RNA Sequencing 1 03-20-2015 03:13 AM
Bisulfite seq file format Marisa_Miller Bioinformatics 1 02-04-2013 11:08 AM
problem on file format in ChIP-Seq data analysis sp_wade Bioinformatics 5 03-08-2012 05:43 AM
format problem:convert fastq to seq/qual file anyone1985 Bioinformatics 1 04-10-2009 08:27 AM

Reply
 
Thread Tools
Old 10-25-2013, 07:12 AM   #1
Mimmy86
Junior Member
 
Location: Italy

Join Date: Oct 2013
Posts: 3
Default Bisulfite seq file format, help!

Hi everyone,
I'm new in methylation analysis and I downloaded a public bisulfite seq data, but I cannot tell what the file format is .
The file ends in a bs-call.basecall.

The file contents is:
chr1 131398 CC 12 0 -
chr1 131399 CC 12 0 -
chr1 131400 CC 13 0 +
chr1 131401 CC 13 0 +
chr1 131402 CG 2 11 +
chr1 131403 CG 4 10 -
chr1 131404 CA 13 0 +
chr1 131407 CC 13 0 +
chr1 131408 CC 13 0 +
chr1 131409 CC 15 0 +
chr1 131410 CA 15 0 +
chr1 131412 CA 15 0 +


Do you know this kind of file?

Thank a lot!!!!!
Mimmy

Last edited by Mimmy86; 10-25-2013 at 07:20 AM.
Mimmy86 is offline   Reply With Quote
Old 10-25-2013, 10:58 AM   #2
dpryan
Devon Ryan
 
Location: Freiburg, Germany

Join Date: Jul 2011
Posts: 3,480
Default

It would help if you gave a URL. Particularly if this is a GEO dataset, there's probably a description somewhere. Having said that, it looks like: chromosome, position, context, unmethylated count, methylated count, strand. Context gives the nucleotide position following the C in question (these days, you'd see CpG, CHG, or CHH rather than what you have).
dpryan is offline   Reply With Quote
Old 10-25-2013, 11:24 AM   #3
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 7,060
Default

From this GEO record: http://www.ncbi.nlm.nih.gov/geo/quer...?acc=GSM922329 comes following tidbit

Quote:
Supplementary_files_format_and_content: Methylation calls files for C's in the bsmap alignment files were generated using methratio.py
Seems to be similar to what Mimmy86 is reporting.
GenoMax is offline   Reply With Quote
Old 10-25-2013, 11:07 PM   #4
Mimmy86
Junior Member
 
Location: Italy

Join Date: Oct 2013
Posts: 3
Default

Hi,
thaks for your replay!
As you said i took this data from GEO GSM922329.
I examined methratio.py manual in bsmap and I find an output file description that confused me.
The description is
Quote:
Output format: tab delimited txt file with the following columns:
1) chromorome
2) coordinate (1-based)
3) strand
4) sequence context (2nt upstream to 2nt downstream in Watson strand direction)
5) methylation ratio, calculated as #C_counts / #eff_CT_counts
6) number of effective total C+T counts on this locus (#eff_CT_counts)
CT_SNP="no action", #eff_CT_counts = #CT_counts
CT_SNP="correct", #eff_CT_counts = #CT_counts * (#rev_G_counts / #rev_GA_counts)
7) number of total C counts on this locus (#C_counts)
8) number of total C+T counts on this locuso (#CT_counts)
9) number of total G counts on this locus of reverse strand (#rev_G_counts)
10) number of total G+A counts on this locus of reverse strand (#rev_GA_counts)
11) lower bound of 95% confidence interval of methylation ratio, calculated by Wilson score interval for binomial proportion.
12) upper bound of 95% confidence interval of methylation ratio, calculated by Wilson score interval for binomial proportion.
Mimmy86 is offline   Reply With Quote
Old 10-26-2013, 05:40 AM   #5
fkrueger
Senior Member
 
Location: Cambridge, UK

Join Date: Sep 2009
Posts: 624
Default

If the description on GEO (...produced by methratio.py) and the actual format are not matching you should probably contact the authors directly (also so that they can update the description on GEO). Looking at the file I am pretty sure though that Devon's assessment is correct.
fkrueger is offline   Reply With Quote
Old 10-26-2013, 05:51 AM   #6
Mimmy86
Junior Member
 
Location: Italy

Join Date: Oct 2013
Posts: 3
Default

thanks, I also think that Devon's assessment is correct. I tried to calculate the Cmethylation frequency using the 4th and 5th columns. do you think that i can continue on this frequencies?
Mimmy86 is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 12:38 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO