SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
FastQC,kmer content, per base sequence content: is this good enough mgg Bioinformatics 10 11-06-2013 11:45 PM
TopHat 1.3.1 coverage islands Irina Pulyakhina Bioinformatics 0 08-15-2011 03:24 AM
tophat islands information syslm01 Bioinformatics 0 08-11-2010 04:48 AM
CpG Islands vs CHIP-Seq data bogu0001 Epigenetics 4 02-05-2009 10:33 PM
CpG Islands Vs CHIP-Seq data bogu0001 Bioinformatics 0 02-04-2009 07:36 PM

Reply
 
Thread Tools
Old 11-02-2011, 02:57 PM   #1
HelenM
Junior Member
 
Location: New Zealand

Join Date: Nov 2011
Posts: 4
Default Programs for GC content and CpG Islands

Hi everyone,

I am interested in determining G+C rich regions in a whole genome sequence as well as identifying possible CpG Islands.

Can anyone recommend their favourite resources for either of these tasks?

So far, for G+C content, I have tried Picard's CollectGCBiasMetrics (doesn't give me the right info) and GATK's GCContentByInterval walker (gives me a persistent error message) and I am just in the process of trying to run GCProfile.

If anyone has used the GCContentByInterval walker could you perhaps give me an example of your code so that I might be able to compare and see where mine is going wrong.

For CpG Islands I have found 'CpGIslands' but have not yet tried it.

I am new to programming so any help would be much appreciated.

Many thanks
Helen
HelenM is offline   Reply With Quote
Old 11-02-2011, 05:41 PM   #2
PeteH
Member
 
Location: Melbourne

Join Date: Jun 2010
Posts: 64
Default

If you are interested in identifying CpG islands I can recommend reading Wu et al. Biostatistics (2010) (http://www.ncbi.nlm.nih.gov/pubmed/20212320). The paper argues that some common definitions of CpG islands are too restrictive (such as the definition used by the UCSC genome browser). The authors develop a hidden Markov model to define CpG islands for arbitrary genomes.

The paper is accompanied by software that implements their method and tables of pre-computed CpG islands using their software for many popular genomes (see http://rafalab.jhsph.edu/CGI/index.html).
Pete
PeteH is offline   Reply With Quote
Old 11-02-2011, 06:11 PM   #3
HelenM
Junior Member
 
Location: New Zealand

Join Date: Nov 2011
Posts: 4
Default

Pete,

Great, I think this will be very useful indeed!
I had been trying to find an existing set of CpG Islands for Bos taurus as well.
Many thanks!
HelenM is offline   Reply With Quote
Old 12-05-2011, 05:16 AM   #4
jamal
Member
 
Location: Denmark

Join Date: Jan 2010
Posts: 10
Default

Hi Helen

I used "makeCGI" for Sus scrofa and get .rda file in the result folder. I want to know that if you used this software for Bos taurus and how you extract the result from .rda file.
thank you in advance

Jamal
jamal is offline   Reply With Quote
Old 12-05-2011, 05:36 AM   #5
cjp
Member
 
Location: Cambridge, United Kingdom

Join Date: Jun 2011
Posts: 58
Default

The GATK command worked for me (did you make the picard ".dict" file for your reference fasta file?):

% java -Xmx2g -Djava.io.tmpdir=/path/to/tmp -jar /path/to/GenomeAnalysisTK-1.1-23-g8072bd9/GenomeAnalysisTK.jar -T GCContentByInterval -R /path/to/human_g1k_v37.fasta -L 1:1-100000 -o chr1_1_100000_gc.txt

...

% cat chr1_1_100000_gc.txt
1:1-100000 0.38207

Chris
cjp is offline   Reply With Quote
Old 12-05-2011, 06:08 AM   #6
jamal
Member
 
Location: Denmark

Join Date: Jan 2010
Posts: 10
Default

Hi chris

I didn't make the picard file for my genome. please tell me how can I do that.
and plaese tell me more about GATK.

thanks alot

Jamal
jamal is offline   Reply With Quote
Old 12-05-2011, 06:36 AM   #7
cjp
Member
 
Location: Cambridge, United Kingdom

Join Date: Jun 2011
Posts: 58
Default

There is a link here about making the picard dict file for GATK:

http://www.broadinstitute.org/gsa/wi...ference_genome

Download the latest picard from here into a new directory (for me $HOME/src on a Linux machine) and unzip it:

http://sourceforge.net/projects/pica...d?source=files

Something like this works for me:

java -jar /home/cjp64/src/picard-tools-1.53/CreateSequenceDictionary.jar R=/data/refs/archive/hg19/bowtie/hg19.fasta O=/data/refs/archive/hg19/bowtie/hg19.dict

GATK help starts here (it's on many pages though and is more for doing SNP calls):

http://www.broadinstitute.org/gsa/wi...alysis_Toolkit

Chris
cjp is offline   Reply With Quote
Old 05-21-2013, 04:07 AM   #8
oria34
Member
 
Location: Finland

Join Date: Feb 2013
Posts: 15
Default

Hi all,

Did anyone try "makeCGI" recently?

I am having some problems with this package.

First, It finds a lot of troubles reading chromosome/scaffold headers from the the fasta files and crash. I reduced the headers just to chromosome/scaffold (deleting the rest of the stuff) name and it seemed to work but then crashed with a new warning message:

Warning message:
In rm(pattern = "Ngc") : object 'Ngc' not found

Apparently, It doesn't like too much to find "Ns" along the sequence.

IT creates the result file but apparently it is empty.

Any suggestions? I am really new with all these stuff so any advice will be very welcome

Thanks in advance

jamal, Maybe is a bit late, but I have found this to convert RDA to CSV I though it might be useful for other people
oria34 is offline   Reply With Quote
Old 09-21-2014, 07:21 PM   #9
jfeicheng
Junior Member
 
Location: shanghai

Join Date: Feb 2014
Posts: 2
Default makeCGI:object 'Ngc' not found

Hi
I've tried this program recently, but I met the same problem like you.

Warning message:
In rm(pattern = "Ngc") : object 'Ngc' not found

I want to know if you find any solutions for this program.
Thank you in advance.

Quote:
Originally Posted by oria34 View Post
Hi all,

Did anyone try "makeCGI" recently?

I am having some problems with this package.

First, It finds a lot of troubles reading chromosome/scaffold headers from the the fasta files and crash. I reduced the headers just to chromosome/scaffold (deleting the rest of the stuff) name and it seemed to work but then crashed with a new warning message:

Warning message:
In rm(pattern = "Ngc") : object 'Ngc' not found

Apparently, It doesn't like too much to find "Ns" along the sequence.

IT creates the result file but apparently it is empty.

Any suggestions? I am really new with all these stuff so any advice will be very welcome

Thanks in advance

jamal, Maybe is a bit late, but I have found this to convert RDA to CSV I though it might be useful for other people
jfeicheng is offline   Reply With Quote
Reply

Tags
cpg, g+c

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 02:36 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO