Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Programs for GC content and CpG Islands

    Hi everyone,

    I am interested in determining G+C rich regions in a whole genome sequence as well as identifying possible CpG Islands.

    Can anyone recommend their favourite resources for either of these tasks?

    So far, for G+C content, I have tried Picard's CollectGCBiasMetrics (doesn't give me the right info) and GATK's GCContentByInterval walker (gives me a persistent error message) and I am just in the process of trying to run GCProfile.

    If anyone has used the GCContentByInterval walker could you perhaps give me an example of your code so that I might be able to compare and see where mine is going wrong.

    For CpG Islands I have found 'CpGIslands' but have not yet tried it.

    I am new to programming so any help would be much appreciated.

    Many thanks
    Helen

  • #2
    If you are interested in identifying CpG islands I can recommend reading Wu et al. Biostatistics (2010) (http://www.ncbi.nlm.nih.gov/pubmed/20212320). The paper argues that some common definitions of CpG islands are too restrictive (such as the definition used by the UCSC genome browser). The authors develop a hidden Markov model to define CpG islands for arbitrary genomes.

    The paper is accompanied by software that implements their method and tables of pre-computed CpG islands using their software for many popular genomes (see http://rafalab.jhsph.edu/CGI/index.html).
    Pete

    Comment


    • #3
      Pete,

      Great, I think this will be very useful indeed!
      I had been trying to find an existing set of CpG Islands for Bos taurus as well.
      Many thanks!

      Comment


      • #4
        Hi Helen

        I used "makeCGI" for Sus scrofa and get .rda file in the result folder. I want to know that if you used this software for Bos taurus and how you extract the result from .rda file.
        thank you in advance

        Jamal

        Comment


        • #5
          The GATK command worked for me (did you make the picard ".dict" file for your reference fasta file?):

          % java -Xmx2g -Djava.io.tmpdir=/path/to/tmp -jar /path/to/GenomeAnalysisTK-1.1-23-g8072bd9/GenomeAnalysisTK.jar -T GCContentByInterval -R /path/to/human_g1k_v37.fasta -L 1:1-100000 -o chr1_1_100000_gc.txt

          ...

          % cat chr1_1_100000_gc.txt
          1:1-100000 0.38207

          Chris

          Comment


          • #6
            Hi chris

            I didn't make the picard file for my genome. please tell me how can I do that.
            and plaese tell me more about GATK.

            thanks alot

            Jamal

            Comment


            • #7
              There is a link here about making the picard dict file for GATK:



              Download the latest picard from here into a new directory (for me $HOME/src on a Linux machine) and unzip it:



              Something like this works for me:

              java -jar /home/cjp64/src/picard-tools-1.53/CreateSequenceDictionary.jar R=/data/refs/archive/hg19/bowtie/hg19.fasta O=/data/refs/archive/hg19/bowtie/hg19.dict

              GATK help starts here (it's on many pages though and is more for doing SNP calls):



              Chris

              Comment


              • #8
                Hi all,

                Did anyone try "makeCGI" recently?

                I am having some problems with this package.

                First, It finds a lot of troubles reading chromosome/scaffold headers from the the fasta files and crash. I reduced the headers just to chromosome/scaffold (deleting the rest of the stuff) name and it seemed to work but then crashed with a new warning message:

                Warning message:
                In rm(pattern = "Ngc") : object 'Ngc' not found

                Apparently, It doesn't like too much to find "Ns" along the sequence.

                IT creates the result file but apparently it is empty.

                Any suggestions? I am really new with all these stuff so any advice will be very welcome

                Thanks in advance

                jamal, Maybe is a bit late, but I have found this to convert RDA to CSV I though it might be useful for other people

                Comment


                • #9
                  makeCGIbject 'Ngc' not found

                  Hi
                  I've tried this program recently, but I met the same problem like you.

                  Warning message:
                  In rm(pattern = "Ngc") : object 'Ngc' not found

                  I want to know if you find any solutions for this program.
                  Thank you in advance.

                  Originally posted by oria34 View Post
                  Hi all,

                  Did anyone try "makeCGI" recently?

                  I am having some problems with this package.

                  First, It finds a lot of troubles reading chromosome/scaffold headers from the the fasta files and crash. I reduced the headers just to chromosome/scaffold (deleting the rest of the stuff) name and it seemed to work but then crashed with a new warning message:

                  Warning message:
                  In rm(pattern = "Ngc") : object 'Ngc' not found

                  Apparently, It doesn't like too much to find "Ns" along the sequence.

                  IT creates the result file but apparently it is empty.

                  Any suggestions? I am really new with all these stuff so any advice will be very welcome

                  Thanks in advance

                  jamal, Maybe is a bit late, but I have found this to convert RDA to CSV I though it might be useful for other people

                  Comment

                  Latest Articles

                  Collapse

                  • seqadmin
                    Recent Advances in Sequencing Analysis Tools
                    by seqadmin


                    The sequencing world is rapidly changing due to declining costs, enhanced accuracies, and the advent of newer, cutting-edge instruments. Equally important to these developments are improvements in sequencing analysis, a process that converts vast amounts of raw data into a comprehensible and meaningful form. This complex task requires expertise and the right analysis tools. In this article, we highlight the progress and innovation in sequencing analysis by reviewing several of the...
                    05-06-2024, 07:48 AM
                  • seqadmin
                    Essential Discoveries and Tools in Epitranscriptomics
                    by seqadmin




                    The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
                    04-22-2024, 07:01 AM

                  ad_right_rmr

                  Collapse

                  News

                  Collapse

                  Topics Statistics Last Post
                  Started by seqadmin, Yesterday, 06:57 AM
                  0 responses
                  11 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 05-06-2024, 07:17 AM
                  0 responses
                  16 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 05-02-2024, 08:06 AM
                  0 responses
                  19 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 04-30-2024, 12:17 PM
                  0 responses
                  24 views
                  0 likes
                  Last Post seqadmin  
                  Working...
                  X