Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • CpG island detection

    Dear All,
    We are a small research group who are working on NGS data analysis and Epigenomics. In Epigenomics, our research focus is CpG island detection. We are currently researching methods to automatically detect CpG islands. However, we have the following questions and we would appreciate any feedback in this matter:
    1. What is the ground truth for CpG islands? We have looked at several datasets but they seem to provide locations as detected by their software (example, EMBOSS by EBI). Clearly, these cannot be used as ground truth when we are developing newer methods. Could any of you shed light on this matter and suggest a good data set with an accompanying ground truth?

    2. In an automatic detection scenario, how harmful is the detection of false positives in CpG islands?

    We want to thank each one of you in advance for any help you can provide in this matter.

  • #2
    Originally posted by asb2718 View Post
    1. What is the ground truth for CpG islands? We have looked at several datasets but they seem to provide locations as detected by their software (example, EMBOSS by EBI). Clearly, these cannot be used as ground truth when we are developing newer methods. Could any of you shed light on this matter and suggest a good data set with an accompanying ground truth?
    You should look at the work of Adrian Bird's group. They have generated a set of functional CpG islands which aren't based on sequence analysis. We've been using this set for much of our analysis and have found that many of the islands they detect, but which are missed by traditional algorithms are functionally interesting.

    Comment


    • #3
      Is there a way to get the CpG islands described by Illingworth et al, PLoS Biol. I did find this in Ensembl browser as a Misc track (CPG island clones), but cannot figure out a way to download the whole file, after trying all day. Is there was a simple way just to get a bed file for these CGIs? Any help would be great, thanks.

      Comment


      • #4
        You might also be interested in work done in Rafael Irizarry's lab. Their method is based on sequence analysis using a statistical procedure called a hidden Markov model to define CpG islands, rather than the heuristic definition given in the classic Gardiner-Garden and Frommer paper. The link includes references to the relevant papers as well as downloadable CpG island definitions for several species using their definition. There is also code for generating CpG islands for other organisms.
        Pete

        Comment


        • #5
          Originally posted by kshankar View Post
          Is there a way to get the CpG islands described by Illingworth et al, PLoS Biol. I did find this in Ensembl browser as a Misc track (CPG island clones), but cannot figure out a way to download the whole file, after trying all day. Is there was a simple way just to get a bed file for these CGIs? Any help would be great, thanks.
          We've certainly got a file with all of these in but it was a while back so I'd need to go back to see how we got them in the first place. I don't think we pulled them from Ensembl (we usually download these kinds of tracks through table browser at UCSC but I'm not sure if that was the case with this data). If all else fails I can stick our copy up on our website if you like?

          Comment

          Latest Articles

          Collapse

          • seqadmin
            Essential Discoveries and Tools in Epitranscriptomics
            by seqadmin




            The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
            04-22-2024, 07:01 AM
          • seqadmin
            Current Approaches to Protein Sequencing
            by seqadmin


            Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
            04-04-2024, 04:25 PM

          ad_right_rmr

          Collapse

          News

          Collapse

          Topics Statistics Last Post
          Started by seqadmin, Today, 08:47 AM
          0 responses
          12 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-11-2024, 12:08 PM
          0 responses
          60 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-10-2024, 10:19 PM
          0 responses
          59 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-10-2024, 09:21 AM
          0 responses
          54 views
          0 likes
          Last Post seqadmin  
          Working...
          X