Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • asb2718
    Junior Member
    • Mar 2011
    • 4

    CpG island detection

    Dear All,
    We are a small research group who are working on NGS data analysis and Epigenomics. In Epigenomics, our research focus is CpG island detection. We are currently researching methods to automatically detect CpG islands. However, we have the following questions and we would appreciate any feedback in this matter:
    1. What is the ground truth for CpG islands? We have looked at several datasets but they seem to provide locations as detected by their software (example, EMBOSS by EBI). Clearly, these cannot be used as ground truth when we are developing newer methods. Could any of you shed light on this matter and suggest a good data set with an accompanying ground truth?

    2. In an automatic detection scenario, how harmful is the detection of false positives in CpG islands?

    We want to thank each one of you in advance for any help you can provide in this matter.
  • simonandrews
    Simon Andrews
    • May 2009
    • 870

    #2
    Originally posted by asb2718 View Post
    1. What is the ground truth for CpG islands? We have looked at several datasets but they seem to provide locations as detected by their software (example, EMBOSS by EBI). Clearly, these cannot be used as ground truth when we are developing newer methods. Could any of you shed light on this matter and suggest a good data set with an accompanying ground truth?
    You should look at the work of Adrian Bird's group. They have generated a set of functional CpG islands which aren't based on sequence analysis. We've been using this set for much of our analysis and have found that many of the islands they detect, but which are missed by traditional algorithms are functionally interesting.

    Comment

    • kshankar
      Member
      • Jul 2010
      • 12

      #3
      Is there a way to get the CpG islands described by Illingworth et al, PLoS Biol. I did find this in Ensembl browser as a Misc track (CPG island clones), but cannot figure out a way to download the whole file, after trying all day. Is there was a simple way just to get a bed file for these CGIs? Any help would be great, thanks.

      Comment

      • PeteH
        Member
        • Jun 2010
        • 64

        #4
        You might also be interested in work done in Rafael Irizarry's lab. Their method is based on sequence analysis using a statistical procedure called a hidden Markov model to define CpG islands, rather than the heuristic definition given in the classic Gardiner-Garden and Frommer paper. The link includes references to the relevant papers as well as downloadable CpG island definitions for several species using their definition. There is also code for generating CpG islands for other organisms.
        Pete

        Comment

        • simonandrews
          Simon Andrews
          • May 2009
          • 870

          #5
          Originally posted by kshankar View Post
          Is there a way to get the CpG islands described by Illingworth et al, PLoS Biol. I did find this in Ensembl browser as a Misc track (CPG island clones), but cannot figure out a way to download the whole file, after trying all day. Is there was a simple way just to get a bed file for these CGIs? Any help would be great, thanks.
          We've certainly got a file with all of these in but it was a while back so I'd need to go back to see how we got them in the first place. I don't think we pulled them from Ensembl (we usually download these kinds of tracks through table browser at UCSC but I'm not sure if that was the case with this data). If all else fails I can stick our copy up on our website if you like?

          Comment

          Latest Articles

          Collapse

          • SEQadmin2
            Nine Things a Sample Prep Scientist Thinks About Before Sequencing
            by SEQadmin2


            I’m not a sequencing expert. I’m a purification scientist who uses NGS to evaluate workflows my group develops. With this perspective, we think about the sample first and the NGS workflow second. The sequencer is an exceptionally honest reporter, but it can only report on what you give it, so whether you get clean, interpretable data from an NGS workflow is largely determined before you begin.

            Here are nine questions we think about, in roughly the order they matter, before...
            06-18-2026, 07:11 AM
          • SEQadmin2
            From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
            by SEQadmin2


            Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


            The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
            ...
            06-02-2026, 10:05 AM

          ad_right_rmr

          Collapse

          News

          Collapse

          Topics Statistics Last Post
          Started by SEQadmin2, Yesterday, 11:10 AM
          0 responses
          7 views
          0 reactions
          Last Post SEQadmin2  
          Started by SEQadmin2, 06-17-2026, 06:09 AM
          0 responses
          42 views
          0 reactions
          Last Post SEQadmin2  
          Started by SEQadmin2, 06-09-2026, 11:58 AM
          0 responses
          103 views
          0 reactions
          Last Post SEQadmin2  
          Started by SEQadmin2, 06-05-2026, 10:09 AM
          0 responses
          125 views
          0 reactions
          Last Post SEQadmin2  
          Working...