Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Mappability or "why my mapping is biased?"

    I found this track a very important piece of information to have in mind while mapping reads to the human genome. To be honest I have not found it discussed that much out there (I may be wrong ). I hope it is helpful for someone.



    I copy-pasted the most relevant information from the link. It belongs to a track called "Mapability or Uniqueness of Reference Genome" in the UCSC's Genome Browser
    Description

    To see it properly just go to the genome browser, go to your favourite locus and below, in the track list, it should appear.



    These tracks display the level of sequence uniqueness of the reference hg18 genome. They were generated using different window sizes and high signal will be found in areas where the sequence is unique.
    Methods

    The Broad alignability track displays whether a region is made up of mostly unique or mostly non-unique sequence. To generate the track, every 36-mer in the genome was marker as "unique" if the most similar 36-mer elsewhere in the genome have at most 2 mismatches, and as "non-unique" otherwise. Position X in the alignable track is marked by 1 if >50% of the bases in [X-200,X+200] are "unique" and by 0 otherwise. Every point in the alignable track has a corresponding position in each of the ChIP signal tracks. The Broad alignability track was generated for the ENCODE project as a tool for development of the Broad Histone tracks.

    The Duke uniqueness tracks display how unique is each sequence on the positive strand starting at a particular base and of a particular length. Thus, the 20 bp track reflects the uniqueness of all 20 base sequences with the score being assigned to the first base of the sequence. Scores are normalized to between 0 and 1 with 1 representing a completely unique sequence and 0 representing the sequence occurs >4 times in the genome (excluding chrN_random and alternative haplotypes). A score of 0.5 indicates the sequence occurs exactly twice, likewise 0.33 for three times and 0.25 for four times. The Duke uniqueness tracks were generated for the ENCODE project as tools in the development of the Open Chromatin tracks.

    The Duke excluded regions track displays genomic regions for which mapped sequence tags were filtered out before signal generation and peak calling for Duke/UNC/UTA's Open Chromatin tracks. This track contains problematic regions for short sequence tag signal detection (such as satellites and rRNA genes). The Duke excluded regions track was generated for the ENCODE project.

    The Rosetta uniqueness track uses sequence 'tiles' of 35 bp. Each tile was aligned to the genome using the BWA aligner. Tiles that align uniquely and perfectly in hg18 receive a p-value of 1e-37, while those that align perfectly in multiple locations receive a p-value of 0. For each tile, the oligo midpoint coordinate was recorded along with the -log_10 p-value: 37 (unambiguous) to 0 (ambiguous). The Rosetta uniqueness track was generated independently of the ENCODE project.

    The UMass uniqueness track displays a uniqueness signal for each base which represents the sum of both plus and minus strand 15-mer occurrences of that particular 5'->3' (plus strand) sequence throughout the genome. Scores are normalized between 0 and 1 by calculating ( 1 / N ) where N is the number of genome wide occurrences of the 15-mer starting at position X. A score of 1 represents a single genome wide occurrence of that 15-mer. A 0.5 would represent either 2 plus strand occurrences or 1 plus and 1 minus strand occurrence, and so on. Ratios are rounded to 3 significant digits. Therefore a 0.000 would represent > 2000 occurrences. A 0 is reserved for a given 15-mer that is either not assembled or contains at least one N at position X. The UMass uniqueness track was generated for the ENCODE project.
    Credits

    The Broad alignability track was created by the Broad Institute (contact: [email protected]). Data generation and analysis was supported by funds from the NHGRI (the ENCODE project), the Burroughs Wellcome Fund, Massachusetts General Hospital and the Broad Institute.

    The Duke uniqueness and Duke excluded regions tracks were created by Terry Furey (contact: [email protected]) and Debbie Winter at Duke Univerisity's Institute for Genome Sciences & Policy (IGSP); and Stefan Graf at the European Bioinformatics Insitute (EBI). We thank NHGRI for ENCODE funding support.

    The Rosetta uniqueness track was created by John Castle at Rosetta Inpharmatics (Merck) (contact: [email protected]), with assistance from Melissa Cline at UCSC.

    The UMass uniqueness track was created by Bryan Lajoie (contact: [email protected]) in Job Dekker's Lab at the University of Massachusetts Medical School. Funding Support: NIH grant HG003143 to JD. Keck Distinguished Young Scholar Award to JD. This track was generated as part of the ENCODE project funded by the NHGRI.
    Last edited by polivares; 08-28-2009, 06:47 AM. Reason: wrong link

  • #2
    Thanks, this is a very interesting resource

    Comment


    • #3
      I got the following error when I clicked on the link:

      Can't find wgEncodeMapability in track database ce6 chromosome chrX

      Please could you post the link again.

      Thanks

      Krys

      Comment


      • #4
        Anyway I would recommend to visit it directly
        Last edited by polivares; 08-26-2009, 08:01 AM. Reason: couldn't delete a repeated post

        Comment


        • #5
          This is what I found on UCSC. Perhaps this link works:
          --
          bioinfosm

          Comment


          • #6

            Comment


            • #7
              do they have a resource for the modencode?

              Comment


              • #8
                Hello everyone, I am beginner in bioinformatic stuffs, I would be appreciated if any one can help me in BEDTools and SAMTools.

                Comment


                • #9
                  I like BEDTools. I think it may be enough just to follow the manual that is very well written.

                  Comment

                  Latest Articles

                  Collapse

                  • seqadmin
                    Essential Discoveries and Tools in Epitranscriptomics
                    by seqadmin




                    The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
                    04-22-2024, 07:01 AM
                  • seqadmin
                    Current Approaches to Protein Sequencing
                    by seqadmin


                    Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                    04-04-2024, 04:25 PM

                  ad_right_rmr

                  Collapse

                  News

                  Collapse

                  Topics Statistics Last Post
                  Started by seqadmin, 04-11-2024, 12:08 PM
                  0 responses
                  59 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 04-10-2024, 10:19 PM
                  0 responses
                  57 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 04-10-2024, 09:21 AM
                  0 responses
                  53 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 04-04-2024, 09:00 AM
                  0 responses
                  56 views
                  0 likes
                  Last Post seqadmin  
                  Working...
                  X