Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Mapability or Uniqueness of Reference Genome

    The UCSC genome browser has a track that displays the mappability of short fragments to a reference genome. My question is, "Is there a tool to estimate the mappability of longer read lengths?"

    I am trying to define the parameters necessary to accurately align highly homologous gene family members and thier respective pseudogenes. They can have up to 90-95% identical sequences. I want to find a way to tell me the likelyhood of mapping all of the possible 500 bp sequences correctly and how many mismatches are present.

    Any ideas?

  • #2
    method for computing "mappable" regions of genome

    The program vmatch [1] can be used to find all pairs of regions longer than a length that are similar (or identical) to each other, where similarity can be defined in terms of maximum edit or hamming distance). Use the -dbnomatch option to report on all regions NOT participating in any such matches, i.e. are genome-wide dis-similar (or unique).

    [1] http://www.zbh.uni-hamburg.de/vmatch/

    I am unfamiliar with the UCSC mappability track. Can you provide a link to an example of this?

    Comment


    • #3
      Originally posted by RockChalkJayhawk View Post
      The UCSC genome browser has a track that displays the mappability of short fragments to a reference genome. My question is, "Is there a tool to estimate the mappability of longer read lengths?"

      I am trying to define the parameters necessary to accurately align highly homologous gene family members and thier respective pseudogenes. They can have up to 90-95% identical sequences. I want to find a way to tell me the likelyhood of mapping all of the possible 500 bp sequences correctly and how many mismatches are present.

      Any ideas?
      Follow the same approach as the mappability tracks for your 500bp sequences. Sample 500bp across the interested regions and map with your favorite short-read aligner (with the proper sensitivity). Then filter and process the resulting alignments to generate your own track. If you need help, there are many starving bioinformaticians.

      Originally posted by malcook View Post
      I am unfamiliar with the UCSC mappability track. Can you provide a link to an example of this?
      Try the "Mapability" track under "Mapping and Sequencing Tracks".

      Comment


      • #4
        hmmm - I don't see any mappability in either human or mouse at ucsc. (sounds of poking around) Oh, I see, it exists for human in hg18 but not most recent hg19. OK. Interesting. Thanks!

        The proposed `vmatch` based solution produces whole genome mappability boolean vector... a base is mappable if all the k-mers (i.e. 36mers) which span it are sufficiently unique (i.e. no match within edit distance of 2).

        Comment


        • #5
          Originally posted by malcook View Post
          hmmm - I don't see any mappability in either human or mouse at ucsc. (sounds of poking around) Oh, I see, it exists for human in hg18 but not most recent hg19. OK. Interesting. Thanks!

          The proposed `vmatch` based solution produces whole genome mappability boolean vector... a base is mappable if all the k-mers (i.e. 36mers) which span it are sufficiently unique (i.e. no match within edit distance of 2).
          It should not be too hard to generate these tracks using a short-read aligner and a reference genome. The methods of how each track is computed can be found in the track's details. I find these tracks extremely useful.

          Comment


          • #6
            What RockChalkJayhawk was asking for is challenging. It is difficult to find a suboptimal alignment 5-10% away for a 500bp sequence. I do not know how vmatch works, but probably it does not work well in this case.

            The right aligner for this task is ssaha2 or bwa/bwasw. But even with these tools, you still have a big chance of missing a suboptimal alignment 5-10% away.

            Comment


            • #7
              Mappability

              What I have done so far is to isolate the locus I am interested (either a 75 kb locus or 250kb), then I generated a custom perl script to make every possible n-length fragment. I then mapped it back to the entire reference looking for multiple matches and up to 5 mismatches using SeqMap and created a custom track to view it in UCSC. It's probably not the best approach, but it seems like its doing what I want it to.

              Comment

              Latest Articles

              Collapse

              • seqadmin
                Understanding Genetic Influence on Infectious Disease
                by seqadmin




                During the COVID-19 pandemic, scientists observed that while some individuals experienced severe illness when infected with SARS-CoV-2, others were barely affected. These disparities left researchers and clinicians wondering what causes the wide variations in response to viral infections and what role genetics plays.

                Jean-Laurent Casanova, M.D., Ph.D., Professor at Rockefeller University, is a leading expert in this crossover between genetics and infectious...
                09-09-2024, 10:59 AM
              • seqadmin
                Addressing Off-Target Effects in CRISPR Technologies
                by seqadmin






                The first FDA-approved CRISPR-based therapy marked the transition of therapeutic gene editing from a dream to reality1. CRISPR technologies have streamlined gene editing, and CRISPR screens have become an important approach for identifying genes involved in disease processes2. This technique introduces targeted mutations across numerous genes, enabling large-scale identification of gene functions, interactions, and pathways3. Identifying the full range...
                08-27-2024, 04:44 AM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by seqadmin, 09-11-2024, 02:44 PM
              0 responses
              13 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 09-06-2024, 08:02 AM
              0 responses
              146 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 09-03-2024, 08:30 AM
              0 responses
              153 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 08-27-2024, 04:40 AM
              0 responses
              163 views
              0 likes
              Last Post seqadmin  
              Working...
              X