Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Mapability or Uniqueness of Reference Genome

    The UCSC genome browser has a track that displays the mappability of short fragments to a reference genome. My question is, "Is there a tool to estimate the mappability of longer read lengths?"

    I am trying to define the parameters necessary to accurately align highly homologous gene family members and thier respective pseudogenes. They can have up to 90-95% identical sequences. I want to find a way to tell me the likelyhood of mapping all of the possible 500 bp sequences correctly and how many mismatches are present.

    Any ideas?

  • #2
    method for computing "mappable" regions of genome

    The program vmatch [1] can be used to find all pairs of regions longer than a length that are similar (or identical) to each other, where similarity can be defined in terms of maximum edit or hamming distance). Use the -dbnomatch option to report on all regions NOT participating in any such matches, i.e. are genome-wide dis-similar (or unique).

    [1] http://www.zbh.uni-hamburg.de/vmatch/

    I am unfamiliar with the UCSC mappability track. Can you provide a link to an example of this?

    Comment


    • #3
      Originally posted by RockChalkJayhawk View Post
      The UCSC genome browser has a track that displays the mappability of short fragments to a reference genome. My question is, "Is there a tool to estimate the mappability of longer read lengths?"

      I am trying to define the parameters necessary to accurately align highly homologous gene family members and thier respective pseudogenes. They can have up to 90-95% identical sequences. I want to find a way to tell me the likelyhood of mapping all of the possible 500 bp sequences correctly and how many mismatches are present.

      Any ideas?
      Follow the same approach as the mappability tracks for your 500bp sequences. Sample 500bp across the interested regions and map with your favorite short-read aligner (with the proper sensitivity). Then filter and process the resulting alignments to generate your own track. If you need help, there are many starving bioinformaticians.

      Originally posted by malcook View Post
      I am unfamiliar with the UCSC mappability track. Can you provide a link to an example of this?
      Try the "Mapability" track under "Mapping and Sequencing Tracks".

      Comment


      • #4
        hmmm - I don't see any mappability in either human or mouse at ucsc. (sounds of poking around) Oh, I see, it exists for human in hg18 but not most recent hg19. OK. Interesting. Thanks!

        The proposed `vmatch` based solution produces whole genome mappability boolean vector... a base is mappable if all the k-mers (i.e. 36mers) which span it are sufficiently unique (i.e. no match within edit distance of 2).

        Comment


        • #5
          Originally posted by malcook View Post
          hmmm - I don't see any mappability in either human or mouse at ucsc. (sounds of poking around) Oh, I see, it exists for human in hg18 but not most recent hg19. OK. Interesting. Thanks!

          The proposed `vmatch` based solution produces whole genome mappability boolean vector... a base is mappable if all the k-mers (i.e. 36mers) which span it are sufficiently unique (i.e. no match within edit distance of 2).
          It should not be too hard to generate these tracks using a short-read aligner and a reference genome. The methods of how each track is computed can be found in the track's details. I find these tracks extremely useful.

          Comment


          • #6
            What RockChalkJayhawk was asking for is challenging. It is difficult to find a suboptimal alignment 5-10% away for a 500bp sequence. I do not know how vmatch works, but probably it does not work well in this case.

            The right aligner for this task is ssaha2 or bwa/bwasw. But even with these tools, you still have a big chance of missing a suboptimal alignment 5-10% away.

            Comment


            • #7
              Mappability

              What I have done so far is to isolate the locus I am interested (either a 75 kb locus or 250kb), then I generated a custom perl script to make every possible n-length fragment. I then mapped it back to the entire reference looking for multiple matches and up to 5 mismatches using SeqMap and created a custom track to view it in UCSC. It's probably not the best approach, but it seems like its doing what I want it to.

              Comment

              Latest Articles

              Collapse

              • seqadmin
                Advancing Precision Medicine for Rare Diseases in Children
                by seqadmin




                Many organizations study rare diseases, but few have a mission as impactful as Rady Children’s Institute for Genomic Medicine (RCIGM). “We are all about changing outcomes for children,” explained Dr. Stephen Kingsmore, President and CEO of the group. The institute’s initial goal was to provide rapid diagnoses for critically ill children and shorten their diagnostic odyssey, a term used to describe the long and arduous process it takes patients to obtain an accurate...
                12-16-2024, 07:57 AM
              • seqadmin
                Recent Advances in Sequencing Technologies
                by seqadmin



                Innovations in next-generation sequencing technologies and techniques are driving more precise and comprehensive exploration of complex biological systems. Current advancements include improved accessibility for long-read sequencing and significant progress in single-cell and 3D genomics. This article explores some of the most impactful developments in the field over the past year.

                Long-Read Sequencing
                Long-read sequencing has seen remarkable advancements,...
                12-02-2024, 01:49 PM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by seqadmin, 12-17-2024, 10:28 AM
              0 responses
              28 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 12-13-2024, 08:24 AM
              0 responses
              43 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 12-12-2024, 07:41 AM
              0 responses
              29 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 12-11-2024, 07:45 AM
              0 responses
              42 views
              0 likes
              Last Post seqadmin  
              Working...
              X