Seqanswers Leaderboard Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • RockChalkJayhawk
    Senior Member
    • Mar 2009
    • 192

    Mapability or Uniqueness of Reference Genome

    The UCSC genome browser has a track that displays the mappability of short fragments to a reference genome. My question is, "Is there a tool to estimate the mappability of longer read lengths?"

    I am trying to define the parameters necessary to accurately align highly homologous gene family members and thier respective pseudogenes. They can have up to 90-95% identical sequences. I want to find a way to tell me the likelyhood of mapping all of the possible 500 bp sequences correctly and how many mismatches are present.

    Any ideas?
  • malcook
    Member
    • Sep 2009
    • 24

    #2
    method for computing "mappable" regions of genome

    The program vmatch [1] can be used to find all pairs of regions longer than a length that are similar (or identical) to each other, where similarity can be defined in terms of maximum edit or hamming distance). Use the -dbnomatch option to report on all regions NOT participating in any such matches, i.e. are genome-wide dis-similar (or unique).

    [1] http://www.zbh.uni-hamburg.de/vmatch/

    I am unfamiliar with the UCSC mappability track. Can you provide a link to an example of this?

    Comment

    • nilshomer
      Nils Homer
      • Nov 2008
      • 1283

      #3
      Originally posted by RockChalkJayhawk View Post
      The UCSC genome browser has a track that displays the mappability of short fragments to a reference genome. My question is, "Is there a tool to estimate the mappability of longer read lengths?"

      I am trying to define the parameters necessary to accurately align highly homologous gene family members and thier respective pseudogenes. They can have up to 90-95% identical sequences. I want to find a way to tell me the likelyhood of mapping all of the possible 500 bp sequences correctly and how many mismatches are present.

      Any ideas?
      Follow the same approach as the mappability tracks for your 500bp sequences. Sample 500bp across the interested regions and map with your favorite short-read aligner (with the proper sensitivity). Then filter and process the resulting alignments to generate your own track. If you need help, there are many starving bioinformaticians.

      Originally posted by malcook View Post
      I am unfamiliar with the UCSC mappability track. Can you provide a link to an example of this?
      Try the "Mapability" track under "Mapping and Sequencing Tracks".

      Comment

      • malcook
        Member
        • Sep 2009
        • 24

        #4
        hmmm - I don't see any mappability in either human or mouse at ucsc. (sounds of poking around) Oh, I see, it exists for human in hg18 but not most recent hg19. OK. Interesting. Thanks!

        The proposed `vmatch` based solution produces whole genome mappability boolean vector... a base is mappable if all the k-mers (i.e. 36mers) which span it are sufficiently unique (i.e. no match within edit distance of 2).

        Comment

        • nilshomer
          Nils Homer
          • Nov 2008
          • 1283

          #5
          Originally posted by malcook View Post
          hmmm - I don't see any mappability in either human or mouse at ucsc. (sounds of poking around) Oh, I see, it exists for human in hg18 but not most recent hg19. OK. Interesting. Thanks!

          The proposed `vmatch` based solution produces whole genome mappability boolean vector... a base is mappable if all the k-mers (i.e. 36mers) which span it are sufficiently unique (i.e. no match within edit distance of 2).
          It should not be too hard to generate these tracks using a short-read aligner and a reference genome. The methods of how each track is computed can be found in the track's details. I find these tracks extremely useful.

          Comment

          • lh3
            Senior Member
            • Feb 2008
            • 686

            #6
            What RockChalkJayhawk was asking for is challenging. It is difficult to find a suboptimal alignment 5-10% away for a 500bp sequence. I do not know how vmatch works, but probably it does not work well in this case.

            The right aligner for this task is ssaha2 or bwa/bwasw. But even with these tools, you still have a big chance of missing a suboptimal alignment 5-10% away.

            Comment

            • RockChalkJayhawk
              Senior Member
              • Mar 2009
              • 192

              #7
              Mappability

              What I have done so far is to isolate the locus I am interested (either a 75 kb locus or 250kb), then I generated a custom perl script to make every possible n-length fragment. I then mapped it back to the entire reference looking for multiple matches and up to 5 mismatches using SeqMap and created a custom track to view it in UCSC. It's probably not the best approach, but it seems like its doing what I want it to.

              Comment

              Latest Articles

              Collapse

              • seqadmin
                Pathogen Surveillance with Advanced Genomic Tools
                by seqadmin




                The COVID-19 pandemic highlighted the need for proactive pathogen surveillance systems. As ongoing threats like avian influenza and newly emerging infections continue to pose risks, researchers are working to improve how quickly and accurately pathogens can be identified and tracked. In a recent SEQanswers webinar, two experts discussed how next-generation sequencing (NGS) and machine learning are shaping efforts to monitor viral variation and trace the origins of infectious...
                03-24-2025, 11:48 AM
              • seqadmin
                New Genomics Tools and Methods Shared at AGBT 2025
                by seqadmin


                This year’s Advances in Genome Biology and Technology (AGBT) General Meeting commemorated the 25th anniversary of the event at its original venue on Marco Island, Florida. While this year’s event didn’t include high-profile musical performances, the industry announcements and cutting-edge research still drew the attention of leading scientists.

                The Headliner
                The biggest announcement was Roche stepping back into the sequencing platform market. In the years since...
                03-03-2025, 01:39 PM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by seqadmin, 03-20-2025, 05:03 AM
              0 responses
              49 views
              0 reactions
              Last Post seqadmin  
              Started by seqadmin, 03-19-2025, 07:27 AM
              0 responses
              57 views
              0 reactions
              Last Post seqadmin  
              Started by seqadmin, 03-18-2025, 12:50 PM
              0 responses
              50 views
              0 reactions
              Last Post seqadmin  
              Started by seqadmin, 03-03-2025, 01:15 PM
              0 responses
              201 views
              0 reactions
              Last Post seqadmin  
              Working...