Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Unique mapped reads definition confusing...

    Hi,

    I just read few assembly paper this few days. All of those papers got mention a lot about the "unique mapped reads".
    Can anybody willing to share with me about the unique mapped reads definition?
    I will appreciate if someone can provide some simple examples that can help me more understanding about the definition of unique mapped reads in assembly and bioinformatics.
    Thanks a lot for your advise and suggestion

  • #2
    I would use this term to describe a read which mapped only once in a genome with a given number of mismatches. Hopefully the match would be a unique exact match, but if there was a single SNP then so long as there was no other place in the genome which the read could match with only one mismatch then it would still count as a uniquely mapped read.

    You normally find that once you get above 2 mismatches in a 36bp read you're very unlikely to be able to map it uniquely so the majority of uniquely mapping reads will be exact matches or have just 1 or 2 mismatches.

    Comment


    • #3
      Hi simon,

      Thanks a lot for your suggestion. It is very clear and easy to understand.
      Thanks for helping me solved my doubts
      In your opinion, the definition for the uniquely mapped reads that you explained to me just now. Is it also applied for the long base pair read, like 454,Sanger read,etc?
      I got read some bioinformatics journal paper recently.
      Some scientist will use the uniquely mapped read to assemble a high-quality consensus sequence of some specific organism's genome.
      Do you know what is the purpose that scientist use the uniquely mapped read to assemble a high-quality consensus sequence of some specific organism's genome?
      Thanks again for your help

      Originally posted by simonandrews View Post
      I would use this term to describe a read which mapped only once in a genome with a given number of mismatches. Hopefully the match would be a unique exact match, but if there was a single SNP then so long as there was no other place in the genome which the read could match with only one mismatch then it would still count as a uniquely mapped read.

      You normally find that once you get above 2 mismatches in a 36bp read you're very unlikely to be able to map it uniquely so the majority of uniquely mapping reads will be exact matches or have just 1 or 2 mismatches.

      Comment


      • #4
        In your opinion, the definition for the uniquely mapped reads that you explained to me just now. Is it also applied for the long base pair read, like 454,Sanger read,etc?
        Certainly it can. If I had a bunch of long 1000-base Sanger reads they could be mapped either uniquely or non-uniquely to a reference genome. Depending on the number of SNPs expected then the number of allowed mismatches may need to be raised.

        Do you know what is the purpose that scientist use the uniquely mapped read to assemble a high-quality consensus sequence of some specific organism's genome?
        Probably for the same reason that anyone wants a sequence -- in order to find out what makes that specific organism's genome different than other genomes ... SNPs, InDels, unique genes, unique control mechanisms, etc.

        It may be obvious but you can assemble a sequence either via:

        1) De-novo assembly
        or
        2) Mapping unique reads onto a reference
        or
        3) Mapping unique and non-unique reads onto a reference
        or
        4) A combination of the above

        Comment


        • #5
          Thanks a lot, westerman.
          Your reply makes me more understanding about how the scientist analyze the data.

          Sad to said that I still not very clear about why the scientist will use the uniquely mapped read of specific organism genome A to assemble a high-quality consensus sequence of some specific organism genome B?
          What is the purpose that they doing these method to analyze the data?
          Do you know what is the general pipeline to analyze the 454 or Illumina data?
          I very appreciate and thanks for your suggestion and opinion

          Originally posted by westerman View Post
          Certainly it can. If I had a bunch of long 1000-base Sanger reads they could be mapped either uniquely or non-uniquely to a reference genome. Depending on the number of SNPs expected then the number of allowed mismatches may need to be raised.


          Probably for the same reason that anyone wants a sequence -- in order to find out what makes that specific organism's genome different than other genomes ... SNPs, InDels, unique genes, unique control mechanisms, etc.

          It may be obvious but you can assemble a sequence either via:

          1) De-novo assembly
          or
          2) Mapping unique reads onto a reference
          or
          3) Mapping unique and non-unique reads onto a reference
          or
          4) A combination of the above

          Comment


          • #6
            It is easy to define uniqueness when you require the entire read to be aligned without gaps. But things get complicated when you allow clipping and gaps, both of which are related to the underlying scoring system and therefore uniqueness is related to scoring system. In addition, although we may define a read being unique when its best two matches have identical scores according to a scoring system, such a definition is not useful in practice. What if the second best match has a lower score just by 1 or 2?

            Comment


            • #7
              Thanks for your reply...
              What you mention,make senses too...
              I will try to find out more about the "unique mapped read" and share it with everybody

              Originally posted by lh3 View Post
              It is easy to define uniqueness when you require the entire read to be aligned without gaps. But things get complicated when you allow clipping and gaps, both of which are related to the underlying scoring system and therefore uniqueness is related to scoring system. In addition, although we may define a read being unique when its best two matches have identical scores according to a scoring system, such a definition is not useful in practice. What if the second best match has a lower score just by 1 or 2?

              Comment

              Latest Articles

              Collapse

              • seqadmin
                Essential Discoveries and Tools in Epitranscriptomics
                by seqadmin




                The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
                04-22-2024, 07:01 AM
              • seqadmin
                Current Approaches to Protein Sequencing
                by seqadmin


                Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                04-04-2024, 04:25 PM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by seqadmin, Yesterday, 08:47 AM
              0 responses
              12 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 04-11-2024, 12:08 PM
              0 responses
              60 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 04-10-2024, 10:19 PM
              0 responses
              59 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 04-10-2024, 09:21 AM
              0 responses
              54 views
              0 likes
              Last Post seqadmin  
              Working...
              X