Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • newbler problem

    Hi, I found the 454Isotigs.fna file contains many sequences that are 100% identical but with different lengths (i.e. one sequence contains another shorter one). Isn't this supposed not to happen. I mean they should be assembled as one? Thanks ...
    Last edited by bioben; 09-30-2010, 07:49 PM.

  • #2
    Forgot to say that I am trying to assemble ~10 million 454 ESTs and ~1 million sanger ESTs. I also tried CAP3 and TGICL. They all output identical sequences more or less in the contigs and singlets files.

    Comment


    • #3
      Originally posted by bioben View Post
      Hi, I found the 454Isotigs.fna file contains many sequences that are 100% identical but with different lengths (i.e. one sequence contains another shorter one). Isn't this supposed not to happen. I mean they should be assembled as one? Thanks ...
      This is the gsAssembler (Newbler) saying that it believes there are two isoforms of the gene, one being shorter than the other. Is it correct?? That's where your biological expertise comes in. Personally I would bet a large number of donuts that it's not correct. gsAssembler seems to be overzealous in finding isoforms.

      Comment


      • #4
        Thanks, kmcarr. I think you are right. Probably they are splicing variants.

        Then how about singlets? I tried to find them back by parsing the 454ReadStatus.txt file. The resulting singlets file also contains many identical reads. To me, they are supposed to be assembled as one and show up in the isotigs file. Do people usually care about singlets or not? Thanks ...

        Comment


        • #5
          Originally posted by bioben View Post
          Then how about singlets? I tried to find them back by parsing the 454ReadStatus.txt file. The resulting singlets file also contains many identical reads. To me, they are supposed to be assembled as one and show up in the isotigs file. Do people usually care about singlets or not? Thanks ...
          I suspect that the singletons are not assembled together simply because they are identical and thus considered to be technical duplicates. It is hard to have a contig made up of exactly one identical read. If the reads overlap then they could be assembled. Unfortunately do not know of a 454 file that describes which reads are true singletons and which are duplicate singletons.

          Comment


          • #6
            Hi bioben
            I think you should read this thread: Detection of alternative splicing events from 454 output
            it should answer a lot of questions

            Comment


            • #7
              Originally posted by westerman View Post
              I suspect that the singletons are not assembled together simply because they are identical and thus considered to be technical duplicates. It is hard to have a contig made up of exactly one identical read. If the reads overlap then they could be assembled. Unfortunately do not know of a 454 file that describes which reads are true singletons and which are duplicate singletons.
              I don't think so. Singletons are read from region poorly covered by emPCR. also, if there were reads having an overlap but when they were trimmed or there were some sequencing errors, newbler did not find the overlap. Set these before you start assembly in 454AssemblyProject.xml:

              <minimumReadLength>45</minimumReadLength>
              <overlapSeedStep>1</overlapSeedStep>
              <overlapMinMatchLength>60</overlapMinMatchLength>
              <overlapMinMatchIdentity>96</overlapMinMatchIdentity>
              <ripMode>true</ripMode>

              Make a new cDNA assembly, do not re-run it from the current assembly directory because in my opinion newbler does not re-compute the overlaps and hence not all changes will kick in. With these settings I got 50% more assembled contigs than with loose defaults!

              Comment

              Latest Articles

              Collapse

              • seqadmin
                Essential Discoveries and Tools in Epitranscriptomics
                by seqadmin




                The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
                04-22-2024, 07:01 AM
              • seqadmin
                Current Approaches to Protein Sequencing
                by seqadmin


                Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                04-04-2024, 04:25 PM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by seqadmin, Today, 08:47 AM
              0 responses
              11 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 04-11-2024, 12:08 PM
              0 responses
              60 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 04-10-2024, 10:19 PM
              0 responses
              59 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 04-10-2024, 09:21 AM
              0 responses
              54 views
              0 likes
              Last Post seqadmin  
              Working...
              X