Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • De novo assembly of highly expressed transcripts

    I am working on a transcriptome project where I have ~400MB-in-length 454 mRNA-seq reads sequenced on a non-normalized cDNA library. I was using mira3 to do de novo assembly of my reads, and it produced a decent assembly over transcripts with a moderate expression level. However, mira has a hard time assembling those highly expressed transcripts (>1000 copies or more). And it's the same thing with cap3. The TIGR assemler (TGICL) offers some ways to deal with highly expressed transcripts, but it doesn't have a great answer either.

    I wonder if anyone has insight of assembling high-expression transcripts? Could de bruijin graph-based assemblers work in this scenario?

    Many thanks,
    Hao

  • #2
    i also do a de novo assembly of a transcriptome and velvet/oases (de brujin graph-based) works fine especially for highly expressed transcripts. These are especially good assembled when you choose a high kmer.

    Comment


    • #3
      Have you tried Roche's Newbler in cDNA mode?

      Comment


      • #4
        Originally posted by Thorondor View Post
        i also do a de novo assembly of a transcriptome and velvet/oases (de brujin graph-based) works fine especially for highly expressed transcripts. These are especially good assembled when you choose a high kmer.
        That's great to know. Just to clarify, are velvet/oases working fine too on 454 reads?

        Comment


        • #5
          Originally posted by sklages View Post
          Have you tried Roche's Newbler in cDNA mode?
          I don't have a copy of Newbler. I emailed Roche for one weeks ago but still waiting for their replies.

          Comment


          • #6
            Originally posted by foryvonne View Post
            I don't have a copy of Newbler. I emailed Roche for one weeks ago but still waiting for their replies.
            Did you just send an e-mail to a general contact address or did you use their online software request form:

            Comment


            • #7
              I was sending an email. I'll trying sending an request form too. Thanks for letting me know.

              Comment


              • #8
                Originally posted by foryvonne View Post
                That's great to know. Just to clarify, are velvet/oases working fine too on 454 reads?
                Well it should work fine I guess especially for the high expressed transcripts. But I can't say that for sure since I am working with Illumina reads.

                Comment


                • #9
                  Does a non-normalized cDNA library has an impact on number of reads used by the assembler?
                  I'm asking because we're also working with Illumina Reads. We're using Velvet and SOAPdenovo at the moment. Velvet, for example, only uses 15594122 / 87419634 reads. Our reads are (after quality trimming to mean_qual = 20 and min_len = 35) between 35 and 60 bp long, the kmer-value for this assembly was set to 29 and velvet was run in -shortPaired mode. Anyway there are about 330 000 Contigs with N50=106 and 2300 Contigs longer than 500bp with N50=696.
                  Using lower kmer-values decreases the number of contigs, but increases the numer of used reads which is going on with a decrease in N50 value in both, all and long contigs only.
                  Last edited by Jenzo; 05-04-2011, 12:55 AM.

                  Comment


                  • #10
                    so what exactly is your question? This sounds all reasonable to me. You have a comparison to a normalized library? What is your expected coverage? And kmer 29 might be bit high if your calculated expected coverage is around 10-20.

                    Comment


                    • #11
                      Thanks for reply, Thorondor! We have a normalized library, sequenced with 454 and assembled it using nearly 90% of all reads with Mira. The question is: Why does Velvet use only about 15% of all reads and could it be because of the non-normalisation?
                      Mean Coverage is (according to Velvet's own measurement in contigs.fa) between 21 and 26 for all long Contigs (> 500bp).
                      Perhaps someone can recommend an assembler, which uses more reads on a non-normalized library.

                      Fyi, we did 8 assemblies with Velvet, using the following kmer-values: 21 (-short, for scaffolding with other algorithms using PE-information), 23 (-shortPaired), 23 (-short), 25 (-short), 27 (-shortPaired), 29 (-shortPaired), 31 (-shortPaired), 35 (-short). With k=23, shortPaired, Velvet uses about 25% of all reads, which was the maximum of all assemblies. Because scaffolding with other algorithms increases N50 currently up to 950 we would like to use Velvet only in -short mode, where the number of used reads is low (~11%).
                      Got my question? :-)
                      Last edited by Jenzo; 05-04-2011, 01:21 AM. Reason: error correction

                      Comment


                      • #12
                        no, i don't think that the non-normalisation is the reason here, but keep in mind that you coverage is not consistent over all transcripts. So it might get some transcripts better assembled with a setting the exp_cov really high and some better when you really set it to low values, this will influence the amount of reads used.

                        Also try to estimate your expected coverage on your own e.g. (total amount of bp in your reads)/(expected transcriptome size)

                        Are the paired end shuffled correctly into one file after trimming? Some reads are discarded after trimming so did you use select_paired.pl in the contrib-folder of velvet?

                        Since it seems like you do de novo transcriptome assembly why not try oases?

                        Comment


                        • #13
                          Dear Thorondor, thanks a lot for this suggestions! I'll try to estimate coverage and then try some values for exp_cov.
                          I'm really sure, that the reads are shuffled correctly, because trimming did not discard reads at all (low quality reads were just a single N after quality trimming) and the filter-on-length-script was wrote by myself, respecting always both reads (/1 and /2) and discarding none or both.
                          And you're right, we're doing de novo transcriptome assembly, but Oases runs out of memory (32 GB RAM). I set up a new virtual machine now, with 32 GB physical and about 60GB in swap and will try to run Oases on velvetg's output. (I already know that it will take a while ^^)
                          Thanks again a lot for help :-)

                          Comment

                          Latest Articles

                          Collapse

                          • seqadmin
                            Current Approaches to Protein Sequencing
                            by seqadmin


                            Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                            04-04-2024, 04:25 PM
                          • seqadmin
                            Strategies for Sequencing Challenging Samples
                            by seqadmin


                            Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                            03-22-2024, 06:39 AM

                          ad_right_rmr

                          Collapse

                          News

                          Collapse

                          Topics Statistics Last Post
                          Started by seqadmin, 04-11-2024, 12:08 PM
                          0 responses
                          17 views
                          0 likes
                          Last Post seqadmin  
                          Started by seqadmin, 04-10-2024, 10:19 PM
                          0 responses
                          22 views
                          0 likes
                          Last Post seqadmin  
                          Started by seqadmin, 04-10-2024, 09:21 AM
                          0 responses
                          16 views
                          0 likes
                          Last Post seqadmin  
                          Started by seqadmin, 04-04-2024, 09:00 AM
                          0 responses
                          46 views
                          0 likes
                          Last Post seqadmin  
                          Working...
                          X