Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Pipeline for de novo RNA sequencing, and Galaxy

    Hi all, I'm trying to figure out a good pipeline for a de novo RNA sequencing project (hybrid assembly) - I figure I will have to use MIRA. I will have paired ~100 bp Illumina HiSeq data and am hoping to someday also get 454 FLX data (they are having trouble with the new chemistry, so I don't know when this will be). I'll be making a hybrid transcriptome assembly, then mapping the Illumina sequences to the assembly to quantify reads. This is the first time I've ever done anything like this- can people suggest pipelines that would be good to try? Also, what metrics should I be using to determine if my assembly result is good or not? I don't have a reference genome to map to.

    I'm not used to command line interfaces and if anyone has used MIRA and has an example of commands they used that they can share with me, I'd be grateful.

    Also, I've encountered Galaxy, which apparently can let you use MIRA with it. Has anyone done this, and had problems? Anyone have problems in general with Galaxy not allowing programs to work correctly?

    Thanks for any help you can provide to this noob.

  • #2
    If you know someone else who has done RNASeq on your organism, you could test the assembly with their data. Any de-novo assembly you do should have a really good mapping of your reads back to the assembly, but might not be so great with someone else's reads.

    Comment


    • #3
      I have good results with Trinity for Illumina data, I guess it won't be too happy about 454 reads though, unless you pre-process them to correct homopolymer errors.

      Metrics for de novo transcriptomes are difficult to define, we have tried to map the transcript contigs to the transcripts of similar organisms to get an idea of the completeness. You could look at the contig length distribution and compare it to that of a similar organism.

      For MIRA I suggest you to ask on the mailing list, Bastien is quite fast in helping out new users there... It might choke on big Illumina sets though, make sure you have lots of RAM and time for your analysis or subset your dataset to have a manageable run.

      Comment


      • #4
        Trinity vs Mira /de novo assembly

        Hi all,

        I'm keen to see how others are getting on with de novo assemblies, particularly with Trinity. It's interesting to me that their Nat. Biotechnology paper doesn't mention Mira, and I was wondering if anyone has compared the two programs.

        I'm doing de novo assemblies using 50bp single-read Illumina data, with a little 454 data thrown in there. When Trinity first came out, it crashed pretty quickly. But now that they have different options for the first step/inchworm (I've been trying jellyfish), I've been able to assemble 100 million reads on my local machine (24GB ram) in less than a day. This has been the case for Illumina data alone, and with the 454 data pooled. I suspect, however, that the 454 data had little impact on the outcome, because I only have about 200,000 reads!

        So far, Trinity gives me more long reads and has less redundancy (according to TGICL). But it's always difficult to assess these alignments. In particular, I can't find out how much of my data is being used by Trinity. Is there a handy report file with this information somewhere? The webpage suggests using bowtie to figure out what has gone into the alignment, but this will throw out anything that aligns ambiguously. Is there an easier way? Does anyone else have experience with Trinity that they can share?

        Also, Trinity is able to align all of my data at once, whereas Mira was crashing when I tried to align it all together (even on a cluster with 96GB RAM). I was getting around this by partitioning my data in mira, so it was working. But doing it all in one alignment is a plus.

        And Liz- you might have found this already, but the example inputs on the mira html guide are quite useful: http://mira-assembler.sourceforge.ne...ideToMIRA.html

        Thanks!
        -Alice

        Comment

        Latest Articles

        Collapse

        • seqadmin
          Current Approaches to Protein Sequencing
          by seqadmin


          Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
          04-04-2024, 04:25 PM
        • seqadmin
          Strategies for Sequencing Challenging Samples
          by seqadmin


          Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
          03-22-2024, 06:39 AM

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by seqadmin, 04-11-2024, 12:08 PM
        0 responses
        18 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-10-2024, 10:19 PM
        0 responses
        22 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-10-2024, 09:21 AM
        0 responses
        17 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-04-2024, 09:00 AM
        0 responses
        49 views
        0 likes
        Last Post seqadmin  
        Working...
        X