Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Pipeline for de novo RNA sequencing, and Galaxy

    Hi all, I'm trying to figure out a good pipeline for a de novo RNA sequencing project (hybrid assembly) - I figure I will have to use MIRA. I will have paired ~100 bp Illumina HiSeq data and am hoping to someday also get 454 FLX data (they are having trouble with the new chemistry, so I don't know when this will be). I'll be making a hybrid transcriptome assembly, then mapping the Illumina sequences to the assembly to quantify reads. This is the first time I've ever done anything like this- can people suggest pipelines that would be good to try? Also, what metrics should I be using to determine if my assembly result is good or not? I don't have a reference genome to map to.

    I'm not used to command line interfaces and if anyone has used MIRA and has an example of commands they used that they can share with me, I'd be grateful.

    Also, I've encountered Galaxy, which apparently can let you use MIRA with it. Has anyone done this, and had problems? Anyone have problems in general with Galaxy not allowing programs to work correctly?

    Thanks for any help you can provide to this noob.

  • #2
    If you know someone else who has done RNASeq on your organism, you could test the assembly with their data. Any de-novo assembly you do should have a really good mapping of your reads back to the assembly, but might not be so great with someone else's reads.

    Comment


    • #3
      I have good results with Trinity for Illumina data, I guess it won't be too happy about 454 reads though, unless you pre-process them to correct homopolymer errors.

      Metrics for de novo transcriptomes are difficult to define, we have tried to map the transcript contigs to the transcripts of similar organisms to get an idea of the completeness. You could look at the contig length distribution and compare it to that of a similar organism.

      For MIRA I suggest you to ask on the mailing list, Bastien is quite fast in helping out new users there... It might choke on big Illumina sets though, make sure you have lots of RAM and time for your analysis or subset your dataset to have a manageable run.

      Comment


      • #4
        Trinity vs Mira /de novo assembly

        Hi all,

        I'm keen to see how others are getting on with de novo assemblies, particularly with Trinity. It's interesting to me that their Nat. Biotechnology paper doesn't mention Mira, and I was wondering if anyone has compared the two programs.

        I'm doing de novo assemblies using 50bp single-read Illumina data, with a little 454 data thrown in there. When Trinity first came out, it crashed pretty quickly. But now that they have different options for the first step/inchworm (I've been trying jellyfish), I've been able to assemble 100 million reads on my local machine (24GB ram) in less than a day. This has been the case for Illumina data alone, and with the 454 data pooled. I suspect, however, that the 454 data had little impact on the outcome, because I only have about 200,000 reads!

        So far, Trinity gives me more long reads and has less redundancy (according to TGICL). But it's always difficult to assess these alignments. In particular, I can't find out how much of my data is being used by Trinity. Is there a handy report file with this information somewhere? The webpage suggests using bowtie to figure out what has gone into the alignment, but this will throw out anything that aligns ambiguously. Is there an easier way? Does anyone else have experience with Trinity that they can share?

        Also, Trinity is able to align all of my data at once, whereas Mira was crashing when I tried to align it all together (even on a cluster with 96GB RAM). I was getting around this by partitioning my data in mira, so it was working. But doing it all in one alignment is a plus.

        And Liz- you might have found this already, but the example inputs on the mira html guide are quite useful: http://mira-assembler.sourceforge.ne...ideToMIRA.html

        Thanks!
        -Alice

        Comment

        Latest Articles

        Collapse

        • seqadmin
          Techniques and Challenges in Conservation Genomics
          by seqadmin



          The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

          Avian Conservation
          Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
          03-08-2024, 10:41 AM
        • seqadmin
          The Impact of AI in Genomic Medicine
          by seqadmin



          Artificial intelligence (AI) has evolved from a futuristic vision to a mainstream technology, highlighted by the introduction of tools like OpenAI's ChatGPT and Google's Gemini. In recent years, AI has become increasingly integrated into the field of genomics. This integration has enabled new scientific discoveries while simultaneously raising important ethical questions1. Interviews with two researchers at the center of this intersection provide insightful perspectives into...
          02-26-2024, 02:07 PM

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by seqadmin, 03-14-2024, 06:13 AM
        0 responses
        33 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 03-08-2024, 08:03 AM
        0 responses
        72 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 03-07-2024, 08:13 AM
        0 responses
        81 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 03-06-2024, 09:51 AM
        0 responses
        68 views
        0 likes
        Last Post seqadmin  
        Working...
        X