Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • New to Next Gen not sure where to go from here

    Hello all,

    My supervisor and I recently jumped on the next gen bandwagon. Another professor in our department purchased lanes on an RNAseq plate and couldn't fill the entire thing so we took 4 spots. We are working on glyphosate resistant giant ragweed; for which there is no reference genome and little to no genetic data available.

    We have a few questions:
    1) What do you recommend we do now?
    2) In addition to the whole transcriptome date we are also interested specifically in looking at the expression of two or three genes (Catalase, SOD1 Cu/Zn, and EPSPS) however we don't have giant ragweed sequences for those genes. Can we search our data for those genes and their expression levels or is that just out of the question?
    3) This data will not be in my MSc and I won't be doing much work with it beyond posting here; but I would like to include a short section in my thesis on the data we collect here. Is it possible to get average read lengths, fold number and other quality statistics about the data to include in my thesis, or does that not make any sense?

    Thanks everyone,
    Taylor

  • #2


    Looks like a good place to start for transcriptome assembly.

    Comment


    • #3
      1) As vivek alluded, if you're thinking of doing some RNAseq, then you'll need to do assembly. Knowing absolutely nothing about this, I would think an interesting experiment would be to compare glyphosate resistant vs. glyphosate sensitive ragweed, but without a reference genome/transcriptome you might need more sequencing capacity to do that easily in one shot (hopefully someone who does assembly can chime in). BTW, if a related plant has been sequenced, you might have luck aligning to that (I haven't a clue how related the various plants are, I work on mice and humans!). This is particularly true for transcriptome alignments, since there's selective pressure on that.
      2) After assembly, you'll have to blast the various contigs to try to figure out what they are. Presumably you'll pick up the genes that you're most interested in. There's no great way to search the raw alignments for just a few genes, you'll likely end up getting a lot of false-positive matches.
      3) Just a listing of some various quality metrics probably wouldn't be interesting enough for inclusion in a thesis, at least without knowing the exact thesis topic.

      Comment


      • #4
        4 lanes should give some useful data. Use Trinity to assemble the transcriptomes. Use blast and perhaps something like blast2go for annotation.

        I can't give out details but I am 100% sure that you will find potential differential Catalase and CU-SOD1 activity. I agree with dpryan that listing numbers in your thesis would not be interesting. However such numbers can be obtained if you really want them.

        Good luck with the analysis. Expect a high learning curve but with, it is to be hoped, a high payback.

        Comment


        • #5
          can also use MIRA and Newbler for transcriptome assembly

          Comment


          • #6
            4 lanes is a lot these days. You can expect to get about 800 Million reads from that, and you really only need maybe 30-50 Million per replicate, and 3 replicates per sample. So you can easily stick in about 12-20 total samples in those 4 lanes.

            So, you should have some sort of experimental design. Don't just throw some random stuff in each lane. You'll over sequence them and end up regretting it later. Now, you're probably not going to find many people able to help with the experimental design. But you should think about if there are is some sort of developmental time course, drug/condition treatment and control, or different tissues from an adult, that would actually give some interesting comparisons. Once you have a few conditions/tissues/time-points picked out, you need to have 3 (or more) replicates for any sort of meaningful statistical analysis.

            Now, if you can't come up with more than 2-3 different samples to sequences, you may want to consider genome sequencing. But how big is your genome, or do you have any idea? Because if you want maybe 9 samples for RNA-seq, that only leaves you with about 400M reads for the genome. With 2x100bp reads, you're really only going to have useful depth of sequencing for 2Gbp size genomes or smaller. And even then, its going to be pretty fragments due to lack of matepair reads (though you could do 300bp and 800bp libraries now). But if you plan to continue working on this species, it may be useful to get the genome sequencing effort started, adding things like mate pair libraries at a later in time. This is something your advisor should be heavily involved in deciding, since most genome sequencing projects out live a single graduate student (especially if you're already in year 3 or 4).

            Now, for RNAseq analysis without a genome, I highly recommend trinity (linked above), it makes assembly, orthology assignment and expression analysis all very user friendly (for command line stuff).

            Comment


            • #7
              Thanks for your replies everyone.

              I feel I should clarify my third point. I don't necessarily want to include the metrics about the data as a part of my thesis. I guess I would like to be able to include a paragraph and part of a slide in my defense about where the research is going beyond the work I've done so far. Being able to cite some metrics about the data sounds a little more scientific than "We ran RNAseq and got back a lot of data"

              I also should clarify that we didn't buy four lanes. Our colleague bought a lane of 12 to himself and we bought 4 spots on that lane. We provided the sequencing facility with RNA from 4 plants representing 4 different states: Resistant sprayed (after 2 hours) and unsprayed to look for differential expression and susceptible sprayed (after 2 hours) and unsprayed to eliminate differences that are just a normal response to glyphosate.

              The closest plant with a sequenced and aligned genome is Sunflower, which is much too far to be of use.

              Using c-value our genome size is about 1.8 x 10^10 bp.

              Like I said, this isn't part of my project. I did the RNA extraction and the paper work but that is where my responsibility ends in my opinion. Look like I need to make the recommendation to my supervisor that if he really wants to work with this data he needs to get a genome sequence first. Otherwise he could use trinity but he'll probably need to hire a new grad student or post doc to do it.

              Thanks for all of the answers everyone.

              Comment


              • #8
                Originally posted by tjeffe01 View Post

                I also should clarify that we didn't buy four lanes. Our colleague bought a lane of 12 to himself and we bought 4 spots on that lane. We provided the sequencing facility with RNA from 4 plants representing 4 different states: Resistant sprayed (after 2 hours) and unsprayed to look for differential expression and susceptible sprayed (after 2 hours) and unsprayed to eliminate differences that are just a normal response to glyphosate.
                If this is a single lane of sequencing with 12 samples (if that is what you mean by lane of 12) then that would not be a lot of data.

                Comment


                • #9
                  Originally posted by GenoMax View Post
                  If this is a single lane of sequencing with 12 samples (if that is what you mean by lane of 12) then that would not be a lot of data.
                  Indeed. In fact, its probably far too little. Ideally, you're looking at about 12M reads per sample. That's just not enough sequencing depth. A single lane really shouldn't be split with more than 6 ways, or equivalent (i.e. 12 samples spread over 2 lanes). And the fact that there isn't a reference genome makes it even harder, as to do any meaningful analysis, genes/transcripts first need to be assembled, which requires much higher coverage than pure DE analysis.

                  So, its probably a good thing this is just a "future direction" for the OP's thesis.

                  Comment


                  • #10
                    Originally posted by tjeffe01 View Post
                    Using c-value our genome size is about 1.8 x 10^10 bp.
                    So 18Gbp? That genome isn't getting sequenced anytime soon.

                    Comment


                    • #11
                      Originally posted by Wallysb01 View Post
                      So 18Gbp? That genome isn't getting sequenced anytime soon.
                      Most plants aren't.

                      On the bright side of working with the un-characterized part of Life is that experiments don't have to stick to rigorous statistical principles. Which, in your case, is a good thing since you don't have biological nor technical replicates. Instead you can treat this as a "fishing expedition".

                      @Genomax. I agree that there isn't a lot of data but they should be able to get enough even with 1/3 of a lane. For rnaSeq we shoot for at least 30M reads per sample. A recent one-lane 8-sample experiment (similar to tjeffe01's) that recently came through our center yielded a total of 450M reads. So assuming that tjeffe01's sequencing center can balance across those 12 samples then he will get over 30M reads per sample. From that Trinity will be able to provide a nice assembly. Not to human/mouse standards but for us plant & animal guys ... well, we just take what we can.


                      I should emphasize what Wallysb01 said. Trinity does the assembly and, via its Trinnotate package -- the annotation and expression analysis. I am a bit behind the times by still using Blast and Blast2Go for my annotation but Trinity is becoming a one-stop solution.

                      Comment


                      • #12
                        Originally posted by westerman View Post
                        A recent one-lane 8-sample experiment (similar to tjeffe01's) that recently came through our center yielded a total of 450M reads. So assuming that tjeffe01's sequencing center can balance across those 12 samples then he will get over 30M reads per sample. From that Trinity will be able to provide a nice assembly. Not to human/mouse standards but for us plant & animal guys ... well, we just take what we can.
                        Do you mean 450M reads as in 225M PE reads? I don't think counting a read on the same fragment twice is the right thing to do here, if that's in deed what you're doing.

                        But I guess we should ask tjeffe01, how many PE reads did you get for each sample? Or is it not completed yet?

                        Comment


                        • #13
                          Since it is a roundup-resistant weed, you have obvious candidate genes. Part of the problem is that you don't have biological replicates, if I read it right, at least not for expression.

                          What you do have is biological replicates in sequence....

                          The first thing I would try to do is mine this data for any sequence variants in candidate genes...especially EPSPS. Imagine if you find variants in EPSPS in the resistant variety that are not in the non-resistant variety. That would be a very obvious candidate for resistance.

                          You could try to make a de novo transcriptome assembly. I would actually combine the reads from samples to make the assembly, or at least combine the reads from resistant varieties and combine the reads from non-resistant varieties. Then realign your reads back to this reference transcriptome to get differential expression.
                          Last edited by chadn737; 08-09-2013, 04:42 PM.

                          Comment


                          • #14
                            Originally posted by chadn737 View Post
                            I would actually combine the reads from samples to make the assembly, or at least combine the reads from resistant varieties and combine the reads from non-resistant varieties. Then realign your reads back to this reference transcriptome to get differential expression.
                            This, definitely this. I'd suggest assembling them all together, given your fairly limited sequencing depth. But I'd do both myself, and compare.

                            Comment


                            • #15
                              I understand the pitfalls of this suggestion, so nobody rip me to pieces.

                              I suggest it only because there are VERY obvious candidate genes in this experimental design. For those not familiar with how roundup works, the target enzyme is EPSPS and roundup-resistant crops carry a resistant EPSPS gene (no offense to those who know all this already, I just don't want to be attacked for suggesting this).

                              Find some sequences of candidate genes, whether from sunflower or other organisms. You may even be able to find the sequence of EPSPS from ragweed in a database somewhere. Then just align your sequences against this small reference. Obviously, this can lead to a lot of misalignment, but it would give a very quick look at any reads aligning to candidate genes. I would suggest this only as an initial quick dirty look at your data while you are running a de novo assembly or something, not as an approach to getting your data published.

                              What do people think?
                              Last edited by chadn737; 08-09-2013, 05:08 PM.

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Strategies for Sequencing Challenging Samples
                                by seqadmin


                                Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                                03-22-2024, 06:39 AM
                              • seqadmin
                                Techniques and Challenges in Conservation Genomics
                                by seqadmin



                                The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                                Avian Conservation
                                Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                                03-08-2024, 10:41 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, Yesterday, 06:37 PM
                              0 responses
                              10 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, Yesterday, 06:07 PM
                              0 responses
                              9 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 03-22-2024, 10:03 AM
                              0 responses
                              49 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 03-21-2024, 07:32 AM
                              0 responses
                              67 views
                              0 likes
                              Last Post seqadmin  
                              Working...
                              X