Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Paired-end or Single-end?

    Hello all,

    I am new to RNA-seq analysis, trying to decide whether to do single-end or paired-end RNA seq for the purpose of differential expression and alternative splicing.
    We use the Illumina Genome Analyzer IIx and intend to analyse the reads using Tophat and Cufflinks.

    - Would you recommend single-end or paired-end?
    - Which read length would you recommend, 2x36 or more?

    Thanks
    Reut

  • #2
    Go for paired end if you can - that will help you with the alternative splicing.

    Comment


    • #3
      We have performed 2x36 paired end and obtained apporx 6.5 GB fastq from one lane. Unfortunately the single base qualities are low and we are trying to figure out why, together with support. I think we will perform some alignment despite the quality and check the coverage.
      Tomasz Stokowy
      www.sequencing.io.gliwice.pl

      Comment


      • #4
        I'll add my two cents. Definitely go for paired-end when doing RNAseq. Certain protocols (ChIP-seq, tag-seq) might make more sense with single-end data. But, RNAseq is definitely not one of them. Read length of 2 x 36 is ok. But, 2 x 76 would be far better.

        Comment


        • #5
          At the risk of being contentious I'd say there is a case to be made for single end - depending on what you're interested in.

          If you generate paired end data you will (by definition) just get a second sequence from the same transcript as the first end, albeit from a bit further along. You will improve the mapping efficiency a bit by having the second read, but not by huge amounts. If you're just counting hits to transcripts then paired end data isn't going to increase the diversity of your results over single end and will double your sequencing costs.

          If you're interested in splicing then you ideally need to see a single read which spans across two exons so you can tell where the splice point is. Seeing hits in different exons from paired end reads can be useful, but it doesn't tell you how you got from one end to another. In terms of cost benefit you might therefore do better by running a longer single end than a paired end run (which would be quicker too).

          There are some situations (chimeric transcripts, connecting widely spaced exons within a transcript, working without a reference genome etc.) for which paired end data will be great, but how useful it is to you will depend on what you're really looking for in your data.

          Our current RNA-Seq samples are run with single-end 40bp reads (though there's a good case to be made for moving up to 50bp). As a replacement for expression arrays and for detecting novel transcripts this works pretty well for us. If cost and time weren't an issue then we'd happily collect long paired end reads - but our current call is that they wouldn't tell us enough extra about our data to be justified.

          Comment


          • #6
            low quality of long single end?

            Originally posted by simonandrews View Post
            If you're interested in splicing then you ideally need to see a single read which spans across two exons so you can tell where the splice point is. Seeing hits in different exons from paired end reads can be useful, but it doesn't tell you how you got from one end to another. In terms of cost benefit you might therefore do better by running a longer single end than a paired end run (which would be quicker too).
            As I understand, the read quality deteriorates as the read is longer.
            Our current data is single end reads of 72bp and after about 36-40bp the score starts to fall down. (however the reads do map to the genome and Tophat does find better junctions than just using the first 36bp).
            Maybe a paired-end read of 2x36bp will be of better quality?
            (I don't know which options Illumina has - is it possible to map single end of length 50bp?)

            Comment


            • #7
              Originally posted by reut View Post
              As I understand, the read quality deteriorates as the read is longer.
              Our current data is single end reads of 72bp and after about 36-40bp the score starts to fall down. (however the reads do map to the genome and Tophat does find better junctions than just using the first 36bp).
              Maybe a paired-end read of 2x36bp will be of better quality?
              (I don't know which options Illumina has - is it possible to map single end of length 50bp?)
              We're now seeing good quality data (>Q30 for >90% of reads) for at least 50bp from our Illumina runs so whereas we used to be reluctant to go above 40bp I'd be a lot happier to look towards 80bp these days where this would benefit us. As I said before I reckon 50bp could be a sweet spot for RNA-Seq - but we're quite happy with our 40bp data.

              What we do find is that as we go beyond 40bp we start to get read-through of the inserts into the adapters on the other end, which completely mess up the mapping stats unless they are carefully removed. We were surprised at how much of the library contained these shorter fragments even when the input material had been carefully size selected. Obviously, for this subset of reads the single-end / paired-end distinction is irrelevant.

              Comment


              • #8
                Paired end reads are much better for RNA-Seq, both for assembly, and also for quantification. There are a number of reasons:
                0. As already alluded to on this thread, paired-end reads improve mapping. Granted, as pointed out on this thread, as read lengths increase this improvement begins to diminish.
                1. For assembly, having a paired-end read effectively increases the fragment length from the read length to a few hundred bases. This allows for phasing of distant exons in multi-isoform genes. Furthermore, for de novo assembly paired-ends are even more important as they really help in traversing the de Bruijn graph structures underlying most current algorithms.
                2. In genes with multiple isoforms, paired-end reads improve accuracy of expression estimates by improving the assignment of fragments to transcripts. For an evaluation of the improvement obtained by paired-end over single-end sequencing see Hui Jiang's thesis: http://www.stanford.edu/dept/ICME/do...Jiang-2009.pdf
                In our analyses with Cufflinks, we see similar results and also find improvement with paired-ends. We get something like an overall improvement of 10% in accuracy, however this is an average. In some genes the improvement can be much greater.
                Simon is right that single end reads can identify splicing differences, but they don't allow for phasing such differences across distant parts of transcripts.
                3. When modeling non-uniformity of reads paired-end sequencing allows for modeling different bias on the 5' and 3' ends of fragments. We have recently shown that these biases are different and separate modeling leads to further improvements in expression estimates.

                A final remark: although the internal sequence is missing in paired-end sequencing, it is almost always uniquely determined by the mapping, so that one is effectively increasing read length from 10s of bp to hundreds. At Berkeley the extra cost for paired-ends is ~50%. With so much more that one can extract and learn from the data, that seems like a no-brainer.

                Comment


                • #9
                  Very interesting comments lpachter.
                  It certainty seems paired end can add value to RNA-seq studies. Would you say the same with Chip-seq. I mean you still get larger fragments than read length, but does it justify the doubling of cost.

                  Comment


                  • #10
                    I know you didn't ask me but no, in my opinion paired end does not add enough value for ChIP-seq.

                    Comment


                    • #11
                      Thanks kopi-o, i appreciate you commenting.
                      What are your thoughts on multiplexing being used for Chip-seq, (4 samples in one lane, illumina).

                      Comment


                      • #12
                        Hm, is it Illumina HiSeq? In that case multiplexing 4 samples in one lane should give you reasonable results. (estimating 60 million reads / lane -> 15 Mreads/sample)

                        Comment


                        • #13
                          It is still to be decided, we can use Hi-seq. But the strange thing is I just can not find a published paper for multiplexing Chip-seq with mouse. Just trying to see if someone has experience.
                          I agree with you, logically it should be possible

                          Comment


                          • #14
                            by the way, 15 million reads = 750 million bases
                            if there are 10,000 binding regions, each ~ 1 kb, it is 10 MB total, so ~ 75X coverage.
                            Does this calculation makes sense? Just wondering how to know how much reads are enough.

                            Comment


                            • #15
                              Well, from my experience, 10-15M reads have given nice results for ChIP-seq. The more the better of course, but say 15M total reads where 10M are mappable to the genome should be fine.

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Essential Discoveries and Tools in Epitranscriptomics
                                by seqadmin




                                The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
                                Yesterday, 07:01 AM
                              • seqadmin
                                Current Approaches to Protein Sequencing
                                by seqadmin


                                Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                                04-04-2024, 04:25 PM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, 04-11-2024, 12:08 PM
                              0 responses
                              58 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-10-2024, 10:19 PM
                              0 responses
                              53 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-10-2024, 09:21 AM
                              0 responses
                              45 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-04-2024, 09:00 AM
                              0 responses
                              55 views
                              0 likes
                              Last Post seqadmin  
                              Working...
                              X