Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • RNA-seq two sample single and rest PE analysis

    I have 5 time points of RNA-seq data which is paired end. In addition I have two samples (single end) one of which is replicate of 5 time points and One is separate additional time point. (I know it is an unusual situation- I have two samples in single read rest PE- long story short part of study was done by other grp). tMy questions are:
    1. Should I keep separate PE and single reads and analyze separately, in that case how I will combine one of them as replicate.
    2. Will that be OK if I combine both types of reads in that case will not I loose information if I follow single end.
    3. Is there any rule which tool may be best to analyze such data, I was planning to use TopHat> Cufflink. Alternatively, I can think about DEseq.
    My aim is to find differential expression transcripts an displacing events.

    Thanks for your help and attention.

  • #2
    one option

    you could try:
    1) map (e.g. with RUM)
    2) count reads with HTseq-count
    3) analyse with R-packages limma/edgeR (voom()-function) and an appropriate contrast matrix

    Comment


    • #3
      I've had to run some comparisons between single-end and paired-end data myself. Me and the researchers came to the conclusion that regardless of the sequencing method we should align each sample in whatever way we can to get the most complete set of alignments for that sample. This is because in each case the kit is designed to produce some type of reads and we should align them in the way it was intended starting from the kit used to prepare the samples.

      The touchy thing would probably come down to normalization between samples and I think tools like DESeq and edgeR do a good job of that.

      So my version of the above post would be:

      1) align with tophat
      2) count reads with htseq-count
      3) perform pairwise de tests with DESeq
      4) ...play with results of pairwise tests...
      /* Shawn Driscoll, Gene Expression Laboratory, Pfaff
      Salk Institute for Biological Studies, La Jolla, CA, USA */

      Comment


      • #4
        normalization PE and SE

        I have not seen much in the literature the explicitly states the best protocols for RNA-Seq analysis using PE reads let alone mixing PE and SE.

        So here are the ideas to consider.

        PE and SE reads are taken from one fragment in your library. If the idea is to count the number of fragments that are suppose to be represented of the RNA in your sample then after you do a paired end alignment the PE read counts for a particular feature (gene exon etc) should be divided by 2.

        The other option would be to take only the forward read of the PE and run your analysis so everything is equal. I would liken this method to taking the first 36 bases of a 50 base read so that it matches the read length of other libraries you have. You will lose information but libraries will be the same (SE, read length).

        If you align the PE you will get more uniquely mappable alignments. So there will be some bias in the SE reads mapping to more locations.

        Comment


        • #5
          Thanks Andrew. So, the next logical question - what are the methods that we're all using every day (HTSeq-count + DESeq, DEXSeq, BayesSeq, EdgeR, Cuffdiff) doing about counting with paired-end reads?

          Comment


          • #6
            bwa

            BWA manual appears to suggest it maps only PE reads that are concordant (map in the proper orientation and are within the specified fragment size distance from each other)

            GSNAP accounts for all the possible combinations of alignments and outputs half PE reads where only 1 read maps and the other does not. These can be output into separate files but I haven't seen any methods for combining them.

            In other words the question of how to normalize and count PE aligned reads is important in performing RNA-Seq differential gene expression analysis.

            Comment


            • #7
              Potential source of false positives?

              I was thinking about this a little more and this doubling could lead to false positives too if not corrected for.

              Consider the following made up example:
              condition1 condition2
              gene1 5 9 corrected for PE double count
              gene1 10 18 uncorrected for PE double count

              For low read counts the doubling could result in the appearance of a more significant difference than actually exists unless I am missing something fundamental here not to mention the havoc it will likely have on over dispersion between biological replicates.

              Thoughts anyone?

              Comment


              • #8
                You could use my htsep-count script, which does not double-count paired-end reads. Rather, a read pair is counted once for a gene, if both ends map to the same gene, and is discarded otherwise.

                Furthermore, have a look the vignette of DESeq, where we present an example of a mixed paired-end/single-end data set.

                However, if you have confounding between treatment and library type, you should better discard the second mates to avoid bias.

                Comment


                • #9
                  Similar problem

                  I have a similar problem as the OP. I've downloaded RNA-seq data (Hs) from a published paper and to my surprise the control condition is PE and the 2 conditions are SE. No replicates.

                  They've used this data to determine differential exon usage between control vs condition 1 and control vs condition 2 using a "home-brewed" analysis after mapping. To answer the particular biological problem I am interested in cufflinks (and cuffdiff) is ideal because it should allow me to extract exactly the information I am interested.

                  The question is: is it even possible to use cufflinks (+cuffdiff) to analyse control (PE) vs condition (SE)?

                  Comment


                  • #10
                    Originally posted by turnersd View Post
                    Thanks Andrew. So, the next logical question - what are the methods that we're all using every day (HTSeq-count + DESeq, DEXSeq, BayesSeq, EdgeR, Cuffdiff) doing about counting with paired-end reads?
                    I'd like to second this question. Even delving deep into the documentation for some of these doesn't find an answer.

                    I am quite interested in the effect of changing the counting scheme on gene expression estimates. It seems to me that with paired-ends, you need to be counting fragments, not reads, and you need to be normalizing for fragment length as well. Shouldn't a transcript that gets chopped into 2x250bp fragments should count the same as a transcript that gets chopped into 5x100bp fragments?

                    Comment

                    Latest Articles

                    Collapse

                    • seqadmin
                      Strategies for Sequencing Challenging Samples
                      by seqadmin


                      Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                      03-22-2024, 06:39 AM
                    • seqadmin
                      Techniques and Challenges in Conservation Genomics
                      by seqadmin



                      The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                      Avian Conservation
                      Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                      03-08-2024, 10:41 AM

                    ad_right_rmr

                    Collapse

                    News

                    Collapse

                    Topics Statistics Last Post
                    Started by seqadmin, Yesterday, 06:37 PM
                    0 responses
                    10 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, Yesterday, 06:07 PM
                    0 responses
                    10 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 03-22-2024, 10:03 AM
                    0 responses
                    51 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 03-21-2024, 07:32 AM
                    0 responses
                    67 views
                    0 likes
                    Last Post seqadmin  
                    Working...
                    X