Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Using DEXSeq to compare differential exon usage from different technical replicates

    Hi All:

    I have a general question about testing differential exon usage using DEXSeq. Suppose I have a sample called SLR (from a cell line), and it is sequenced on 5 lanes, so I have BAM files like L1_SLR, L2_SLR, L3_SLR, L5_SLR and L7_SLR.bam. Here, the letter "L" denotes "lanes". In a typical RNA-Seq experiment, we should expect that reads from those 5 lanes have little variations.

    Now, similarly, I have three technical replicates of the cell line SLR, let's call them TEC1, TEC2 and TEC3. For each technical replicate (say TEC1), it is also sequenced using 5 lanes. For example, I have L1_TEC1, L3_TEC1, L4_TEC1, L5_TEC1 and L7_TEC1. Similarly for TEC2 and TEC3.

    Since TEC1, TEC2 and TEC3 are technical replicates of cell line SLR, ideally there should not be any differential exon usage detected. Our goal is to focus on these technical replicates and see if we'll obtain larger than expected amounts of differential exon usage. The tool we consider is DEXSeq. I looked at this manual here: http://bioconductor.org/packages/dev...te_objects.pdf. In Table 1, I think the author "concatenated" all the lanes (BAM files) into a single one for each replicate (treat1fb, etc), so we see "total" number of reads and exon counts. In my situation, can I know how to obtain

    (1) the combined BAM file for SLR? Shall I cat the 5 lanes of BAM files? (after combination, I can get SAM file using samtools, and then use dexseq_count.py GFF_file \ SLR_sorted.sam SLR.txt)

    (2) the GFF file is converted from GTF file using dexseq_prepare_annotation.py; can I know if the GTF file is just downloaded from Ensembl website (choose Homo Sapiens if my cell line is from human)?

    (3) since my ultimate objective is to compare (maybe pairwise?) those technical replicates: SLR, TEC1, TEC2, TEC3, instead of the traditional situation of comparing biological sample 1 (with several tech reps) with biological sample 2 (with several tech reps). Would there be any problem? I can see that maybe comparing SLR vs. TEC1 is impossible (also for other pairwise comparisons) as there is no "replicates" of the replicate.

    Thank you so much!

  • #2
    Originally posted by alittleboy View Post
    (1) the combined BAM file for SLR? Shall I cat the 5 lanes of BAM files? (after combination, I can get SAM file using samtools, and then use dexseq_count.py GFF_file \ SLR_sorted.sam SLR.txt)
    "samtools merge", or the picard tools equivalent.

    Originally posted by alittleboy View Post
    (2) the GFF file is converted from GTF file using dexseq_prepare_annotation.py; can I know if the GTF file is just downloaded from Ensembl website (choose Homo Sapiens if my cell line is from human)?
    At least the version of that script that I have won't download things for you. Go ahead and get the human GTF annotation from Ensembl (don't get the one from UCSC, you'll thank me later).

    Originally posted by alittleboy View Post
    (3) since my ultimate objective is to compare (maybe pairwise?) those technical replicates: SLR, TEC1, TEC2, TEC3, instead of the traditional situation of comparing biological sample 1 (with several tech reps) with biological sample 2 (with several tech reps). Would there be any problem? I can see that maybe comparing SLR vs. TEC1 is impossible (also for other pairwise comparisons) as there is no "replicates" of the replicate.
    The normal experiment is to compare group 1 with multiple biological replicates to group 2 with normal biological replicates. That's why you see people concatenating their datasets when they have technical replicates, it gives them higher depth. Can you describe the biological question that you're trying to answer with this? That might give people better insight into how to best help you.

    Comment


    • #3
      Originally posted by dpryan View Post
      "samtools merge", or the picard tools equivalent.

      Thanks for the suggestion!

      At least the version of that script that I have won't download things for you. Go ahead and get the human GTF annotation from Ensembl (don't get the one from UCSC, you'll thank me later).

      Yes, I downloaded the Ensembl GTF file (thanks for the reminder! I also heard that Ensembl is better to use), but here is another question I have: please see this post.

      The normal experiment is to compare group 1 with multiple biological replicates to group 2 with normal biological replicates. That's why you see people concatenating their datasets when they have technical replicates, it gives them higher depth. Can you describe the biological question that you're trying to answer with this? That might give people better insight into how to best help you.
      The situation in my case is kind of different: we try to focus on the comparison of technical replicates, to see if different methods are consistent with the results (ideally there shouldn't be any differential exon usage since they're tech. reps.). We have four technical replicates, each having 5 lanes, and we sum up the counts in the 5 lanes for each replicate, so that we have a table like this:

      Exon_ID CL_1 TR_1 TR_2 TR_3
      E001 XX XX XX XX
      ... ...

      In DEXSeq, I don't think we can compare cell line 1 (CL_1) with technical replicate 1 (TR_1) as there is no "replicate". How about, say, treating CL_1 and TR_1 as one group, and TR_2 and TR_3 as the other group, and compare the two groups? In this case, each group has two "replicates" that make DEXSeq estimation possible.

      Thanks!

      Comment


      • #4
        Regarding your question in the other thread/on biostars, while you don't have to use the Ensembl GTF file, it really is the path of least resistance. I've previously used an annotation using Entrez IDs and just wrote a couple scripts to pacify dexseq_prepare_annotation.py. In effect, this resulted in the annotation resembling that from Ensembl. Out of curiousity, I've poked around the DEXSeq code a bit. read.HTSeqCounts will work without a GFF file, though I assume that at least the plotting functions won't then work. Unless you have a great desire to go through the DEXSeq code to see what other uses it makes of the annotation file, you're probably best off just contacting one of it's authors (maybe a PM if neither of them happen to see the threads you started).

        If you think of your technical replicates as a single group, do they not show approximately Poisson variance? You're correct that DEXSeq isn't intended to compare individual samples (I've seen Simon Anders reply to that idea on this forum more times than I can count...the guy has the patience of a saint!). You could do as you suggested and just divide the 4 replicates into 2 groups. Of course, that's not really that informative for those of us doing normal experiments, since our variance will be higher. At the end of the day, I wonder if you're just testing how well the various programs estimate poisson noise with a small number of replicates (which is more of an argument that we should use more replicates than anything else).

        Comment

        Latest Articles

        Collapse

        • seqadmin
          Essential Discoveries and Tools in Epitranscriptomics
          by seqadmin




          The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
          04-22-2024, 07:01 AM
        • seqadmin
          Current Approaches to Protein Sequencing
          by seqadmin


          Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
          04-04-2024, 04:25 PM

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by seqadmin, Yesterday, 08:47 AM
        0 responses
        12 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-11-2024, 12:08 PM
        0 responses
        60 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-10-2024, 10:19 PM
        0 responses
        59 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-10-2024, 09:21 AM
        0 responses
        54 views
        0 likes
        Last Post seqadmin  
        Working...
        X