Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Analysing only exons from ONE gene using RNAseq

    I know that DEXseq and/or Cuffdiff etc. may be used of exon-levels analyses.

    However, I am only interested in ONE gene, because a novel exon is discovered in this gene, by other researches. There are one "old" known transcript using exon-1a and the rest of the gene, and this "new" transcript using exon-1b and the rest if the gene.

    Now, I want to see if there is a different usage of exon-1a vs. exon-1b between lean and obese people.

    My approach is:
    1. Make a new custom GTF-annotation file with the novel exon. This GTF-file thus contains info about the "old" and the "new" transcript. Since it does not exists in any known repertoires yet.
    2. Using tophat2, only provide chromosome 7 (where the gene is located) and this GTF as annotations.
    3. Count the reads uniquely mapped exon-1a and exon-1b. And also sum all reads on the gene.
    4. Normalize the exon expression to total gene expression and perhaps to library size?
    5. Simply use linear regression i.e. exon~sum.of.reads+library.size+obese

    Is this "valid"? Am I overlooking something major? I know I ignore some dispersion estimates etc. and that I ofc could run the whole DEXseq pipeline. But I want to save time and get intuitive answers. DEXseq is hard to understand and hard to explain to coworkers.

    Thanks!
    Last edited by sindrle; 03-11-2017, 08:01 AM.

  • #2
    If you're looking at a single gene and have the full transcript sequences, why not just map to the transcripts and skip the complicated intron/exon searching?

    If you're using Tophat2 for exon searches, you can switch to mapping to the transcripts with Bowtie2 without much change in the mapping pipeline.

    Comment


    • #3
      Thats interesting!

      Yes, I have the full transcript sequences. What do you think is the pros and cons between these approaches?

      Comment


      • #4
        If the gene (or a subsequence of the gene) is not genome-unique, then non-specific matches to the gene of interest may happen.

        I can't really think of any other cons, except for the whole throwing out 99.9% of your data thing, but that may be an advantage in this case because the needle is in the remainder.

        Comment


        • #5
          Ok, so I got the transcript-sequences for the two transcripts of interest. Can you please explain how I should use them as reference for Bowtie2 (file format etc.)? An example with code would be very nice!

          I probably can figure it out myself, but I would greatly appreciate your input, also to check that Im doing it correctly.

          Thank you!
          Last edited by sindrle; 03-13-2017, 11:17 AM.

          Comment


          • #6
            This should produce a count table for each transcript. It assumes that the most recent version of SAMtools is installed:

            Code:
            cat transcript1.fa transcript2.fa > allTranscripts_myGene.fa # combine transcripts into a single file
            bowtie2-build allTranscripts_myGene.fa allTranscripts_myGene.fa # make bowtie2 index for transcripts
            bowtie2 -x allTranscripts_myGene.fa -1 left_reads.fq.gz -2 right_reads.fq.gz | samtools sort -O BAM > reads_vs_myGene.bam # map reads to transcripts
            samtools index reads_vs_myGene.bam # generate index file for mapped sequences
            samtools idxstats reads_vs_myGene.bam # show mapping counts for transcripts
            Note that any shared transcript areas will be randomly distributed among the two transcripts. A better idea of the differential expression may be obtained by looking at the reads in a pileup program like Tablet or IGV and only counting the reads for places where the transcripts differ. This will also give an indication of whether or not unspliced sequences are also in the RNASeq reads, which will also affect count results.

            Comment


            • #7
              Thank you very much!

              The two transcripts each have 2 unique sections (exon-a1 for one, and exon-1b for the other). I guess I can just define where these sequences are, in a GTF-file?, and use HTseq or similar to count the reads.

              Comment


              • #8
                Yes, that will probably work. You need to be careful that you're not excluding reads that hit only partially to the unique regions (i.e. check the HTSeq-count options); any hits to the unique region of the transcripts should be counted as an isoform-specific hit.

                Comment


                • #9
                  Thats great!

                  Any input on how to normalise the counts? i.e. "counts/library size"?

                  Comment

                  Latest Articles

                  Collapse

                  • seqadmin
                    Strategies for Sequencing Challenging Samples
                    by seqadmin


                    Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                    03-22-2024, 06:39 AM
                  • seqadmin
                    Techniques and Challenges in Conservation Genomics
                    by seqadmin



                    The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                    Avian Conservation
                    Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                    03-08-2024, 10:41 AM

                  ad_right_rmr

                  Collapse

                  News

                  Collapse

                  Topics Statistics Last Post
                  Started by seqadmin, Yesterday, 06:37 PM
                  0 responses
                  8 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, Yesterday, 06:07 PM
                  0 responses
                  8 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 03-22-2024, 10:03 AM
                  0 responses
                  49 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 03-21-2024, 07:32 AM
                  0 responses
                  67 views
                  0 likes
                  Last Post seqadmin  
                  Working...
                  X