Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • sindrle
    Senior Member
    • Aug 2013
    • 266

    Analysing only exons from ONE gene using RNAseq

    I know that DEXseq and/or Cuffdiff etc. may be used of exon-levels analyses.

    However, I am only interested in ONE gene, because a novel exon is discovered in this gene, by other researches. There are one "old" known transcript using exon-1a and the rest of the gene, and this "new" transcript using exon-1b and the rest if the gene.

    Now, I want to see if there is a different usage of exon-1a vs. exon-1b between lean and obese people.

    My approach is:
    1. Make a new custom GTF-annotation file with the novel exon. This GTF-file thus contains info about the "old" and the "new" transcript. Since it does not exists in any known repertoires yet.
    2. Using tophat2, only provide chromosome 7 (where the gene is located) and this GTF as annotations.
    3. Count the reads uniquely mapped exon-1a and exon-1b. And also sum all reads on the gene.
    4. Normalize the exon expression to total gene expression and perhaps to library size?
    5. Simply use linear regression i.e. exon~sum.of.reads+library.size+obese

    Is this "valid"? Am I overlooking something major? I know I ignore some dispersion estimates etc. and that I ofc could run the whole DEXseq pipeline. But I want to save time and get intuitive answers. DEXseq is hard to understand and hard to explain to coworkers.

    Thanks!
    Last edited by sindrle; 03-11-2017, 08:01 AM.
  • gringer
    David Eccles (gringer)
    • May 2011
    • 845

    #2
    If you're looking at a single gene and have the full transcript sequences, why not just map to the transcripts and skip the complicated intron/exon searching?

    If you're using Tophat2 for exon searches, you can switch to mapping to the transcripts with Bowtie2 without much change in the mapping pipeline.

    Comment

    • sindrle
      Senior Member
      • Aug 2013
      • 266

      #3
      Thats interesting!

      Yes, I have the full transcript sequences. What do you think is the pros and cons between these approaches?

      Comment

      • gringer
        David Eccles (gringer)
        • May 2011
        • 845

        #4
        If the gene (or a subsequence of the gene) is not genome-unique, then non-specific matches to the gene of interest may happen.

        I can't really think of any other cons, except for the whole throwing out 99.9% of your data thing, but that may be an advantage in this case because the needle is in the remainder.

        Comment

        • sindrle
          Senior Member
          • Aug 2013
          • 266

          #5
          Ok, so I got the transcript-sequences for the two transcripts of interest. Can you please explain how I should use them as reference for Bowtie2 (file format etc.)? An example with code would be very nice!

          I probably can figure it out myself, but I would greatly appreciate your input, also to check that Im doing it correctly.

          Thank you!
          Last edited by sindrle; 03-13-2017, 11:17 AM.

          Comment

          • gringer
            David Eccles (gringer)
            • May 2011
            • 845

            #6
            This should produce a count table for each transcript. It assumes that the most recent version of SAMtools is installed:

            Code:
            cat transcript1.fa transcript2.fa > allTranscripts_myGene.fa # combine transcripts into a single file
            bowtie2-build allTranscripts_myGene.fa allTranscripts_myGene.fa # make bowtie2 index for transcripts
            bowtie2 -x allTranscripts_myGene.fa -1 left_reads.fq.gz -2 right_reads.fq.gz | samtools sort -O BAM > reads_vs_myGene.bam # map reads to transcripts
            samtools index reads_vs_myGene.bam # generate index file for mapped sequences
            samtools idxstats reads_vs_myGene.bam # show mapping counts for transcripts
            Note that any shared transcript areas will be randomly distributed among the two transcripts. A better idea of the differential expression may be obtained by looking at the reads in a pileup program like Tablet or IGV and only counting the reads for places where the transcripts differ. This will also give an indication of whether or not unspliced sequences are also in the RNASeq reads, which will also affect count results.

            Comment

            • sindrle
              Senior Member
              • Aug 2013
              • 266

              #7
              Thank you very much!

              The two transcripts each have 2 unique sections (exon-a1 for one, and exon-1b for the other). I guess I can just define where these sequences are, in a GTF-file?, and use HTseq or similar to count the reads.

              Comment

              • gringer
                David Eccles (gringer)
                • May 2011
                • 845

                #8
                Yes, that will probably work. You need to be careful that you're not excluding reads that hit only partially to the unique regions (i.e. check the HTSeq-count options); any hits to the unique region of the transcripts should be counted as an isoform-specific hit.

                Comment

                • sindrle
                  Senior Member
                  • Aug 2013
                  • 266

                  #9
                  Thats great!

                  Any input on how to normalise the counts? i.e. "counts/library size"?

                  Comment

                  Latest Articles

                  Collapse

                  • GATTACAT
                    Reply to Nine Things a Sample Prep Scientist Thinks About Before Sequencing
                    by GATTACAT
                    Love this - good data definitely starts from good input, and poor input can only give relatively poor data. I particularly like the mention of Nanodrop/absorbance based methods for quantification. It's such a toss up if you'll get an accurate reading or what amounts to a randomly generated number, and a lot of library/sequencing related issues can be traced back to poor quant.
                    07-01-2026, 11:43 AM
                  • SEQadmin2
                    Nine Things a Sample Prep Scientist Thinks About Before Sequencing
                    by SEQadmin2


                    I’m not a sequencing expert. I’m a purification scientist who uses NGS to evaluate workflows my group develops. With this perspective, we think about the sample first and the NGS workflow second. The sequencer is an exceptionally honest reporter, but it can only report on what you give it, so whether you get clean, interpretable data from an NGS workflow is largely determined before you begin.

                    Here are nine questions we think about, in roughly the order they matter, before...
                    06-18-2026, 07:11 AM

                  ad_right_rmr

                  Collapse

                  News

                  Collapse

                  Topics Statistics Last Post
                  Started by SEQadmin2, 07-02-2026, 11:08 AM
                  0 responses
                  11 views
                  0 reactions
                  Last Post SEQadmin2  
                  Started by SEQadmin2, 06-30-2026, 05:37 AM
                  0 responses
                  14 views
                  0 reactions
                  Last Post SEQadmin2  
                  Started by SEQadmin2, 06-26-2026, 11:10 AM
                  0 responses
                  20 views
                  0 reactions
                  Last Post SEQadmin2  
                  Started by SEQadmin2, 06-17-2026, 06:09 AM
                  0 responses
                  54 views
                  0 reactions
                  Last Post SEQadmin2  
                  Working...