Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • all_your_base
    Member
    • Mar 2012
    • 40

    Six reading frame question... why all contain ORF??

    Hi,

    So I have a conceptual question I'm trying to get my head around. I have some RNA-seq data and was trying to determine the ORF of each read. Of course a six reading frame translation of a given nucleotide sequence would be expected to have a significant ORF in at least one frame, as long as the sequence comes from a gene region.

    However, I find short bits of sequences (my 150bp RNA-seq reads) that appears to have continuous ORFs on all 6 frames of translation without any stop codons at all... how can this be? Since there are 3 stop codons out of 64 possibilities, we should statistically see a stop every 21AA (63 bases) or so.

    I realize this could be an anomaly, but this seems to be the case with about 10% of all my RNA-seq reads. I realize this could happen with repetitive sequence, but I don't think that is the case, since it is RNA-seq data.

    Any thoughts or speculations are gladly welcomed!!
  • kcchan
    Senior Member
    • Jul 2012
    • 186

    #2
    RNA-seq libraries are almost never full length; the strands are fragmented into shorter fragments before sequencing. Therefore the reads you get are only a portion of the full mRNA. If you want to get the complete AA sequence of an RNA, you'll have to assemble your reads back together first.

    Comment

    • all_your_base
      Member
      • Mar 2012
      • 40

      #3
      Thanks for the reply. I understand this is just a small fragment of a whole mRNA, but for a span of 150 bases, I can't understand why we should find no stop codons on all 6 reading frames.

      Comment

      • syfo
        Just a member
        • Nov 2012
        • 103

        #4
        Originally posted by all_your_base View Post
        Since there are 3 stop codons out of 64 possibilities, we should statistically see a stop every 21AA (63 bases) or so.
        ...assuming the same frequency for each base, which is usually not the case. What is the GC% of this genome? Also, base distribution is not uniform and often differs between regions (gene/intergenic, exon/intron, etc). You might find GC-rich repeats in 3'UTRs for instance. Last, this subset of 10% might come from the same genomic locus.
        Have you first tried fastqc on your reads?

        Comment

        • dpryan
          Devon Ryan
          • Jul 2011
          • 3478

          #5
          Originally posted by all_your_base View Post
          Since there are 3 stop codons out of 64 possibilities, we should statistically see a stop every 21AA (63 bases) or so.
          Bases and codons aren't randomly distributed, nor should one to expect them to be.

          Comment

          Latest Articles

          Collapse

          ad_right_rmr

          Collapse

          News

          Collapse

          Topics Statistics Last Post
          Started by SEQadmin2, 06-09-2026, 11:58 AM
          0 responses
          17 views
          0 reactions
          Last Post SEQadmin2  
          Started by SEQadmin2, 06-05-2026, 10:09 AM
          0 responses
          27 views
          0 reactions
          Last Post SEQadmin2  
          Started by SEQadmin2, 06-04-2026, 08:59 AM
          0 responses
          38 views
          0 reactions
          Last Post SEQadmin2  
          Started by SEQadmin2, 06-02-2026, 12:03 PM
          0 responses
          61 views
          0 reactions
          Last Post SEQadmin2  
          Working...