Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Expression in RNA-seq

    Hi everyone,

    We recently sent some samples for RNA-sequencing (Solid 50bp strand-specific) and we realized a gene that we consider important is not covered with sequencing reads. There are some reads aligning to intronic sequences and one to exonic but it seems that this happened by accident. We analyzed the alignments with Tophat and Cufflinks and not surprisingly the FPKM for this gene is 0.
    Does anyone know the chances that this can happen by technical difficulties (it's tumor tissue and normal tissue of that organ should have that gene expressed quite highly according to UCSC). Most other genes we've looked at had quite a high coverage.
    We got ~100M reads.
    Any suggestions?

    Best regards

  • #2
    The reads that would have aligned there could have been thrown out because they mapped too many places in the genome, is that possible?

    You could test this by cutting up the gene and trying to align it to the genome. Tophat by default will throw out reads that map > 40 places.

    If you think it's expressed, you could test it with qPCR too.

    Comment


    • #3
      I agree with mgogol. One possible explanation is mapping, although with 50mers you would think some part of the gene would be mappable. How big is this gene? How many exons?

      Try a BLAT of some of the exons of this gene against the genome (do they hit everywhere?). You could also try mapping reads directly against a database of transcripts to see whether there are any matching reads (this would again suggest that the reason they are not coming through in your Tophat analysis is that they are ambiguous in the genome...). It would also allow you to identify a subset of reads to try different mapping approaches on and maybe eliminate the possibility of some technical issue(s) with your tophat run...

      Comment


      • #4
        Another possibility could be small exons, and then junctions come into play. I am not sure is tophat/cufflinks take a de novo approach that overcomes this?
        --
        bioinfosm

        Comment


        • #5
          We have seen trimming down to 25 with bowtie and tophat can help with this.

          Comment


          • #6
            One easy way to check if it is a mappability issue it to go to UCSC for hg19,
            click the Mapability link under "Mapping and Sequence Tracks"
            select the 50bp option,
            look at your gene.


            Otherwise, maybe you just found some biology in your experiment.

            Comment


            • #7
              Tophat/cufflinks does try to predict novel exons and junctions. I'm not sure how well it handles small exons... Mapping small reads to the genome and then trying to find small exons is a challenging problem. A read that overlaps a small exon may require an alignment into three or more short blocks with potentially large gaps between (e.g. <exon>-intron-<exon>-intron-<exon>). Doing this with full length cDNAs (never mind short reads) can be a difficult. I suspect that most methods in use right now have fairly low sensitivity for short exons. One strategy to at least capture the short exons from known transcripts is to integrate alignment to transcripts and the genome (as advocated in Griffith et al.). This way, reads that hit short exons can be aligned to a transcript sequence without gaps which is much easier. This of course does not work for novel transcripts and relies on the accuracy and completion of transcript annotations for the species being analyzed.

              Comment


              • #8
                Exactly, and I think doing alignments to genome and transcripts raises an important question of which one to get priority. There would certainly be reads well aligned to both, then I suppose transcript alignment would get preference.

                Similar issue comes when using a separate reads junction database or contamination database.. given equal alignments of a read to multiple datasets, which one should get preference
                --
                bioinfosm

                Comment


                • #9
                  Regarding the comment that the observation may simply be due to biology. Good point! Once you convince yourself that the observation is not due to an artifact of the analysis (mapability, small exons, etc.) you should definitely consider this.

                  One of the benefits of RNA-seq over microarrays (IMHO) is the excellent signal-to-noise ratio. I have observed many cases where a gene appears to be turned off in one condition and on in another condition (tissue type, treatment, etc.). Even in very deep RNA-seq libraries, i have been amazed to see just how few reads are reported for the 'off' state. And since in this case we have RNA-seq libraries (analyzed by the same method) for alternate conditions that do get covered by reads we can reassure ourselves that the lack of coverage is not due to some artifact of the analysis.

                  For example, consider this data set (ALEXA-seq by Griffith et al) consisting of four cell types (normal luminal breast, normal myepithelial breast, hESCs, and vHMECs). These libraries have ~150 million paired-end 75-mers each. The quality of these libraries was very high. After performing differential expression analysis we can find many examples where a particular gene has 0 reads (or just a few) in one of these conditions, and thousands in one or more of the others. And we can find 'off' genes for any of the four cell types. For example, CCL2 has 0 reads in the vHMECs library and 124,664 in the myoepithelial breast library. Similarly, COL17A1 has 776,907 reads in the vHMECs library but only 142 in the luminal epithelial breast tissue. You can explore further examples in this list of DE genes.

                  Comment


                  • #10
                    Yes. A paired or comparison analysis really reduces a lot of sequencing and mapping biases, so differential expression comparisons are more or less accurate.

                    I really wish to give this trans-Abyss tool a try, seems like a one-stop solution to rna-seq analysis, but has a huge list of pre-reqs and config files to figure out
                    --
                    bioinfosm

                    Comment

                    Latest Articles

                    Collapse

                    • seqadmin
                      Current Approaches to Protein Sequencing
                      by seqadmin


                      Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                      04-04-2024, 04:25 PM
                    • seqadmin
                      Strategies for Sequencing Challenging Samples
                      by seqadmin


                      Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                      03-22-2024, 06:39 AM

                    ad_right_rmr

                    Collapse

                    News

                    Collapse

                    Topics Statistics Last Post
                    Started by seqadmin, 04-11-2024, 12:08 PM
                    0 responses
                    18 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 04-10-2024, 10:19 PM
                    0 responses
                    22 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 04-10-2024, 09:21 AM
                    0 responses
                    17 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 04-04-2024, 09:00 AM
                    0 responses
                    48 views
                    0 likes
                    Last Post seqadmin  
                    Working...
                    X