Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Remove reverse complement redundancy in stranded transcriptome

    Hi All,
    We are working with a de novo transcriptome assembly of Illumina Hi-seq data - 20x 100 bp paired end, stranded libraries. Raw data underwent standard trimming and was assembled using mainly default settings in Trinity with the appropriate RF flag for stranded data.

    However, it appears that our libraries are not as 'stranded' as we would hope, as after searching our assembly for common qPCR reference genes (about 15) I found that in each case, our assembly contains a strong hit in the expected orientation as well as a near identical contig in reverse complement.

    We estimated that our 'stranded' libraries actually have 13-25% reverse mapping reads, by mapping each library to contigs from the combined transcriptome assembly (won't be a perfect estimate because some of the forward and reverse strand transcripts will overlap and we don't have a reference genome).

    We have another transcriptome for a related species (same treatments), where the 'strandedness' appears more efficient (estimated 5-15% reads mapping to reverse strand).

    My questions are:
    Has anyone come across this problem in their own data and what might lead to a low efficiency in the stranded protocol?

    Can anyone suggest an approach for redundancy removal that would also recognize reverse complement contigs? Programs such as CD-HIT don't seem to search in reverse complement.

    Thanks in advance for your thoughts!

  • #2
    The BBTools package's dedupe program will handle this. It can remove duplicate contigs as long as they are identical, or one is fully contained within the other, up to some maximum edit distance or hamming distance that you can specify, and it handles reverse-complements.

    Syntax:
    dedupe.sh in=assembly.fa out=deduplicated.fa

    Comment


    • #3
      I don't have any solution for your Trinity issues, since I've mainly done human/mouse RNA-Seq, but here are a few possibilities for your strandedness issue (I'm assuming that you're using a dUTP based method):

      ActD Freshness) The protocol is only ~80% strand specific without ActD to prevent spurious 2nd strand synthesis, and that stuff has a really terrible shelf life in solution at -20.
      Nucleotide carryover from 1st strand) if you don't sufficiently remove dTTP from the 1st strand step, it can be incorporated into the 2nd strand cDNA preventing UDG digestion.
      USER/UDG freshness) If the UDG enzyme has gone off, or wasn't incubated long enough, you could retain some of the 2nd strand cDNA.

      It very likely could be a combination of the three. I'm not sure how you're determining correct strand vs. antisense, but I've seen >99% correct strand, based on ERCCs, using all fresh ingredients.

      Comment


      • #4
        I would add possibility of biological process (antisense transcript) to cmbetts comments. It is well known that in some regions both strands are transcribed.

        Comment


        • #5
          Thanks all for your helpful responses - dedupe sounds like what we are after, and its very helpful to know potential library prep issues. We've discussed the observation with our sequencing service provider and will pass these suggestions on.

          Comment

          Latest Articles

          Collapse

          • seqadmin
            Current Approaches to Protein Sequencing
            by seqadmin


            Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
            04-04-2024, 04:25 PM
          • seqadmin
            Strategies for Sequencing Challenging Samples
            by seqadmin


            Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
            03-22-2024, 06:39 AM

          ad_right_rmr

          Collapse

          News

          Collapse

          Topics Statistics Last Post
          Started by seqadmin, 04-11-2024, 12:08 PM
          0 responses
          25 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-10-2024, 10:19 PM
          0 responses
          28 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-10-2024, 09:21 AM
          0 responses
          24 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-04-2024, 09:00 AM
          0 responses
          52 views
          0 likes
          Last Post seqadmin  
          Working...
          X