SEQanswers

Go Back   SEQanswers > Applications Forums > RNA Sequencing



Similar Threads
Thread Thread Starter Forum Replies Last Post
gtf file reverse complement coordinates lwhitmore Bioinformatics 4 08-20-2014 08:03 AM
Reverse complement of a bacterial gene seema12345 Introductions 4 03-07-2014 07:00 PM
Tophat strand-specific RNAseq- Do i need to reverse-complement? abhinay RNA Sequencing 1 08-06-2013 11:21 AM
DNA - Assembly: Algorithmic recognition of reverse complement reads BerGer23 Bioinformatics 0 11-12-2012 01:47 AM
Remove redundancy from 454 data sklages 454 Pyrosequencing 2 09-23-2010 07:14 AM

Reply
 
Thread Tools
Old 10-27-2014, 03:20 PM   #1
evt8
Junior Member
 
Location: New Zealand

Join Date: Aug 2014
Posts: 7
Question Remove reverse complement redundancy in stranded transcriptome

Hi All,
We are working with a de novo transcriptome assembly of Illumina Hi-seq data - 20x 100 bp paired end, stranded libraries. Raw data underwent standard trimming and was assembled using mainly default settings in Trinity with the appropriate RF flag for stranded data.

However, it appears that our libraries are not as 'stranded' as we would hope, as after searching our assembly for common qPCR reference genes (about 15) I found that in each case, our assembly contains a strong hit in the expected orientation as well as a near identical contig in reverse complement.

We estimated that our 'stranded' libraries actually have 13-25% reverse mapping reads, by mapping each library to contigs from the combined transcriptome assembly (won't be a perfect estimate because some of the forward and reverse strand transcripts will overlap and we don't have a reference genome).

We have another transcriptome for a related species (same treatments), where the 'strandedness' appears more efficient (estimated 5-15% reads mapping to reverse strand).

My questions are:
Has anyone come across this problem in their own data and what might lead to a low efficiency in the stranded protocol?

Can anyone suggest an approach for redundancy removal that would also recognize reverse complement contigs? Programs such as CD-HIT don't seem to search in reverse complement.

Thanks in advance for your thoughts!
evt8 is offline   Reply With Quote
Old 10-27-2014, 03:39 PM   #2
Brian Bushnell
Super Moderator
 
Location: Walnut Creek, CA

Join Date: Jan 2014
Posts: 2,707
Default

The BBTools package's dedupe program will handle this. It can remove duplicate contigs as long as they are identical, or one is fully contained within the other, up to some maximum edit distance or hamming distance that you can specify, and it handles reverse-complements.

Syntax:
dedupe.sh in=assembly.fa out=deduplicated.fa
Brian Bushnell is offline   Reply With Quote
Old 10-27-2014, 04:55 PM   #3
cmbetts
Senior Member
 
Location: Bay Area

Join Date: Jun 2012
Posts: 113
Default

I don't have any solution for your Trinity issues, since I've mainly done human/mouse RNA-Seq, but here are a few possibilities for your strandedness issue (I'm assuming that you're using a dUTP based method):

ActD Freshness) The protocol is only ~80% strand specific without ActD to prevent spurious 2nd strand synthesis, and that stuff has a really terrible shelf life in solution at -20.
Nucleotide carryover from 1st strand) if you don't sufficiently remove dTTP from the 1st strand step, it can be incorporated into the 2nd strand cDNA preventing UDG digestion.
USER/UDG freshness) If the UDG enzyme has gone off, or wasn't incubated long enough, you could retain some of the 2nd strand cDNA.

It very likely could be a combination of the three. I'm not sure how you're determining correct strand vs. antisense, but I've seen >99% correct strand, based on ERCCs, using all fresh ingredients.
cmbetts is offline   Reply With Quote
Old 10-27-2014, 08:32 PM   #4
nucacidhunter
Jafar Jabbari
 
Location: Melbourne

Join Date: Jan 2013
Posts: 1,231
Default

I would add possibility of biological process (antisense transcript) to cmbetts comments. It is well known that in some regions both strands are transcribed.
nucacidhunter is offline   Reply With Quote
Old 11-05-2014, 03:28 PM   #5
evt8
Junior Member
 
Location: New Zealand

Join Date: Aug 2014
Posts: 7
Default

Thanks all for your helpful responses - dedupe sounds like what we are after, and its very helpful to know potential library prep issues. We've discussed the observation with our sequencing service provider and will pass these suggestions on.
evt8 is offline   Reply With Quote
Reply

Tags
redundancy removal, reverse complement, strandedness

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 04:10 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO