SEQanswers

Go Back   SEQanswers > Sequencing Technologies/Companies > SOLiD



Similar Threads
Thread Thread Starter Forum Replies Last Post
where to find coordinates for promter, splice site, splice regulatory site? cliff Bioinformatics 13 11-18-2013 05:23 PM
HMMSplicer : new software for finding splice junctions in RNA-Seq data mdimon RNA Sequencing 29 12-11-2012 10:20 AM
New: empirical splice junction finder from RNA-seq data DougB Bioinformatics 3 06-11-2012 04:49 AM
RNA-Seq: Observations on novel splice junctions from RNA sequencing data. Newsbot! Literature Watch 0 05-18-2011 02:30 AM
SpliceMap: New tool for detecting splice junctions from RNA-Seq data jm1234567890 Bioinformatics 10 04-22-2010 07:57 AM

Reply
 
Thread Tools
Old 05-08-2011, 11:03 PM   #1
Hobbe
Member
 
Location: Uppsala, Sweden

Join Date: Apr 2010
Posts: 29
Default Splice site prediction with solid rna-seq data

Hi all

We are having problems predicting splice sites from our Solid rna-seq data. We have a draft genome (125Mb, a eukaryote) assembled from 454-data and are now trying to map our Solid reads to this genome to predict splice sites. The idea is to use these predicted splice sites to make intron hints for the gene finder Augustus to create correct gene models.

We are currently trying Bowtie/Tophat, but get weird results. For example, when working with a subset of our reads we find some splice sites, but these are not found when we add more data. Also, we have earlier tried Corona Light together with Splitseek, and Bowtie/Tophat does not find sites that were found with Corona Light/Splitseek. On the other hand, Corona Light/Splitseek is timeconsuming/awkward to run and often reports splice sites that are a few bp off, so that is not an ideal choice either.

This cannot be an uncommon situation, so what are the rest of you doing in these situations? No closely related genomes have been sequenced.
Hobbe is offline   Reply With Quote
Old 05-13-2011, 05:36 AM   #2
colindaven
Senior Member
 
Location: Germany

Join Date: Oct 2008
Posts: 415
Default

Another reasonable choice might be hmmSplicer, at least for comparison. I've had what look to be reasonable results from it in the past. I take it you're working in sequence space, not colour space ?
colindaven is offline   Reply With Quote
Old 05-13-2011, 06:15 AM   #3
Hobbe
Member
 
Location: Uppsala, Sweden

Join Date: Apr 2010
Posts: 29
Default

Quote:
Originally Posted by colindaven View Post
Another reasonable choice might be hmmSplicer, at least for comparison. I've had what look to be reasonable results from it in the past. I take it you're working in sequence space, not colour space ?

Thanks for the reply. No, we are working in color space. Sequences converted to sequence space would too easily become wrong if there are any errors in the original colorspace reads. However, if you or anyone else have had good success with converting to sequence space I would love to hear about it. The general recommendation seems to be to map in colorspace.
Hobbe is offline   Reply With Quote
Old 08-15-2011, 06:16 AM   #4
darked89
Member
 
Location: Barcelona, Spain

Join Date: Jun 2009
Posts: 36
Default

Quote:
Originally Posted by Hobbe View Post
Hi all

We are having problems predicting splice sites from our Solid rna-seq data. We have a draft genome (125Mb, a eukaryote) assembled from 454-data and are now trying to map our Solid reads to this genome to predict splice sites. The idea is to use these predicted splice sites to make intron hints for the gene finder Augustus to create correct gene models.
Augustus can cope with "hints" created by mapping Illumina reads (converted to fasta) with splice-agnostic blat. So as long as you have some gene models for training, unspliced mappings should work, I hope.

Quote:
Originally Posted by Hobbe View Post
We are currently trying Bowtie/Tophat, but get weird results. For example, when working with a subset of our reads we find some splice sites, but these are not found when we add more data. Also, we have earlier tried Corona Light together with Splitseek, and Bowtie/Tophat does not find sites that were found with Corona Light/Splitseek. On the other hand, Corona Light/Splitseek is timeconsuming/awkward to run and often reports splice sites that are a few bp off, so that is not an ideal choice either.

This cannot be an uncommon situation, so what are the rest of you doing in these situations? No closely related genomes have been sequenced.
I got strange results from tophat vs bowtie mapping SOLID reads without GFF gene models guide (draft+ mamalian genome): bowtie in colorspace mapped _more_ reads than tophat. I used the latest versions (TopHat 1.3.1 and Bowtie 0.12.7).
darked89 is offline   Reply With Quote
Old 08-15-2011, 10:46 PM   #5
Hobbe
Member
 
Location: Uppsala, Sweden

Join Date: Apr 2010
Posts: 29
Default

Quote:
Originally Posted by darked89 View Post
Augustus can cope with "hints" created by mapping Illumina reads (converted to fasta) with splice-agnostic blat. So as long as you have some gene models for training, unspliced mappings should work, I hope.

Blat is the preferred program to use for spliced mapping (see the Augustus Rnaseq instructions). You really need those intron hints to get correct gene models. Blat doesn't work on Solid data though.

Of biggest importance in our case was to have Augustus trained on the actual organism. We did this using our 454 cDNA data, and using this training the number of correctly found genes in our small set (14) of known test genes increased from 6 to 9 (compared to using the training files for distantly related organisms that came with Augustus). Adding intron hints we are now up to 11 out of 14 genes, but this is only with a small part of our Solid rnaseq data, and we are now working on adding more hints. The only solution we have just now is using the old Corona Light pipeline together with Splitseek by Adam Ameur. Slow, but seems to work.

IMO, there is still a great need for a good spliced mapper for Solid data.
Hobbe is offline   Reply With Quote
Old 08-16-2011, 05:18 AM   #6
darked89
Member
 
Location: Barcelona, Spain

Join Date: Jun 2009
Posts: 36
Default

Quote:
Originally Posted by Hobbe View Post
Blat is the preferred program to use for spliced mapping (see the Augustus Rnaseq instructions). You really need those intron hints to get correct gene models. Blat doesn't work on Solid data though.
Same for FASTQ format. Maybe there is something to be gained from color 2 fasta conversion and mapping by blat.

Quote:
Originally Posted by Hobbe View Post
Of biggest importance in our case was to have Augustus trained on the actual organism. We did this using our 454 cDNA data, and using this training the number of correctly found genes in our small set (14) of known test genes increased from 6 to 9 (compared to using the training files for distantly related organisms that came with Augustus). Adding intron hints we are now up to 11 out of 14 genes, but this is only with a small part of our Solid rnaseq data, and we are now working on adding more hints.
Also you may try to use CEGMA (http://korflab.ucdavis.edu/Datasets/cegma/) either to produce yet another training or testing set. Also at times there is no way out except starting semi-manual annotation, again be it for the training or testing sets. Blastp your Augustus predictions: genes whith high conservation/100% coverage in other species are likely to be real.

Quote:
Originally Posted by Hobbe View Post
The only solution we have just now is using the old Corona Light pipeline together with Splitseek by Adam Ameur. Slow, but seems to work.
Is it the currently recommended setup by Splitseek author? In the Splitseek 1.3.4 manual the recommended one is Whole Transcriptome Pipeline.

Quote:
Originally Posted by Hobbe View Post
IMO, there is still a great need for a good spliced mapper for Solid data.
Indeed. I have found some other software (X-MATE), but it requires junction libraries and uses yet another pipeline (http://solidsoftwaretools.com/gf/project/mapreads/).
See:
http://openwetware.org/wiki/Wikiomic...OLiD_data_only
darked89 is offline   Reply With Quote
Old 09-08-2011, 11:31 AM   #7
adameur
Member
 
Location: Uppsala, Sweden

Join Date: Nov 2009
Posts: 23
Default

Hi,

Just a few words about SplitSeek from the author. It only works with the split read mapper from the AB Whole Transcriptome Pipeline, always had. I'm aware it is akward but unfortunately there are currently no good alternatives.

The good news is that AB WTP actually works fine once you get it to run. I even managed to run some 75bp reads from the SOLiD5500 through WTP and SplitSeek (using 25bp anchors in the mapping) so it might be an option also in the future.

/Adam
adameur is offline   Reply With Quote
Reply

Tags
gene finding, solid, splice sites

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 12:27 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO