SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
PE SOLiD reads alignment by bwa m_elena_bioinfo Bioinformatics 8 02-21-2011 08:09 AM
Spliced alignment with BWA telos SOLiD 7 10-06-2010 06:32 AM
SpliceMap 3.3.3.x released: Tool for spliced read alignment john_mu Bioinformatics 0 09-07-2010 02:34 PM
PubMed: Supersplat -- spliced RNA-seq alignment. Newsbot! Literature Watch 0 05-09-2010 07:00 PM
Alignment of ABI solid reads and 454 reads baohua100 Bioinformatics 2 02-23-2009 04:58 PM

Reply
 
Thread Tools
Old 08-19-2010, 09:29 AM   #1
Simon Anders
Senior Member
 
Location: Heidelberg, Germany

Join Date: Feb 2010
Posts: 994
Default SOLiD WTP alignment file: representation of spliced reads

Hi

I've got a data set with SOLiD RNA-Seq data that was aligned with SOLiD's whole transcriptome analysis pipeline (WTP 1.2.1). This software produces a GFF file that represents each read with one line, or, if the read straddles a splice junction, wth two lines (which are usually not next to each other).

I have trouble understanding how the spliced reads are represented.

Here is a normal read:
Code:
chr2L   wtp     read    75079   75113   30      +       .       bd=1445_1152_746_F3;rs=16;mm=0;g=T12003120303000210002013210322101003330110312122223;i=1;
There are 0 mismatches (mm=0) and 16 bases skipped (rs=16). if I convert the read to sequence space and extract the part at the indicated coordinates from my refernce FASTA, this alignes nicely:

Code:
TGAAATGAATTAAAAGTTTTCCATCAATCTGGTTTATAACAATGACTCTCG  [read]
                TTTTCCATCAATCTGGTTTATAACAATGACTCTCG  [reference, 2L:75079-75113]
----------------  [ <-- 16 skipped]
Now for a spliced read. This bead ID here appears twice:

Code:
chr2L   wtp     read    108217  108226  45      +       .       bd=1636_459_310_F3;rs=1;mm=0;g=T32012102331321332201132130130000113020000230013032;i=1;jp=108588;jt=k;
chr2L   wtp     read    108588  108622  45      +       .       bd=1636_459_310_F3;rs=1;mm=0;g=T32012102331321332201132130130000113020000230013032;i=1;jp=108217;jt=k;
The two lines refer each others starting positions via the 'jp' attribute. However if I extract the indicates positions, there is no match:

Code:
TAGGTCAAGCGTAGTATCTTGTAGTAACGGGGGTGCCTTTTTCGGGTAATC   [read]
 CTCAGAATCA                                           [reference, 2L:108217-108226]
           CTCCACCAACAATTTAGCCGACCGGAACTCGGGTT        [reference, 2L:108588-108622]
I can't find these reference parts anywhere in the read.

I tried many different reads, and always, the non-spliced ones agree with the reference (unless there are mismatches, causing the colour space decoding to lose sync) and the spliced ones don't. Do I have to do something different if I decode colour space for a spliced read? Do I misunderstand the WTP output format? Or is something going severely wrong here?

Thanks for any hints

Simon

Last edited by Simon Anders; 08-23-2010 at 02:55 AM. Reason: corrected GFF excerpt
Simon Anders is offline   Reply With Quote
Reply

Tags
solid, spliced read, wtp

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 02:57 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO