SEQanswers

SEQanswers (http://seqanswers.com/forums/index.php)
-   Bioinformatics (http://seqanswers.com/forums/forumdisplay.php?f=18)
-   -   SOLiD WTP alignment file: representation of spliced reads (http://seqanswers.com/forums/showthread.php?t=6477)

Simon Anders 08-19-2010 09:29 AM

SOLiD WTP alignment file: representation of spliced reads
 
Hi

I've got a data set with SOLiD RNA-Seq data that was aligned with SOLiD's whole transcriptome analysis pipeline (WTP 1.2.1). This software produces a GFF file that represents each read with one line, or, if the read straddles a splice junction, wth two lines (which are usually not next to each other).

I have trouble understanding how the spliced reads are represented.

Here is a normal read:
Code:

chr2L  wtp    read    75079  75113  30      +      .      bd=1445_1152_746_F3;rs=16;mm=0;g=T12003120303000210002013210322101003330110312122223;i=1;
There are 0 mismatches (mm=0) and 16 bases skipped (rs=16). if I convert the read to sequence space and extract the part at the indicated coordinates from my refernce FASTA, this alignes nicely:

Code:

TGAAATGAATTAAAAGTTTTCCATCAATCTGGTTTATAACAATGACTCTCG  [read]
                TTTTCCATCAATCTGGTTTATAACAATGACTCTCG  [reference, 2L:75079-75113]
----------------  [ <-- 16 skipped]

Now for a spliced read. This bead ID here appears twice:

Code:

chr2L  wtp    read    108217  108226  45      +      .      bd=1636_459_310_F3;rs=1;mm=0;g=T32012102331321332201132130130000113020000230013032;i=1;jp=108588;jt=k;
chr2L  wtp    read    108588  108622  45      +      .      bd=1636_459_310_F3;rs=1;mm=0;g=T32012102331321332201132130130000113020000230013032;i=1;jp=108217;jt=k;

The two lines refer each others starting positions via the 'jp' attribute. However if I extract the indicates positions, there is no match:

Code:

TAGGTCAAGCGTAGTATCTTGTAGTAACGGGGGTGCCTTTTTCGGGTAATC  [read]
 CTCAGAATCA                                          [reference, 2L:108217-108226]
          CTCCACCAACAATTTAGCCGACCGGAACTCGGGTT        [reference, 2L:108588-108622]

I can't find these reference parts anywhere in the read.

I tried many different reads, and always, the non-spliced ones agree with the reference (unless there are mismatches, causing the colour space decoding to lose sync) and the spliced ones don't. Do I have to do something different if I decode colour space for a spliced read? Do I misunderstand the WTP output format? Or is something going severely wrong here?

Thanks for any hints

Simon


All times are GMT -8. The time now is 06:46 AM.

Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.