SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
SOAP alignment format convert to SAM/BAM KevinLam Bioinformatics 31 01-10-2018 08:05 PM
Updated How to convert .txt file to .bed .GFF or .BAR file format, forevermark4 Bioinformatics 2 06-30-2014 05:02 AM
How to convert BED format to SAM/BAM? seq_newbie Bioinformatics 1 06-23-2011 08:11 AM
Looking process to convert gff3 format into ace format or sam format andylai Bioinformatics 1 05-17-2011 02:09 AM
s_*_export.txt VS s_*_sequence.txt zhuj Illumina/Solexa 5 06-08-2010 01:35 PM

Reply
 
Thread Tools
Old 09-23-2010, 06:58 AM   #1
jorgebm
Member
 
Location: Spain

Join Date: Feb 2010
Posts: 18
Default Right way to convert splice-aligned Solexa reads (s_?_export.txt) to SAM format

Hello,


I would ask you about the right way to convert splice-aligned Solexa reads (s_?_export.txt) to SAM format.

Some time ago I've performed a "eland_rna" analysis using GERALD.pl v.1.15 (CASAVA 1.6). Now I exported s_?_export.txt file to SAM format using

"illumina_export2sam.pl" script (bundled with CASAVA 1.7).

Focus on splice-aligned reads I found weakness in this format conversion script. For instance,

this s_?_export.txt read line:

HWUSI-EAS1597 1 7 1 1 1467 0 1 ATGGAAGCTCTGCGGTNATACAACCAGGAGCACTC aa^`]aa`a`a_`]`ZB`___[a_`XZWU[UWTUZ splicesites34.fa RIC8A_34_34_chr11.fa_198938_199271 5 F 16C18 105 Y


is converted to the following SAM line

HWUSI-EAS1597_1:7:1:1:1467 0 splicesites34.fa/RIC8A_34_34_chr11.fa_198938_199271 5 105 35M * 0 0 ATGGAAGCTCTGCGGTNATACAACCAGGAGCACTC BB?A>BBABAB@A>A;#A@@@<B@A9;86<6856; XD:Z:16C18 SM:i:105


This splice is (from refFlat)

chr11 hg18_refFlat exon 198530 198938 0.000000 + . ID=exon:RIC8A:1;Parent=RIC8A;
chr11 hg18_refFlat exon 199271 199318 0.000000 + . ID=exon:RIC8A:2;Parent=RIC8A;

And in Illumina splicesites34.fa format

>RIC8A_34_34_chr11.fa_198938_199271
GATTATGGAAGCTCTGCGGTCATACAACCAGGAGCACTCCCAGAGCTTCACGTTTGATGATGCCCAAC

So, the alignment is

------ATGGAAGCTCTGCGGTNATACAACCAGGAGCACTC (read)

GATTATGGAAGCTCTGCGGTNATACAACCAGGAG (last 35b of RIC8A:1)
------------------------------------------------------CACTCCCAGAGCTTCACGTTTGATGATGCCCAAC (first 35 bases of RIC8A:2)



Note that field 4 of SAM line (POS), that is formally defined as "1-based leftmost POSition/coordinate of clipped sequence", is set to 5. That's the offset against splicesites34.fa reference file and not to the genome reference files (chr11.fa).

I think it's wrong. it isn't?. I mean this is not a full compliance format conversion.

Last edited by jorgebm; 09-23-2010 at 07:03 AM.
jorgebm is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 01:32 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO