SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Tech Summary: Roche's 454 GS20 / FLX / Titanium ECO 454 Pyrosequencing 12 09-11-2011 07:10 AM
Problems with 454 GS FLX Titanium medium volume sequencing BL82 454 Pyrosequencing 3 01-21-2011 05:08 AM
Error rates in 454 FLX/Titanium reads Marcus 454 Pyrosequencing 3 12-10-2010 11:11 AM
Discussion and explanation about Roche's 454 GS20 / FLX / Titanium?! edge 454 Pyrosequencing 1 10-05-2009 11:22 PM
In Sequence: ABI Announces SOLiD 3, 454 Launches Titanium, Illumina Sets Specs for En Newsbot! SOLiD 0 09-30-2008 01:18 PM

Reply
 
Thread Tools
Old 01-12-2010, 10:49 AM   #1
agroster
Member
 
Location: Bay Area

Join Date: Jan 2010
Posts: 13
Default sff_extract: combining data from 454 Flx and Titanium data sets

Hi,
I am attempting to feed my 454 sequencing data sets into MIRA for assembly.
Here is my data, all in .sff files:
Flx unpaired reads (6 separate. sff files)
Titanium unpaired reads (3 separate .sff files)
Titanium paired-end reads (2 separate .sff files).

I can extract to fasta and xml with the individual sets (unpaired-only and paired-only), but the MIRA assembler requires all data to be combined into a single data file.

Does anyone know if this can be done with sff_extract? If so, can you explain to me how? The sff_extract webpage doesn't explain how to do this.
agroster is offline   Reply With Quote
Old 01-12-2010, 10:58 PM   #2
Jose Blanca
Member
 
Location: Valencia, Spain

Join Date: Aug 2009
Posts: 70
Default

Have you tried to extract them toghether? Just like:

sff_extract 1.sff 2.sff
Jose Blanca is offline   Reply With Quote
Old 01-13-2010, 09:51 AM   #3
agroster
Member
 
Location: Bay Area

Join Date: Jan 2010
Posts: 13
Default

Thanks for your reply.

Since one set of my data is paired-end, i believe i need to force sff_extract to split up the reads by inducing the -l command to look for the linker. Also, mira requests an addition of the paired-end library size and stdev info, but I wouldn't want that applied to the unpaired reads.

I tried the cat command, but mira quit inexplicably when loading these files.
agroster is offline   Reply With Quote
Old 01-14-2010, 02:10 AM   #4
maubp
Peter (Biopython etc)
 
Location: Dundee, Scotland, UK

Join Date: Jul 2009
Posts: 1,540
Default

Quote:
Originally Posted by agroster View Post
I tried the cat command, but mira quit inexplicably when loading these files.
You can't just cat SFF files together - they have a complex header structure. It isn't really needed, but you can use the Roche tools (or 3rd party software) to combine SFF files, but they must all have the same number of flow cycles. In your case you have both FLX and Titanium runs so you can't do this.

In any case, don't you want to give MIRA two separate sets of 454 data
(the pairs and the unpaired reads)?
maubp is offline   Reply With Quote
Old 01-14-2010, 06:04 AM   #5
agroster
Member
 
Location: Bay Area

Join Date: Jan 2010
Posts: 13
Default

Quote:
Originally Posted by maubp View Post
You can't just cat SFF files together - they have a complex header structure. It isn't really needed, but you can use the Roche tools (or 3rd party software) to combine SFF files, but they must all have the same number of flow cycles. In your case you have both FLX and Titanium runs so you can't do this.

In any case, don't you want to give MIRA two separate sets of 454 data
(the pairs and the unpaired reads)?
I concatenated the separate .fasta, .qual, and .xml files that were generated post sff_extract for each type of data, not the .sff files before sff_extract.

I DO want to give MIRA two separate sets of 454 data, but I don't know how to do this (since MIRA looks for only one file name). Does anyone know how to do this?
agroster is offline   Reply With Quote
Old 01-14-2010, 06:19 AM   #6
maubp
Peter (Biopython etc)
 
Location: Dundee, Scotland, UK

Join Date: Jul 2009
Posts: 1,540
Default

Quote:
Originally Posted by agroster View Post
I concatenated the separate .fasta, .qual, and .xml files that were generated post sff_extract for each type of data, not the .sff files before sff_extract.
Concatenating FASTA and QUAL files is fine, but I don't think you should concatenate XML files together. MIRA may cope, but I would expect any XML validator to reject such a file.
Quote:
Originally Posted by agroster View Post
I DO want to give MIRA two separate sets of 454 data, but I don't know how to do this (since MIRA looks for only one file name). Does anyone know how to do this?
Have you read the "Walkthrough: combined unpaired and paired-end assembly of Brucella ceti" example in the MIRA manual?
http://chevreux.org/uploads/media/mi...tml#section_21
maubp is offline   Reply With Quote
Old 01-14-2010, 08:08 AM   #7
agroster
Member
 
Location: Bay Area

Join Date: Jan 2010
Posts: 13
Default

Ok, i now see the issue - I need to append my subsequent sff extractions using the "-a" option. i.e. do three subsequent extractions, each appending to the previous. I'll see if this works.
agroster is offline   Reply With Quote
Old 01-14-2010, 10:19 AM   #8
BaCh
Member
 
Location: Germany

Join Date: May 2008
Posts: 79
Default

Quote:
Originally Posted by agroster View Post
Ok, i now see the issue - I need to append my subsequent sff extractions using the "-a" option. i.e. do three subsequent extractions, each appending to the previous. I'll see if this works.
Ah, I'm too late. Actually, two extractions are enough. First all the unpaired, then all paired with -a.

Regards,
Bastien
BaCh is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 04:10 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2018, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO