SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Extract paired end reads from sff file. ojy Bioinformatics 4 12-13-2012 05:07 AM
Orientation of 454 paired end reads split by linker skblazer 454 Pyrosequencing 8 04-26-2012 06:30 PM
Can we extract f3 reads while f5 reads are being sequenced in paired end Raa Bioinformatics 2 12-25-2011 09:46 PM
paired-end reads mapped to genome.. gene with only one direction of paired-end reads? danwiththeplan Bioinformatics 2 09-22-2011 03:06 AM

Reply
 
Thread Tools
Old 02-21-2010, 12:30 PM   #1
pmiguel
Senior Member
 
Location: Purdue University, West Lafayette, Indiana

Join Date: Aug 2008
Posts: 2,317
Default How to extract paired-end reads from .sff 454?

I have a Titanium paired end .sff and want to convert it to fasta and qual files. But I want the paired end linker removed and the reads containing them split into "right" and "left" side reads. (Best if the distal part of the paired end would also be reverse complemented)

Just want to try some other assembly engines. Small bacterial genome using 3kb paired end Titanium protocol.

Best way to do this? I can write a script to parse the trim info file, but that is work. Would prefer something like an sffinfo option or a program someone else has already written.

--
Phillip
pmiguel is offline   Reply With Quote
Old 02-21-2010, 02:04 PM   #2
kmcarr
Senior Member
 
Location: USA, Midwest

Join Date: May 2008
Posts: 1,178
Default

According to its documentation sff_extract (http://bioinf.comav.upv.es/sff_extract/index.html) can do this. I have used sff_extract but not on paired end data so I can't offer any first hand information.
kmcarr is offline   Reply With Quote
Old 02-21-2010, 02:46 PM   #3
forevermark4
Junior Member
 
Location: Europe

Join Date: Jan 2009
Posts: 6
Default

Hi everyone

I just started to work with next generation sequencing data . I have following query : If you can provide me help to handle this kind of simulation and reassembling problems. How to generate reads from sequence. i have fasta file. I think we can go for maq toll for simulation. Nut not be able to work out.

To establish simulation of reassembling sequence from NGS data. This will build from re-assembling a simple sequence of 1 Mb with no repeats in the haploid state, to inclusion of genetic variation and polyploidy.
-simulate a NGS run from a 1 Mb segment of human with little/no repeats. Average fragment size 500 bp with normal distribution. Paired end with 75 bp reads. Assume perfect sequencing. Check out other simulation methods
- align the reads back to the 1 Mb sequence. How much variation in coverage
- reassemble the reads WITHOUT using the reference sequence.

Thanks
forevermark4 is offline   Reply With Quote
Old 02-21-2010, 05:18 PM   #4
themerlin
Member
 
Location: Flagstaff, AZ

Join Date: Feb 2010
Posts: 51
Default

I have had good luck with sff_extract. All you need is the linker sequence, insert length and insert length standard deviation. Then you run:

sff_extract -l linker.fasta yoursff.sff -i "insert_size:XXXX, insert_stdev:XXX" -o prefix

-Jason
themerlin is offline   Reply With Quote
Old 02-22-2010, 07:53 AM   #5
maven
Member
 
Location: United States

Join Date: Oct 2009
Posts: 11
Default

This can be done with 454 software too, although there are bound to be differences in the result based on the specifics of the linker-recognition algorithms.

runAssembly -tr -noa -no myfile.sff

It's not the friendliest of output in that it generates an assembly directory and a few extra files that are unneeded for this use case, but it gets the job done. I've done this with version 2.3, I don't know about earlier versions.
maven is offline   Reply With Quote
Old 02-22-2010, 08:50 AM   #6
pmiguel
Senior Member
 
Location: Purdue University, West Lafayette, Indiana

Join Date: Aug 2008
Posts: 2,317
Default

Quote:
Originally Posted by maven View Post
This can be done with 454 software too, although there are bound to be differences in the result based on the specifics of the linker-recognition algorithms.

runAssembly -tr -noa -no myfile.sff

It's not the friendliest of output in that it generates an assembly directory and a few extra files that are unneeded for this use case, but it gets the job done. I've done this with version 2.3, I don't know about earlier versions.
That looks like just what I want. Alas:

runAssembly -tr -noa -no GB71BC401.sff

gives me:

Error: Invalid option: -noa.
Usage: runAssembly [-o projdir] [-nrm] [-p (sfffile | [regionlist:]analysisDir)]... (sfffile | [regionlist:]analysisDir)...

I am running v. 2.3

--
Phillip
pmiguel is offline   Reply With Quote
Old 02-22-2010, 09:05 AM   #7
westerman
Rick Westerman
 
Location: Purdue University, Indiana, USA

Join Date: Jun 2008
Posts: 1,104
Default

The closest option to '-noa' is '-noace' which skips the output of ACE files, etc.
westerman is offline   Reply With Quote
Old 02-22-2010, 09:09 AM   #8
maven
Member
 
Location: United States

Join Date: Oct 2009
Posts: 11
Default

-noa is supposed to tell it to not actually bother doing the assembly itself. The -no option turns off most output generation, since the goal here is to just generate the split fasta (and qual) file. Both options are .... optional ... in the sense that once it gets past the first stage of the assembly you can manually kill it if you don't want to sit around waiting for an assembly to complete. The fasta file should still be there, as it's generated prior to actually starting the assembly.
maven is offline   Reply With Quote
Old 02-22-2010, 09:17 AM   #9
pmiguel
Senior Member
 
Location: Purdue University, West Lafayette, Indiana

Join Date: Aug 2008
Posts: 2,317
Default

Quote:
Originally Posted by maven View Post
-noa is supposed to tell it to not actually bother doing the assembly itself. The -no option turns off most output generation, since the goal here is to just generate the split fasta (and qual) file. Both options are .... optional ... in the sense that once it gets past the first stage of the assembly you can manually kill it if you don't want to sit around waiting for an assembly to complete. The fasta file should still be there, as it's generated prior to actually starting the assembly.
Alright! Leaving out the -noa worked. It did create a new assembly directory and do the assembly, but that didn't take long.

Thanks!
--
Phillip
pmiguel is offline   Reply With Quote
Reply

Tags
re-assembling, simulation

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 10:08 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2021, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO