SEQanswers

Go Back   SEQanswers > Sequencing Technologies/Companies > 454 Pyrosequencing



Similar Threads
Thread Thread Starter Forum Replies Last Post
Extract paired end reads from sff file. ojy Bioinformatics 4 12-13-2012 04:07 AM
SFF Read names johan 454 Pyrosequencing 8 04-19-2012 07:54 AM
using vcf tools to extract genotype information rna_seeker Bioinformatics 3 07-10-2011 05:25 PM
sff extract dina Bioinformatics 3 03-01-2010 02:55 AM
How to extract paired-end reads from .sff 454? pmiguel Bioinformatics 8 02-22-2010 08:17 AM

Reply
 
Thread Tools
Old 06-16-2010, 02:57 PM   #1
Xterra
Member
 
Location: London

Join Date: Jun 2010
Posts: 27
Default Using sff tools to Extract Read Subsets

How can I extract a subset of reads (longer than 150 bases but shorter than 200) from a sequencing Run using the sfftools. I think this can be accomplished using either sffinfo or fnafile (-t) but I just cannot get it to work.
Any help will be very much appreciated.
Xterra is offline   Reply With Quote
Old 06-17-2010, 05:31 AM   #2
sklages
Senior Member
 
Location: Berlin, DE

Join Date: May 2008
Posts: 628
Default

This is not directly possible with sfftools. You'll probably need to work on fasta files.

cheers,
Sven
sklages is offline   Reply With Quote
Old 06-17-2010, 05:35 AM   #3
kmcarr
Senior Member
 
Location: USA, Midwest

Join Date: May 2008
Posts: 1,177
Default

You can't do this with sfftools directly. You can use sfffile to select a subset of reads from an input SFF file by using the -i option to pass a file containing a list of accession number to include in the output. Of course you will have to use some other tools to first create the list of accession numbers for the reads which match your criteria, whatever they may be.

(ETA: Oh, Sven beat me to it.)

Last edited by kmcarr; 06-17-2010 at 05:37 AM.
kmcarr is offline   Reply With Quote
Old 06-17-2010, 05:57 AM   #4
Xterra
Member
 
Location: London

Join Date: Jun 2010
Posts: 27
Default How about fnafile?

Could someone please explain me how to use -t option with fnafile? I mean if I use -i accno 1-200, wouldn't I be able to 'extract' the reads that are 150 bases long?
Xterra is offline   Reply With Quote
Old 06-17-2010, 06:09 AM   #5
kmcarr
Senior Member
 
Location: USA, Midwest

Join Date: May 2008
Posts: 1,177
Default

No, the -t option is used for changing trim point information stored in the file, not for extracting specific reads.
kmcarr is offline   Reply With Quote
Old 06-17-2010, 12:39 PM   #6
Xterra
Member
 
Location: London

Join Date: Jun 2010
Posts: 27
Default fnafile -t syntax

Hmmm! Ok, I understand the limitations. Still, would you mind explaining me the syntax for fnafile using the -t option (accno 1 200)? I just want to see what that tool is capable to do and see if I can use it down the road.
Thanks!

Last edited by Xterra; 06-17-2010 at 01:19 PM.
Xterra is offline   Reply With Quote
Old 06-17-2010, 12:50 PM   #7
sklages
Senior Member
 
Location: Berlin, DE

Join Date: May 2008
Posts: 628
Default

Quote:
fnafile --help
I never used this program, it seems to be the fasta counterpart to sffinfo.

Both sffinfo and fnafile are capable of filtering by read names (ids) via '-e' or '-i', they are not capable of filtering by any other characteristic (e.g. length, gc content etc.).

-t / -tr *set* trim points according to a file given by the user. These are not filter.
sklages is offline   Reply With Quote
Old 06-17-2010, 01:30 PM   #8
Xterra
Member
 
Location: London

Join Date: Jun 2010
Posts: 27
Default Skalages

I just cannot get it to work. Would you mind being a little more specific? Let's assume I have the file A.fas which is a FASTA file and I am using 1 200 instead of 12 543 as described on the manual
Quote:
The specifi ed “trimfi le” should contain one or more lines
consisting of (1) a read accession number, (2) a starting trimpoint
and (3) an ending trimpoint, separated by whitespace characters or
where the trimpoints are separate by a dash (e.g., “accno 12 543” or
“accno 12-543”)
Xterra is offline   Reply With Quote
Old 06-17-2010, 01:58 PM   #9
kmcarr
Senior Member
 
Location: USA, Midwest

Join Date: May 2008
Posts: 1,177
Default

Quote:
Originally Posted by Xterra View Post
I just cannot get it to work. Would you mind being a little more specific? Let's assume I have the file A.fas which is a FASTA file and I am using 1 200 instead of 12 543 as described on the manual
Could you please post the exact command you were using, as well as small samples of the FASTA and trim files you tried to use.
kmcarr is offline   Reply With Quote
Old 06-17-2010, 02:03 PM   #10
Xterra
Member
 
Location: London

Join Date: Jun 2010
Posts: 27
Default I have tried so many different commands

but none is working. That's why I would like to get the right syntax.
I have uploaded an example of the FASTA file I would like to process using fnafile. As I said, I am only trying to find out what fnafile can do.
Thanks.
Attached Files
File Type: zip Test.zip (9.6 KB, 6 views)
Xterra is offline   Reply With Quote
Old 06-17-2010, 02:17 PM   #11
kmcarr
Senior Member
 
Location: USA, Midwest

Join Date: May 2008
Posts: 1,177
Default

Quote:
Originally Posted by Xterra View Post
but none is working. That's why I would like to get the right syntax.
O.K., how about an example of just one command that didn't work.

And your FASTA file looks like it contains gapped sequence.

Quote:
>GF2FOAC04ISOQO
ACGAG-TG----GTGATGT-GCCAGC-TG-CCGTTGGTGT-TAATGAGCTGAA-TGTTCT
GCTGA-G-------GGC--ATGGC-T-GAACAC-GACGG-CAAATCACGT----TGTGAA
CGTG-CAA-CACGCG-CC--TCAA-CGGT-GGTGGT-G--CCCG-CGT--CCACCCCA-G
CGG-CCAG-C-AGAAGGA--TGA-CAAT-GACCCTT-C--G-CCCACGACT---------
>GF2FOAC04J0H2I
ACGAA-TGCG-TTTGATGT-GCCAGC-TG-CCGTTGGTGT-TAATGAGCTGAA-TGTTCT
GCTGA-G---G---GCCGAGTGGCGTAGAACAC-GCCGG-CAAT-CA-GT-TGGTGG-AA
CGTG-CAA-CA-GCG-CC-TCCAA-C----GGGGGTCG--CCCG-CGC--CCACCCCA-G
CGG-CCAG-C-AGAAGGA--TGA-CAAT-GA-CCTT-C--G-CCCA--------------
Where did this FASTA come from?

Last edited by kmcarr; 06-17-2010 at 02:19 PM.
kmcarr is offline   Reply With Quote
Old 06-17-2010, 02:25 PM   #12
Xterra
Member
 
Location: London

Join Date: Jun 2010
Posts: 27
Default Here you have one without gaps.

Does it matter if you have gaps in the fasta file? Would it affect the performance of fnafile? Anyways, I have uploaded a file with no gaps. That's a 454 run using the Titanium amplicon sequencing kit. The previous file was the aligned file and that's why you could see the indels.
Attached Files
File Type: zip No gaps.zip (9.1 KB, 6 views)
Xterra is offline   Reply With Quote
Old 06-17-2010, 11:03 PM   #13
sklages
Senior Member
 
Location: Berlin, DE

Join Date: May 2008
Posts: 628
Default

Quote:
Originally Posted by Xterra View Post
Does it matter if you have gaps in the fasta file? Would it affect the performance of fnafile? Anyways, I have uploaded a file with no gaps. That's a 454 run using the Titanium amplicon sequencing kit. The previous file was the aligned file and that's why you could see the indels.
You still haven't provided the exact command you used.

Your fasta file contains aligned sequences created by some "windows" program? OK, .. you provided a DOS formatted file, no idea if fnafile is happy about that.

1) create a file with trim points, just like:
Code:
GF2FOAC04I8T0F 1 200
GF2FOAC04J305H 1 200
GF2FOAC04J3QXL 1 200
2) run fnafile
Code:
$ fnafile -o out.fa -tr tp.txt UniqueHapsUnix.fas
3) see the differences

a) original sequence:
Code:
>GF2FOAC04I8T0F
ACGAGTGCGTTTGATGTGCCAGCTGCCGTTGGTGTTAATGAGCTGAATGTTCTGCTGAGGGCCATGGCTG
AACACGCCGGCAATCACGTTGGTGGAACGTGCAACAGCGCCTCCAACGGTGGTGGTGCCCGCGTCCACCC
CAGCGGCCAGCAGAAGGATGACAATGACCTTCGCCCAC
b) "trimmed" sequence
Code:
>GF2FOAC04I8T0F trim=1-200
ACGAGTGCGTTTGATGTGCCAGCTGCCGTTGGTGTTAATGAGCTGAATGTTCTGCTGAGG
GCCATGGCTGAACACGCCGGCAATCACGTTGGTGGAACGTGCAACAGCGCCTCCAACGGT
GGTGGTGCCCGCGTCCACCCCAGCGGCCAGCAGAAGGATGACAATGACCTTCGCCCAC
The sequence itself remains unchanged, but there has been a flag introduced (trim=1-200) directing the assembler (newbler) to just use the sequence within this range. Again, this not a filter and no physical trimming of the reads. You need to use an external tool to either work on fasta or sff files to trim your sequences.
sklages is offline   Reply With Quote
Old 06-18-2010, 06:32 AM   #14
Xterra
Member
 
Location: London

Join Date: Jun 2010
Posts: 27
Default Thanks!

That's the answer I was looking for!
Xterra is offline   Reply With Quote
Old 03-01-2011, 06:39 AM   #15
TheLight
Junior Member
 
Location: USA

Join Date: Sep 2008
Posts: 5
Default SFF editor/convert

There is a free tool with graphic interface that allows you to view/edit and convert SFF files here
TheLight is offline   Reply With Quote
Old 03-01-2011, 06:50 AM   #16
sklages
Senior Member
 
Location: Berlin, DE

Join Date: May 2008
Posts: 628
Talking

oh, it says

$ ./Biology-tools-package.exe
bash: ./Biology-tools-package.exe: cannot execute binary file



SCNR,
Sven
sklages is offline   Reply With Quote
Old 03-01-2011, 08:53 AM   #17
Xterra
Member
 
Location: London

Join Date: Jun 2010
Posts: 27
Default Great!

TL,

I have intalled it on my mahine.
Thank you very much!

X
Xterra is offline   Reply With Quote
Old 05-11-2011, 08:26 AM   #18
TheLight
Junior Member
 
Location: USA

Join Date: Sep 2008
Posts: 5
Default

Quote:
Originally Posted by sklages View Post
bash: ./Biology-tools-package.exe: cannot execute binary file

Oh, sorry, it looks like I forgot to mention it is for Windows GUI.
TheLight is offline   Reply With Quote
Old 08-06-2012, 04:26 AM   #19
HeidiLee
Member
 
Location: Earth

Join Date: Jul 2011
Posts: 20
Default

Quote:
Originally Posted by kmcarr View Post
You can't do this with sfftools directly. You can use sfffile to select a subset of reads from an input SFF file by using the -i option to pass a file containing a list of accession number to include in the output. Of course you will have to use some other tools to first create the list of accession numbers for the reads which match your criteria, whatever they may be.

(ETA: Oh, Sven beat me to it.)
Could you please tell me where I can download the sfffile to select a subset of reads from an input SFF file?
HeidiLee is offline   Reply With Quote
Old 08-06-2012, 04:34 AM   #20
kmcarr
Senior Member
 
Location: USA, Midwest

Join Date: May 2008
Posts: 1,177
Default

Quote:
Originally Posted by HeidiLee View Post
Could you please tell me where I can download the sfffile to select a subset of reads from an input SFF file?
Heidi,

Look at my response to the thread you started the other day. sfffile is part of the Roche/454 software tools, you can request the software here.
kmcarr is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 06:12 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO