SEQanswers

Go Back   SEQanswers > Sequencing Technologies/Companies > 454 Pyrosequencing



Similar Threads
Thread Thread Starter Forum Replies Last Post
Alternatives to sfffile & sffinfo Xterra 454 Pyrosequencing 5 06-28-2010 07:09 AM

Reply
 
Thread Tools
Old 06-18-2010, 07:31 AM   #1
Xterra
Member
 
Location: London

Join Date: Jun 2010
Posts: 27
Default sffinfo -s inputfile

Is there any way to chage the 'format' of the output fasta file generated by sffinfo -s?
Using the following code
Code:
sffinfo -s Inputfile.sff > Outputfile.fna
This is what I get
Quote:
>GHXCZCC01AJ8CJ length=314 xy=0113_1201 region=1 run=R_2010_05_27_13_55_50_
TTGATGTGCCAGCTGCCGTTGGTGTGTATCAGCTGGATTTTCTGGGACGCCCCGGGGCGA
AGAGGCTGGTCAGCGCACGGGTGTCCCTGCCCGCCGTCTGCCCCACCGTACGGGTGCTCC
CGTCAACGCCGGCAAAGAGTAGCATCACAATCAAGACCTTAGCCCAGTTCCCCACCATGG
AATAGTAGGCAAGGCCCGCCAGGACTCCCCAGTGGACCCCCGCCACCATATCTACGACAG
CTTGTGGGATCCGGAGTAACTGCGATACCACCAGGGCCGTTGTAGGTGACCAGTTCATCA
TCATATCCACGCAC
>GHXCZCC01APUO5 length=312 xy=0177_1303 region=1 run=R_2010_05_27_13_55_50_
TGATGTGCCAGCTGCCGTTGGTGTGTATCAGCTGGATTTCTGGGACGCCCCGGGGCGAAG
AGGCTGGTCAGCGCACGGGTGTCCCTGCCCGCCGTCTGCCCCACCGTACGGGTGCTCCCG
TCAACGCCGGCAAAGAGTAGCATCACAATCAAGACCTTAGCCCAGTTCCCCACCATGGAA
TAGTAGGCAAGGCCCGCCAGGACTCCCCAGTGGGCCCCCGCCACCATATCCACGACAGCT
TGTGGGATCCGGAGTAACTGCGATACCACCAGGGCCGTTGTAGGTGACCAGTTCATCATC
ATATCCACGCAC
>GHXCZCC01AQSRP length=314 xy=0188_0403 region=1 run=R_2010_05_27_13_55_50_
TTGATGTTGCCAGCTGCCGTTGGTGTGTATCAGCTGGATTTTCTGGGACGCCCCGGGGTG
AAGAGGCTGGTCAGCGCACGGGTGTTCATACCCGCCGTCTCCCCACCGTACGGGTGCTCC
CGTCAACGCCGGCAAAGAGTAGCATCGCAATCAAGACCTTAGCCCAGTTCCCCACCATGG
AATAGTAGGCAAGGCCCGCCAGGACTCCCCAGTGGGCCCCCGCCACCATATCCACGACAG
CTTGTGGGATCCGGAGTAACTGCGATACCACCAGGGCCGTTGTAGGTGACCAGTTCATCA
TCATATCCACGCAC
Would it be possible to change it to
Quote:
>GHXCZCC01AJ8CJ
TTGATGTGCCAGCTGCCGTTGGTGTGTATCAGCTGGATTTTCTGGGACGCCCCGGGGCGAAGAGGCTGGTCAGCGCACGGGTGTCCCTGCCCGCCGTCTG CCCCACCGTACGGGTGCTCCCGTCAACGCCGGCAAAGAGTAGCATCACAATCAAGACCTTAGCCCAGTTCCCCACCATGGAATAGTAGGCAAGGCCCGCC AGGACTCCCCAGTGGACCCCCGCCACCATATCTACGACAGCTTGTGGGATCCGGAGTAACTGCGATACCACCAGGGCCGTTGTAGGTGACCAGTTCATCA TCATATCCACGCAC
>GHXCZCC01APUO5
TGATGTGCCAGCTGCCGTTGGTGTGTATCAGCTGGATTTCTGGGACGCCCCGGGGCGAAGAGGCTGGTCAGCGCACGGGTGTCCCTGCCCGCCGTCTGCC CCACCGTACGGGTGCTCCCGTCAACGCCGGCAAAGAGTAGCATCACAATCAAGACCTTAGCCCAGTTCCCCACCATGGAATAGTAGGCAAGGCCCGCCAG GACTCCCCAGTGGGCCCCCGCCACCATATCCACGACAGCTTGTGGGATCCGGAGTAACTGCGATACCACCAGGGCCGTTGTAGGTGACCAGTTCATCATC ATATCCACGCAC
>GHXCZCC01AQSRP
TTGATGTTGCCAGCTGCCGTTGGTGTGTATCAGCTGGATTTTCTGGGACGCCCCGGGGTGAAGAGGCTGGTCAGCGCACGGGTGTTCATACCCGCCGTCT CCCCACCGTACGGGTGCTCCCGTCAACGCCGGCAAAGAGTAGCATCGCAATCAAGACCTTAGCCCAGTTCCCCACCATGGAATAGTAGGCAAGGCCCGCC AGGACTCCCCAGTGGGCCCCCGCCACCATATCCACGACAGCTTGTGGGATCCGGAGTAACTGCGATACCACCAGGGCCGTTGTAGGTGACCAGTTCATCA TCATATCCACGCAC
Any help will be greatly appeciated!
Xterra is offline   Reply With Quote
Old 06-18-2010, 10:07 AM   #2
sklages
Senior Member
 
Location: Berlin, DE

Join Date: May 2008
Posts: 628
Default

Quote:
Originally Posted by Xterra View Post
Is there any way to chage the 'format' of the output fasta file generated by sffinfo -s?
Using the following code
Code:
sffinfo -s Inputfile.sff > Outputfile.fna

Code:
sffinfo -s Inputfile.sff  | perl -lpe 's/^(\>\S+).+/$1/'  > Outputfile.fna
should work in most cases :-)
sklages is offline   Reply With Quote
Old 06-18-2010, 10:26 AM   #3
Xterra
Member
 
Location: London

Join Date: Jun 2010
Posts: 27
Default Very nice!

However, I still have a problem, the file now look like this (60 characters per line):

Quote:
>GHXCZCC01AJ8CJ
TTGATGTGCCAGCTGCCGTTGGTGTGTATCAGCTGGATTTTCTGGGACGC
CCCGGGGC
>GHXCZCC01APUO5
TGATGTGCCAGCTGCCGTTGGTGTGTATCAGCTGGATTTCTGGGACGCCC
CGGGGCGA
>GHXCZCC01AQSRP
TTGATGTTGCCAGCTGCCGTTGGTGTGTATCAGCTGGATTTTCTGGGACG
CCCCGGGG
And I need something like this (the entire sequence in one line):
Quote:
>GHXCZCC01AJ8CJ
TTGATGTGCCAGCTGCCGTTGGTGTGTATCAGCTGGATTTTCTGGGACGCCCCGGGGC
>GHXCZCC01APUO5
TGATGTGCCAGCTGCCGTTGGTGTGTATCAGCTGGATTTCTGGGACGCCCCGGGGCGA
>GHXCZCC01AQSRP
TTGATGTTGCCAGCTGCCGTTGGTGTGTATCAGCTGGATTTTCTGGGACGCCCCGGGG
Thanks in advance!

Last edited by Xterra; 06-18-2010 at 10:56 AM.
Xterra is offline   Reply With Quote
Old 06-18-2010, 11:28 AM   #4
sklages
Senior Member
 
Location: Berlin, DE

Join Date: May 2008
Posts: 628
Default

look at your previous post and see what you have requested.

Write a little perl script to remove the newline at the end of the sequence lines.
sklages is offline   Reply With Quote
Old 06-18-2010, 11:45 AM   #5
Xterra
Member
 
Location: London

Join Date: Jun 2010
Posts: 27
Default sklages

I am trying to combine sffinfo with a code that can get rid of the extra information in the ID line and at the same time remove the new line at the end of each line. Originally, I was hoping there was an option in sffinfo that could do exactly what I needed. Not being an scripter makes the task of finding the right code a little more challenging.
Quote:
Write a little perl script to remove the newline at the end of the sequence lines.
Not Perl but AWK:
Quote:
awk '/^>/ {
print (buff ? buff RS : null) $0
buff = null; next
}
{
buff = buff ? buff FS $0 : $0
}
END { print buff }' infile

Last edited by Xterra; 06-18-2010 at 02:16 PM.
Xterra is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 07:39 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO