SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
csfasta --> fasta conversion doxologist SOLiD 35 05-15-2012 09:27 AM
FASTA to ACE Farhat Bioinformatics 1 07-08-2011 04:58 AM
454 .ace to .bam conversion issue pmiguel Bioinformatics 14 06-23-2011 11:30 PM
EMBL like file to FASTA conversion.. empyrean Bioinformatics 1 05-14-2011 12:49 AM
fastq to fasta conversion kwtennis311 Bioinformatics 4 06-11-2010 11:06 AM

Reply
 
Thread Tools
Old 05-27-2008, 09:20 AM   #1
Farhat
Member
 
Location: Pune, India

Join Date: Apr 2008
Posts: 21
Default Fasta to Ace conversion

Is there a program to convert a Fasta file to an Ace assembly file? While googling I came across references to fasta2ace.pl but no program itself.

Thanks.
Farhat is offline   Reply With Quote
Old 05-27-2008, 10:24 AM   #2
bioinfosm
Senior Member
 
Location: USA

Join Date: Jan 2008
Posts: 482
Default

I am looking for the exact same tool ... fasta to ace, but have not succeeded yet.
If it can use quality values, even better...

the ace file can then be used by eagleView to visualize reads on reference
bioinfosm is offline   Reply With Quote
Old 05-27-2008, 10:57 AM   #3
Farhat
Member
 
Location: Pune, India

Join Date: Apr 2008
Posts: 21
Default

I am looking for it for Eagleview as well.

-Farhat
Farhat is offline   Reply With Quote
Old 06-05-2008, 04:10 AM   #4
Torst
Senior Member
 
Location: The University of Melbourne, AUSTRALIA

Join Date: Apr 2008
Posts: 275
Default

Quote:
Originally Posted by Farhat View Post
Is there a program to convert a Fasta file to an Ace assembly file?
Can you be a bit more precise on what you require?
A FASTA file is just a bunch of sequences with an ID and a description.
What form do you want the ACE file to take?
Torst is offline   Reply With Quote
Old 06-05-2008, 05:34 AM   #5
kmcarr
Senior Member
 
Location: USA, Midwest

Join Date: May 2008
Posts: 1,177
Default

Farhat,

I don't think it is possible to do what you are asking. FASTA files only contain ID/definition line(s) followed by sequence line(s). You may also have an accompanying quality score file. An ACE file contains much more information than this. For each contig (an ACE file may include more than one contig) it will contain the gapped sequence and quality scores, the gapped sequences of the constituent reads as well as offset information indicating where each of the constituent reads is located on the contig (reference). This information does not exist in the FASTA files so it would be impossible to construct a meaningful ACE file.
kmcarr is offline   Reply With Quote
Old 06-05-2008, 06:06 AM   #6
Farhat
Member
 
Location: Pune, India

Join Date: Apr 2008
Posts: 21
Default

Thanks for the replies. Yes, I realize the Fasta File by itself doesn't have enough information to construct the ACE file. I wrote my own script to take in a FASTA file, a FASTQ quality file and the output from a SOAP or ELAND aligner and convert that to ACE which does work with EagleView.
Farhat is offline   Reply With Quote
Old 06-05-2008, 07:51 AM   #7
bioinfosm
Senior Member
 
Location: USA

Join Date: Jan 2008
Posts: 482
Default

Quote:
Originally Posted by Farhat View Post
Thanks for the replies. Yes, I realize the Fasta File by itself doesn't have enough information to construct the ACE file. I wrote my own script to take in a FASTA file, a FASTQ quality file and the output from a SOAP or ELAND aligner and convert that to ACE which does work with EagleView.
Thats great !
I started writing a script of my own, but then got on to other things

Farhat - is it possible for you to share the script for format conversion?
bioinfosm is offline   Reply With Quote
Old 06-05-2008, 08:38 AM   #8
Farhat
Member
 
Location: Pune, India

Join Date: Apr 2008
Posts: 21
Default

Quote:
Originally Posted by bioinfosm View Post
Thats great !
I started writing a script of my own, but then got on to other things

Farhat - is it possible for you to share the script for format conversion?
Yes, but it is not very mature though and has limitations. It works fine with Eagleview but there seem to be issues making it work with pbShort. If you want it, PM me with your email.

-F
Farhat is offline   Reply With Quote
Old 08-22-2008, 12:23 PM   #9
jia
Junior Member
 
Location: Florida

Join Date: Aug 2008
Posts: 1
Default

I'm looking for exactly the same thing for eagleview too!!
Would you mind sharing your script with me? I'll send you a message shortly. Thanks!
Jia


Quote:
Originally Posted by Farhat View Post
Yes, but it is not very mature though and has limitations. It works fine with Eagleview but there seem to be issues making it work with pbShort. If you want it, PM me with your email.

-F
jia is offline   Reply With Quote
Old 05-10-2010, 12:20 AM   #10
nicolallias
Member
 
Location: France

Join Date: Jan 2010
Posts: 23
Default

Can I ave a look to your script ?
Thanks

nico l'allias
nicolallias is offline   Reply With Quote
Old 05-10-2010, 01:12 AM   #11
jkbonfield
Senior Member
 
Location: Cambridge, UK

Join Date: Jul 2008
Posts: 146
Default

This still makes no sense.

ACE is an assembly output, while fasta is just a bunch of sequences with no assembly information. Are you asking for advice on what assembler to use? This will obviously depend a lot on the type of data and whether you want a denovo or mapped assembly.

James

PS. Contrary to above, I don't believe ACE supports quality values. At least I've never seen any - instead the authors of ace preferred to store qualities in "phd" files (in possibly the most inefficient format known to man). I'd love to be wrong on this though as it'll make my life easier. :-)
jkbonfield is offline   Reply With Quote
Old 05-10-2010, 01:59 AM   #12
maubp
Peter (Biopython etc)
 
Location: Dundee, Scotland, UK

Join Date: Jul 2009
Posts: 1,543
Default

Quote:
Originally Posted by jkbonfield View Post
PS. Contrary to above, I don't believe ACE supports quality values. At least I've never seen any - instead the authors of ace preferred to store qualities in "phd" files (in possibly the most inefficient format known to man). I'd love to be wrong on this though as it'll make my life easier. :-)
You can store PHRED qualities for a contig in an ACE file on BQ lines. I don't think the quality scores of the reads themselves are stored, which is probably what you meant.

P.S. The MIRA assembly format (MAF, which is a bit like ACE), stores both - using FASTQ like encoding which is much more space efficient:
http://mira-assembler.sourceforge.ne.../mira_maf.html
maubp is offline   Reply With Quote
Old 05-10-2010, 02:39 AM   #13
jkbonfield
Senior Member
 
Location: Cambridge, UK

Join Date: Jul 2008
Posts: 146
Default

Getting off-topic, sorry.

However MAF looks like a nice format. The problems of random ordering of data in CAF and the complete lack of sequence quality in ACE is one reason why I produced BAF, although it never really went anywhere and I only use it locally as an interchange format.

Certainly it's true that ACE and CAF are very cumbersome for next-gen data, while SAM/BAM have other major issues when it comes to mixed technologies (such as not supporting older capillary style assemblies with potentially more than two sequences per template).

A good find. :-)
jkbonfield is offline   Reply With Quote
Old 05-10-2010, 02:57 AM   #14
maubp
Peter (Biopython etc)
 
Location: Dundee, Scotland, UK

Join Date: Jul 2009
Posts: 1,543
Default

Quote:
Originally Posted by jkbonfield View Post
Getting off-topic, sorry.

However MAF looks like a nice format. The problems of random ordering of data in CAF and the complete lack of sequence quality in ACE is one reason why I produced BAF, although it never really went anywhere and I only use it locally as an interchange format.
I think Bastien was thinking along the same lines when he came up with MAF for internal use in MIRA.
Quote:
Originally Posted by jkbonfield View Post
Certainly it's true that ACE and CAF are very cumbersome for next-gen data, while SAM/BAM have other major issues when it comes to mixed technologies (such as not supporting older capillary style assemblies with potentially more than two sequences per template).
I'd like the option to include the reference sequences (not just their names and lengths; and as a further option the reference quality scores) to make a SAM/BAM file self contained. This is probably not important for people working on model organisms, but would seem useful for early stages of projects with draft assemblies, or if working on a new strain etc. Its something that ACE and other assembly formats have.
maubp is offline   Reply With Quote
Old 05-10-2010, 03:01 AM   #15
maubp
Peter (Biopython etc)
 
Location: Dundee, Scotland, UK

Join Date: Jul 2009
Posts: 1,543
Default

Quote:
Originally Posted by jkbonfield View Post
This still makes no sense.

ACE is an assembly output, while fasta is just a bunch of sequences with no assembly information. Are you asking for advice on what assembler to use? This will obviously depend a lot on the type of data and whether you want a denovo or mapped assembly.
The original question was probably misleading. Farhat did later on say he was able to convert FASTQ reads into an ACE assembly by getting the missing information from the SOAP/ELAND alignment.
maubp is offline   Reply With Quote
Old 05-13-2010, 11:39 PM   #16
sundar
Junior Member
 
Location: india

Join Date: May 2010
Posts: 7
Default

Quote:
Originally Posted by Farhat View Post
Is there a program to convert a Fasta file to an Ace assembly file? While googling I came across references to fasta2ace.pl but no program itself.

Thanks.
Hi farhat,

I wanna know ,how to extract contig file ,I am running velvet algorithm , i got .afg format, that i have viewed in hawkeye .,and i wanna extract specific contig .can you help me on that, ??? it would be wonder if i get the positive reply from you

thanks
sundar is offline   Reply With Quote
Old 05-13-2010, 11:47 PM   #17
sundar
Junior Member
 
Location: india

Join Date: May 2010
Posts: 7
Default

Quote:
Originally Posted by Torst View Post
Can you be a bit more precise on what you require?
A FASTA file is just a bunch of sequences with an ID and a description.
What form do you want the ACE file to take?
Hi torst,


I am having the doubt in denovo assembly, shall i ask?
sundar is offline   Reply With Quote
Old 05-13-2010, 11:53 PM   #18
Torst
Senior Member
 
Location: The University of Melbourne, AUSTRALIA

Join Date: Apr 2008
Posts: 275
Default

Quote:
Originally Posted by sundar View Post
Hi farhat,
I wanna know ,how to extract contig file ,I am running velvet algorithm , i got .afg format, that i have viewed in hawkeye .,and i wanna extract specific contig .can you help me on that, ??? it would be wonder if i get the positive reply from you
thanks
Velvet produces a file called "contigs.fa" with all the contigs. Just cut+paste the contig you want out of that file.
Torst is offline   Reply With Quote
Old 05-14-2010, 07:37 AM   #19
sundar
Junior Member
 
Location: india

Join Date: May 2010
Posts: 7
Unhappy need to view the extract conig file

Hi all, i can extract the best contig file by using a perl program .but i wanna view the extracted contig file.


Need help
sundar is offline   Reply With Quote
Old 05-15-2010, 06:08 PM   #20
Torst
Senior Member
 
Location: The University of Melbourne, AUSTRALIA

Join Date: Apr 2008
Posts: 275
Default

Quote:
Originally Posted by sundar View Post
Hi all, i can extract the best contig file by using a perl program .but i wanna view the extracted contig file.
Please clarify. What do you mean by "view" ? Just open it with a text editor?
Torst is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 11:47 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO