Seqanswers Leaderboard Ad

**gpertea** · 01-08-2010, 10:11 AM

I don't know about Mosaik but I worked a lot with ace files produced by assemblers like CAP3, Phrap etc. and if Mosaik is a real assembler and follows the same format then in the ace file the sequence block following the "CO " line should be the consensus (assembly) sequence that you are after, e.g.:

Code:

CO CL10003_1 836 2 0 U
ACAACTCAGCTAAGATACAGAAAAAGTCAAGAAACAAACAATAGTGAATGCTAGGAAGGC
AAAAAGAGAGATTCAATCAACATACCAAGCTCCATGTAAGTATATCAAGTATATTAACGA
CATTGAAAGCCACAAGATAGCCATTCCTTTAGCTCTTAAATGGAAAAACGACCTAACCCC
CATTGAAAGCCACAAGATA [i]... etc. (until an empty line)[/i]

So if you want just to pull these consensus sequences you can use a simple script taking all these sequence blocks found between a line starting with "CO " and the next empty line, something like this (using perl and assuming Linux/Unix as your platform):

Code:

{
local $/="\n\n";
while (<>) {
  s/^(\s+)//s;
  if (length>20 && s/^CO\s+(\S+)[^\n]+//s) { 
    chomp;
    print ">$1\n";
    tr/-*\n//d; # removing gaps and line endings
    # reformat and print sequence:
    print join("\n", (unpack('(A72)*',$_)))."\n";
    }
 }# while text blocks
}

**kajsa** · 01-22-2010, 12:19 AM

.ace consensus sequence

Hello,

I am also assembling Solexa data to a reference using MOSAIK and I am looking for a program to create a consensus sequence from an .ace file.

But I don't think the sequence after the "CO" is the consensus of the assembled reads. I think it is the reference sequence? Correct me if I am wrong.

Doe's anyone have a good solution to this?

Thank you!

Kajsa

**maubp** · 01-22-2010, 07:28 AM

For traditional de-novo assembly, the contig consensus should be stored in the sequence lines just after the CO line (as discussed above). Getting this sequence (and removing any gaps which are stored as * characters) to convert into FASTA format is quite easy.

A Python alternative using Biopython 1.52 or later would be just two lines:

Code:

from Bio import SeqIO
SeqIO.convert("input.ace", "ace", "output.fasta", "fasta")

However, if you are mapping reads onto an existing reference, it sounds like the CO section contains the original reference (plus probably some extra gap characters where an insertion was required to map a read). The ACE file doesn't explicitly contain a consensus of the reference and reads - you would have to look at the aligned reads, and calculate one. This is possible, but much more complicated. For example, how would you proceed if the middle of the reference had no reads mapped to it?

I may have misunderstood, but I think you want to look at reference guided assembly instead of mapping reads onto a reference.

**kajsa** · 01-22-2010, 07:39 AM

Yeas that's correct, I am mapping reads onto an existing reference. I guess I have to calculate the consensus myself in some way. If no one else already have done a program for that.......?

Kajsa

**maubp** · 01-22-2010, 08:02 AM

Originally posted by kajsa View Post

Yeas that's correct, I am mapping reads onto an existing reference. I guess I have to calculate the consensus myself in some way. If no one else already have done a program for that.......?

Do you want a consensus if the reference AND the reads, or just the reads?

Either way, if you know Python this cookbook entry on treating an ACE file as an alignment might be a useful start. Once you have a Biopython alignment object it would be easier to calculate the consensus: http://www.biopython.org/wiki/ACE_contig_to_alignment

**kajsa** · 01-22-2010, 11:51 AM

I want to have a consensus of the assembly of only the reads I recon. Thank you for your help, I will look at that coockbook entry.

Kajsa

**jjohnson** · 01-22-2010, 12:10 PM

I find the AMOS toolset convenient for converting between many differnt assembly outputs/inputs.

So, in this case you would use

toAmos which yields a amos Bank file (useful in using Hawkeye for viewing if you like) and the bonus is that you can input information on scaffolding, library sizes, etc and this will be maintained through conversions.

from this bank file, you could convert to several differnt formats, dump the consensus fasta, reads, etc.

**kajsa** · 01-23-2010, 10:31 AM

That also looks very interesting. I will look at that too.

Thanks!

Kajsa

**skycreative** · 01-23-2010, 10:10 PM

really powerful code, thanks and study the coding ...

**sklages** · 01-25-2010, 04:10 AM

But the problem remains: ACE format does not consider and thus does not store the consensus sequence ...
You should probably use a different alignment format?

cheers,
Sven

**kajsa** · 01-26-2010, 05:26 AM

Sven: Do you have any suggestions on what alignment format to use?

Kajsa

**jjohnson** · 01-26-2010, 06:44 AM

It looks like you are trying to generate a consensus fasta file from a mapping/alignment, and you are using ace only because it is what is produced by mosaik (mosaik also produces sam/bam, but not useful for producing a consensus from a mapping as far as I can tell). One option is to use the Amos tools to produce a fasta consensus from the ace (after converting to the native amos format .bnk) . Most alignment formats do not include the reference and do not generate or recall a consensus from a multiple sequence alignment as far as I know, since most people only want to do variant detection downstream of the mapping. Mosaik will create a consensus of the mapping assembly, but only stores it in ace format. It will also produce SAM/BAM if you want to use the SAMtools downstream for further analysis.

**kajsa** · 02-09-2010, 05:13 AM

We didn't get the Amos tools to work on our mac computers. But we found a solution: We can export a BAM file and then use SAM tools to extract the consensus sequence.

**[email protected]** · 02-09-2010, 02:23 PM

Originally posted by kajsa View Post

We didn't get the Amos tools to work on our mac computers. But we found a solution: We can export a BAM file and then use SAM tools to extract the consensus sequence.

I think you are talking about using 'pileup' command with -c (consensus calling parameter). Am i right?

I also have .ace file from mosaik but i dont know whats the next step to do with it. Can you guide me?

Topics	Statistics	Last Post
Genetic Variants and Diabetes Risk in Childhood Cancer Survivors by seqadmin Started by seqadmin, Today, 08:47 AM	0 responses 12 views 0 likes	Last Post by seqadmin Today, 08:47 AM
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 60 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 59 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 54 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM

Seqanswers Leaderboard Ad

Announcement

.ace consensus sequence

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News