SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
How do I export consensus sequence from IGV? Bean General 4 06-17-2013 07:04 AM
how to use samtools to get consensus sequence? f0415007 Bioinformatics 1 09-26-2011 11:59 PM
Consensus sequence nitinkumar Bioinformatics 0 05-21-2011 06:38 PM
How do I convert 454 ace to a regular ace? lskatz Bioinformatics 5 11-22-2010 08:31 AM
maq consensus sequence BertieWooster Bioinformatics 0 06-09-2010 10:25 AM

Reply
 
Thread Tools
Old 01-08-2010, 08:53 AM   #1
jgibbons1
Senior Member
 
Location: Worcester, MA

Join Date: Oct 2009
Posts: 133
Default .ace consensus sequence

Hello all,
Does anyone know a method to create a consensus sequence from an .ace file? I am assembling solexa data to a reference using MOSAIK. I can convert the .ace file to .fasta using a script, however this does not make a consensus, instead it creates an individual sequence for each read:

>.MosaikReference
AAGGACTTATGAGGACAGAGAGCGGGATGACTCTTCATATGCCACATGCAGCCCACATCCCAACACCTCT
CCCATGTCTCAAGTTGAAGGATGCGAGACCATTTCGCAGAGAAGCCGTACCAAACGCAAACACGACCATA
TCACCTGTTAGTGCCGAAGTTGGAAGCTTCCCCACCCTTGACTGCAGCACCTGGGAAGAAAGCACTTAAG
AAACCCAGGCATCCTACTGTGGATGACGATGGGGTTCATAGAGGCCATCGACTGACTAGGGCAGTGACGC
AAAGACAAGCCACTGGCGCTCGAAGTGGCCACATTTCAAGACCCAAGCGGCTCTACCCCAGACTGGACTG
ATGTGGATATTCCATTGCTGCCTAGCCCTTGGAGACGATGCGATTCGCTCTCCCTGGAGGGTAATTTGTC
AATAGATCCCATGAGACATCGAGATTGATTGAGGGCCGGTAAGGCTGAAACGGACAATGATTCAGTCATA
AAGTGTAGTGGTCACTTGATGTTGAGATGCGACTTGCCTGCAAATCGCTGTGTGCTTCGTGTACAGTCAC
TATTTGGATAACATCGAGTCTGGAGTTTTCTTTCAAGCGCCAGTTATCACAGTACTTTGAGTTTTTCTGT
TTATTGGTTGACCACCCAAAGCATCCGATTTCACGGAAGGGACACGATGGCGGTCTGCACTTATTTCCGT
GATTGAACCGGTTCAAATTAGAAAACGGGCGACAAAGTGCCACGTGCTATGCCATCGAGTATCACTTACG
AGGTTCTACCATGCTGGATGTAGGGAGCGGGGCAAAAGCCATAATTGCTGCTTTTCGGTGCCAGGTCGTA
CAAGATACGGAAAGGCATGTGTACCATACATGCACCAAGATTTCAACCTGCGGTGTAGTTGTGGCCCCCT
TCTTTGACAACACATCCCTAGATTACCCCTAAACGCTTTCCCTCAGCTTACCAGACATAATCTGCCTCTT
ATCTTATTCAGTCAGCCACG
>HWI-EAS240_0001:2:70:591:1131#0/1
----------------------------------------------------------------------
----------------------------------------------------------------------
----------------------------------------------------------------------
----------------------------------------------------------------------
----------------------------------------------------------------------
----------------------------------------------------------------------
----------------------------------------------------------------------
----------------------------------------------------------------------
----------------------------------------------------------------------
----------------------------------------------------------------------
----------------------------------------------------------------------
-------------------------------------------------GCTTTTCGGTGCCAGGTCGTA
CAAGATACGGAAAGG
>HWI-EAS240_0001:2:22:830:70#0/1
----------------------------------------------------------------------
----------------------------------------------------------------------
----------------------------------------------------------------------
----------------------------------------------------------------------
----------------------------------------------------------------------
----------------------------------------------------------------------
----------------------------------------------------------------------
----------------------------------------------------------------------
----------------------------------------------------------------------
----------------------------------------------------------------------
----------------------------------------------------------------------
-----------------------------------------------------TTCGGTGCCAGGTCGTA
CAAGATACGGAAAGGCATG
>HWI-EAS240_0001:2:59:1080:696#0/1
----------------------------------------------------------------------
----------------------------------------------------------------------
----------------------------------------------------------------------
----------------------------------------------------------------------
----------------------------------------------------------------------
----------------------------------------------------------------------
----------------------------------------------------------------------
----------------------------------------------------------------------
----------------------------------------------------------------------
----------------------------------------------------------------------
----------------------------------------------------------------------
----------------------------------------------------------------------
-------CGGAAAGGCATGTGCACCATACATGCACCAAGATTT
>HWI-EAS240_0001:2:16:926:1755#0/1
----------------------------------------------------------------------
----------------------------------------------------------------------
----------------------------------------------------------------------
----------------------------------------------------------------------
----------------------------------------------------------------------
----------------------------------------------------------------------
----------------------------------------------------------------------
----------------------------------------------------------------------
----------------------------------------------------------------------
----------------------------------------------------------------------
----------------------------------------------------------------------
----------------------------------------------------------------------
---------GAAAGGCATGTGCACCATACATGCACCAAGATTTCA

Ideally I would like an aligned consensus sequence of the reads to the reference. Any ideas?

Thanks!

John
jgibbons1 is offline   Reply With Quote
Old 01-08-2010, 10:11 AM   #2
gpertea
Member
 
Location: Maryland, US

Join Date: Jan 2010
Posts: 21
Default

I don't know about Mosaik but I worked a lot with ace files produced by assemblers like CAP3, Phrap etc. and if Mosaik is a real assembler and follows the same format then in the ace file the sequence block following the "CO " line should be the consensus (assembly) sequence that you are after, e.g.:
Code:
CO CL10003_1 836 2 0 U
ACAACTCAGCTAAGATACAGAAAAAGTCAAGAAACAAACAATAGTGAATGCTAGGAAGGC
AAAAAGAGAGATTCAATCAACATACCAAGCTCCATGTAAGTATATCAAGTATATTAACGA
CATTGAAAGCCACAAGATAGCCATTCCTTTAGCTCTTAAATGGAAAAACGACCTAACCCC
CATTGAAAGCCACAAGATA ... etc. (until an empty line)
So if you want just to pull these consensus sequences you can use a simple script taking all these sequence blocks found between a line starting with "CO " and the next empty line, something like this (using perl and assuming Linux/Unix as your platform):
Code:
{
local $/="\n\n";
while (<>) {
  s/^(\s+)//s;
  if (length>20 && s/^CO\s+(\S+)[^\n]+//s) { 
    chomp;
    print ">$1\n";
    tr/-*\n//d; # removing gaps and line endings
    # reformat and print sequence:
    print join("\n", (unpack('(A72)*',$_)))."\n";
    }
 }# while text blocks
}
gpertea is offline   Reply With Quote
Old 01-22-2010, 12:19 AM   #3
kajsa
Member
 
Location: Uppsala, Sweden

Join Date: Jan 2010
Posts: 19
Default .ace consensus sequence

Hello,

I am also assembling Solexa data to a reference using MOSAIK and I am looking for a program to create a consensus sequence from an .ace file.

But I don't think the sequence after the "CO" is the consensus of the assembled reads. I think it is the reference sequence? Correct me if I am wrong.

Doe's anyone have a good solution to this?

Thank you!

Kajsa
kajsa is offline   Reply With Quote
Old 01-22-2010, 07:28 AM   #4
maubp
Peter (Biopython etc)
 
Location: Dundee, Scotland, UK

Join Date: Jul 2009
Posts: 1,543
Default

For traditional de-novo assembly, the contig consensus should be stored in the sequence lines just after the CO line (as discussed above). Getting this sequence (and removing any gaps which are stored as * characters) to convert into FASTA format is quite easy.

A Python alternative using Biopython 1.52 or later would be just two lines:
Code:
from Bio import SeqIO
SeqIO.convert("input.ace", "ace", "output.fasta", "fasta")
However, if you are mapping reads onto an existing reference, it sounds like the CO section contains the original reference (plus probably some extra gap characters where an insertion was required to map a read). The ACE file doesn't explicitly contain a consensus of the reference and reads - you would have to look at the aligned reads, and calculate one. This is possible, but much more complicated. For example, how would you proceed if the middle of the reference had no reads mapped to it?

I may have misunderstood, but I think you want to look at reference guided assembly instead of mapping reads onto a reference.
maubp is offline   Reply With Quote
Old 01-22-2010, 07:39 AM   #5
kajsa
Member
 
Location: Uppsala, Sweden

Join Date: Jan 2010
Posts: 19
Default

Yeas that's correct, I am mapping reads onto an existing reference. I guess I have to calculate the consensus myself in some way. If no one else already have done a program for that.......?

Kajsa
kajsa is offline   Reply With Quote
Old 01-22-2010, 08:02 AM   #6
maubp
Peter (Biopython etc)
 
Location: Dundee, Scotland, UK

Join Date: Jul 2009
Posts: 1,543
Default

Quote:
Originally Posted by kajsa View Post
Yeas that's correct, I am mapping reads onto an existing reference. I guess I have to calculate the consensus myself in some way. If no one else already have done a program for that.......?
Do you want a consensus if the reference AND the reads, or just the reads?

Either way, if you know Python this cookbook entry on treating an ACE file as an alignment might be a useful start. Once you have a Biopython alignment object it would be easier to calculate the consensus: http://www.biopython.org/wiki/ACE_contig_to_alignment
maubp is offline   Reply With Quote
Old 01-22-2010, 11:51 AM   #7
kajsa
Member
 
Location: Uppsala, Sweden

Join Date: Jan 2010
Posts: 19
Default

I want to have a consensus of the assembly of only the reads I recon. Thank you for your help, I will look at that coockbook entry.

Kajsa
kajsa is offline   Reply With Quote
Old 01-22-2010, 12:10 PM   #8
jjohnson
Member
 
Location: Washington DC Metro Area

Join Date: Aug 2009
Posts: 20
Default

I find the AMOS toolset convenient for converting between many differnt assembly outputs/inputs.

So, in this case you would use

toAmos which yields a amos Bank file (useful in using Hawkeye for viewing if you like) and the bonus is that you can input information on scaffolding, library sizes, etc and this will be maintained through conversions.

from this bank file, you could convert to several differnt formats, dump the consensus fasta, reads, etc.
__________________
Justin H. Johnson | Twitter: @BioInfo | LinkedIn: http://bit.ly/LIJHJ | EdgeBio
jjohnson is offline   Reply With Quote
Old 01-23-2010, 10:31 AM   #9
kajsa
Member
 
Location: Uppsala, Sweden

Join Date: Jan 2010
Posts: 19
Default

That also looks very interesting. I will look at that too.

Thanks!

Kajsa
kajsa is offline   Reply With Quote
Old 01-23-2010, 10:10 PM   #10
skycreative
Member
 
Location: GuangXi China

Join Date: Jan 2010
Posts: 27
Default

really powerful code, thanks and study the coding ...
skycreative is offline   Reply With Quote
Old 01-25-2010, 04:10 AM   #11
sklages
Senior Member
 
Location: Berlin, DE

Join Date: May 2008
Posts: 628
Default

But the problem remains: ACE format does not consider and thus does not store the consensus sequence ...
You should probably use a different alignment format?

cheers,
Sven
sklages is offline   Reply With Quote
Old 01-26-2010, 05:26 AM   #12
kajsa
Member
 
Location: Uppsala, Sweden

Join Date: Jan 2010
Posts: 19
Default

Sven: Do you have any suggestions on what alignment format to use?

Kajsa
kajsa is offline   Reply With Quote
Old 01-26-2010, 06:44 AM   #13
jjohnson
Member
 
Location: Washington DC Metro Area

Join Date: Aug 2009
Posts: 20
Default

It looks like you are trying to generate a consensus fasta file from a mapping/alignment, and you are using ace only because it is what is produced by mosaik (mosaik also produces sam/bam, but not useful for producing a consensus from a mapping as far as I can tell). One option is to use the Amos tools to produce a fasta consensus from the ace (after converting to the native amos format .bnk) . Most alignment formats do not include the reference and do not generate or recall a consensus from a multiple sequence alignment as far as I know, since most people only want to do variant detection downstream of the mapping. Mosaik will create a consensus of the mapping assembly, but only stores it in ace format. It will also produce SAM/BAM if you want to use the SAMtools downstream for further analysis.
__________________
Justin H. Johnson | Twitter: @BioInfo | LinkedIn: http://bit.ly/LIJHJ | EdgeBio
jjohnson is offline   Reply With Quote
Old 02-09-2010, 05:13 AM   #14
kajsa
Member
 
Location: Uppsala, Sweden

Join Date: Jan 2010
Posts: 19
Default

We didn't get the Amos tools to work on our mac computers. But we found a solution: We can export a BAM file and then use SAM tools to extract the consensus sequence.
kajsa is offline   Reply With Quote
Old 02-09-2010, 02:23 PM   #15
niazi84@hotmail.com
Member
 
Location: Uppsala

Join Date: Jan 2010
Posts: 25
Default

Quote:
Originally Posted by kajsa View Post
We didn't get the Amos tools to work on our mac computers. But we found a solution: We can export a BAM file and then use SAM tools to extract the consensus sequence.
I think you are talking about using 'pileup' command with -c (consensus calling parameter). Am i right?

I also have .ace file from mosaik but i dont know whats the next step to do with it. Can you guide me?
__________________
~Adnan~
niazi84@hotmail.com is offline   Reply With Quote
Old 02-09-2010, 02:40 PM   #16
sklages
Senior Member
 
Location: Berlin, DE

Join Date: May 2008
Posts: 628
Default

If you have an ACE file from MOSAIK, you should have the original (sorted) alignment from MOSAIK, which you can easily use to create a BAM file with MosaikText,

Quote:
------------------------------------------------------------------------------
MosaikText 1.0.1388 2010-02-01
Michael Stromberg Marth Lab, Boston College Biology Department
------------------------------------------------------------------------------

Description: exports reads and alignments from MOSAIK.

Usage: MosaikText -in <filename> -out <filename> -ia <filename>

Read Archive Options:
-ir <MOSAIK read filename> the input read file
-fastq <FASTQ filename> stores the data in a FASTQ file
-screen displays the reads on the screen

Alignment Archive Options:
-in <MOSAIK alignment filename> the input alignment file
-axt <axt filename> stores the data in an AXT file
-bam <bam filename> stores the data in a BAM file
-bed <bed filename> stores the data in a BED file
-eland <eland filename> stores the data in an Eland file
-ref <reference sequence name> displays output for a specific reference
-sam <sam filename> stores the data in a SAM file
-screen displays the alignments on the screen
-u limit output to unique reads
$ MosaikText -in PROJ_sorted.dat -bam PROJ.bam

Then use samtools or whatever you want :-)
sklages is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 07:47 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO