SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Targeted Genome Assembly for region poorly represented in reference genome? gumbos Bioinformatics 1 01-09-2012 04:01 PM
Please help: imperfect reference genome/get consensus on genome/read alignment? KAP Bioinformatics 1 08-19-2011 07:14 AM
transferring annotations from reference genome to the resequenced genome mike.t Bioinformatics 1 09-17-2010 05:35 AM
Whole Large Genome Alignment with Annotation SoftGenetics Vendor Forum 0 06-22-2009 07:01 AM
Reference genome for MAQ - split reference genome by chromosome or not? inesdesantiago Bioinformatics 4 02-18-2009 08:44 AM

Reply
 
Thread Tools
Old 10-12-2011, 07:39 AM   #1
PatrickReed
Junior Member
 
Location: Chicago, Il

Join Date: Oct 2011
Posts: 2
Smile Super Large Reference Genome

I am working on a project in which i am analyzing RNAseq data from fused interspecific cell types, specifically mouse cells and rat cells, and then performing. Being confident that a given read came from the mouse genome or the rat genome is crucial thus the optimal reference genome would be the union of mm9.fa and rn4.fa, but the size is too large for build with bowtie/tophat. Is their anyway to build this reference genome? Why is there a set limit on the size that a reference genome can be. Any help would be greatly appreciated. I know there are work arounds by performing alignemnts to one genome then the other and looking at differences and overlap so on and so forth but this is not optimal.

Cheers,
PatrickReed is offline   Reply With Quote
Old 10-12-2011, 08:56 AM   #2
maubp
Peter (Biopython etc)
 
Location: Dundee, Scotland, UK

Join Date: Jul 2009
Posts: 1,543
Default

Are you trying to use a single concatenated sequence? Is so, why not use a multi-entry FASTA file containing both the rat and the mouse chromosomes?

The SAM/BAM format itself has a limit of 2^31 - 1 base pairs for each reference sequence, or about 2Gbp (2 billion base pairs). In theory this could be raised to 2^32 - 1 or about 4Gbp but it would cause trouble for Java tools. However, you are much more likely to hit a limitation in the current BAM indexing scheme (BAI files) of 512Mbp (or half a billion base pairs), which is a problem for some organisms - but not for mice, rats or humans!

Perhaps there is some other limiting factor in bowtie/tophat as well?
maubp is offline   Reply With Quote
Old 10-12-2011, 09:32 AM   #3
ffinkernagel
Senior Member
 
Location: Marburg, Germany

Join Date: Oct 2009
Posts: 110
Default

Yes, bowtie indices use 32 unsigned integers which limits them to about 4gb.
Going up to 64 bit integers would double the memory requirement and probably also slow down the alignment process.

You could extend bowtie to allow larger genomes - the SeqAn library it uses should even make this a pretty straightforward endeavour. Better ask the authors how to go about it though.
ffinkernagel is offline   Reply With Quote
Old 10-12-2011, 10:07 AM   #4
PatrickReed
Junior Member
 
Location: Chicago, Il

Join Date: Oct 2011
Posts: 2
Default

Thanks ffinkernagel, thats really helpful, i'll start looking at the SeqAn library and try to get in contact with the authors.
PatrickReed is offline   Reply With Quote
Reply

Tags
bowtie-build, reference genome

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 08:44 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO