SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics
Similar Threads
Thread Thread Starter Forum Replies Last Post
Cufflinks building problems aquleaf Bioinformatics 3 09-21-2016 05:36 AM
how to building vcftools executable ? gao Introductions 4 08-11-2014 04:06 AM
Building OLB for MacOS cabroadb Bioinformatics 4 08-04-2010 11:10 AM
help in building sequences in bowtie lilithdog General 1 12-17-2009 03:17 AM

Reply
 
Thread Tools
Old 10-22-2009, 02:46 PM   #1
sdriscoll
I like code
 
Location: San Diego, CA, USA

Join Date: Sep 2009
Posts: 436
Default building a Mosaik reference for Mouse

Can anybody point me in the right direction for building a single, full genome, reference for the mouse that i can then align my illumina read data to? I have FASTA reference files, one per chromosome, for the mouse which I downloaded from UCSC. I can pass one of those at a time to MosaikBuild to produce .dat files for each chromosome but that seems a little crazy because that means I'd have to run a single lane of data against each chromosome, 1 at a time.

If this is how other people do it then that's totally fine - it just seems like I should be able to build a single reference file for the entire genome.
sdriscoll is offline   Reply With Quote
Old 10-22-2009, 05:20 PM   #2
snownebula
Junior Member
 
Location: Boston, MA

Join Date: Oct 2009
Posts: 9
Default

Hi there,

All you have to do is create a concatenated FASTA file and you'll be all set with MOSAIK.

For example, if I wanted to combine the first four mouse chromosomes into one file, I could type:

cat mm_ref_chr1.fa >> mouse_ref.fa
cat mm_ref_chr2.fa >> mouse_ref.fa
cat mm_ref_chr3.fa >> mouse_ref.fa
cat mm_ref_chr4.fa >> mouse_ref.fa

You could keep doing this for all of the mouse chromosomes or if you're savvy at creating bash scripts, you could pretty much automate the above in a small script.

Cheers,

// Michael
snownebula is offline   Reply With Quote
Old 10-23-2009, 09:15 AM   #3
sdriscoll
I like code
 
Location: San Diego, CA, USA

Join Date: Sep 2009
Posts: 436
Default

cool thanks. it just wasn't clear in the documentation that you could just cat files together to make one larger reference. now i just need to figure out this jump database thing and i'll be off and running. Mosaik chews up some serious RAM and i've only got 16 GB on the system I'm running it on. looks like a jump database will help for running full genome alignments on this system.
sdriscoll is offline   Reply With Quote
Old 10-23-2009, 02:36 PM   #4
sdriscoll
I like code
 
Location: San Diego, CA, USA

Join Date: Sep 2009
Posts: 436
Default

so i made this cat'd reference file (2.6 GB) and compiled it down. then i made a jump database and started a run with MosaikAligner using the jump database of this full genome reference. i ran pretty much the default settings listed in the manual except with only 4 cpu cores. it looks like it munched up about 19GB of RAM to load the jump database files into memory but once the alignment actually started it wasn't using all 4 cores - it was only using about 4% of the CPU and it was processing only 3.5 reads per second with an ETA of 53 DAYS. what could be going wrong?
sdriscoll is offline   Reply With Quote
Old 07-21-2010, 12:07 PM   #5
donniemarco
Member
 
Location: USA

Join Date: Aug 2009
Posts: 17
Default cat all files

maybe concatting different files might get little tedious. i tried:
cat chr*.fa >> human_ref.fa

it worked well.
donniemarco is offline   Reply With Quote
Old 07-21-2010, 03:02 PM   #6
mkeehan
Member
 
Location: Hamilton NZ

Join Date: Feb 2010
Posts: 13
Default

Are you still using your system with 16GB of RAM?
You are probably swapping if it's using 19GB...

I found reading the manual to get the right parameters made a huge difference to the reads per second. The magic parameters I found were
-bw 13 -act 20 -mm 4 -mhp 100
That took me from a few reads per second to 700 - 800 per second.

I also needed around 20GB of RAM for the jump database.
mkeehan is offline   Reply With Quote
Old 07-21-2010, 09:09 PM   #7
sdriscoll
I like code
 
Location: San Diego, CA, USA

Join Date: Sep 2009
Posts: 436
Default

thanks for sharing the magic.
sdriscoll is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off



All times are GMT -8. The time now is 10:19 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2022, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO