SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Newbie - experimental design question enkia Sample Prep / Library Generation 10 03-20-2012 06:09 AM
How to use BWA - a newbie's question henrye Bioinformatics 1 03-12-2012 02:47 PM
1000 genomes newbie question brofallon Bioinformatics 1 06-16-2011 06:50 AM
Newbie Question Here Schoenbrau General 1 12-05-2010 09:20 PM
Newbie Question, Getting my Terminology in Order Bardj General 0 01-21-2010 12:26 PM

Reply
 
Thread Tools
Old 03-22-2012, 06:25 PM   #1
jmartin
Member
 
Location: St. Louis

Join Date: Dec 2009
Posts: 74
Default GATK newbie question

I'm trying to run the GATK Unified Genotyper on a set of bam files that I have coordinate sorted, but when I run the UG its giving me this error message:

"Input files reads and reference have incompatible contigs: Order of contigs differences, which is unsafe."

From reading, I believe I need to re-order my reference fasta file to match the order apparent in the coordinate sorted header of my bam files. But I'm not sure how to do that. Is there code somewhere that will let me re-order my reference file to match a given bam file's order?

I'm using a reference that comprises ~5k small genomes, some of which are in pieces (~188k total sequence records). The file size is 7.3Gb.

I also think that maybe (probably?) I need to pull out single specific references and run the UG on single references at a time. Its a metagenomic project, and I was hoping to get results for the whole thing at one time, but that might not be realistic. But even if I pull out single, well covered genome references, some of them will be in hundreds of pieces themselves. So I'd still need a way to order my reference. I could probably write up something in perl to do this, but I'm not too strong a coder, and I'm worried that I'd have memory issues trying to hash 188k sequences and juggle them around.

Can anyone offer me some guidance on this?
jmartin is offline   Reply With Quote
Old 03-27-2012, 02:43 PM   #2
spreeth84
Junior Member
 
Location: Boston

Join Date: Jan 2011
Posts: 9
Default

Is this what you are looking for? Picard - Reorder sam
http://picard.sourceforge.net/comman...tml#ReorderSam

You can reorder your reads to match the order of your reference sequence.
spreeth84 is offline   Reply With Quote
Old 03-27-2012, 02:54 PM   #3
jmartin
Member
 
Location: St. Louis

Join Date: Dec 2009
Posts: 74
Default

I will take a look at that, thanks for the suggestion. I had thought my alignment sam file needed to be sorted in coordinate order, so I was kind of thinking that I needed to re-order my reference.

I've been having more luck just extracting single references and working on one at a time though. When I do that the reference fasta is small enough that a simple perl script can order it. I may just give up on trying to call SNPs on my whole subject database in a single go.
jmartin is offline   Reply With Quote
Old 03-27-2012, 03:43 PM   #4
spreeth84
Junior Member
 
Location: Boston

Join Date: Jan 2011
Posts: 9
Default

The Reordering can be done only after sorting - you will still have to first sort your reads (and reference) by coordinates. Guess that doesn't solve the reference sorting problem you are facing..
spreeth84 is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 09:58 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2021, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO