SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
bwa index problem - bwt file is not created probean Bioinformatics 3 01-20-2014 09:07 PM
how do i create an rg.txt file for input into samtools merge? / picard lexi Bioinformatics 2 08-30-2013 03:20 AM
samtools picard SamFormatConverter Bio.X2Y Bioinformatics 3 07-08-2013 07:46 AM
Problems with htseq-count reading bam file created by STAR priya RNA Sequencing 3 06-01-2013 11:32 PM
too small output .sai file created by bwa aln? ..leads to bwa sampe hanging? Stina Bioinformatics 12 12-05-2012 07:32 AM

Reply
 
Thread Tools
Old 09-21-2015, 07:34 AM   #1
HeidiLee
Member
 
Location: Earth

Join Date: Jul 2011
Posts: 20
Default .dict file created by picard and by samtools

There are 2 ways of generating .dict file for human genome.

java -jar picard.jar CreateSequenceDictionary REFERENCE=reference.fa OUTPUT=reference.dict
samtools faidx ref.fasta

I found that the 2 files generated in the 2 different command have different file size.

I need .dict file for GATK and Picard. Which one is correct one?

Thank you very much.
HeidiLee is offline   Reply With Quote
Old 09-21-2015, 07:59 AM   #2
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 6,795
Default

Are you sure samtools faidx is creating a .dict file? It only creates an .fai file as I recollect.
GenoMax is offline   Reply With Quote
Old 08-24-2018, 12:45 AM   #3
dvdleest
Junior Member
 
Location: the netherlands

Join Date: Aug 2018
Posts: 1
Default

There are actually 2 ways of generating .dict files.

Either
Code:
java -jar picard.jar CreateSequenceDictionary REFERENCE=reference.fa OUTPUT=reference.dict
or
Code:
samtools dict reference.fa -o reference.dict
Both commands are supposed to produce the exact same .dict file. However, due to a solved error in an older version of samtools dict, there can be subtle differences. In some contigs a few positions of the sequence were lost if they are not within the usual character set [ACTG], resulting in different sequence lengths and a different md5sum.

So to answer your question with respect to "which one is [the] correct one?"
Picard functions correctly, so either use picard or the updated version of samtools for making the fasta sequence dictionary.

Last edited by dvdleest; 08-24-2018 at 01:11 AM.
dvdleest is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 08:17 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2018, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO