This might be a simple question. But since I'm a molecular archaeologist, I'm years behind the bioinfo times (or it feels that way), and I'm hoping this forum will be a good place to start.
I just got back a boatload of Illumina PE sequencing reads for a handful of species in one genus. In order to start with any analysis, I need a reference genome, in one, neat little file (ok maybe not 'little').
There are FTPs (specifically, Sanger and NCBI genome) where I can access the sequence data for the three previously completed genomes in my genus of interest. But upon initial examination, each chromosome is represented by the eleven following file extensions: *.asn *.faa *.fnn *.fna *.frn *.gbk *.gff *.ptt *.rnt *.rpt *.val.
I know that faa, fnn, fna, etc are all FASTA file formats with different types of information. Do I just need to cat the fna files for each chromosome?
Simply, how do I build the one file to rule them all? And is this how others have approached creating a reference genome file- to index in BWA, for example?
Any insight is appreciated!
I just got back a boatload of Illumina PE sequencing reads for a handful of species in one genus. In order to start with any analysis, I need a reference genome, in one, neat little file (ok maybe not 'little').
There are FTPs (specifically, Sanger and NCBI genome) where I can access the sequence data for the three previously completed genomes in my genus of interest. But upon initial examination, each chromosome is represented by the eleven following file extensions: *.asn *.faa *.fnn *.fna *.frn *.gbk *.gff *.ptt *.rnt *.rpt *.val.
I know that faa, fnn, fna, etc are all FASTA file formats with different types of information. Do I just need to cat the fna files for each chromosome?
Simply, how do I build the one file to rule them all? And is this how others have approached creating a reference genome file- to index in BWA, for example?
Any insight is appreciated!
Comment