Hello!
I have a bit of a conundrum with Blast of all things. I have it locally installed on a Mac. I have individual fasta files of sequences from NCBI (about 300 tiny ones of individual genes from [GeneFamily1]). I have a reference genome (though it has about a dozen files associated with it) of [ReferenceOrganism]. I have EST sequences of [DiffOrganism] (which is somewhat closely related to the reference organism) where I'm going to be mining ITS versions of the gene family from.
So the way I was going to go about this was Blast all of the 300 [GeneFamily1] variations against [ReferenceOrganism] to find the most similar [GeneFamily1] genes in the [ReferenceOrganism]. I will pull those sequences out and Blast THEM against the EST sequences I have of a the somewhat closely related [DiffOrganism]. I'm working with a polyploid, so I'm expecting an expansion of [GeneFamily1] compared to its diploid reference.
Alright, so I poked around blast, and it definitely needs a database as far as I can tell. No problem, I just need to download my reference genome from phytozome and run makeblastdb right?
Except I downloaded the reference and have over a dozen files somehow. There are three "assembly" files (plant.fa.gz, plant.hardmasked.fa.gz, plant.softmasked.fa.gz) and twelve "annotation" files (plant.annotation_info.txt, plant.cds_primaryTranscriptOnly.fa.gz, plant.cds.fa.gz, plant.defline.txt, plant.gene_exons.gff3.gz, plant.gene.gff3.gz, plant.protein_primaryTranscriptOnly.fa.gz, plant.protein.fa.gz, plant.repeatmasked_assembly_2.gff3.gz, plant.synonym.txt, plant.transcript_primaryTranscriptOnly.fa.gz, plant.transcript.fa.gz).
By all means, I have read the user manual to see what commands to put in for makeblastdb, but I have no idea which files I should be using anyway, or why some of the files I have are important. Or, alternatively, if what I want to do is the correct way of going about trying to mine this blasted gene family from my polyploid ESTs.
I have a bit of a conundrum with Blast of all things. I have it locally installed on a Mac. I have individual fasta files of sequences from NCBI (about 300 tiny ones of individual genes from [GeneFamily1]). I have a reference genome (though it has about a dozen files associated with it) of [ReferenceOrganism]. I have EST sequences of [DiffOrganism] (which is somewhat closely related to the reference organism) where I'm going to be mining ITS versions of the gene family from.
So the way I was going to go about this was Blast all of the 300 [GeneFamily1] variations against [ReferenceOrganism] to find the most similar [GeneFamily1] genes in the [ReferenceOrganism]. I will pull those sequences out and Blast THEM against the EST sequences I have of a the somewhat closely related [DiffOrganism]. I'm working with a polyploid, so I'm expecting an expansion of [GeneFamily1] compared to its diploid reference.
Alright, so I poked around blast, and it definitely needs a database as far as I can tell. No problem, I just need to download my reference genome from phytozome and run makeblastdb right?
Except I downloaded the reference and have over a dozen files somehow. There are three "assembly" files (plant.fa.gz, plant.hardmasked.fa.gz, plant.softmasked.fa.gz) and twelve "annotation" files (plant.annotation_info.txt, plant.cds_primaryTranscriptOnly.fa.gz, plant.cds.fa.gz, plant.defline.txt, plant.gene_exons.gff3.gz, plant.gene.gff3.gz, plant.protein_primaryTranscriptOnly.fa.gz, plant.protein.fa.gz, plant.repeatmasked_assembly_2.gff3.gz, plant.synonym.txt, plant.transcript_primaryTranscriptOnly.fa.gz, plant.transcript.fa.gz).
By all means, I have read the user manual to see what commands to put in for makeblastdb, but I have no idea which files I should be using anyway, or why some of the files I have are important. Or, alternatively, if what I want to do is the correct way of going about trying to mine this blasted gene family from my polyploid ESTs.