Hello there,
to the most of you my problem is maybe trivial but somehow I'am still struggeling with this one. So here is the problem.
I' have the data from some Illumina sequencing runs. Somewhat like 7 or 8 samples which contain between 20 - 30 million reads. I want to blastX these raw reads after clip and merge against viral nr becouse Im mainly interested in viruses. In a perfect world I want to work with diamond becouse it is so much faster than blast for aligning short reads against a reference sequence. But Diamond is also my worst nightmare when trying to import the output into MEGAN. When I try to import the output from a normal blastX it works perfectly. I import the blast output, the reads and the gi mapping file and get a nice tree. But with the Diamond output none of the reads could be assigned to at least one taxa.
Here is my Diamond comand:
./diamond blastx -d viralprot -q ~/D4439_megan_analyse/D4439merged.fa --sam ~/diamond_output/diamondout.sam
As an output I get a SAM file and when I try to import this into MEGAN the terror begins (it's also not working with the blast output from diamond).
Quote from the manual:
MEGAN can now parse files in SAM format [10]. Note, however, that SAM files usually do not contain the names of the taxa associated with the reference sequences and so one must supply suitable mapping files that map identifiers used for the reference sequences to NCBI taxa, KEGG, COG and/or SEED identifiers, see below.
I get that I must provide a mapping file which tells MEGAN which identifiers belong to which NCBI taxa. But I hve no idea what file I hava to load. Am I just not getting the point here? Where can I find a suitable file? It would be very nice if someone clould help me with this. Im struggeling for a long time with this one. And working with Diamond would be such an improvement.
With kind regards and thanks in advance,
Julian
to the most of you my problem is maybe trivial but somehow I'am still struggeling with this one. So here is the problem.
I' have the data from some Illumina sequencing runs. Somewhat like 7 or 8 samples which contain between 20 - 30 million reads. I want to blastX these raw reads after clip and merge against viral nr becouse Im mainly interested in viruses. In a perfect world I want to work with diamond becouse it is so much faster than blast for aligning short reads against a reference sequence. But Diamond is also my worst nightmare when trying to import the output into MEGAN. When I try to import the output from a normal blastX it works perfectly. I import the blast output, the reads and the gi mapping file and get a nice tree. But with the Diamond output none of the reads could be assigned to at least one taxa.
Here is my Diamond comand:
./diamond blastx -d viralprot -q ~/D4439_megan_analyse/D4439merged.fa --sam ~/diamond_output/diamondout.sam
As an output I get a SAM file and when I try to import this into MEGAN the terror begins (it's also not working with the blast output from diamond).
Quote from the manual:
MEGAN can now parse files in SAM format [10]. Note, however, that SAM files usually do not contain the names of the taxa associated with the reference sequences and so one must supply suitable mapping files that map identifiers used for the reference sequences to NCBI taxa, KEGG, COG and/or SEED identifiers, see below.
I get that I must provide a mapping file which tells MEGAN which identifiers belong to which NCBI taxa. But I hve no idea what file I hava to load. Am I just not getting the point here? Where can I find a suitable file? It would be very nice if someone clould help me with this. Im struggeling for a long time with this one. And working with Diamond would be such an improvement.
With kind regards and thanks in advance,
Julian
Comment