Hi all,
I am a newbie in bioinformatics analysis and I have a bacterial genome sequenced using PacBio RSII platform, 20 kB library, which yielded 5 contigs (Mean coverage 57.25) at the following length:
unitig_0 5490211 bp (coverage: 56.87)
unitig_1 246309 bp (coverage: 72.21)
unitig_2 146220 bp (coverage: 63.34)
unitig_3 141768 bp (coverage: 44.39)
unitig_4 15343 bp (coverage: 13.3)
When checked using dotplot (gepard), all 4 contigs (unitig 0-3) were shown to contain overlapping ends) while unitig 4 seems to be a huge chunk of repeats but since the coverage is so low, I think it is safe to be discarded?
I was trying to identify if any of the smaller contigs are plasmids so I submitted the assembled genome into PLASMIDfinder with the cutoff percentage of 95%. From PLASMIDfinder, unitig_1 was shown to contain a perfect match (identity 100%, query length with hit: 439, position 218092..218530) to Klebsiella pneumoniae plasmid pNDM-MAR. However, when I perform a blast (of unitig_1) against the NCBI nucleotide database, the top hit was shown to be Klebsiella pneumoniae subsp. pneumoniae PittNDM01 plasmid1 (86% query cover, 32493/32688 (99% identity), 59273 max score) whereas to Klebsiella pneumoniae plasmid pNDM-MAR the alignment was shown to be (84% query cover, 31159/31388 (99% identity), 56691 max score) with the same e value (0.0).
I am not sure which result should I trust or should I build a phylogenetic tree to find the closest match to the unitig_1? What would be a more accurate tool to identify presence of plasmid in the genome?
Thanks a million for helping.
I am a newbie in bioinformatics analysis and I have a bacterial genome sequenced using PacBio RSII platform, 20 kB library, which yielded 5 contigs (Mean coverage 57.25) at the following length:
unitig_0 5490211 bp (coverage: 56.87)
unitig_1 246309 bp (coverage: 72.21)
unitig_2 146220 bp (coverage: 63.34)
unitig_3 141768 bp (coverage: 44.39)
unitig_4 15343 bp (coverage: 13.3)
When checked using dotplot (gepard), all 4 contigs (unitig 0-3) were shown to contain overlapping ends) while unitig 4 seems to be a huge chunk of repeats but since the coverage is so low, I think it is safe to be discarded?
I was trying to identify if any of the smaller contigs are plasmids so I submitted the assembled genome into PLASMIDfinder with the cutoff percentage of 95%. From PLASMIDfinder, unitig_1 was shown to contain a perfect match (identity 100%, query length with hit: 439, position 218092..218530) to Klebsiella pneumoniae plasmid pNDM-MAR. However, when I perform a blast (of unitig_1) against the NCBI nucleotide database, the top hit was shown to be Klebsiella pneumoniae subsp. pneumoniae PittNDM01 plasmid1 (86% query cover, 32493/32688 (99% identity), 59273 max score) whereas to Klebsiella pneumoniae plasmid pNDM-MAR the alignment was shown to be (84% query cover, 31159/31388 (99% identity), 56691 max score) with the same e value (0.0).
I am not sure which result should I trust or should I build a phylogenetic tree to find the closest match to the unitig_1? What would be a more accurate tool to identify presence of plasmid in the genome?
Thanks a million for helping.