Hello
I have two sets of fastq files (control & treatment) from DNA-sequencing (illumina) of a bacterial strain. The strain is new and has no (very) closely related strains. I have used the CLC Genomics Workbench to build contigs de-novo and then used Sequencher to improve the quality and length of contigs generated from the CLC Workbench. I have to compare these two sets of contigs (fasta sequences) to search for similarities and differences. The treatment was an insertion in the genome and I wish to study how this affects the genome and/or genes. Since I don't have a way to perform a genome wide study, is there a way to perform Control_contigs vs Treatment_contigs pairwise search to study the difference and how easy or hard would it be to parse the result files from the tools suggested?
Statistical info: Size of contigs range from 128Kb to 100bases in both sets, majority of them between 5K-15Kbases. Each set has around 140 contigs. Majority of contigs have a 40% G/C content. Some of the contigs have N's. Average length of contigs:12Kbases.
PS. I have tried to use CD-HIT-EST-2D and standalone blast+ for which the results have been inconclusive or jobs failed.
Thanks,
I have two sets of fastq files (control & treatment) from DNA-sequencing (illumina) of a bacterial strain. The strain is new and has no (very) closely related strains. I have used the CLC Genomics Workbench to build contigs de-novo and then used Sequencher to improve the quality and length of contigs generated from the CLC Workbench. I have to compare these two sets of contigs (fasta sequences) to search for similarities and differences. The treatment was an insertion in the genome and I wish to study how this affects the genome and/or genes. Since I don't have a way to perform a genome wide study, is there a way to perform Control_contigs vs Treatment_contigs pairwise search to study the difference and how easy or hard would it be to parse the result files from the tools suggested?
Statistical info: Size of contigs range from 128Kb to 100bases in both sets, majority of them between 5K-15Kbases. Each set has around 140 contigs. Majority of contigs have a 40% G/C content. Some of the contigs have N's. Average length of contigs:12Kbases.
PS. I have tried to use CD-HIT-EST-2D and standalone blast+ for which the results have been inconclusive or jobs failed.
Thanks,
Comment