Hello,
I'm new to bioinformatics and am trying to use the RATT annotation tool to transfer annotation from an already sequenced bacterial strain to another very closely related one.
RATT has run successfully, but I'm not sure how to interpret the results.
The main target of my project is to find out how many/which genes are common to both the strains and which genes are unique to either of the strains.
RATT results have generated .final.embl files, .embl files, .report files, .txt files and .mutations.gff files.
The overview of the results looks like this:
Overview of transfere of annotation elements:
3500 elements found.
2543 Elements were transfered.
0 Elements could be transfered partially.
61 Elements split.
957 Parts of elements (i.e.exons tRNA) not transferred.
957 Elements couldn't be transferred.
1.What should I do next? Should I assume that the 2543 genes are found in both the strains?
2. Should I look at the 957 genes that were not transferred manually to ascertain their presence or absence in both the strains?
3.Since my query file is the soapdenovo assembly .scafseq file, the sequences are present in the form of 'C' numbers like C12345 and scaffolds 'scaffold 234'. Should I look at the .final.embl files generated (eg C12345.final.emb) and all the C & scaffold numbers that are missing are present only in my query not in the reference genome.
4. How do I determine the number of genes in my query genome with the help of RATT?
If theres a better way to use RATT to pick out genes common to both the strains and ascertain total number of genes, please let me know.
I'd really appreciate any help. Like I said earlier, I'm new to bioinformatics and would be grateful for any help.
bgansw
I'm new to bioinformatics and am trying to use the RATT annotation tool to transfer annotation from an already sequenced bacterial strain to another very closely related one.
RATT has run successfully, but I'm not sure how to interpret the results.
The main target of my project is to find out how many/which genes are common to both the strains and which genes are unique to either of the strains.
RATT results have generated .final.embl files, .embl files, .report files, .txt files and .mutations.gff files.
The overview of the results looks like this:
Overview of transfere of annotation elements:
3500 elements found.
2543 Elements were transfered.
0 Elements could be transfered partially.
61 Elements split.
957 Parts of elements (i.e.exons tRNA) not transferred.
957 Elements couldn't be transferred.
1.What should I do next? Should I assume that the 2543 genes are found in both the strains?
2. Should I look at the 957 genes that were not transferred manually to ascertain their presence or absence in both the strains?
3.Since my query file is the soapdenovo assembly .scafseq file, the sequences are present in the form of 'C' numbers like C12345 and scaffolds 'scaffold 234'. Should I look at the .final.embl files generated (eg C12345.final.emb) and all the C & scaffold numbers that are missing are present only in my query not in the reference genome.
4. How do I determine the number of genes in my query genome with the help of RATT?
If theres a better way to use RATT to pick out genes common to both the strains and ascertain total number of genes, please let me know.
I'd really appreciate any help. Like I said earlier, I'm new to bioinformatics and would be grateful for any help.
bgansw