Hello all,
This is my 1st thread. Hope this is not a redundant query.
I am a beginner on NGS data analysis. I work on the server with alloted RAM of 32GB. I have been asked to assemble and annotate a highly repetitive unassembled eukaryotic genome (silkworm).
Since the memory is too low I have used Rnnotator on my RNA-seq data. I have illumina paired end reads of 4GB data each end. Total reads combined are 33968114 (33 million). I do not know the coverage exactly as most of the genome is unannotated. The data I have been given are just reads which were sequenced prior to my joining.
My aim is to assemble the genome and see if there are any other organism genes (bacterial) with a high occurance in my sample so that I can work on their interaction at genome level. This I wanted to do by first assembling the genome and then searching for similarities of the contigs produced. I want to see the expression levels of transcripts too at a later stage. .
So i tried the assembly with Rnnotator with default options as -
$ perl /opt/Apps/Rnnotator-2.4.12/scripts/rnnotator.pl -strP 200 25_comb.fq -n 4 -trim off -kmer_length 25 -a oases -o /home/guest/Rohit/index_25/i25_k25
I tried with with different k-mer values starting from 19-79. I did not find much change in the asembly from 21-65. I have taken even k-mer values which i should not have but I needed to see if there would be any change in the data. The table of values is as follows -
K-mer k21 k22 k23 k25 k26 k27 k28 k31 k32 k33 k36 k41 k48 k51 k53 k57 k59 k66 k73 k77 k79
contigs 12325 12368 12399 12389 12345 12349 12339 12380 12354 12354 12332 12299 12276 12387 12366 12411 12358 12368 12379 12356
sequence total 10097 10118 10126 10132 10064 10093 10070 10114 10101 10095 10084 10079 10051 10124 10118 10120 10089 10110 10107 10101
Total bp 3991665 3995059 3990798 4003725 3964061 3981526 3957553 3975101 3971942 3970755 3965013 3973767 3973763 3970775 3992825 3956562 3957711 3970696 3970478 3986746
N50 Length(bp) 657 657 653 656 653 652 650 657 650 653 656 651 653 651 654 647 650 653 651 653
Largest contig 3530 3530 3528 3530 3528 3530 3530 3528 3528 3528 3530 3528 3530 3530 3528 3528 3528 3528 3530 3528
median contig 230 230 230 230 230 230 230 229 230 230 230 230 230 229 229 228 228 230 229 230
1) Which is the best one I can choose as my assembly? Is there one particular k-mer in this table?
Is it k-mer 25 as it has high number of bases, contigs and N50?
2) Are there any additional options to get a more accurate assembly with Rnnotator?
I tried using the Oases assembler in Rnnotator instead of Velvet but it is giving an error.
3) Is there any other assembler which runs on low RAM of 8-32 GB?
Please do not suggest commercial ones.
--
Regards,
Rohit
This is my 1st thread. Hope this is not a redundant query.
I am a beginner on NGS data analysis. I work on the server with alloted RAM of 32GB. I have been asked to assemble and annotate a highly repetitive unassembled eukaryotic genome (silkworm).
Since the memory is too low I have used Rnnotator on my RNA-seq data. I have illumina paired end reads of 4GB data each end. Total reads combined are 33968114 (33 million). I do not know the coverage exactly as most of the genome is unannotated. The data I have been given are just reads which were sequenced prior to my joining.
My aim is to assemble the genome and see if there are any other organism genes (bacterial) with a high occurance in my sample so that I can work on their interaction at genome level. This I wanted to do by first assembling the genome and then searching for similarities of the contigs produced. I want to see the expression levels of transcripts too at a later stage. .
So i tried the assembly with Rnnotator with default options as -
$ perl /opt/Apps/Rnnotator-2.4.12/scripts/rnnotator.pl -strP 200 25_comb.fq -n 4 -trim off -kmer_length 25 -a oases -o /home/guest/Rohit/index_25/i25_k25
I tried with with different k-mer values starting from 19-79. I did not find much change in the asembly from 21-65. I have taken even k-mer values which i should not have but I needed to see if there would be any change in the data. The table of values is as follows -
K-mer k21 k22 k23 k25 k26 k27 k28 k31 k32 k33 k36 k41 k48 k51 k53 k57 k59 k66 k73 k77 k79
contigs 12325 12368 12399 12389 12345 12349 12339 12380 12354 12354 12332 12299 12276 12387 12366 12411 12358 12368 12379 12356
sequence total 10097 10118 10126 10132 10064 10093 10070 10114 10101 10095 10084 10079 10051 10124 10118 10120 10089 10110 10107 10101
Total bp 3991665 3995059 3990798 4003725 3964061 3981526 3957553 3975101 3971942 3970755 3965013 3973767 3973763 3970775 3992825 3956562 3957711 3970696 3970478 3986746
N50 Length(bp) 657 657 653 656 653 652 650 657 650 653 656 651 653 651 654 647 650 653 651 653
Largest contig 3530 3530 3528 3530 3528 3530 3530 3528 3528 3528 3530 3528 3530 3530 3528 3528 3528 3528 3530 3528
median contig 230 230 230 230 230 230 230 229 230 230 230 230 230 229 229 228 228 230 229 230
1) Which is the best one I can choose as my assembly? Is there one particular k-mer in this table?
Is it k-mer 25 as it has high number of bases, contigs and N50?
2) Are there any additional options to get a more accurate assembly with Rnnotator?
I tried using the Oases assembler in Rnnotator instead of Velvet but it is giving an error.
3) Is there any other assembler which runs on low RAM of 8-32 GB?
Please do not suggest commercial ones.
--
Regards,
Rohit
Comment