How to perform a Genome Assembly with Rnnotator?

rohitngs

Member

Join Date: Jul 2012

Posts: 14
- Share
- Tweet
#1

How to perform a Genome Assembly with Rnnotator?

10-18-2012, 12:12 AM

Hello all,

This is my 1st thread. Hope this is not a redundant query.

I am a beginner on NGS data analysis. I work on the server with alloted RAM of 32GB. I have been asked to assemble and annotate a highly repetitive unassembled eukaryotic genome (silkworm).

Since the memory is too low I have used Rnnotator on my RNA-seq data. I have illumina paired end reads of 4GB data each end. Total reads combined are 33968114 (33 million). I do not know the coverage exactly as most of the genome is unannotated. The data I have been given are just reads which were sequenced prior to my joining.

My aim is to assemble the genome and see if there are any other organism genes (bacterial) with a high occurance in my sample so that I can work on their interaction at genome level. This I wanted to do by first assembling the genome and then searching for similarities of the contigs produced. I want to see the expression levels of transcripts too at a later stage. .

So i tried the assembly with Rnnotator with default options as -

$ perl /opt/Apps/Rnnotator-2.4.12/scripts/rnnotator.pl -strP 200 25_comb.fq -n 4 -trim off -kmer_length 25 -a oases -o /home/guest/Rohit/index_25/i25_k25

I tried with with different k-mer values starting from 19-79. I did not find much change in the asembly from 21-65. I have taken even k-mer values which i should not have but I needed to see if there would be any change in the data. The table of values is as follows -

K-mer k21 k22 k23 k25 k26 k27 k28 k31 k32 k33 k36 k41 k48 k51 k53 k57 k59 k66 k73 k77 k79

contigs 12325 12368 12399 12389 12345 12349 12339 12380 12354 12354 12332 12299 12276 12387 12366 12411 12358 12368 12379 12356

sequence total 10097 10118 10126 10132 10064 10093 10070 10114 10101 10095 10084 10079 10051 10124 10118 10120 10089 10110 10107 10101

Total bp 3991665 3995059 3990798 4003725 3964061 3981526 3957553 3975101 3971942 3970755 3965013 3973767 3973763 3970775 3992825 3956562 3957711 3970696 3970478 3986746

N50 Length(bp) 657 657 653 656 653 652 650 657 650 653 656 651 653 651 654 647 650 653 651 653

Largest contig 3530 3530 3528 3530 3528 3530 3530 3528 3528 3528 3530 3528 3530 3530 3528 3528 3528 3528 3530 3528

median contig 230 230 230 230 230 230 230 229 230 230 230 230 230 229 229 228 228 230 229 230

1) Which is the best one I can choose as my assembly? Is there one particular k-mer in this table?
Is it k-mer 25 as it has high number of bases, contigs and N50?

2) Are there any additional options to get a more accurate assembly with Rnnotator?
I tried using the Oases assembler in Rnnotator instead of Velvet but it is giving an error.

3) Is there any other assembler which runs on low RAM of 8-32 GB?
Please do not suggest commercial ones.

--
Regards,
Rohit

Last edited by rohitngs; 10-18-2012, 01:37 AM.
Tags: None
rohitngs

Member

Join Date: Jul 2012

Posts: 14
- Share
- Tweet
#2

10-18-2012, 11:15 PM

I can see that there is no reply till now. But I have mailed the developers of Rnnotator and got this reply just now.

1. Best K-mer value for assembly
There is no single kmer assembly that can give the best results for all genes, therefore the software executed multiple Velvet assemblies and then merged the resulting contigs using the Minimus2. Merging the Velvet assembled contigs resulted in a much better assembly than running a particular kmer assembly.

2. Oases not running on Rnnotator
Most likely oases is not in your PATH or has not been installed. You may set the software path using $PATH variable. Type "which oases" to see if the software is available. I recommend you to use velvet assembler.

3. Additional options for better results
There are several sets of advanced options for doing this, e.g., ADVANCED ASSEMBLY OPTIONS, ADVANCED POLISHING OPTIONS and ACCURACY ASSESSMENT OPTIONS. You may play around these options to improve/assess the contigs.

4. Transcript levels of expression
If you use --keep_rundir option, then intermediate files can be kept in rnnotator_run directory. counts.txt shows you the number of reads which align to each gene(contig). This file provides an accurate estimation of gene expression levels.

5. Final output
The final output file is called final_contigs.fa, which contains de novo assembled final transcripts.

This gives details of the Rnnotator now. Still it would be great to know of anything else other than this, about Rnnotator and other Assemblers using low RAM.

Last edited by rohitngs; 10-19-2012, 12:24 AM.
Comment

Previous template Next

Essential Discoveries and Tools in Epitranscriptomics

by seqadmin

The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
- Channel: Articles
04-22-2024, 07:01 AM
Current Approaches to Protein Sequencing

by seqadmin

Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
- Channel: Articles
04-04-2024, 04:25 PM

Topics	Statistics	Last Post
Expanding the Horizons of Cellular Research with the Single Cell Atlas by seqadmin Started by seqadmin, Yesterday, 11:49 AM	0 responses 13 views 0 likes	Last Post by seqadmin Yesterday, 11:49 AM
Genetic Variants and Diabetes Risk in Childhood Cancer Survivors by seqadmin Started by seqadmin, 04-24-2024, 08:47 AM	0 responses 16 views 0 likes	Last Post by seqadmin 04-24-2024, 08:47 AM
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 61 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 60 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM

Seqanswers Leaderboard Ad

Announcement

How to perform a Genome Assembly with Rnnotator?

Comment

Latest Articles

ad_right_rmr

News