View Single Post
Old 05-05-2014, 08:34 PM   #1
bioman1
Member
 
Location: US

Join Date: May 2012
Posts: 80
Wink Getting nuclear genome

I have plant genomic reads (illumina paired-end reads, Hiseq 2000 -WGS approach). I would like to get nuclear genome eliminating probable contaminating sequences (like mitochondrial, chloroplast, bacterial sequences and vector sequences). After eliminating, I would like to do denovo assembly.

I would like to know the good workflow for this. Please help me in suggesting good workflow. The planned workflow

1. I am planning to map the filtered illumina paired-ends reads (filtered through trimmomatic tool) to Arabidopsis thaliana mitochondira & choloroplast genome uisng BWA and filter unmapped reads using samtools.

2. The unmapped reads will be nuclear reads.Again vector and bacterial contamination is removed by mapping against univec database (http://www.ncbi.nlm.nih.gov/tools/vecscreen/univec/) and bacterial genomes (http://www.ncbi.nlm.nih.gov/genomes/...l_taxtree.html).

My question is , how can I map the reads

(i) To i need to index reference genome individually or can I combine (chloroplast and mitochondira genome as one reference)?. If I need to index separately, how can I get unmapped reads from choloroplast and mitochondria genome as paired-end fastq file?
(ii) After getting unmapped reads, how can I remove bacterial and vector contamination from the reads?
bioman1 is offline   Reply With Quote