![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Is a server with 512 GB RAM enough for denovo genome assembly? | ngs_per | Bioinformatics | 15 | 03-26-2014 11:04 PM |
cuffmerge assembly vs denovo assembly of RNAseq data | skm | Bioinformatics | 0 | 10-16-2013 10:16 PM |
Denovo assembly | Thenna | Bioinformatics | 2 | 05-06-2013 07:09 AM |
Denovo assembly problem | huma Asif | Illumina/Solexa | 1 | 03-27-2013 10:20 PM |
denovo assembly | nagaraj | Bioinformatics | 5 | 07-11-2012 07:13 AM |
![]() |
|
Thread Tools |
![]() |
#1 |
Member
Location: US Join Date: May 2012
Posts: 80
|
![]()
Hi all,
I am trying to assemble plant mitochondria genome. The method I follow is to extract mitochondria reads from genomic reads (sequenced WGS approach using hiseq 2000, illumina paired-end reads) 1. I have downloaded all mitochondrial genomes of plants and indexed as reference genome using BWA 2. The raw paried-end reads were filtered (adapter & low quality reads filtered) which passed fastqc tool test. The fastqc passed filtered reads were interleaved using using perl script and used as single-end sequence. These single-end sequence were mapped to mitochondiral reference genome using BWA 3. Then mapped reads are extracted using samtools -F 4 option and got output in bam format 4. Using picard, bam format converted to fastq format 5.Before doing denovo assembly, I checked with fastqc, it failed in following (i)FAIL-Per sequence GC content (ii)FAIL-Sequence Duplication Levels (iii)FAIL-Overrepresented sequences (iv)FAIL-Kmer Content My questions (i) what I can I improve the reads before denovo assembly of mitochondrial reads? (ii) Which better tool to assembly mitochondrial genome velvet or soapdenovo?. How much kmer size can be used? |
![]() |
![]() |
![]() |
#2 |
Peter (Biopython etc)
Location: Dundee, Scotland, UK Join Date: Jul 2009
Posts: 1,543
|
![]()
I would try MITObim, which uses the MIRA assembler iteratively:
https://github.com/chrishah/MITObim http://dx.doi.org/10.1093/nar/gkt371 |
![]() |
![]() |
![]() |
#3 | |
Senior Member
Location: USA, Midwest Join Date: May 2008
Posts: 1,178
|
![]() Quote:
|
|
![]() |
![]() |
![]() |
#4 |
Member
Location: New Zealand Join Date: Mar 2014
Posts: 20
|
![]()
It's always good to have more than one way to do something to add confidence that your results are correct.
So for a completely different approach to MITObim, why not just de novo assemble your reads? Since mitochondrial coverage levels are typically much higher than other genomic data, you only need to use a small fraction of a full data set to achieve sufficient mitochondrial coverage. This has worked well for me when I tried it. For example, with no trimming or other quality control, I took an entire Illumina Genome Analyzer II chimpanzee shotgun sequencing dataset and de novo assembled 5% of the data (which works out at around 1.4 million pairs) with default settings in Geneious apart from turning the allow circular contigs setting on. It produced around 260,000 contigs, the largest of which was a 16,500 bp circular contig which agrees with the known chimpanzee mitochondrial genome apart from SNPs. The whole process only took a few minutes of my time and just over an hour of CPU time. It may or may not work as well for plants and/or other assemblers but it's worth a go since it's such a simple solution. Note - for anyone not familiar with Geneious, it is commercial software with a 2 week trial, so there's plenty of time to freely use it on your own data sets. Here are the exact steps I did with http://www.ncbi.nlm.nih.gov/sra/ERX012402 1) Drag and drop the 2 fastq files into Geneious 2) Select them both and go to 'Sequence -> Set Paired Reads' and set the insert size to 250 3) Select 'Align/Assemble -> De Novo Assemble' 4) Turned on "Use x% of the data" checkbox, setting x to 5. 5) In the advanced options turned on the circular contigs option 6) Clicked OK and a little over an hour later I had my circular mitochondrial sequence. |
![]() |
![]() |
![]() |
#5 |
Member
Location: US Join Date: May 2012
Posts: 80
|
![]()
@Matt kearse: Did you use 2 fastqfiles of only mitchondrial reads?
--------------------------------------------- @maubp I find difficult in installing MITObim, can you please help me? I am trying to use MITObim for mitchondrial genome assembly project. I have tried tutorial provided in MITObim:https://github.com/chrishah/MITObim I have installed Mira 3.4.1.1 and MITObim 1.6. Mapping assembly with MIRA 3.4.1.1 was sucessful, however wrapper script MITObim 1.6 stops at iteration 1, it should iterate upto 8 according the the tutorial, it says error "mirabait seems to have failed". What will be the reason mapping assembly with MIRA 3.4.1.1: mira --project=initial-mapping- testpool-to-Salpinus-mt --job=mapping,genome,accurate,solexa -MI:somrnl=0 -AS:nop=1 -SB:bft=fasta:bbq=30:bsn=Salpinus-mt-genome SOLEXA_SETTINGS -CO:msr=no -SB:dsn=testpool for MITObim 1.6: MITObim.pl -start 1 -end 10 -strain testpool -ref Salpinus_mt_genome -readpool initial-mapping-testpool-to-Salpinus-mt_in.solexa.fastq -maf initial-mapping-testpool-to-Salpinus-mt_assembly/initial-mapping-testpool-to-Salpinus-mt_d_results/initial-mapping-testpool-to-Salpinus-mt_out.maf &> log below is the log file: MITObim - mitochondrial baiting and iterative mapping version 1.6 author: Christoph Hahn, (c) 2012-2013 Full command run: ../MITObim_1.6.pl -start 1 -end 10 -strain testpool -ref Salpinus_mt_genome -readpool initial-mapping-testpool-to-Salpinus-mt_in.solexa.fastq -maf initial-mapping-testpool-to-Salpinus-mt_assembly/initial-mapping-testpool-to-Salpinus-mt_d_results/initial-mapping-testpool-to-Salpinus-mt_out.maf All paramters seem to make sense: startiteration: 1 enditeration: 10 strainname: testpool refname: Salpinus_mt_genome readpool: /PATH/MITObim-master/testdata1/initial-mapping-testpool-to-Salpinus-mt_in.solexa.fastq maf: /PATH/Downloads/MITObim-master/testdata1/initial-mapping-testpool-to-Salpinus-mt_assembly/initial-mapping-testpool-to-Salpinus-mt_d_results/initial-mapping-testpool-to-Salpinus-mt_out.maf quick: 0 paired: 0 denovo: 0 (mapping=0, denovo=1) noshow: 0 read trimming: 0 (off=0, on=1) kmer baiting: 31 platform: SOLEXA clean: 0 (off=0, on=1) proofread: 0 Starting MITObim ============== ITERATION 1 ============== May 8 10:20:53 recover backbone by running convert_project on maf file Parsing special MIRA parameters: SOLEXA_SETTINGS -CO:fnicpst=yes Ok. Loading from maf, saving to: fasta fastaqual First counting reads: [0%] ....|.... [10%] ....|.... [20%] ....|.... [30%] ....|.... [40%] ....|.... [50%] ....|.... [60%] ....|.... [70%] ....|.... [80%] ....|.... [90%] ....|.... [100%] Now loading and processing data: [0%] ....|.... [10%] ....|.... [20%] ....|.... [30%] ....|.... [40%] ....|.... [50%] ....|.... [60%] ....|.... [70%] ....|.... [80%] ....|.... [90%] ....|.Localtime: Thu May 8 10:20:53 2014 Generated 1692 unique template ids for 2548 valid reads. Localtime: Thu May 8 10:20:53 2014 Seeing strain 1: "testpool" Seeing strain 2: "Salpinus-mt-genome" Generated 2 unique strain ids for 2548 reads. makeIntelligentConsensus() complete calc makeIntelligentConsensus() complete calc makeIntelligentConsensus() complete calc makeIntelligentConsensus() complete calc makeIntelligentConsensus() complete calc makeIntelligentConsensus() complete calc makeIntelligentConsensus() complete calc makeIntelligentConsensus() complete calc makeIntelligentConsensus() complete calc makeIntelligentConsensus() complete calc makeIntelligentConsensus() complete calc makeIntelligentConsensus() complete calc ... [100%] Data conversion process finished, no obvious errors encountered. SC Read:: read name issorted (1) capacity 4294967295(4) size 1 SC Read:: scf path name issorted (1) capacity 255(1) size 1 SC Read:: exp path name issorted (1) capacity 255(1) size 1 SC Read:: machine type issorted (1) capacity 255(1) size 1 SC Read:: primer issorted (1) capacity 255(1) size 1 SC Read:: strain issorted (1) capacity 255(1) size 3 SC Read:: base caller issorted (1) capacity 255(1) size 1 SC Read:: dye issorted (1) capacity 255(1) size 1 SC Read:: process status issorted (1) capacity 255(1) size 1 SC Read:: clone vector name issorted (1) capacity 65535(2) size 1 SC Read:: sequencing vector name issorted (1) capacity 65535(2) size 1 SC asped issorted (1) capacity 4294967295(4) size 1 fishing readpool using mirabait (k = 31) mirabait seems to have failed - see detailed output above |
![]() |
![]() |
![]() |
#6 |
Member
Location: US Join Date: May 2012
Posts: 80
|
![]()
@Matt kearse: Did you use 2 fastqfiles of only mitchondrial reads?
--------------------------------------------- @maubp I find difficult in installing MITObim, can you please help me? I am trying to use MITObim for mitchondrial genome assembly project. I have tried tutorial provided in MITObim:https://github.com/chrishah/MITObim I have installed Mira 3.4.1.1 and MITObim 1.6. Mapping assembly with MIRA 3.4.1.1 was sucessful, however wrapper script MITObim 1.6 stops at iteration 1, it should iterate upto 8 according the the tutorial, it says error "mirabait seems to have failed". What will be the reason mapping assembly with MIRA 3.4.1.1: mira --project=initial-mapping- testpool-to-Salpinus-mt --job=mapping,genome,accurate,solexa -MI:somrnl=0 -AS:nop=1 -SB:bft=fasta:bbq=30:bsn=Salpinus-mt-genome SOLEXA_SETTINGS -CO:msr=no -SB:dsn=testpool for MITObim 1.6: MITObim.pl -start 1 -end 10 -strain testpool -ref Salpinus_mt_genome -readpool initial-mapping-testpool-to-Salpinus-mt_in.solexa.fastq -maf initial-mapping-testpool-to-Salpinus-mt_assembly/initial-mapping-testpool-to-Salpinus-mt_d_results/initial-mapping-testpool-to-Salpinus-mt_out.maf &> log below is the log file: MITObim - mitochondrial baiting and iterative mapping version 1.6 author: Christoph Hahn, (c) 2012-2013 Full command run: ../MITObim_1.6.pl -start 1 -end 10 -strain testpool -ref Salpinus_mt_genome -readpool initial-mapping-testpool-to-Salpinus-mt_in.solexa.fastq -maf initial-mapping-testpool-to-Salpinus-mt_assembly/initial-mapping-testpool-to-Salpinus-mt_d_results/initial-mapping-testpool-to-Salpinus-mt_out.maf All paramters seem to make sense: startiteration: 1 enditeration: 10 strainname: testpool refname: Salpinus_mt_genome readpool: /PATH/MITObim-master/testdata1/initial-mapping-testpool-to-Salpinus-mt_in.solexa.fastq maf: /PATH/Downloads/MITObim-master/testdata1/initial-mapping-testpool-to-Salpinus-mt_assembly/initial-mapping-testpool-to-Salpinus-mt_d_results/initial-mapping-testpool-to-Salpinus-mt_out.maf quick: 0 paired: 0 denovo: 0 (mapping=0, denovo=1) noshow: 0 read trimming: 0 (off=0, on=1) kmer baiting: 31 platform: SOLEXA clean: 0 (off=0, on=1) proofread: 0 Starting MITObim ============== ITERATION 1 ============== May 8 10:20:53 recover backbone by running convert_project on maf file Parsing special MIRA parameters: SOLEXA_SETTINGS -CO:fnicpst=yes Ok. Loading from maf, saving to: fasta fastaqual First counting reads: [0%] ....|.... [10%] ....|.... [20%] ....|.... [30%] ....|.... [40%] ....|.... [50%] ....|.... [60%] ....|.... [70%] ....|.... [80%] ....|.... [90%] ....|.... [100%] Now loading and processing data: [0%] ....|.... [10%] ....|.... [20%] ....|.... [30%] ....|.... [40%] ....|.... [50%] ....|.... [60%] ....|.... [70%] ....|.... [80%] ....|.... [90%] ....|.Localtime: Thu May 8 10:20:53 2014 Generated 1692 unique template ids for 2548 valid reads. Localtime: Thu May 8 10:20:53 2014 Seeing strain 1: "testpool" Seeing strain 2: "Salpinus-mt-genome" Generated 2 unique strain ids for 2548 reads. makeIntelligentConsensus() complete calc makeIntelligentConsensus() complete calc makeIntelligentConsensus() complete calc makeIntelligentConsensus() complete calc makeIntelligentConsensus() complete calc makeIntelligentConsensus() complete calc makeIntelligentConsensus() complete calc makeIntelligentConsensus() complete calc makeIntelligentConsensus() complete calc makeIntelligentConsensus() complete calc makeIntelligentConsensus() complete calc makeIntelligentConsensus() complete calc ... [100%] Data conversion process finished, no obvious errors encountered. SC Read:: read name issorted (1) capacity 4294967295(4) size 1 SC Read:: scf path name issorted (1) capacity 255(1) size 1 SC Read:: exp path name issorted (1) capacity 255(1) size 1 SC Read:: machine type issorted (1) capacity 255(1) size 1 SC Read:: primer issorted (1) capacity 255(1) size 1 SC Read:: strain issorted (1) capacity 255(1) size 3 SC Read:: base caller issorted (1) capacity 255(1) size 1 SC Read:: dye issorted (1) capacity 255(1) size 1 SC Read:: process status issorted (1) capacity 255(1) size 1 SC Read:: clone vector name issorted (1) capacity 65535(2) size 1 SC Read:: sequencing vector name issorted (1) capacity 65535(2) size 1 SC asped issorted (1) capacity 4294967295(4) size 1 fishing readpool using mirabait (k = 31) mirabait seems to have failed - see detailed output above |
![]() |
![]() |
![]() |
#7 |
Peter (Biopython etc)
Location: Dundee, Scotland, UK Join Date: Jul 2009
Posts: 1,543
|
![]()
Double check the version of MIRA matches what the version of MITObim recommends. In particular MIRA 3 and 4 are very different, but the exact version may matter too. Also you could try the MIRA mailing list (assuming MITObim doesn't have its own forum/list).
Last edited by maubp; 10-03-2014 at 06:20 AM. Reason: typo |
![]() |
![]() |
![]() |
#8 | |
Member
Location: New Zealand Join Date: Mar 2014
Posts: 20
|
![]() Quote:
|
|
![]() |
![]() |
![]() |
#9 |
Member
Location: US Join Date: May 2012
Posts: 80
|
![]()
@maubp
I have used stable version of MIRA 3.4.1.1 and MITObim 1.6 (double checked I am using correct version). The problem it can find mirabait executables while running MITObim 1.6. I tried export MIRA path in .bashrc and also tried with --MIRAPATH option. --------------------- @Matt Kears: Well it seems easy method in getting mitochondrial as circular contigs. I don't understand how Geneious gets mitchondrial reads without mapping to any reference mitochondria reads or align the assembled contigs to reference mitochondria genome?. I think from your steps (i) Geneious first do denovo assemble the whole genomic reads (ii) then you use only 5% of them to form circular contigs, but how sure if it is from mitochondria? may be it from chloroplast reads (if plants). I am sorry for misinterpretation. Could you please explain a bit?. |
![]() |
![]() |
![]() |
#10 |
Senior Member
Location: Pathum Thani, Thailand Join Date: Nov 2009
Posts: 190
|
![]()
With plant mitochondria you aren't necessarily expecting a single cyclic strand of DNA (for example: http://www.ncbi.nlm.nih.gov/pubmed/?...Mutation+Rates).
|
![]() |
![]() |
![]() |
#11 |
Member
Location: Carrollton, GA Join Date: May 2011
Posts: 27
|
![]()
I was able to use Geneious to assemble mitochondrial DNA form Illumina reads of fish DNA but in a different way than that suggested by Matt Kearse. I started with a small number of mitochondrial genes from a related species and then assembled to each of the genes with the Map to Reference function with an "Iterate up to X times" option. This iteratively extends the assembly of the contig from the initial gene outward. Eventually, the resulting contigs will overlap giving rise to the entire mitochondria DNA sequence. This may be similar to what MITObim does? It may be necessary to only use a fraction of the reads if the mtDNA sequences are highly enriched.
|
![]() |
![]() |
![]() |
#12 | |
Member
Location: New Zealand Join Date: Mar 2014
Posts: 20
|
![]() Quote:
Yes you're right - you may get chloroplast contigs too if you sequenced a region of the plant that has high chloroplast levels. In this case you'll need to do some analysis of the contigs to determine which are chloroplast or mitochondria. |
|
![]() |
![]() |
![]() |
#13 |
Member
Location: Innsbruck Join Date: Jul 2010
Posts: 28
|
![]()
Dears, I found this thread interesting and I'm trying to use this procedure.
I have 140M RNAseq reads in paired-ends (140M+140M) too much. MIRA give me clear warning, too much coverage. So I'm splitting reads in blocks in order to get a better coverage. Everything works, but as expected each read block gives a differente result and, because I'm using RNAseq data I have mainly sequences from coding genes. That means I have uncomplete contigs. What I'd like to do, to use as much information as possible, is to assemble all these condings (one for each read block) and obtain a sort of consensus. Do you have any advise how to do it? I was trying to use CAGOB (wgs-assembler), but since contigs have "x" (gaps) apparently it does not work! Thank you so much for your help Francesco |
![]() |
![]() |
![]() |
#14 |
Senior Member
Location: Pathum Thani, Thailand Join Date: Nov 2009
Posts: 190
|
![]()
You need to use a transcriptome assembler (SOAPdenovo-Trans, Trinity, Trans-abyss etc.), of course a genome assembler will fail with RNA-seq data, almost all of the assumptions regarding the data are violated.
|
![]() |
![]() |
![]() |
#15 | |
Member
Location: Innsbruck Join Date: Jul 2010
Posts: 28
|
![]() Quote:
I'm expecting to have a fragmented reconstruction of the genome, with no coverage on polyadenylated transcripts or shorter than my insert and that is what I have. I just would like take advantage of my high amount of reads and determine the most accurate reconstruction using replicas. F |
|
![]() |
![]() |
![]() |
#16 |
Senior Member
Location: Pathum Thani, Thailand Join Date: Nov 2009
Posts: 190
|
![]()
How can you reconstruct the genome using transcriptome data?
|
![]() |
![]() |
![]() |
#17 |
Member
Location: Innsbruck Join Date: Jul 2010
Posts: 28
|
![]()
Well, definitely not. But mtDNA is quite smaller than the nuclear genome; it's a single molecule while nuDNA has chromosomes; mtDNA genes do not splice or have different isoforms. I'd say is a totally different approach.
Still I do not see contraindications, what do you think is the misleading effect? F |
![]() |
![]() |
![]() |
#18 | |
Member
Location: Innsbruck Join Date: Jul 2010
Posts: 28
|
![]() Quote:
F |
|
![]() |
![]() |
![]() |
#19 |
Senior Member
Location: Pathum Thani, Thailand Join Date: Nov 2009
Posts: 190
|
![]()
If you are talking about a mammalian mitochondrial genome maybe. Plant mitochondrial genomes are very very different.
|
![]() |
![]() |
![]() |
#20 |
Member
Location: Innsbruck Join Date: Jul 2010
Posts: 28
|
![]() |
![]() |
![]() |
![]() |
Tags |
bioinformatics, genome, mitochondria, plastid |
Thread Tools | |
|
|