SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Genome Res De novo bacterial genome sequencing: millions of very short reads assembly b_seite Literature Watch 1 10-04-2017 11:26 PM
PubMed: Rapid hybrid de novo assembly of a microbial genome using only short reads: C Newsbot! Literature Watch 0 10-19-2011 11:40 PM
mapping 454 reads to a reference genome query Bioinformatics 33 02-09-2011 06:36 AM
PubMed: De novo assembly of a 40 Mb eukaryotic genome from short sequence reads: Sord Newsbot! Literature Watch 0 04-14-2010 02:01 AM
De novo Assembly of a 40 Mb Eukaryotic Genome from Short Sequence Reads Michael.James.Clark Literature Watch 1 04-09-2010 12:16 PM

Reply
 
Thread Tools
Old 02-07-2012, 06:51 AM   #1
zmartine
Junior Member
 
Location: NIH, NIAID (USA)

Join Date: Nov 2009
Posts: 4
Default Assisted de novo genome assembly? Create new consensus mapping reads to reference?

Greetings.

My issue:

I need the SNP-difference(s) between clone 1 and clone 2 of a haploid eukaryote. I do not have an assembled genome of this species for mapping, but there is one that is closely related with 100% synteny (I'll call it X, here).

I have illumina 50pb paired-end reads. I tried mapping clone 1 and clone 2 to X separately, then extracting the SNPs, removing the intersection of the sets and just using the complement. But there are too many SNPs, 1/100bp, and the sequence quality differs problematically among the samples (the clone 1 sample is gorgeous, clone 2 is a little iffy but higher coverage), thus the SNP list for clone 2 is more than twice as long as clone 1 (but the spurious SNPs have a > 100 coverage and Q scores over 100 in many cases.

(Pipeline = BWA -> sampe -> SAMtools mpileup -> BCFtools vcfutils.pl -> SNPs)

I'm thinking of assembling one clone to X, then exporting that as a new sequence, and then mapping clone 2 to that and cutting out the middle man. I am the only one in my immediate area working on this and I am just a mapping monkey, I don't know much about how to assemble a new genome and use it as a reference -- so any advice on how to do this or alternatives to solving the Find the SNP between Clone 1 and Clone 2 Problem is much appreciated.

Last edited by zmartine; 02-07-2012 at 08:00 AM.
zmartine is offline   Reply With Quote
Old 02-07-2012, 07:51 AM   #2
nickloman
Senior Member
 
Location: Birmingham, UK

Join Date: Jul 2009
Posts: 356
Default

This is a bit tricky but I would say your most likely options are:

1) do a true de novo assembly of clone 1, exclude the repetitive contigs using coverage depth as a guide and call SNPs by mapping clone 2 against those contigs - you could use Velvet, SOAPdenovo etc. for the assembly, and your regular aligner for mapping

2) do a reference-guided de novo assembly of clone 1 and then map clone 2 against that - you could use CLC Bio or MIRA for that

3) do a true de novo assembly of clone 1 with the same method as 1), scaffold the contigs against your reference using something like BAMBUS and map SNPs back

Have a look at http://www.molecularevolution.org/re...owtie_activity for a simple tutorial on Velvet.
nickloman is offline   Reply With Quote
Old 02-07-2012, 08:47 AM   #3
swbarnes2
Senior Member
 
Location: San Diego

Join Date: May 2008
Posts: 912
Default

If you want a non de novo method, you could try aligning the higher quality data to your reference, make a new reference by correcting for the SNPs you found, and then realign to that corrected reference. Hopefully, you should find that the number of new SNPs is drastically lower. You can try iterating that another time, then align sample 2 to your corrected reference.
swbarnes2 is offline   Reply With Quote
Old 02-07-2012, 09:32 AM   #4
Zam
Member
 
Location: Oxford

Join Date: Apr 2010
Posts: 51
Default

I'm beginning to feel a bit self-conscious about the fact that I seem to do so much self-promotion on seqnaswers recently. Apologies to those of you bored of hearing from me, I'll be brief. Martine - there is a de novo assembly variant caller, called Cortex, designed for this kind of question, which works on haploid and diploid organisms.
See
http://cortexassembler.sourceforge.n...ortex_var.html
and the paper here:
http://dx.doi.org/10.1038/ng.1028
It will assemble and look for differences directly, and spit out variants in
flank-allele1-allele2-flank format, which if you want can then be turned into VCF with respect to your X outgroup/related genome, or VCF with respect to a consensus.

If, as seems to be the case, you have a tonne of coverage, you can tell it to only pay attention to high quality reads.
Zam is offline   Reply With Quote
Old 02-07-2012, 09:45 AM   #5
nickloman
Senior Member
 
Location: Birmingham, UK

Join Date: Jul 2009
Posts: 356
Default

That sounds cool Zam, I definitely will be giving it a try for my own projects.
nickloman is offline   Reply With Quote
Old 02-07-2012, 01:19 PM   #6
zmartine
Junior Member
 
Location: NIH, NIAID (USA)

Join Date: Nov 2009
Posts: 4
Default

Zam, this looks like just what I was searching for!
zmartine is offline   Reply With Quote
Old 02-09-2012, 12:47 PM   #7
MQ-BCBB
Member
 
Location: Maryland

Join Date: May 2009
Posts: 25
Default error

Quote:
Originally Posted by nickloman View Post
That sounds cool Zam, I definitely will be giving it a try for my own projects.
Hello Zam, I am working with Martine in trying to run cortex. I ran the following line and received the error below:

command:
cortex_var_31_c1 --pe_list PyN67C_R1.fastq,PyN67C_R2.fastq --format FASTQ --quality_score_threshold 5 --remove_pcr_duplicates --remove_seq_errors --dump_binary PyN67C_R1-R2.ctx --kmer_size 21 --max_read_len 50

error:
Start loading @HWI-ST183:319:c02a2acxx:4:1101:1017:2087 1:Y:0: and @HWI-ST183:319:c02a2acxx:4:1101:1017:2087 2:Y:0:
cannot open file:@HWI-ST183:319:c02a2acxx:4:1101:1017:2087 1:Y:0:
MQ-BCBB is offline   Reply With Quote
Old 02-09-2012, 12:59 PM   #8
Zam
Member
 
Location: Oxford

Join Date: Apr 2010
Posts: 51
Default

Hi!
The --pe_list option wants a pair of filelists - ie two lists of FASTQ.
You have given it a pair of FASTQ. The manual goes through
some explicit examples. Also worth doing the first two examples
in the demo directory.
Cheers
Zam
Zam is offline   Reply With Quote
Old 02-10-2012, 12:31 AM   #9
jkbonfield
Senior Member
 
Location: Cambridge, UK

Join Date: Jul 2008
Posts: 146
Default

You could also look into ICORN http://icorn.sourceforge.net/ as this is an iterative correction of the reference to pull in more data and to find more SNPs. It should work well on closely related species, but I'm not sure how far it can push things.
jkbonfield is offline   Reply With Quote
Reply

Tags
assembly programs, de novo assembly, snp calling

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 10:21 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO