![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Genome Res De novo bacterial genome sequencing: millions of very short reads assembly | b_seite | Literature Watch | 1 | 10-05-2017 12:26 AM |
PubMed: Rapid hybrid de novo assembly of a microbial genome using only short reads: C | Newsbot! | Literature Watch | 0 | 10-20-2011 12:40 AM |
mapping 454 reads to a reference genome | query | Bioinformatics | 33 | 02-09-2011 07:36 AM |
PubMed: De novo assembly of a 40 Mb eukaryotic genome from short sequence reads: Sord | Newsbot! | Literature Watch | 0 | 04-14-2010 03:01 AM |
De novo Assembly of a 40 Mb Eukaryotic Genome from Short Sequence Reads | Michael.James.Clark | Literature Watch | 1 | 04-09-2010 01:16 PM |
![]() |
|
Thread Tools |
![]() |
#1 |
Junior Member
Location: NIH, NIAID (USA) Join Date: Nov 2009
Posts: 4
|
![]()
Greetings.
My issue: I need the SNP-difference(s) between clone 1 and clone 2 of a haploid eukaryote. I do not have an assembled genome of this species for mapping, but there is one that is closely related with 100% synteny (I'll call it X, here). I have illumina 50pb paired-end reads. I tried mapping clone 1 and clone 2 to X separately, then extracting the SNPs, removing the intersection of the sets and just using the complement. But there are too many SNPs, 1/100bp, and the sequence quality differs problematically among the samples (the clone 1 sample is gorgeous, clone 2 is a little iffy but higher coverage), thus the SNP list for clone 2 is more than twice as long as clone 1 (but the spurious SNPs have a > 100 coverage and Q scores over 100 in many cases. (Pipeline = BWA -> sampe -> SAMtools mpileup -> BCFtools vcfutils.pl -> SNPs) I'm thinking of assembling one clone to X, then exporting that as a new sequence, and then mapping clone 2 to that and cutting out the middle man. I am the only one in my immediate area working on this and I am just a mapping monkey, I don't know much about how to assemble a new genome and use it as a reference -- so any advice on how to do this or alternatives to solving the Find the SNP between Clone 1 and Clone 2 Problem is much appreciated. Last edited by zmartine; 02-07-2012 at 09:00 AM. |
![]() |
![]() |
![]() |
#2 |
Senior Member
Location: Birmingham, UK Join Date: Jul 2009
Posts: 356
|
![]()
This is a bit tricky but I would say your most likely options are:
1) do a true de novo assembly of clone 1, exclude the repetitive contigs using coverage depth as a guide and call SNPs by mapping clone 2 against those contigs - you could use Velvet, SOAPdenovo etc. for the assembly, and your regular aligner for mapping 2) do a reference-guided de novo assembly of clone 1 and then map clone 2 against that - you could use CLC Bio or MIRA for that 3) do a true de novo assembly of clone 1 with the same method as 1), scaffold the contigs against your reference using something like BAMBUS and map SNPs back Have a look at http://www.molecularevolution.org/re...owtie_activity for a simple tutorial on Velvet. |
![]() |
![]() |
![]() |
#3 |
Senior Member
Location: San Diego Join Date: May 2008
Posts: 912
|
![]()
If you want a non de novo method, you could try aligning the higher quality data to your reference, make a new reference by correcting for the SNPs you found, and then realign to that corrected reference. Hopefully, you should find that the number of new SNPs is drastically lower. You can try iterating that another time, then align sample 2 to your corrected reference.
|
![]() |
![]() |
![]() |
#4 |
Member
Location: Oxford Join Date: Apr 2010
Posts: 51
|
![]()
I'm beginning to feel a bit self-conscious about the fact that I seem to do so much self-promotion on seqnaswers recently. Apologies to those of you bored of hearing from me, I'll be brief. Martine - there is a de novo assembly variant caller, called Cortex, designed for this kind of question, which works on haploid and diploid organisms.
See http://cortexassembler.sourceforge.n...ortex_var.html and the paper here: http://dx.doi.org/10.1038/ng.1028 It will assemble and look for differences directly, and spit out variants in flank-allele1-allele2-flank format, which if you want can then be turned into VCF with respect to your X outgroup/related genome, or VCF with respect to a consensus. If, as seems to be the case, you have a tonne of coverage, you can tell it to only pay attention to high quality reads. |
![]() |
![]() |
![]() |
#5 |
Senior Member
Location: Birmingham, UK Join Date: Jul 2009
Posts: 356
|
![]()
That sounds cool Zam, I definitely will be giving it a try for my own projects.
|
![]() |
![]() |
![]() |
#6 |
Junior Member
Location: NIH, NIAID (USA) Join Date: Nov 2009
Posts: 4
|
![]()
Zam, this looks like just what I was searching for!
|
![]() |
![]() |
![]() |
#7 | |
Member
Location: Maryland Join Date: May 2009
Posts: 25
|
![]() Quote:
command: cortex_var_31_c1 --pe_list PyN67C_R1.fastq,PyN67C_R2.fastq --format FASTQ --quality_score_threshold 5 --remove_pcr_duplicates --remove_seq_errors --dump_binary PyN67C_R1-R2.ctx --kmer_size 21 --max_read_len 50 error: Start loading @HWI-ST183:319:c02a2acxx:4:1101:1017:2087 1:Y:0: and @HWI-ST183:319:c02a2acxx:4:1101:1017:2087 2:Y:0: cannot open file:@HWI-ST183:319:c02a2acxx:4:1101:1017:2087 1:Y:0: |
|
![]() |
![]() |
![]() |
#8 |
Member
Location: Oxford Join Date: Apr 2010
Posts: 51
|
![]()
Hi!
The --pe_list option wants a pair of filelists - ie two lists of FASTQ. You have given it a pair of FASTQ. The manual goes through some explicit examples. Also worth doing the first two examples in the demo directory. Cheers Zam |
![]() |
![]() |
![]() |
#9 |
Senior Member
Location: Cambridge, UK Join Date: Jul 2008
Posts: 146
|
![]()
You could also look into ICORN http://icorn.sourceforge.net/ as this is an iterative correction of the reference to pull in more data and to find more SNPs. It should work well on closely related species, but I'm not sure how far it can push things.
|
![]() |
![]() |
![]() |
Tags |
assembly programs, de novo assembly, snp calling |
Thread Tools | |
|
|