Unconfigured Ad

**nickloman** · 02-07-2012, 08:51 AM

This is a bit tricky but I would say your most likely options are:

1) do a true de novo assembly of clone 1, exclude the repetitive contigs using coverage depth as a guide and call SNPs by mapping clone 2 against those contigs - you could use Velvet, SOAPdenovo etc. for the assembly, and your regular aligner for mapping

2) do a reference-guided de novo assembly of clone 1 and then map clone 2 against that - you could use CLC Bio or MIRA for that

3) do a true de novo assembly of clone 1 with the same method as 1), scaffold the contigs against your reference using something like BAMBUS and map SNPs back

Have a look at http://www.molecularevolution.org/re...owtie_activity for a simple tutorial on Velvet.

**swbarnes2** · 02-07-2012, 09:47 AM

If you want a non de novo method, you could try aligning the higher quality data to your reference, make a new reference by correcting for the SNPs you found, and then realign to that corrected reference. Hopefully, you should find that the number of new SNPs is drastically lower. You can try iterating that another time, then align sample 2 to your corrected reference.

**Zam** · 02-07-2012, 10:32 AM

I'm beginning to feel a bit self-conscious about the fact that I seem to do so much self-promotion on seqnaswers recently. Apologies to those of you bored of hearing from me, I'll be brief. Martine - there is a de novo assembly variant caller, called Cortex, designed for this kind of question, which works on haploid and diploid organisms.
See

CORTEX website

http://cortexassembler.sourceforge.net/index_cortex_var.html

and the paper here:

303 See Other

http://dx.doi.org/10.1038/ng.1028

It will assemble and look for differences directly, and spit out variants in
flank-allele1-allele2-flank format, which if you want can then be turned into VCF with respect to your X outgroup/related genome, or VCF with respect to a consensus.

If, as seems to be the case, you have a tonne of coverage, you can tell it to only pay attention to high quality reads.

**nickloman** · 02-07-2012, 10:45 AM

That sounds cool Zam, I definitely will be giving it a try for my own projects.

**zmartine** · 02-07-2012, 02:19 PM

Zam, this looks like just what I was searching for!

**MQ-BCBB** · 02-09-2012, 01:47 PM

error

Originally posted by nickloman View Post

That sounds cool Zam, I definitely will be giving it a try for my own projects.

Hello Zam, I am working with Martine in trying to run cortex. I ran the following line and received the error below:

command:
cortex_var_31_c1 --pe_list PyN67C_R1.fastq,PyN67C_R2.fastq --format FASTQ --quality_score_threshold 5 --remove_pcr_duplicates --remove_seq_errors --dump_binary PyN67C_R1-R2.ctx --kmer_size 21 --max_read_len 50

error:
Start loading @HWI-ST183:319:c02a2acxx:4:1101:1017:2087 1:Y:0: and @HWI-ST183:319:c02a2acxx:4:1101:1017:2087 2:Y:0:
cannot open file:@HWI-ST183:319:c02a2acxx:4:1101:1017:2087 1:Y:0:

**Zam** · 02-09-2012, 01:59 PM

Hi!
The --pe_list option wants a pair of filelists - ie two lists of FASTQ.
You have given it a pair of FASTQ. The manual goes through
some explicit examples. Also worth doing the first two examples
in the demo directory.
Cheers
Zam

**jkbonfield** · 02-10-2012, 01:31 AM

You could also look into ICORN http://icorn.sourceforge.net/ as this is an iterative correction of the reference to pull in more data and to find more SNPs. It should work well on closely related species, but I'm not sure how far it can push things.

Topics	Statistics	Last Post
Sequencing the Two-Toed Sloth Genome Reveals Jumping Genes Tied to Its Extreme Metabolism by SEQadmin2 Started by SEQadmin2, 06-09-2026, 11:58 AM	0 responses 23 views 0 reactions	Last Post by SEQadmin2 06-09-2026, 11:58 AM
A New Method Makes Hantavirus Genome Analysis Faster and More Accessible by SEQadmin2 Started by SEQadmin2, 06-05-2026, 10:09 AM	0 responses 29 views 0 reactions	Last Post by SEQadmin2 06-05-2026, 10:09 AM
A New Single-Cell Method Maps DNA-Protein Interactions by SEQadmin2 Started by SEQadmin2, 06-04-2026, 08:59 AM	0 responses 39 views 0 reactions	Last Post by SEQadmin2 06-04-2026, 08:59 AM
Long-Read RNA Sequencing Uncovers a Hidden Layer of Immune Cell Regulation by SEQadmin2 Started by SEQadmin2, 06-02-2026, 12:03 PM	0 responses 61 views 0 reactions	Last Post by SEQadmin2 06-02-2026, 12:03 PM

Unconfigured Ad

Assisted de novo genome assembly? Create new consensus mapping reads to reference?

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News