Seqanswers Leaderboard Ad

**mastal** · 03-17-2014, 02:51 AM

Which assembler to use depends partly on what type of data you have, Illumina, IonTorrent, 454, etc., and whether there is a reference genome for the species you are working on.

Popular assemblers include velvet, Abyss, Soapdenovo, Mira, 454's Newbler, but there are many others.

**tw7649116** · 03-17-2014, 02:54 AM

Originally posted by mastal View Post

Which assembler to use depends partly on what type of data you have, Illumina, IonTorrent, 454, etc., and whether there is a reference genome for the species you are working on.

Popular assemblers include velvet, Abyss, Soapdenovo, Mira, 454's Newbler, but there are many others.

Thank you for your Re. Im using the illumina pair-end fastq data! It's about rice, so, I have the reference!

**gringer** · 03-17-2014, 12:23 PM

You can use samtools / bcftools to do this. I have modified the vcftools vcf2fq script (link here) to work on VCF files that only contain variant information, as well as properly dealing with INDELs (i.e. changing the sequence, rather than just masking out INDEL regions in lower case). You can use it with a small modification to that specified in the SAMtools man page:

Code:

$ samtools mpileup -L ${maxdepth} -uf \
     chloroplast_ref.fasta results/sample1.bam | \
     bcftools view -v -g - > sample1.vcf
$ ./vcf2fq.pl -Q 20 -L 20 -f chloroplast_ref.fasta output.vcf \
     > con_sample1.fastq

Basic help for the vcf2fq script:

Code:

Usage:   vcf2fq.pl [options] <variant-sites.vcf>

Options: -d INT    minimum depth                   [3]
         -D INT    maximum depth                   [100000]
         -Q INT    min RMS mapQ                    [10]
         -L INT    min INDEL Qual                  [100]
         -f FASTA  file with reference sequence(s) []

Note: without a reference sequence, sites that are not specified
      in the VCF file will be filled with Ns

**tw7649116** · 03-17-2014, 04:04 PM

Thank you very much!
I will try first.

**concitacantarella** · 04-03-2014, 02:04 AM

Velvet inquiry about assembling from multiple libraries

Hi,
I' m trying to assembly a plastidial genome (about 160000 bp) from a whole genome NGS experiment.

I installed Velvet using these configuration parameters:

make 'CATEGORIES=20' 'MAXKMERLENGTH=77' 'LONGSEQUENCES=1' 'OPENMP=1'

I have 4 paired-end libraries, with different insert-lengths.

First I aligned reads to reference and extracted mapping reads.
Second I tried to do de-novo assembling by Velvet:

velveth velvet/ 43,69,2 -fastq -separate
-shortPaired
mappedReads/paired/lib_400_paired_forward.fastq mappedReads/paired/lib_400_paired_reverse.fastq
-shortPaired2
mappedReads/paired/lib_500_paired_forward.fastq mappedReads/paired/lib_500_paired_reverse.fastq
-shortPaired3
mappedReads/paired/lib_700_paired_forward.fastq mappedReads/paired/lib_700_paired_reverse.fastq
-shortPaired4
mappedReads/paired/lib_700_BP_paired_forward.fastq mappedReads/paired/lib_700_BP_paired_reverse.fastq
-fastq -short
mappedReads/unpaired/lib_400_unpaired.fastq
mappedReads/unpaired/lib_500_unpaired.fastq
mappedReads/unpaired/lib_700_unpaired.fastq
mappedReads/unpaired/lib_700_BP_unpaired.fastq
mappedReads/unpaired/lib_single.fastq

and for each kmer:

velvetg velvet/_43/ -exp_cov auto -cov_cutoff auto -min_contig_lgth 200 -ins_length 250 -ins_length2 400 -ins_length3 600 -ins_length4 600

Unfortunately N50 value for each kmer is identical to kmer value.
(e.g. kmer 59 Result: Final graph has 24686 nodes and n50 of 59, max 897, total 781924, using 2297000/54940402 reads )

Could anyone help me to understand what am I doing wrong?

**mastal** · 04-03-2014, 02:47 AM

How long are your reads? Maybe the range of kmer values you tried aren't the best ones for velvet.

What kind of coverage do you have?
You could try using velvetk to calculate what kind of coverage you would get for different kmers.

How closely related is the reference genome that you are aligning your reads to?

Maybe you could try a reference-assisted assembly with velvet instead of de novo.

**concitacantarella** · 04-03-2014, 04:52 AM

> How long are your reads? Maybe the range of kmer values you tried aren't the best ones for velvet.

Reads are 100 bp. The range is 43 - 69, I think it is large enough

> What kind of coverage do you have?
About 57 millions of reads mapping the reference, this is a very large coverage for plastidial genome (about 160000 bp)

**mastal** · 04-03-2014, 06:39 AM

In that case the problem could be that you have very high coverage, velvet doesn't do so well with high coverage.

**concitacantarella** · 04-04-2014, 04:04 AM

Thanks for suggestion.
It has been very useful.
Setting the right coverage cutoff, I obtained a good result.

Originally posted by mastal View Post

In that case the problem could be that you have very high coverage, velvet doesn't do so well with high coverage.

**laisuliang** · 07-17-2014, 07:17 PM

Originally posted by concitacantarella View Post

Thanks for suggestion.
It has been very useful.
Setting the right coverage cutoff, I obtained a good result.

Hi concitacantarella!
Can you post the velvet parameters what you used, cause I have the same problem like yours. Thanks!

**concitacantarella** · 07-18-2014, 12:16 AM

hi,
I used velvet-estimate-exp_cov.pl to evaluate the coverage-cutoff.

this was the runned command line:

velvetg _73/ -exp_cov auto -cov_cutoff 500 -min_contig_lgth 200 -ins_length 280

**laisuliang** · 07-20-2014, 07:34 AM

Originally posted by concitacantarella View Post

hi,
I used velvet-estimate-exp_cov.pl to evaluate the coverage-cutoff.

this was the runned command line:

velvetg _73/ -exp_cov auto -cov_cutoff 500 -min_contig_lgth 200 -ins_length 280

Thanks! I am using these parameters and it work great.

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 24 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 25 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 21 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 52 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

Hello,everybody! Chloroplast genome assembly Q!!

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News