Hi, all:
We sequenced two Bacillus strains for about 300X using Illumina Miseq paired-end sequencing. Our library is built by Nextera XT kit, which produces a wide range of fragment size, average fragment size is 1,000kb, range from 300 to 3,000kb.
I run Velvet assembler using these command lines: 1) ./bin/velvet_1.2.10/velveth out 33 -fastq -short ./P06_S1_L001_R1_001.fastq ./P06_S1_L001_R2_001.fastq
2) velvetg out -exp_cov auto -ins_length 1044 -scaffolding yes
The contig distribution is as below:
100:199 199670
200:299 9825
300:399 287
400:499 73
500:599 24
600:699 18
700:799 9
800:899 2
900:999 4
1000:1099 1
1100:1199 2
1200:1299 2
Extremely short contigs, expecially considering reads are 250bp long!
1) First I suspect the sequenced samples may be contaminated with other organism DNA. Therefore I mapped the reads to a close reference. 90% of reads can map by Bowtie with default parameters. We were afraid that sequencing may be biased, so we draw coverage depth distribution (shown in attached Rplot.jpeg). Though some loci are strongly biased and were sequenced 7000fold, they were just a small propotion.
2) We also draw k-mer distribution to screen possible abysmal repeat or sequencing errors (shown in attached histogram-k29.histo.pdf). But the figure looks normal.
What would you do if you meet these problems?
We sequenced two Bacillus strains for about 300X using Illumina Miseq paired-end sequencing. Our library is built by Nextera XT kit, which produces a wide range of fragment size, average fragment size is 1,000kb, range from 300 to 3,000kb.
I run Velvet assembler using these command lines: 1) ./bin/velvet_1.2.10/velveth out 33 -fastq -short ./P06_S1_L001_R1_001.fastq ./P06_S1_L001_R2_001.fastq
2) velvetg out -exp_cov auto -ins_length 1044 -scaffolding yes
The contig distribution is as below:
100:199 199670
200:299 9825
300:399 287
400:499 73
500:599 24
600:699 18
700:799 9
800:899 2
900:999 4
1000:1099 1
1100:1199 2
1200:1299 2
Extremely short contigs, expecially considering reads are 250bp long!
1) First I suspect the sequenced samples may be contaminated with other organism DNA. Therefore I mapped the reads to a close reference. 90% of reads can map by Bowtie with default parameters. We were afraid that sequencing may be biased, so we draw coverage depth distribution (shown in attached Rplot.jpeg). Though some loci are strongly biased and were sequenced 7000fold, they were just a small propotion.
2) We also draw k-mer distribution to screen possible abysmal repeat or sequencing errors (shown in attached histogram-k29.histo.pdf). But the figure looks normal.
What would you do if you meet these problems?
Comment