SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Compare de-novo transcriptome assembly to genome reference guided assembly IdoBar Bioinformatics 1 04-04-2014 01:28 AM
Low mapping of reads to trinity assembly horvathdp RNA Sequencing 3 11-19-2013 07:30 AM
Minia: ultra-low memory contigs assembly rayanc De novo discovery 15 04-01-2013 06:31 AM
Metagenomic assembly (filter low complexity reads) rsinha Bioinformatics 0 10-24-2012 01:24 PM

Reply
 
Thread Tools
Old 07-18-2014, 02:19 AM   #1
bioman1
Member
 
Location: US

Join Date: May 2012
Posts: 80
Default Low coverge genome assembly- suggestions

Hello,

We have sequenced genomic sequence of fruit crop through Hiseq 2000. The data are illumina paired-end fastq reads. The raw reads are filtered using trimmomatic with default settings and test with FastQC tool. After filteration, the total sequences are reduced from 505956290 to 418812062, with %GC 38 and sequence length is 101.It passed all test with warnings in per base sequence content and sequence duplication levels.

My single filtered fastq file size is 108Gbp and the genome size predicted through kmer genie and SGA preqc predicted to be around 2Gbp. The coverage is to be below 20x. Which genome assembler is good in assembling at low coverage?. What are the ways I can improve my genome assembly through computational approach?. Please let me know your suggestions and any pointer to journal papers which successed in assembling low coverage plant genome.
bioman1 is offline   Reply With Quote
Old 07-18-2014, 04:02 AM   #2
lorendarith
Guest
 

Posts: n/a
Default

Do you only have one single short fragment library which makes up these 20x coverage or is this a sum of different libraries?

If you want to assemble genomes with short read technologies it is crucial to have several libraries and library types of different insert (and maybe also read) lengths.

Is it not possible for you to sequence more or you really need to make something out of these 20x?
  Reply With Quote
Old 07-18-2014, 06:43 AM   #3
zatelmar
Junior Member
 
Location: Oxford

Join Date: Apr 2013
Posts: 1
Default

Hi - I would first consider running error correction of your reads (e.g. using musket). Are your reads paired? This would be important to improve the assembly.SOAPdenovo could be a good starting point, you could also try abyss, velvet, that are also relatively easy to install and run, though velvet could be quite memory demanding for a big dataset as yours. It is important to optimise the k-mer size, kmer genie should have suggested one already. However, with the coverage you have, you cannot expect a really high N50. Hope this helps.
zatelmar is offline   Reply With Quote
Old 07-18-2014, 06:46 AM   #4
bioman1
Member
 
Location: US

Join Date: May 2012
Posts: 80
Default

I have paired-end reads (read1.fastq, read2.fastq), which I interleaved as single.fastq file. This single fastq file has %GC 38 and sequence length is 101. This file has coverage about 20x. I cannot able to sequence more due to my boss budget, I would like to make something out these reads to make publication. Any suggestions, to make draft genome for publication.
bioman1 is offline   Reply With Quote
Reply

Tags
assembly, genome, lowcoverage

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 10:10 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2021, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO