SEQanswers

Go Back   SEQanswers > General



Similar Threads
Thread Thread Starter Forum Replies Last Post
Coverage Calculation for Whole Genome Sequencing on GA II X ron128 Bioinformatics 3 01-09-2013 11:02 PM
Low coverage sequencing, which strategy? Ole De novo discovery 3 02-29-2012 09:21 AM
low 454 coverage combined with high solexa coverage strob Bioinformatics 7 10-07-2010 10:14 AM
Adequate coverage for genome wide pair end sequencing Chien-Yuan Chen Bioinformatics 0 03-07-2009 10:49 AM

Reply
 
Thread Tools
Old 05-02-2014, 07:30 AM   #1
wrch
Junior Member
 
Location: india

Join Date: Jan 2014
Posts: 7
Default problem with low coverage genome sequencing

hi ,
I am doing a whole genome sequencing of a fungi of size ~20 mb. I have done a 500 bp library paired end 2*75 bp illumina miseq sequencing with 50x coverage. But several genes (~8% core genes) are remained uncovered. It is not an assembly issue. I have checked several short fragments of the genes in the raw fastq file and they are absent.Now what should I do to cover the maximum genes? Will a mate pair library with larger inserts help? or sequencing with ion torrent with its larger product size would be a better option?
wrch is offline   Reply With Quote
Old 05-02-2014, 08:24 AM   #2
Brian Bushnell
Super Moderator
 
Location: Walnut Creek, CA

Join Date: Jan 2014
Posts: 2,707
Default

If it's a platform-specific bias then a different platform might help. Illumina has problems with super-high GC areas, for example. Also, overamplification will increase coverage variability.

But if the uncovered genes are completely uncovered, maybe they don't actually exist in your sample? And anyway, I don't understand what you mean by "I have checked several short fragments of the genes in the raw fastq file and they are absent". What exactly did you do?
Brian Bushnell is offline   Reply With Quote
Old 05-02-2014, 08:53 AM   #3
wrch
Junior Member
 
Location: india

Join Date: Jan 2014
Posts: 7
Default

actually i have some gene sequences from pcr products. So the genes are there. i have searched short portions from those say 35 bases in the raw sequences and have not found them. Also there are several core genes absent so i think it is a coverage related problem.Can you plz suggest between ion torrent and large mate pair library which approach would be better?
wrch is offline   Reply With Quote
Old 05-02-2014, 08:58 AM   #4
Brian Bushnell
Super Moderator
 
Location: Walnut Creek, CA

Join Date: Jan 2014
Posts: 2,707
Default

I don't know much about the Ion Torrent platform, so all I can say is "maybe". I would not expect a long-mate-pair library to fill in low-coverage areas of a fragment library. PacBio data has a very unbiased coverage distribution, though, so if you have access to it, I would try that.
Brian Bushnell is offline   Reply With Quote
Old 05-02-2014, 10:56 AM   #5
Wallysb01
Senior Member
 
Location: San Francisco, CA

Join Date: Feb 2011
Posts: 286
Default

Looks like you’re just hitting that random chance of uncovered regions from the shotgun strategy. Mate pair libraries are for scaffolding so don’t directly help in recovering genes, but they can do some things that might still be of benefit. That is, they will allow you to close the gaps between the generated scaffolds from your original PE reads. While some regions might not have assembled in your contigs from the PE reads, more will come up with gap filling after scaffolding. With the illumina mate pairs you’ll have to understand too that few cores will do that library prep. So you’ll have to call around to find a good core. I’ve heard Hudson Alpha’s core is good and does this, but its been a while since I was part of pricing this out.

Given that your genome is 40mb, you should have a lot of options if you have the money to do additional sequencing. One easy options could be to track down a MiSeq run with the 2x250 reads and use roughly at 400bp library size. Another good option would be PacBio, as suggested above, but you’ll have to figure out the best approach for combining the PacBio and Illumina data for your use case. Mixing data types is something that hasn’t been worked out as thoroughly as an all Illumina assembly, but there are good reasons to go with different types of technologies and PacBio in particularly.

PacBio can be used to create an assembly all by itself if you have the money to get the require coverage. If each SMRT cell produces about 300Mbp, you’d probably want 3 or 4 of them if you’re going to take the PacBio only assembly approach (thinking you should get 50x coverage or so). Then you could use your current PE illumina reads to just correct errors via alignment to your draft genome using something like iCORN which comes with PAGIT. Now, PacBio also has the ability to be corrected by your Illumina data before assembly, and this requires less PacBio read depth, but if you’re concerned you don’t have what you want in your Illumina data, then you might be left with low quality (from PacBio only regions) or still gaps in your assembly.

So, I’d think you have several options which could be mixed and matched depending on budget and how “finished” you ultimately want this genome to be.

1) High coverage PacBio (1 library prep and 3-4 SMRT cells ~$1250-$2000 depending on pricing available to you)
2) Low coverage PacBio (1 library prep and 1 SMRT cell ~$500-$1000)
3) Additional PE Illumina sequencing (1 MiSeq lane, 1 library prep $1500)
4) Illumina mate pair library (1 MiSeq lane 1 library prep $1500)

Now I’ve only dealt with vertebrate genomes, so I might be off, but personally I’d lean towards a high coverage PacBio, potentially with Illumina used for corrections after the fact. This would be especially true if you have a university core that will do preps for ~$400 and sequencing for $250 per SMRT cell.
Wallysb01 is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 05:44 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO