SEQanswers

Go Back   SEQanswers > Sequencing Technologies/Companies > Illumina/Solexa



Similar Threads
Thread Thread Starter Forum Replies Last Post
Two questions camelbbs RNA Sequencing 10 11-02-2011 09:06 AM
questions on FluxSimulator mrfox Bioinformatics 0 03-22-2011 06:59 AM
Questions about TopHat yjlui Bioinformatics 0 07-29-2010 09:44 AM
questions about samtools rjjwind Bioinformatics 0 07-26-2010 07:52 PM
PubMed: Assessing the feasibility of GS FLX Pyrosequencing for sequencing the Atlanti Newsbot! Literature Watch 0 08-30-2008 05:06 AM

Reply
 
Thread Tools
Old 08-26-2011, 12:49 PM   #1
lkral
Member
 
Location: Carrollton, GA

Join Date: May 2011
Posts: 27
Default Feasibility questions

As the starting point of a new project, I want to characterize the sequences of, perhaps, a dozen genes in a species of fish. While I could make a BAC library and try to fish out those genes from it, I’m thinking that I could obtain the genome sequence with an Illumina run and fish out the relevant gene fragments by alignment to orthologs in characterized fish genomes. Based on the information in the “Field guide to next-generation DNA sequencers” paper (http://www.ncbi.nlm.nih.gov/pubmed/21592312), I have made the following calculations:

While not known, the likely size of the fish genome is ~ 1x10^9. A single lane (cell) on a GAIIx utilizing 150+150 paired reads should produce 1.3x10^10 bases of sequence thus providing me with a ~13x coverage. Similarly, on a HiSeq utilizing 100+100 paired reads ~1.2x10^10 bases of sequence should be produced per lane for ~ 12x coverage. Alternately, if HiSeq version 3 is available, a lane should yield 3.6x10^10 bases of sequence for ~36x coverage.

So, my questions for those familiar with this technology are:

1) Are these numbers realistic in terms of the output I can expect or would I likely see lower sequence yields?

2) Will the indicated levels of coverage provide a high enough likelihood that I will be able to assemble each of the genes of interest (and hopefully immediately adjacent genes as well)?

3) The paper cited above indicates a cost of about $3,000 to $3,500 at an academic core facility. My institution does not have such a core facility so I would have to utilize a commercial provider. Any idea of what the likely cost of my proposed sequencing would be? Any recommendations of facility I could use?

4) I assume I would only provide the facility with some amount of genomic DNA and the facility would shear and prep the DNA samples. I usually use the Qiagen DNeasy tissue kit for genomic DNA isolation. Is this acceptable for Illumina sequencing or is there a recommended purification kit/process?

5) Finally. Any recommendations for free software that would allow me to do the targeted alignments and assembly?

Thanks in advance for any words of wisdom.

Leos
lkral is offline   Reply With Quote
Old 09-06-2011, 05:54 AM   #2
lkral
Member
 
Location: Carrollton, GA

Join Date: May 2011
Posts: 27
Default

Perhaps I asked too many questions. If I could try again with just one main question:

For those with direct experience with Illumina sequencing – assuming a fish genome with 1 x 10^9 nucleotides, will paired end reads on a single lane provide, in practice, sufficient coverage that I can likely assemble individual protein coding genes using orthologs from genomes of other fish species?
lkral is offline   Reply With Quote
Old 09-06-2011, 06:10 AM   #3
krobison
Senior Member
 
Location: Boston area

Join Date: Nov 2007
Posts: 747
Default

I have not done this myself, but there is certainly a lot of literature on de novo assembly of vertebrate genomes. The major challenge is assembly of the data; this is an evolving area and you will require a beefy machine. It may be possible to avoid this by aligning the reads to related species. The Software Wiki is a good place to start looking for programs that might suit you.

Depending on your genes, you might find things easier doing RNA-Seq on an appropriate tissue, as that assembly is much easier. But if you aren't sure where they will be expressed or they are typically expressed only at very low levels, this may not help you much.

If your genes are available from a closely related species, there are a number of papers showing hybridization selection as a useful way to snag orthologs (such as here).
krobison is offline   Reply With Quote
Old 09-07-2011, 02:25 AM   #4
tonybolger
Senior Member
 
Location: berlin

Join Date: Feb 2010
Posts: 156
Default

Quote:
Originally Posted by lkral View Post
Perhaps I asked too many questions. If I could try again with just one main question:

For those with direct experience with Illumina sequencing – assuming a fish genome with 1 x 10^9 nucleotides, will paired end reads on a single lane provide, in practice, sufficient coverage that I can likely assemble individual protein coding genes using orthologs from genomes of other fish species?
The only answer is 'possibly'.

If your orthologs are close enough on a nucleotide level, reference based alignment should give you what you need (though it's inherently biased).

If you need to go de-novo, then genome complexity is a big factor - if these genes are from families of recently duplicated genes, you're less likely to get a comprehensive and correct assembly from a single lane. You may also have problems assembling introns so you'll end up with the exons of a single gene in several contigs. You might then need to do reference based scaffolding of de-novo contigs.

It's a tough call between GAII and HiSeq - read length is very nice to have especially for de-novo, but quantity also matters.
tonybolger is offline   Reply With Quote
Old 09-07-2011, 03:03 AM   #5
swaseq
Junior Member
 
Location: India

Join Date: Sep 2011
Posts: 1
Default

Hi,
I am fresh for the forum n for Illumina sequencing too (So BackwarD!!!)! I hav done lots of sequencing by Big-Dye Chain termination. Thanks to ABI!
I hope I will undersatnd n my brain wl digest this new technology (atleast for me). I may bore u in near by future with my questions. So kindly bare me.

Keep sequencing,
swaseq is offline   Reply With Quote
Old 09-13-2011, 06:23 AM   #6
lkral
Member
 
Location: Carrollton, GA

Join Date: May 2011
Posts: 27
Default

Quote:
Originally Posted by krobison View Post
If your genes are available from a closely related species, there are a number of papers showing hybridization selection as a useful way to snag orthologs (such as here).
Thanks for the reference. An approach worth exploring.
lkral is offline   Reply With Quote
Old 09-13-2011, 06:34 AM   #7
lkral
Member
 
Location: Carrollton, GA

Join Date: May 2011
Posts: 27
Default

Quote:
Originally Posted by tonybolger View Post

If your orthologs are close enough on a nucleotide level, reference based alignment should give you what you need (though it's inherently biased).

If you need to go de-novo, then genome complexity is a big factor - if these genes are from families of recently duplicated genes, you're less likely to get a comprehensive and correct assembly from a single lane. You may also have problems assembling introns so you'll end up with the exons of a single gene in several contigs. You might then need to do reference based scaffolding of de-novo contigs.
The approach I was thinking of was to use tblastn to pull out the Illumina obtained sequences that correspond to the exons of the genes of interest, "pin" these in place in the proper order and in reference to these assemble the contigs and scaffolds. The genes I'm after are single copy genes.
lkral is offline   Reply With Quote
Old 09-13-2011, 06:49 AM   #8
tonybolger
Senior Member
 
Location: berlin

Join Date: Feb 2010
Posts: 156
Default

Quote:
Originally Posted by lkral View Post
The approach I was thinking of was to use tblastn to pull out the Illumina obtained sequences that correspond to the exons of the genes of interest, "pin" these in place in the proper order and in reference to these assemble the contigs and scaffolds. The genes I'm after are single copy genes.
Blast of any kind on raw illumina data is brave - the volume is so big, and tblast varieties are particularly slow.

I would first try reference based assembly on the related genome or subset thereof, and see if you get enough coverage within the genes of interest and check that snps are consistent when they occur.

Alternatively, denovo and then try to align the resulting contigs using tblast.
tonybolger is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 12:51 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO