Unconfigured Ad

**lkral** · 09-06-2011, 05:54 AM

Perhaps I asked too many questions. If I could try again with just one main question:

For those with direct experience with Illumina sequencing – assuming a fish genome with 1 x 10^9 nucleotides, will paired end reads on a single lane provide, in practice, sufficient coverage that I can likely assemble individual protein coding genes using orthologs from genomes of other fish species?

**krobison** · 09-06-2011, 06:10 AM

I have not done this myself, but there is certainly a lot of literature on de novo assembly of vertebrate genomes. The major challenge is assembly of the data; this is an evolving area and you will require a beefy machine. It may be possible to avoid this by aligning the reads to related species. The Software Wiki is a good place to start looking for programs that might suit you.

Depending on your genes, you might find things easier doing RNA-Seq on an appropriate tissue, as that assembly is much easier. But if you aren't sure where they will be expressed or they are typically expressed only at very low levels, this may not help you much.

If your genes are available from a closely related species, there are a number of papers showing hybridization selection as a useful way to snag orthologs (such as here).

**tonybolger** · 09-07-2011, 02:25 AM

Originally posted by lkral View Post

Perhaps I asked too many questions. If I could try again with just one main question:

For those with direct experience with Illumina sequencing – assuming a fish genome with 1 x 10^9 nucleotides, will paired end reads on a single lane provide, in practice, sufficient coverage that I can likely assemble individual protein coding genes using orthologs from genomes of other fish species?

The only answer is 'possibly'.

If your orthologs are close enough on a nucleotide level, reference based alignment should give you what you need (though it's inherently biased).

If you need to go de-novo, then genome complexity is a big factor - if these genes are from families of recently duplicated genes, you're less likely to get a comprehensive and correct assembly from a single lane. You may also have problems assembling introns so you'll end up with the exons of a single gene in several contigs. You might then need to do reference based scaffolding of de-novo contigs.

It's a tough call between GAII and HiSeq - read length is very nice to have especially for de-novo, but quantity also matters.

**swaseq** · 09-07-2011, 03:03 AM

Hi,
I am fresh for the forum n for Illumina sequencing too (So BackwarD!!!)! I hav done lots of sequencing by Big-Dye Chain termination. Thanks to ABI!
I hope I will undersatnd n my brain wl digest this new technology (atleast for me). I may bore u in near by future with my questions. So kindly bare me.

Keep sequencing,

**lkral** · 09-13-2011, 06:23 AM

Originally posted by krobison View Post

If your genes are available from a closely related species, there are a number of papers showing hybridization selection as a useful way to snag orthologs (such as here).

Thanks for the reference. An approach worth exploring.

**lkral** · 09-13-2011, 06:34 AM

Originally posted by tonybolger View Post

If your orthologs are close enough on a nucleotide level, reference based alignment should give you what you need (though it's inherently biased).

If you need to go de-novo, then genome complexity is a big factor - if these genes are from families of recently duplicated genes, you're less likely to get a comprehensive and correct assembly from a single lane. You may also have problems assembling introns so you'll end up with the exons of a single gene in several contigs. You might then need to do reference based scaffolding of de-novo contigs.

The approach I was thinking of was to use tblastn to pull out the Illumina obtained sequences that correspond to the exons of the genes of interest, "pin" these in place in the proper order and in reference to these assemble the contigs and scaffolds. The genes I'm after are single copy genes.

**tonybolger** · 09-13-2011, 06:49 AM

Originally posted by lkral View Post

The approach I was thinking of was to use tblastn to pull out the Illumina obtained sequences that correspond to the exons of the genes of interest, "pin" these in place in the proper order and in reference to these assemble the contigs and scaffolds. The genes I'm after are single copy genes.

Blast of any kind on raw illumina data is brave - the volume is so big, and tblast varieties are particularly slow.

I would first try reference based assembly on the related genome or subset thereof, and see if you get enough coverage within the genes of interest and check that snps are consistent when they occur.

Alternatively, denovo and then try to align the resulting contigs using tblast.

Topics	Statistics	Last Post
New AI Model Captures Long-Range Genomic Signals to Improve RNA Splice Site Prediction by SEQadmin2 Started by SEQadmin2, 06-30-2026, 05:37 AM	0 responses 11 views 0 reactions	Last Post by SEQadmin2 06-30-2026, 05:37 AM
Large-Scale Protein Screen Uncovers Hidden Regulators of Alternative Polyadenylation by SEQadmin2 Started by SEQadmin2, 06-26-2026, 11:10 AM	0 responses 18 views 0 reactions	Last Post by SEQadmin2 06-26-2026, 11:10 AM
Whole-Genome Sequencing Traces Faroe Islands Ancestry to a North Atlantic Founder Population by SEQadmin2 Started by SEQadmin2, 06-17-2026, 06:09 AM	0 responses 52 views 0 reactions	Last Post by SEQadmin2 06-17-2026, 06:09 AM
Sequencing the Two-Toed Sloth Genome Reveals Jumping Genes Tied to Its Extreme Metabolism by SEQadmin2 Started by SEQadmin2, 06-09-2026, 11:58 AM	0 responses 111 views 0 reactions	Last Post by SEQadmin2 06-09-2026, 11:58 AM

Unconfigured Ad

Feasibility questions

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News