Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • lkral
    Member
    • May 2011
    • 27

    Feasibility questions

    As the starting point of a new project, I want to characterize the sequences of, perhaps, a dozen genes in a species of fish. While I could make a BAC library and try to fish out those genes from it, I’m thinking that I could obtain the genome sequence with an Illumina run and fish out the relevant gene fragments by alignment to orthologs in characterized fish genomes. Based on the information in the “Field guide to next-generation DNA sequencers” paper (http://www.ncbi.nlm.nih.gov/pubmed/21592312), I have made the following calculations:

    While not known, the likely size of the fish genome is ~ 1x10^9. A single lane (cell) on a GAIIx utilizing 150+150 paired reads should produce 1.3x10^10 bases of sequence thus providing me with a ~13x coverage. Similarly, on a HiSeq utilizing 100+100 paired reads ~1.2x10^10 bases of sequence should be produced per lane for ~ 12x coverage. Alternately, if HiSeq version 3 is available, a lane should yield 3.6x10^10 bases of sequence for ~36x coverage.

    So, my questions for those familiar with this technology are:

    1) Are these numbers realistic in terms of the output I can expect or would I likely see lower sequence yields?

    2) Will the indicated levels of coverage provide a high enough likelihood that I will be able to assemble each of the genes of interest (and hopefully immediately adjacent genes as well)?

    3) The paper cited above indicates a cost of about $3,000 to $3,500 at an academic core facility. My institution does not have such a core facility so I would have to utilize a commercial provider. Any idea of what the likely cost of my proposed sequencing would be? Any recommendations of facility I could use?

    4) I assume I would only provide the facility with some amount of genomic DNA and the facility would shear and prep the DNA samples. I usually use the Qiagen DNeasy tissue kit for genomic DNA isolation. Is this acceptable for Illumina sequencing or is there a recommended purification kit/process?

    5) Finally. Any recommendations for free software that would allow me to do the targeted alignments and assembly?

    Thanks in advance for any words of wisdom.

    Leos
  • lkral
    Member
    • May 2011
    • 27

    #2
    Perhaps I asked too many questions. If I could try again with just one main question:

    For those with direct experience with Illumina sequencing – assuming a fish genome with 1 x 10^9 nucleotides, will paired end reads on a single lane provide, in practice, sufficient coverage that I can likely assemble individual protein coding genes using orthologs from genomes of other fish species?

    Comment

    • krobison
      Senior Member
      • Nov 2007
      • 734

      #3
      I have not done this myself, but there is certainly a lot of literature on de novo assembly of vertebrate genomes. The major challenge is assembly of the data; this is an evolving area and you will require a beefy machine. It may be possible to avoid this by aligning the reads to related species. The Software Wiki is a good place to start looking for programs that might suit you.

      Depending on your genes, you might find things easier doing RNA-Seq on an appropriate tissue, as that assembly is much easier. But if you aren't sure where they will be expressed or they are typically expressed only at very low levels, this may not help you much.

      If your genes are available from a closely related species, there are a number of papers showing hybridization selection as a useful way to snag orthologs (such as here).

      Comment

      • tonybolger
        Senior Member
        • Feb 2010
        • 156

        #4
        Originally posted by lkral View Post
        Perhaps I asked too many questions. If I could try again with just one main question:

        For those with direct experience with Illumina sequencing – assuming a fish genome with 1 x 10^9 nucleotides, will paired end reads on a single lane provide, in practice, sufficient coverage that I can likely assemble individual protein coding genes using orthologs from genomes of other fish species?
        The only answer is 'possibly'.

        If your orthologs are close enough on a nucleotide level, reference based alignment should give you what you need (though it's inherently biased).

        If you need to go de-novo, then genome complexity is a big factor - if these genes are from families of recently duplicated genes, you're less likely to get a comprehensive and correct assembly from a single lane. You may also have problems assembling introns so you'll end up with the exons of a single gene in several contigs. You might then need to do reference based scaffolding of de-novo contigs.

        It's a tough call between GAII and HiSeq - read length is very nice to have especially for de-novo, but quantity also matters.

        Comment

        • swaseq
          Junior Member
          • Sep 2011
          • 1

          #5
          Hi,
          I am fresh for the forum n for Illumina sequencing too (So BackwarD!!!)! I hav done lots of sequencing by Big-Dye Chain termination. Thanks to ABI!
          I hope I will undersatnd n my brain wl digest this new technology (atleast for me). I may bore u in near by future with my questions. So kindly bare me.

          Keep sequencing,

          Comment

          • lkral
            Member
            • May 2011
            • 27

            #6
            Originally posted by krobison View Post
            If your genes are available from a closely related species, there are a number of papers showing hybridization selection as a useful way to snag orthologs (such as here).
            Thanks for the reference. An approach worth exploring.

            Comment

            • lkral
              Member
              • May 2011
              • 27

              #7
              Originally posted by tonybolger View Post

              If your orthologs are close enough on a nucleotide level, reference based alignment should give you what you need (though it's inherently biased).

              If you need to go de-novo, then genome complexity is a big factor - if these genes are from families of recently duplicated genes, you're less likely to get a comprehensive and correct assembly from a single lane. You may also have problems assembling introns so you'll end up with the exons of a single gene in several contigs. You might then need to do reference based scaffolding of de-novo contigs.
              The approach I was thinking of was to use tblastn to pull out the Illumina obtained sequences that correspond to the exons of the genes of interest, "pin" these in place in the proper order and in reference to these assemble the contigs and scaffolds. The genes I'm after are single copy genes.

              Comment

              • tonybolger
                Senior Member
                • Feb 2010
                • 156

                #8
                Originally posted by lkral View Post
                The approach I was thinking of was to use tblastn to pull out the Illumina obtained sequences that correspond to the exons of the genes of interest, "pin" these in place in the proper order and in reference to these assemble the contigs and scaffolds. The genes I'm after are single copy genes.
                Blast of any kind on raw illumina data is brave - the volume is so big, and tblast varieties are particularly slow.

                I would first try reference based assembly on the related genome or subset thereof, and see if you get enough coverage within the genes of interest and check that snps are consistent when they occur.

                Alternatively, denovo and then try to align the resulting contigs using tblast.

                Comment

                Latest Articles

                Collapse

                • GATTACAT
                  Reply to Nine Things a Sample Prep Scientist Thinks About Before Sequencing
                  by GATTACAT
                  Love this - good data definitely starts from good input, and poor input can only give relatively poor data. I particularly like the mention of Nanodrop/absorbance based methods for quantification. It's such a toss up if you'll get an accurate reading or what amounts to a randomly generated number, and a lot of library/sequencing related issues can be traced back to poor quant.
                  Yesterday, 11:43 AM
                • SEQadmin2
                  Nine Things a Sample Prep Scientist Thinks About Before Sequencing
                  by SEQadmin2


                  I’m not a sequencing expert. I’m a purification scientist who uses NGS to evaluate workflows my group develops. With this perspective, we think about the sample first and the NGS workflow second. The sequencer is an exceptionally honest reporter, but it can only report on what you give it, so whether you get clean, interpretable data from an NGS workflow is largely determined before you begin.

                  Here are nine questions we think about, in roughly the order they matter, before...
                  06-18-2026, 07:11 AM
                • SEQadmin2
                  From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
                  by SEQadmin2


                  Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


                  The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
                  ...
                  06-02-2026, 10:05 AM

                ad_right_rmr

                Collapse

                News

                Collapse

                Topics Statistics Last Post
                Started by SEQadmin2, 06-30-2026, 05:37 AM
                0 responses
                11 views
                0 reactions
                Last Post SEQadmin2  
                Started by SEQadmin2, 06-26-2026, 11:10 AM
                0 responses
                18 views
                0 reactions
                Last Post SEQadmin2  
                Started by SEQadmin2, 06-17-2026, 06:09 AM
                0 responses
                52 views
                0 reactions
                Last Post SEQadmin2  
                Started by SEQadmin2, 06-09-2026, 11:58 AM
                0 responses
                111 views
                0 reactions
                Last Post SEQadmin2  
                Working...