Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Feasibility questions

    As the starting point of a new project, I want to characterize the sequences of, perhaps, a dozen genes in a species of fish. While I could make a BAC library and try to fish out those genes from it, I’m thinking that I could obtain the genome sequence with an Illumina run and fish out the relevant gene fragments by alignment to orthologs in characterized fish genomes. Based on the information in the “Field guide to next-generation DNA sequencers” paper (http://www.ncbi.nlm.nih.gov/pubmed/21592312), I have made the following calculations:

    While not known, the likely size of the fish genome is ~ 1x10^9. A single lane (cell) on a GAIIx utilizing 150+150 paired reads should produce 1.3x10^10 bases of sequence thus providing me with a ~13x coverage. Similarly, on a HiSeq utilizing 100+100 paired reads ~1.2x10^10 bases of sequence should be produced per lane for ~ 12x coverage. Alternately, if HiSeq version 3 is available, a lane should yield 3.6x10^10 bases of sequence for ~36x coverage.

    So, my questions for those familiar with this technology are:

    1) Are these numbers realistic in terms of the output I can expect or would I likely see lower sequence yields?

    2) Will the indicated levels of coverage provide a high enough likelihood that I will be able to assemble each of the genes of interest (and hopefully immediately adjacent genes as well)?

    3) The paper cited above indicates a cost of about $3,000 to $3,500 at an academic core facility. My institution does not have such a core facility so I would have to utilize a commercial provider. Any idea of what the likely cost of my proposed sequencing would be? Any recommendations of facility I could use?

    4) I assume I would only provide the facility with some amount of genomic DNA and the facility would shear and prep the DNA samples. I usually use the Qiagen DNeasy tissue kit for genomic DNA isolation. Is this acceptable for Illumina sequencing or is there a recommended purification kit/process?

    5) Finally. Any recommendations for free software that would allow me to do the targeted alignments and assembly?

    Thanks in advance for any words of wisdom.

    Leos

  • #2
    Perhaps I asked too many questions. If I could try again with just one main question:

    For those with direct experience with Illumina sequencing – assuming a fish genome with 1 x 10^9 nucleotides, will paired end reads on a single lane provide, in practice, sufficient coverage that I can likely assemble individual protein coding genes using orthologs from genomes of other fish species?

    Comment


    • #3
      I have not done this myself, but there is certainly a lot of literature on de novo assembly of vertebrate genomes. The major challenge is assembly of the data; this is an evolving area and you will require a beefy machine. It may be possible to avoid this by aligning the reads to related species. The Software Wiki is a good place to start looking for programs that might suit you.

      Depending on your genes, you might find things easier doing RNA-Seq on an appropriate tissue, as that assembly is much easier. But if you aren't sure where they will be expressed or they are typically expressed only at very low levels, this may not help you much.

      If your genes are available from a closely related species, there are a number of papers showing hybridization selection as a useful way to snag orthologs (such as here).

      Comment


      • #4
        Originally posted by lkral View Post
        Perhaps I asked too many questions. If I could try again with just one main question:

        For those with direct experience with Illumina sequencing – assuming a fish genome with 1 x 10^9 nucleotides, will paired end reads on a single lane provide, in practice, sufficient coverage that I can likely assemble individual protein coding genes using orthologs from genomes of other fish species?
        The only answer is 'possibly'.

        If your orthologs are close enough on a nucleotide level, reference based alignment should give you what you need (though it's inherently biased).

        If you need to go de-novo, then genome complexity is a big factor - if these genes are from families of recently duplicated genes, you're less likely to get a comprehensive and correct assembly from a single lane. You may also have problems assembling introns so you'll end up with the exons of a single gene in several contigs. You might then need to do reference based scaffolding of de-novo contigs.

        It's a tough call between GAII and HiSeq - read length is very nice to have especially for de-novo, but quantity also matters.

        Comment


        • #5
          Hi,
          I am fresh for the forum n for Illumina sequencing too (So BackwarD!!!)! I hav done lots of sequencing by Big-Dye Chain termination. Thanks to ABI!
          I hope I will undersatnd n my brain wl digest this new technology (atleast for me). I may bore u in near by future with my questions. So kindly bare me.

          Keep sequencing,

          Comment


          • #6
            Originally posted by krobison View Post
            If your genes are available from a closely related species, there are a number of papers showing hybridization selection as a useful way to snag orthologs (such as here).
            Thanks for the reference. An approach worth exploring.

            Comment


            • #7
              Originally posted by tonybolger View Post

              If your orthologs are close enough on a nucleotide level, reference based alignment should give you what you need (though it's inherently biased).

              If you need to go de-novo, then genome complexity is a big factor - if these genes are from families of recently duplicated genes, you're less likely to get a comprehensive and correct assembly from a single lane. You may also have problems assembling introns so you'll end up with the exons of a single gene in several contigs. You might then need to do reference based scaffolding of de-novo contigs.
              The approach I was thinking of was to use tblastn to pull out the Illumina obtained sequences that correspond to the exons of the genes of interest, "pin" these in place in the proper order and in reference to these assemble the contigs and scaffolds. The genes I'm after are single copy genes.

              Comment


              • #8
                Originally posted by lkral View Post
                The approach I was thinking of was to use tblastn to pull out the Illumina obtained sequences that correspond to the exons of the genes of interest, "pin" these in place in the proper order and in reference to these assemble the contigs and scaffolds. The genes I'm after are single copy genes.
                Blast of any kind on raw illumina data is brave - the volume is so big, and tblast varieties are particularly slow.

                I would first try reference based assembly on the related genome or subset thereof, and see if you get enough coverage within the genes of interest and check that snps are consistent when they occur.

                Alternatively, denovo and then try to align the resulting contigs using tblast.

                Comment

                Latest Articles

                Collapse

                • seqadmin
                  Current Approaches to Protein Sequencing
                  by seqadmin


                  Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                  04-04-2024, 04:25 PM
                • seqadmin
                  Strategies for Sequencing Challenging Samples
                  by seqadmin


                  Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                  03-22-2024, 06:39 AM

                ad_right_rmr

                Collapse

                News

                Collapse

                Topics Statistics Last Post
                Started by seqadmin, 04-11-2024, 12:08 PM
                0 responses
                27 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 04-10-2024, 10:19 PM
                0 responses
                30 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 04-10-2024, 09:21 AM
                0 responses
                26 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 04-04-2024, 09:00 AM
                0 responses
                52 views
                0 likes
                Last Post seqadmin  
                Working...
                X