Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • What approach for ~700Mb genome?

    Hi there. I've been wondering what approach would be suggested to take for a de novo sequencing project of an organism who's genome is ~600-700Mb?

    I have been thinking to do low coverage with 454 shotgun reads, possibly ~5X, and would use the Titanium XL+ seq kit for longer reads when it's launched (which supposedly could give ~1x coverage per run). Is 1 shotgun library ok for this, or would perhaps 2 be needed? Along with this, either 1 or 2 Illumina paired-end libraries, possibly with different insert sizes, sequenced on a HiSeq @2x100bp. Then combine this with either 454 paired-end reads (8K &/or 20K inserts) &/or Illumina mate pairs w/ a couple of different insert sizes (do they go larger than 5K?).

    At this point this is mostly for a proposal, with the realisation that after assembly of the initial sequence data more sequencing could be needed. Any suggestions would be greatly appreciated.

  • #2
    If I were to review your proposal I would not have much problems with it, looks good. I would add transcriptome sequencing, though (using the long 454 reads) to help with the gene prediction (annotation).

    Comment


    • #3
      Thanks flxlex. Would you go w/ Illumina mate pairs over 454 paired ends? I was wondering if the much higher number of reads would be better if there was already a decent (hopefully) backbone to the assembly.
      I did recently speak to someone from Max Planck who mentioned they have used a modified protocol that is essentially the 454 PE prep but with Illumina adaptors so that they could sequence 100bp reads of extra long inserts on Illumina.

      Comment


      • #4
        Originally posted by RCJK View Post
        Thanks flxlex. Would you go w/ Illumina mate pairs over 454 paired ends? I was wondering if the much higher number of reads would be better if there was already a decent (hopefully) backbone to the assembly.
        I did recently speak to someone from Max Planck who mentioned they have used a modified protocol that is essentially the 454 PE prep but with Illumina adaptors so that they could sequence 100bp reads of extra long inserts on Illumina.
        I would go with the 454 / illumina hybrid for long mate pairs - the number of reads is sooo much more, and scaffolding is all about the numbers. You lose a massive portion from mapping to the same contig, ambigious/no mapping on one or other end, cloned reads etc, and you really want a reasonable number of links e.g 5+ justifying each scaffold bridge.

        Our 'useful link' rate has been around 10% at best (2% at worst), so you can easily see the cost of getting 5+ links per bridge given the $ per read of 454.

        The only disadvantage of illumina in this game is that you don't usually see the linker sequence, so you can't be so confident that it was really a read spanning a splice junction.

        Comment


        • #5
          Originally posted by flxlex View Post
          If I were to review your proposal I would not have much problems with it, looks good. I would add transcriptome sequencing, though (using the long 454 reads) to help with the gene prediction (annotation).
          Agreed with the need for some RNA seq, but i'd probably go HighSeq for that, and save the 454 budget for the genome.

          Comment


          • #6
            Thanks Tony. Do you know if the protocol for the hybrid mate pair approach has been published or if it's available anywhere? I haven't come across it.

            Comment


            • #7
              I agree with all the above, although I am uncertain whether a de novo RNAseq assembly using short reads would be as useful as one based on (soon 750-800 bases) 454 reads...

              I also would like to see this hybrid protocol!

              Comment


              • #8
                Originally posted by RCJK View Post
                Thanks Tony. Do you know if the protocol for the hybrid mate pair approach has been published or if it's available anywhere? I haven't come across it.
                I don't know of a published protocol - i'm not sure either company would like to see it

                AFAIK, you follow the roche mate paired protocol for fragmentation / size selection / circularization / fragmentation / enrichment, then use this with the illumina PE protocol starting with adapter ligation (skipping the early fragmentation / size selection steps).

                And of course you have to trim the resulting reads if they contain the linker sequence.

                Comment


                • #9
                  Strobe sequencing on the PacBio might be another scaffolding method, though the early reports are that it doesn't yet work as well as one might expect. And informatics support is a bit lacking.

                  I've heard that some groups have had good luck with very long distance mate pair libraries using SOLiD kits, but the informatics support for these may be lacking. But, with the new Exact Call Chemistry you could get FASTQ out and probably plug this into existing assemblers.

                  Comment


                  • #10
                    Originally posted by flxlex View Post
                    I agree with all the above, although I am uncertain whether a de novo RNAseq assembly using short reads would be as useful as one based on (soon 750-800 bases) 454 reads...
                    Agreed - i was suggesting using the HiSeq transcriptome data for gene finding given the genome built from 454 / illumina hybrid data.

                    Using illumina for de-novo RNAseq (without a reference genome) would be pain itself.

                    Comment


                    • #11
                      Originally posted by krobison View Post
                      Strobe sequencing on the PacBio might be another scaffolding method, though the early reports are that it doesn't yet work as well as one might expect. And informatics support is a bit lacking.
                      Even without strobe sequencing just doing long fragment sequencing on PacBio for scaffolding coupled with high depth Illumina sequencing is another way to go.

                      PacBio's read lengths are appealing because they exceed the length of common repeats generally (at least for human--this is contingent upon the species in question of course, but I think most mammals at least have very few repeats above ~2kb in size). The downsides are that a PacBio RS might be hard for you to find or get time on at this juncture and the error rate, which is still relatively high.

                      PacBio has a program called AHA (A Hybrid Assembler -- http://www.pacificbiosciences.com/pr...are/algorithms) and there's a program called ALLPATHS from the Broad (http://www.broadinstitute.org/softwa...paths-lg/blog/). Both of these take Illumina short reads and couple them with PacBio long reads for assembly.

                      Then again, if you're talking about a conservative grant agency or something, it might best to go with 454 in the proposal since it's so tried-and-true.
                      Mendelian Disorder: A blogshare of random useful information for general public consumption. [Blog]
                      Breakway: A Program to Identify Structural Variations in Genomic Data [Website] [Forum Post]
                      Projects: U87MG whole genome sequence [Website] [Paper]

                      Comment


                      • #12
                        Thanks for all the suggestions. 454 and Illumina are what we have in house to perform the sequencing with and are also what the bioinformaticians are more familiar with, hence those are the two options.

                        Comment


                        • #13
                          Originally posted by RCJK View Post
                          Thanks for all the suggestions. 454 and Illumina are what we have in house to perform the sequencing with and are also what the bioinformaticians are more familiar with, hence those are the two options.
                          RCJK,

                          You have not said anything about the characteristics of the genome you are trying to sequence. Percentage of repeats, GC composition, number of chromosomes and availability of a related genome sequence could all influence the recommendations offered here.

                          Comment

                          Latest Articles

                          Collapse

                          • seqadmin
                            Current Approaches to Protein Sequencing
                            by seqadmin


                            Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                            04-04-2024, 04:25 PM
                          • seqadmin
                            Strategies for Sequencing Challenging Samples
                            by seqadmin


                            Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                            03-22-2024, 06:39 AM

                          ad_right_rmr

                          Collapse

                          News

                          Collapse

                          Topics Statistics Last Post
                          Started by seqadmin, 04-11-2024, 12:08 PM
                          0 responses
                          25 views
                          0 likes
                          Last Post seqadmin  
                          Started by seqadmin, 04-10-2024, 10:19 PM
                          0 responses
                          29 views
                          0 likes
                          Last Post seqadmin  
                          Started by seqadmin, 04-10-2024, 09:21 AM
                          0 responses
                          24 views
                          0 likes
                          Last Post seqadmin  
                          Started by seqadmin, 04-04-2024, 09:00 AM
                          0 responses
                          52 views
                          0 likes
                          Last Post seqadmin  
                          Working...
                          X