Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • De novo assembly of 454 exome sequencing

    Hi people, just wanted some advice on a situation.

    I have a dataset of 454 genome sequencing that has been enriched for gene-rich regions by using biotinylated RNA as a bait .. i.e. anything that sticks to the RNA gets selected for. This was then sequenced on a 454.

    I know that normally people would use a reference genome, but there isn't one and won't be one for the forseeable future. I also can't map to a transcriptome because there isn't one and because the point of the exercise is to improve a bunch of gene models I have, which are based on ESTs and are thus likely to be missing introns.

    So I am stuck with de novo assembly. Initial assemblies I have done (using velvet) are not awful but the sequencing is obviously highly hetereogeneous in the coverage level. I know a lot of assembly programs assume homogeneous coverage (eg. CABOG states flat-out that it's rubbish for exome sequencing & other hetereogeneous coverage sequencing).

    I am basically just preprocessing on quality scores, splitting the reads where there's a probable homopolymer caused by 454-ness, and doing assemblies in velvet using a very low estimated coverage (across a range of kmers).

    Then I assess the quality of the assembly based on whether a bunch of known-to-be-good gene models are in there, since N50 etc isn't really applicable in this case.. or is it?

    Is this an OK approach? Any comments / suggestions appreciated.

    Cheers!

  • #2
    Try using MIRA (http://www.chevreux.org/projects_mira.html). I got much better de novo assemblies using it than Velvet.

    Comment


    • #3
      Is there a reason you are not using Roche's gsAssembler (a.k.a. Newbler) for this project? I've tried a number of other de novo assemblers and found Roche's own software produces better results with 454 data. Depending on the size of your data set, you may also find MIRA useful.

      Comment


      • #4
        Originally posted by SES View Post
        Is there a reason you are not using Roche's gsAssembler (a.k.a. Newbler) for this project? I've tried a number of other de novo assemblers and found Roche's own software produces better results with 454 data. Depending on the size of your data set, you may also find MIRA useful.
        The latest release of MIRA handles very large data sets much better than previous releases.

        Comment


        • #5
          Originally posted by JackieBadger View Post
          The latest release of MIRA handles very large data sets much better than previous releases.
          That is good to know. What is "very large" in this case (I haven't consulted the MIRA docs in a while)?

          Comment


          • #6
            Thanks.. yes, I'll try MIRA. The main reason I used velvet is because 1) I'm familiar with it and 2) I'm familiar with how to preprocess fastq files based on quality/homopolymer runs, but not so much with .sff files. Which, having read the MIRA manual, I realise is not a problem since it takes fasta and fasta.qual files fine, doh!. There was a Newbler assembly run already, I think it was rubbish mainly because it treated all 4 runs as equally good, which they were not. But I may also throw that into the mix.

            Generally speaking, I was wondering if any assemblers handled very heterogeneous coverage better? Anyway thanks for responses, I've got plenty to work on!

            Comment


            • #7
              There are assemblers designed for very heterogeneous coverage such as MetaVelvet but I don't think they'd be useful in your case.

              Comment


              • #8
                FYI In case anyone else encounters this, I have used MIRA3, which seems to have produced some good results. It has specific switches for heterogeneous coverage: using "est" in the -job switch (i.e. telling MIRA3 to assemble as if it's an EST sequencing project) or alternatively setting the [uniform_read_distribution(urd)=on|yes|1, off|no|0] within the -AS (-ASSEMBLY) parameter section as "no". By my reading of the manual, either switch will tell MIRA3 to stop assuming that the coverage is homogeneous. I simply used the -est switch, even though sequences from EST sequencing are likely to be even more heterogeneous in coverage than what I have (which is exome sequencing). Seemed to work OK.

                Comment


                • #9
                  There are multiple papers on the subject. Here is just one:http://www.biomedcentral.com/1471-2164/11/571

                  Comment


                  • #10
                    Thanks. That's a paper on transcriptome data though. Exome sequencing will be far less heterogeneous, although still heterogeneous enough that just throwing it into a normal genome assembly pipeline would be inadvisable. I agree there are obvious similarities. This special edition: http://genomebiology.com/content/12/9 is probably more relevant.

                    Comment

                    Latest Articles

                    Collapse

                    • seqadmin
                      Current Approaches to Protein Sequencing
                      by seqadmin


                      Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                      04-04-2024, 04:25 PM
                    • seqadmin
                      Strategies for Sequencing Challenging Samples
                      by seqadmin


                      Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                      03-22-2024, 06:39 AM

                    ad_right_rmr

                    Collapse

                    News

                    Collapse

                    Topics Statistics Last Post
                    Started by seqadmin, 04-11-2024, 12:08 PM
                    0 responses
                    18 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 04-10-2024, 10:19 PM
                    0 responses
                    22 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 04-10-2024, 09:21 AM
                    0 responses
                    16 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 04-04-2024, 09:00 AM
                    0 responses
                    47 views
                    0 likes
                    Last Post seqadmin  
                    Working...
                    X