Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • 454 assembly of repetitive region

    Hello,

    I have been given a set of 454 data which was assembled using newbler. It is a repetitive region from a relative of grape, with lots of transposons.
    The current assembly has misassemblies due to the repeats, and I'd like take the original reads, clean them up and reassemble.

    There are about 40000 reads from 454 and maybe 2000 reads from Sanger sequencing of the two BACs that cover this region, where I might be able to use paired sequence information. I'm told by the sequencing facility there is no paired sequence information to take advantage of with 454 data.

    I was thinking about trying to use RepeatMasker with the known vectors and plant repeat database and then using CAP3 or PCAP to assemble the result.

    (1) Does anyone know which of the publicly available assembly engines works best on 454 data?
    (2) If you recommend using CAP3, which parameter settings would you modify from the default and what values would you use?
    (3) Are there any other sequence cleaning utilities you'd recommend?
    (4) When using RepeatMasker, is the cross_match engine better, or would you use RMblast?

    Thanks,
    Steph

  • #2
    In your's case I would suggest using phrap or celera, and rising the minmatch value to 25-40. The standard phredPhrap will need some serious tweaking. (especially on the sff import side).

    Also make yourself a draft assembly blast database (for repeats borders identification), and extract repeats sequences from draft fasta or ace file, then use them as "vector" sequence (if you want to blank it out).
    I don't recommend newbler assembly for such blast DB - newbler usually clips off repeats borders at the consensus level.

    Let me know, If you need any more help.
    Last edited by Markiyan; 09-22-2010, 05:09 AM.

    Comment

    Latest Articles

    Collapse

    • seqadmin
      Current Approaches to Protein Sequencing
      by seqadmin


      Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
      04-04-2024, 04:25 PM
    • seqadmin
      Strategies for Sequencing Challenging Samples
      by seqadmin


      Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
      03-22-2024, 06:39 AM

    ad_right_rmr

    Collapse

    News

    Collapse

    Topics Statistics Last Post
    Started by seqadmin, 04-11-2024, 12:08 PM
    0 responses
    30 views
    0 likes
    Last Post seqadmin  
    Started by seqadmin, 04-10-2024, 10:19 PM
    0 responses
    32 views
    0 likes
    Last Post seqadmin  
    Started by seqadmin, 04-10-2024, 09:21 AM
    0 responses
    28 views
    0 likes
    Last Post seqadmin  
    Started by seqadmin, 04-04-2024, 09:00 AM
    0 responses
    53 views
    0 likes
    Last Post seqadmin  
    Working...
    X