Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
This topic is closed.
X
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Hello from Dundee! (Scotland!)

    Hi,

    My name is Dan Bolser, and I recently started a Post Doc. in Dundee working on the potato genome sequencing project. Although this is a really exciting and far reaching project, I should say that I am very new to the field of sequencing!

    I have a degree in biochemistry, and subsequently I did a masters, PhD and Post Doc. in structural bioinformatics and interactomics. So, although I have studied 'DNA' and molecular genetics during my degree, its all a bit hazy these days ;-)

    The first 'bulk' of sequencing data that we have here in the UK / Ireland consortium (Chromosome 4) has been generated from several 'interesting' BACs using "capillary based Big Dye chemistries" (ABI 3730 DNA Analyzers). There are a few things that I am keen to learn more about. (I'll ask similar questions in the other appropriate forums, but I may as well list the main issues that I am facing as a beginner.)


    1) What is "capillary based Big Dye chemistries" (ABI 3730 DNA Analyzers)? ;-)

    OK, its not that bad, I do have the broad idea, but where can I find out more? What books should I be reading? Which websites have the best information? How does this kind of sequencing compare to NextGen sequencing in terms of speed, throughput, cost, coverage, de-novo 'assemble-ability', etc?


    2) What kinds of questions should I be asking of the sequence data? So far I just have a bunch of chromatogram files (ABI format) broken down into groups by BAC. I think I need to know (or it would be useful to know) the following basic things about the data:

    * sub-cloning (sequencing) vector sequence
    * cloning vector sequence
    * insert size
    * BAC size
    * ...

    What else should I be asking (before starting the assembly)?


    3) What kind of assembly pipelines are routinely used on this kind of data?

    Currently I am playing with phred/phrap, but perhaps this is considered old hat? Not that I want (or need) to be pushing the bleeding edge, but I would like to be doing something relatively 'standard'. For this kind of sequencing data, is phred/phrap more or less a popular choice?


    4) Once I have run (vanilla) phred/phrap, how should I be visualizing the results? I had a look at consed, but it gives me very detailed views of the contigs. I would like to be able to compare different sets of contigs in 'overview'. While I think it should be relatively easy to parse the phred/phrap output and produce some visual assembly and quality reports, I don't want to start coding something that has already been done. What are common visualization methods for sets of similar 'contigs'? i.e. if I am varying assembly stringency and want to compare the output of the assembler.


    5) What other questions should I be asking? I know its not easy to assess, but what kinds of thing do beginners tend to be ignorant of? What are the 'key texts' that I should read before asking anything else?


    Well, there are my '5 potatoes of ignorance' - I'd be delighted for any kind of feedback on any of them!

    Dan.
    Homepage: Dan Bolser
    MetaBase the database of biological databases.

  • #2
    Hi Dan, welcome to SEQanswers.

    I'll take a quick shot at answering your first question, though I should mention that this forum, in general, is for people using the Next-Gen platforms, so this might be the wrong place to be asking these questions. (I've never really done any serious capillary based sequencing or assembly.) Regardless, we're all friendly here, I believe, and I'm certain that some of the members migrated to the next-gen after years on the "Big Dye chemistry" platforms. Hopefully one of them will chime in and give you better answers than I can.

    Originally posted by dan View Post
    Hi,
    OK, its not that bad, I do have the broad idea, but where can I find out more? What books should I be reading? Which websites have the best information? How does this kind of sequencing compare to NextGen sequencing in terms of speed, throughput, cost, coverage, de-novo 'assemble-ability', etc?
    If you're asking about non-Next-Gen sequencing, you're basically referring to all of the sequencing done before the next-gen platforms arrived in 2006/2007. If you pick up any reasonable molecular biology or biotech textbook, it'll probably have a few paragraphs on it. (Look up Dideoxy or Sanger DNA sequencing for the chemistry, and capillary sequencing for the machines. I'd be surprised if you couldn't find a few hundred web pages on it - there are nearly a million on capillary sequencing.

    In terms of cost, Sanger sequencing can be an order of magnitude (or more) expensive per base, but has some very good features: It's accurate, it's targeted (using primer pairs) and is a trusted method. The key is that it's not competing with Next-Gen sequencing - they have very different applications. Now that Next-Gen is available, I don't think it's particularly cost effective to sequence a genome using Sanger sequencing, but it has been done (eg. the human genome), although pretty much everyone doing next gen work will use Sanger sequencing to verify any predictions they make.

    Generally, I would sum it up as this: Sanger sequencing is used to look at a single site of dna, (eg, a BAC) with great specificity and for reads of about 1000bp in length. Next-Gen sequencing is more of the "pick X million random locations" type (length and number of sequences depend on the technology used), which wouldn't make sense if you wanted to look at a single BAC.

    (Or course, if you have access to next-gen sequencing, you wouldn't be making a BAC library in the first place.)

    As for information, I suspect another good place to start is pubmed. Papers before 2006 will all be discussing how they did assembly with this type of sequencing information. I'm certain there are many applications out there to assist in this task. Their manuals would also be full of helpful hints.

    Hopefully that's enough to get you pointed in the right direction, though I've left most of your questions unanswered.

    Good luck
    The more you know, the more you know you don't know. —Aristotle

    Comment

    Latest Articles

    Collapse

    • seqadmin
      Current Approaches to Protein Sequencing
      by seqadmin


      Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
      04-04-2024, 04:25 PM
    • seqadmin
      Strategies for Sequencing Challenging Samples
      by seqadmin


      Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
      03-22-2024, 06:39 AM

    ad_right_rmr

    Collapse

    News

    Collapse

    Topics Statistics Last Post
    Started by seqadmin, 04-11-2024, 12:08 PM
    0 responses
    18 views
    0 likes
    Last Post seqadmin  
    Started by seqadmin, 04-10-2024, 10:19 PM
    0 responses
    22 views
    0 likes
    Last Post seqadmin  
    Started by seqadmin, 04-10-2024, 09:21 AM
    0 responses
    17 views
    0 likes
    Last Post seqadmin  
    Started by seqadmin, 04-04-2024, 09:00 AM
    0 responses
    49 views
    0 likes
    Last Post seqadmin  
    Working...
    X