SEQanswers

SEQanswers (http://seqanswers.com/forums/index.php)
-   Introductions (http://seqanswers.com/forums/forumdisplay.php?f=22)
-   -   Hello from Dundee! (Scotland!) (http://seqanswers.com/forums/showthread.php?t=567)

dan 09-15-2008 06:12 AM

Hello from Dundee! (Scotland!)
 
Hi,

My name is Dan Bolser, and I recently started a Post Doc. in Dundee working on the potato genome sequencing project. Although this is a really exciting and far reaching project, I should say that I am very new to the field of sequencing!

I have a degree in biochemistry, and subsequently I did a masters, PhD and Post Doc. in structural bioinformatics and interactomics. So, although I have studied 'DNA' and molecular genetics during my degree, its all a bit hazy these days ;-)

The first 'bulk' of sequencing data that we have here in the UK / Ireland consortium (Chromosome 4) has been generated from several 'interesting' BACs using "capillary based Big Dye chemistries" (ABI 3730 DNA Analyzers). There are a few things that I am keen to learn more about. (I'll ask similar questions in the other appropriate forums, but I may as well list the main issues that I am facing as a beginner.)


1) What is "capillary based Big Dye chemistries" (ABI 3730 DNA Analyzers)? ;-)

OK, its not that bad, I do have the broad idea, but where can I find out more? What books should I be reading? Which websites have the best information? How does this kind of sequencing compare to NextGen sequencing in terms of speed, throughput, cost, coverage, de-novo 'assemble-ability', etc?


2) What kinds of questions should I be asking of the sequence data? So far I just have a bunch of chromatogram files (ABI format) broken down into groups by BAC. I think I need to know (or it would be useful to know) the following basic things about the data:

* sub-cloning (sequencing) vector sequence
* cloning vector sequence
* insert size
* BAC size
* ...

What else should I be asking (before starting the assembly)?


3) What kind of assembly pipelines are routinely used on this kind of data?

Currently I am playing with phred/phrap, but perhaps this is considered old hat? Not that I want (or need) to be pushing the bleeding edge, but I would like to be doing something relatively 'standard'. For this kind of sequencing data, is phred/phrap more or less a popular choice?


4) Once I have run (vanilla) phred/phrap, how should I be visualizing the results? I had a look at consed, but it gives me very detailed views of the contigs. I would like to be able to compare different sets of contigs in 'overview'. While I think it should be relatively easy to parse the phred/phrap output and produce some visual assembly and quality reports, I don't want to start coding something that has already been done. What are common visualization methods for sets of similar 'contigs'? i.e. if I am varying assembly stringency and want to compare the output of the assembler.


5) What other questions should I be asking? I know its not easy to assess, but what kinds of thing do beginners tend to be ignorant of? What are the 'key texts' that I should read before asking anything else?


Well, there are my '5 potatoes of ignorance' - I'd be delighted for any kind of feedback on any of them!

Dan.

apfejes 09-15-2008 09:05 AM

Hi Dan, welcome to SEQanswers.

I'll take a quick shot at answering your first question, though I should mention that this forum, in general, is for people using the Next-Gen platforms, so this might be the wrong place to be asking these questions. (I've never really done any serious capillary based sequencing or assembly.) Regardless, we're all friendly here, I believe, and I'm certain that some of the members migrated to the next-gen after years on the "Big Dye chemistry" platforms. Hopefully one of them will chime in and give you better answers than I can.

Quote:

Originally Posted by dan (Post 1544)
Hi,
OK, its not that bad, I do have the broad idea, but where can I find out more? What books should I be reading? Which websites have the best information? How does this kind of sequencing compare to NextGen sequencing in terms of speed, throughput, cost, coverage, de-novo 'assemble-ability', etc?

If you're asking about non-Next-Gen sequencing, you're basically referring to all of the sequencing done before the next-gen platforms arrived in 2006/2007. If you pick up any reasonable molecular biology or biotech textbook, it'll probably have a few paragraphs on it. (Look up Dideoxy or Sanger DNA sequencing for the chemistry, and capillary sequencing for the machines. I'd be surprised if you couldn't find a few hundred web pages on it - there are nearly a million on capillary sequencing.

In terms of cost, Sanger sequencing can be an order of magnitude (or more) expensive per base, but has some very good features: It's accurate, it's targeted (using primer pairs) and is a trusted method. The key is that it's not competing with Next-Gen sequencing - they have very different applications. Now that Next-Gen is available, I don't think it's particularly cost effective to sequence a genome using Sanger sequencing, but it has been done (eg. the human genome), although pretty much everyone doing next gen work will use Sanger sequencing to verify any predictions they make.

Generally, I would sum it up as this: Sanger sequencing is used to look at a single site of dna, (eg, a BAC) with great specificity and for reads of about 1000bp in length. Next-Gen sequencing is more of the "pick X million random locations" type (length and number of sequences depend on the technology used), which wouldn't make sense if you wanted to look at a single BAC.

(Or course, if you have access to next-gen sequencing, you wouldn't be making a BAC library in the first place.)

As for information, I suspect another good place to start is pubmed. Papers before 2006 will all be discussing how they did assembly with this type of sequencing information. I'm certain there are many applications out there to assist in this task. Their manuals would also be full of helpful hints.

Hopefully that's enough to get you pointed in the right direction, though I've left most of your questions unanswered.

Good luck


All times are GMT -8. The time now is 04:03 PM.

Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.