SEQanswers

Go Back   SEQanswers > Sequencing Technologies/Companies > Helicos / Direct Genomics



Similar Threads
Thread Thread Starter Forum Replies Last Post
The position file formats ".clocs" and "_pos.txt"? Ist there any difference? elgor Illumina/Solexa 0 06-27-2011 07:55 AM
Helicos: "dark nucleotides" and the coverage parameter Irina Pulyakhina Bioinformatics 0 09-21-2010 03:31 AM
"Systems biology and administration" & "Genome generation: no engineering allowed" seb567 Bioinformatics 0 05-25-2010 12:19 PM
SEQanswers second "publication": "How to map billions of short reads onto genomes" ECO Literature Watch 0 06-29-2009 11:49 PM
Helicos sequencing: Single-Molecule DNA Sequencing of a Viral Genome eldfors Literature Watch 1 04-04-2008 08:19 AM

Reply
 
Thread Tools
Old 04-03-2008, 12:58 PM   #1
ECO
--Site Admin--
 
Location: SF Bay Area, CA, USA

Join Date: Oct 2007
Posts: 1,355
Default First Helicos Publication! Single Molecule DNA Seq of a "Viral" Genome



Science has just released the first publication detailing real data from the Heliscope.

They undertook the extremely exciting project of resequencing the M13 genome ( was I hoping for too much?) in collaboration with researchers at Ohio Univ and Stanford U.

Quote:
Single-Molecule DNA Sequencing of a Viral Genome

Timothy D. Harris,1* Phillip R. Buzby,1 Hazen Babcock,1 Eric Beer,1 Jayson Bowers,1 Ido Braslavsky,2 Marie Causey,1 Jennifer Colonell,1 James DiMeo,1 J. William Efcavitch,1 Eldar Giladi,1 Jaime Gill,1 John Healy,1 Mirna Jarosz,1 Dan Lapen,1 Keith Moulton,1 Stephen R. Quake,3 Kathleen Steinmann,1 Edward Thayer,1 Anastasia Tyurina,1 Rebecca Ward,1 Howard Weiss,1 Zheng Xie1

The full promise of human genomics will be realized only when the genomes of thousands of individuals can be sequenced for comparative analysis. A reference sequence enables the use of short read length. We report an amplification-free method for determining the nucleotide sequence of more than 280,000 individual DNA molecules simultaneously. A DNA polymerase adds labeled nucleotides to surface-immobilized primer-template duplexes in stepwise fashion, and the asynchronous growth of individual DNA molecules was monitored by fluorescence imaging. Read lengths of >25 bases and equivalent phred software program quality scores approaching 30 were achieved. We used this method to sequence the M13 virus to an average depth of >150x and with 100% coverage; thus, we resequenced the M13 genome with high-sensitivity mutation detection. This demonstrates a strategy for high-throughput low-cost resequencing.


1 Helicos BioSciences Corporation, One Kendall Square, Cambridge, MA 02139, USA.
2 Department of Physics and Astronomy, Ohio University, Athens, OH 45701, USA.
3 Department of Bioengineering, Stanford University, and Howard Hughes Medical Institute, Stanford, CA 94305, USA.
The paper abstract and full text (for subscribers) is located here:
http://www.sciencemag.org/cgi/conten...t/320/5872/106

I'm reading it as we speak, and would welcome interpretations of the data!
ECO is offline   Reply With Quote
Old 04-03-2008, 02:05 PM   #2
ECO
--Site Admin--
 
Location: SF Bay Area, CA, USA

Join Date: Oct 2007
Posts: 1,355
Default

Official press release here:

http://www.businesswire.com/portal/s...53&newsLang=en

Quote:
Helicos BioSciences Announces Single Molecule DNA Sequence Data Published in Science Magazine

Data Validates the World’s First Single Molecule Sequencing of an Organism

CAMBRIDGE, Mass.--(BUSINESS WIRE)--Helicos BioSciences (NASDAQ: HLCS), a life science company focused on innovative genetic analysis technologies, today announced the publication of a report in Science Magazine demonstrating the first single molecule sequencing of an organism. The report depicts the use of Helicos’ proprietary True Single Molecule Sequencing (tSMS)™ technology to re-sequence the M13 viral genome. The report will appear in the April 4, 2008 print issue of Science Magazine.


The report demonstrates that the tSMS technology can reliably re-sequence a moderately complex genome without the associated errors, cost, and experimental complexity of amplification. The tSMS process captures images of single dye labeled nucleotides as they are incorporated to determine the sequence of the individual DNA strands. In addition, the tSMS method simplifies the DNA sample preparation process and maximizes throughput by packing individual strands of DNA at high densities onto the sequencing surface.


“The ability to sequence individual strands of genomic DNA has been a goal of the scientific community for more than 20 years,” said Timothy Harris, PhD, senior director of research at Helicos BioSciences and the report’s corresponding author. “The data in Science Magazinedemonstrate the robustness of our single molecule method and demonstrate our ability to accurately detect single base mutations. Not only does this data represent the first of its kind, but a significant milestone in the genomics revolution.”
To validate its technology, Helicos scientists sequenced the M13 virus genome, examining more than 280,000 strands of captured DNA, directly visualizing the sequential incorporation of individual labeled nucleotides. Overall per-base accuracy was better than 99% and the accuracy of the consensus sequence was 100%. To assess accuracy and robustness of mutation detection, Helicos’ scientists introduced in silico single nucleotide changes into the reference M13 virus genome sequence and compared them to Helicos DNA sequences. The tSMS technology correctly found 98% of 500 simulated mutations with zero false positive errors.


“This data, remarkable as it is, was based on the first generation of our tSMS chemistry,” said Bill Efcavitch, PhD, senior vice president for product R&D at Helicos BioSciences. “We have since developed new generations of ‘one-base-at-a-time’ nucleotides which allow more accurate homopolymer sequencing, and lower overall error rates.”
The report published in Science Magazine initiates the path to many other scientific reports Helicos plans to publish in the upcoming months. These reports will highlight data recently announced at the AGBT meeting in Marco Island further demonstrating single molecule sequencing being applied to both BAC sequencing accuracy, and the ability to count microRNAs as well as identify putative novel miRNAs.
ECO is offline   Reply With Quote
Old 04-04-2008, 03:14 AM   #3
terabase
Junior Member
 
Location: France

Join Date: Feb 2008
Posts: 4
Default And the stock is rising

It does not take much to boost their stock price - up 40% within a week !!

Their paper is rather a proof of concept than the presentation of a machine that can compete in the nextgen seq market.
terabase is offline   Reply With Quote
Old 04-04-2008, 07:29 AM   #4
ECO
--Site Admin--
 
Location: SF Bay Area, CA, USA

Join Date: Oct 2007
Posts: 1,355
Default

Something else I noticed, in their introduction they bad mouth the library preparation protocols for all the other platforms, basically saying that adding adapters is labor intensive, etc, then they go on to prove that they absolutely MUST use adapters to get bidirectional reads because their error rates are so high.

Seems like C incorporations are a killer...
ECO is offline   Reply With Quote
Old 04-04-2008, 08:17 AM   #5
terabase
Junior Member
 
Location: France

Join Date: Feb 2008
Posts: 4
Default they "sold" a Heliscope to Expression Analysis

Actually their stock price went up initially a week ago when they announced the sale of the first machine to http://www.expressionanalysis.com/ . Is actually the second time they announced the first sale. They do not tell what Expression Analysis payed for or how much Helicos had to pay to make them try the machine. May be the will resequence bacteriophage lambda soon. Is about four times the size of M13.
At the current cash burn rate Helicos has enough cash for about a year or so -> they absolutely need positive news to at least temporarily drive the stock price up.
terabase is offline   Reply With Quote
Old 04-04-2008, 02:41 PM   #6
ECO
--Site Admin--
 
Location: SF Bay Area, CA, USA

Join Date: Oct 2007
Posts: 1,355
Default

Quote:
Originally Posted by terabase View Post
Maybe the will resequence bacteriophage lambda soon. Is about four times the size of M13.
Tiny genome resequencing service. No one ever said how big the $1000 genome had to be!
ECO is offline   Reply With Quote
Old 04-04-2008, 03:03 PM   #7
Chipper
Senior Member
 
Location: Sweden

Join Date: Mar 2008
Posts: 324
Default

Quote:
Originally Posted by ECO View Post
Tiny genome resequencing service. No one ever said how big the $1000 genome had to be!
Lol, but remember that this experiment was done on a pre-production machine, using only one lane (out of 2x25 per run) with about 100x coverage per strand. And the obvious advantage is the lack of amplicication bias, not that you dont have to ligate linkers. And multipass readings are not the same as bidirectional reads. I guess we will se more in the coming days but if they could come close to what they say it will be hard times for SOLiD / Solexa sytems to compete at the current reagent costs...
Chipper is offline   Reply With Quote
Old 04-09-2008, 01:28 PM   #8
Mr. Gunn
Member
 
Location: USA

Join Date: Dec 2007
Posts: 10
Default

Quote:
Originally Posted by ECO View Post
Something else I noticed, in their introduction they bad mouth the library preparation protocols for all the other platforms, basically saying that adding adapters is labor intensive, etc, then they go on to prove that they absolutely MUST use adapters to get bidirectional reads because their error rates are so high.
I noticed that too. I still think the killer for them is going to be the expensive optics. There are other ways of detecting really small amounts that don't require a million dollars in instrumentation, ya know?
Mr. Gunn is offline   Reply With Quote
Old 08-01-2008, 06:17 AM   #9
kmay
Member
 
Location: Munich, Germany

Join Date: Aug 2008
Posts: 29
Default seen helicos data

Hi,

Helicos seems to to be so popular here
Well, bells and whistles about M13 was maybe not such a wise decision...

However, I currently analyze a data set I received from Helicos. A DGE study from a human tissue. I have to say - looks pretty good.

Can´t tell more here... NDA!

But to summarize: biological results absolutely comparable to such derived from Solexa! I think they get their act together.

Cheers

Klaus
kmay is offline   Reply With Quote
Old 08-01-2008, 10:31 AM   #10
Chipper
Senior Member
 
Location: Sweden

Join Date: Mar 2008
Posts: 324
Default

Hi Klaus,

good to see that they are generating usable data with the Heliscope. Could you share any numbers from the sequencing or is that also under NDA?...
Chipper is offline   Reply With Quote
Old 08-01-2008, 01:09 PM   #11
kmay
Member
 
Location: Munich, Germany

Join Date: Aug 2008
Posts: 29
Default I´ll see

Chipper,

I´ll see what I can do. But not before next week. I cant access our secure servers from here...

Klaus
kmay is offline   Reply With Quote
Old 08-04-2008, 03:28 AM   #12
kmay
Member
 
Location: Munich, Germany

Join Date: Aug 2008
Posts: 29
Default

Okay,

what I can share are numbers from our first step analysis after mapping. Mapping was very stringent: best unique hit
(= at least one shortest unique sub-sequence contained, point mutations allowed, no indels allowed):

here the summary:
The data set contains reads from the following organism:
Homo sapiens 4020914

Read length (bp) number
11 86
12 1333
13 6904
14 20384
15 49695
16 91159
17 131835
18 153064
19 166469
20 164352
21 169349
22 171943
23 178216
24 185147
25 210388
26 230526
27 245471
28 178388
29 168223
30 143917
31 135030
32 122484
33 113991
34 106351
35 98137
36 90642
37 83322
38 77510
39 70555
40 63349
41 56651
42 50235
43 44120
44 38896
45 34378
46 30472
47 26112
48 22580
49 18993
50 15407
51 12211
52 9602
53 7529
54 5801
55 4115
56 2990
57 2174
58 1517
59 1204
60 953
61 821
62 623
63 664
64 700
65 464
66 572
67 514
68 655
69 356
70 400
71 287
72 158
73 140
74 103
75 80
76 55
77 33
78 25
79 26
80 7
81 15
82 4
83 7
84 1
85 1
86 4
87 3
88 2
89 2
90 3
92 2
93 3
94 6
95 1
96 4
98 1
101 1
102 2
107 1
110 1
112 1
113 3
116 2
123 1

Annotation:
Intergenic regions 1810570558bp 58.8%
Promoters 44676168bp 1.5%
Exons 97616725bp 3.2%
Introns 1172232197bp 38.1%

Read distribution:
Intergenic regions 1694883 42.2%
Promoters 325016 8.1%
Exon 1079293 26.8%
Intron 1167470 29.0%
Partial 79268 2.0%

=======================

Next step: clustering
summary output:

Cluster detection:
window size: 100
reads/window: 7
probability.: 1.1e-10
clusters detected: 35496
reads in clusters: 2118299 52.68%
min. cluster length: 13
max. cluster length: 5876
avg. cluster length: 117
min. number of reads: 7
max. number of reads: 251937
avg. number of reads: 59

Classification
intergenic regions 10945 30.8%
promoters 3369 9.5%
exon 10501 29.6%
intron 8883 25.0%
partial 5167 14.6%

==========================

expression analysis:

analyzed transcripts: 85562
expressed transcripts: 72514 84.8%
normalized expression value (NE):
minimum: 0.000
maximum: 95.675
average: 0.061
analyzed loci: 32514
expressed loci: 26160 80.5%

NE Transcripts
(0.000:0.020] 48993
(0.020:0.040] 9557
(0.040:0.060] 4390
(0.060:0.080] 2294
(0.080:0.100] 1465
(0.100:0.120] 1020
(0.120:0.140] 707
(0.140:0.160] 576
(0.160:0.180] 453
(0.180:0.200] 353
(0.200:0.220] 245
(0.220:0.240] 251
(0.240:0.260] 174
(0.260:0.280] 166
(0.280:0.300] 131
(0.300:0.320] 121
(0.320:0.340] 100
(0.340:0.360] 69
(0.360:0.380] 81
(0.380:0.400] 107
(0.400:95.675] 1261

====================================

This was very crude first analysis run at all parameters default.
Mapping on our mapping station took 10 minutes
(parameters for best unique are least time consuming)

Rest of analysis took 7 minutes on GGA.

Cheers

Klaus
kmay is offline   Reply With Quote
Old 08-04-2008, 05:51 AM   #13
Chipper
Senior Member
 
Location: Sweden

Join Date: Mar 2008
Posts: 324
Default

Klaus,

thanks for sharing the numbers, it sure looks promising. Was this data from one lane only?
Chipper is offline   Reply With Quote
Old 08-04-2008, 06:05 AM   #14
kmay
Member
 
Location: Munich, Germany

Join Date: Aug 2008
Posts: 29
Default

The raw reads were pooled from two channels.

Again, this was a quick and dirty first pass. Mapped tag numbers can be increased significantly with more relaxed mapping parameters. However, downstream pathway mining of the expressed transcripts 100% confirms the biological context of the sample.

Klaus
kmay is offline   Reply With Quote
Old 08-05-2008, 04:05 AM   #15
kmay
Member
 
Location: Munich, Germany

Join Date: Aug 2008
Posts: 29
Default

Quote:
thanks for the Helicos data. What number of mismatches was allowed in the alignment?
Basically there was no limit on the number of point mutations allowed. The "unique best match" setting in our method works like that:

There is a tree with shortest unique words for each position in the genome. This shortest unique word matches exactly once in the genome. E.g. one starts with a tuple of 5 checks uniqueness, increases one bp, checks uniqueness,6..,7.. 8.. and so on until the "word" is unique. SNPs are taken into account. This library of shortest unique words has a variable length.

For mapping parameters can be introduced: point mutations and indels within those shortest unique words.

For "unique best match" none of the above is allowed (=most stringent). Reads from Helicos were checked whether they contain at least one exact shortest unique word in full. Then around this position, alignmet grows into the read in both directions. Here point mutations were allowed, no limit imposed. At this growth, in this case, SNPs were not taken into account. So several of the observed point mutations can originate from a SNP.

Very basic statistics:
Point mutations # of reads
0 509622
1 369486
2 318733
3 313244
4 297974
5 297301
6 298140
7 344822
8 233730
9 191911
10 153682
11 131893
12 113460
13 98302
14 82719
15 69071
16 57540
17 46855
18 37505
19 30710
20 24214

Keep in mind that we have read lengths up to 123 bp. The above numbers need to be normalized to read length and length and count of shortest unique words contained.
kmay is offline   Reply With Quote
Old 09-05-2008, 06:05 AM   #16
new300
Member
 
Location: northern hemisphere

Join Date: Mar 2008
Posts: 50
Default

Quote:
Originally Posted by kmay View Post
Basically there was no limit on the number of point mutations allowed. The "unique best match" setting in our method works like that:
What was the error/mismatch rate in the reads that aligned?
new300 is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 02:16 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2018, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO