hjtripp 04-06-2012 08:22 AM

Hello from Jim Tripp at JGI, Walnut Creek CA
Hi folks!

I registered here after reading a Keith Robinson's blog on Oxford nanopore technology.

His last paragraph is:

A last thought: for any company considering doing this, please <snip> Do yourself a favor: build a registration-free data release site and just use SEQAnswers as the discussion forum. Your marketing people will grouse you've missed an opportunity to collect data, but just ignore them. You will have something far more precious: lots of ADHDish programmers trying to play with your platform's data.

"Lots of ADHDish programmers" sounded like a place I might want to visit, so I've stopped by. :)

I'm principally interested in genome annotation for metabolic modeling, and have published on those subjects. More recently, I stepped into the world of metatranscriptomics and metagenomics for environmental samples of microbial communities, and really think that short reads (less than a typical prokaryotic gene or 1000bp) are the wrong tools for this particular job. I think I'll hold out plunging in deeper until someone (Oxford?) can come up with reads longer than 1000bp that are 99.99% accurate. In other words, at least as good as ancient ABI machines, but faster. PacBio at 3kbp and 85% doesn't cut it with me. Oxford at 30kbp and 96% accuracy doesn't cut it for me. "Slow down and reread for error correction" is what both companies propose for their dismal error rates compared to the slow but accurate first generation machines they replace is perhaps a solution. But how much slower and how much better? I'm waiting. Like Keith Robinson, I want to see hard data, realistic data, not hyped projections and fake data tuning to immature technology.

The two domains of life making up the prokaryotes (sorry Norm Pace, it's a useful word in my opinion) are far too diverse to be studied fruitfully with current NextGen technology, again in my opinion. That technology is awesome for resequencing to a reference, e.g. human genomes for medical records. It is a claimed solution to the problem of metagenomics and metatranscriptomics on the theory that depth of sequencing can cover up the inherent problems of 150bp reads from the prokaryotes. Our choked pipelines at sequencing centers bear me out, I think.

Nice chatting. Cheers to all.

Peppe 11-26-2012 03:42 PM

Hello Jim,
I see you are at the JGI and I though to ask you an advice/question. I am performing RNAseq analysis using a yeast whose genome sequence is available on the JGI website. I downloaded both the unmasked genome and the annotation file. I need the annotation file for the cufflink and cuffcompare steps, but the problem that I have is that the downloaded file is in gff and galaxy (the public server for analyzing NGS data) accepts only gtf files. Do you have any contact there that can help me?

LVAndrews 11-27-2012 08:45 AM

Hi Jim,

I too study environmental microbial samples. Coming from the world of JGI and large sequencing projects, I can imagine how you would be dissatisfied with the short reads offered by the leading NGS instruments at present. However, just because the tools we want aren't yet available to use doesn't mean we shouldn't learn what we can with what we already have. Carl Woese didn't even have Sanger sequencing at his disposal, and yet he changed the way we view life on Earth. I have used 454 and Illumina for analysis of fungal and bacterial/archaeal communities in my work. Each has its advantages and drawbacks, but they both provide far more information than tRFLP and clone library sequencing, and do it in a reasonable amount of time. Currently our lab uses a MiSeq for bacteria/archaea using the EMP protocol ( and we hope to have a fungal assay in place soon. Our current workflow sequences a 250bp region of the 16S gene using 2x150 PE reads. Taxonomic assignments are made confidently, and this gives us our first real glimpse into the microbes that inhabit our various study sites. Additionally, knowledge of the presence of various species lets us design new testable hypotheses with regards to transcription of ecologically important gene products in the context of our established experiments.

Of course for metatranscriptome work you need a lot more depth to make confident assemblies of reads, but the tools we have will do the job for you right now. We've all been waiting for even a single word from Oxford for almost a year now, so the best strategy is probably to continue to embrace your functioning tools and adopt minions as necessary to augment existing data sets.


hjtripp 11-29-2012 09:30 AM

JGI Contact Person, gff to gtf conversion
Hello, does work on rna sequencing and has faced the issue of converting these files. He's willing to help if asked.

I don't really have any comment on sequencing technology at this time. To be honest, I think we collect more data than we can productively and adequately analyze due to the quality of reference data. So I work on improving that.


Peppe 11-29-2012 09:35 AM

Thanks hjtripp

