Seqanswers Leaderboard Ad

**ECO** · 08-21-2008, 08:44 AM

Just saw the paper...http://seqanswers.com/forums/showthread.php?t=515

Darn NewsBot! needs an upgrade, he missed it!

**bioinfosm** · 08-22-2008, 11:04 AM

as I understood, it just works on paired end data.. nothing on single-reads!

**ucpete** · 12-17-2008, 03:35 PM

We tried it here. Not noticing a marked improvement (5%) over Illumina's software, but we only ran it on a subset of one lane. Paired-end definitely not required. The great thing is that we wouldn't need to run phi X and waste a lane...

**new300** · 12-18-2008, 04:23 AM

Originally posted by ucpete View Post

We tried it here. Not noticing a marked improvement (5%) over Illumina's software, but we only ran it on a subset of one lane. Paired-end definitely not required. The great thing is that we wouldn't need to run phi X and waste a lane...

My understanding was that Altacyclic needs to be trained from the PhiX lane of each run. Also wouldn't you still need the PhiX lane for calibrating the quality scores?

**dvh** · 12-18-2008, 09:00 AM

We've not been able to persuade GA Pipeline 1.0 to give decent calibrated base call quality score results for our 45bp reads. Is I think due to use of ELAND and that eland_extended isnt great for >36bp-ish (cant remember if it is 32 or 36 now, I forget) reads. Calibration at pos 36-45 looks wrong.

Anyone else seen this? Have we done something stupid?

david

**ucpete** · 12-18-2008, 11:49 AM

Originally posted by new300 View Post

My understanding was that Altacyclic needs to be trained from the PhiX lane of each run. Also wouldn't you still need the PhiX lane for calibrating the quality scores?

You don't necessarily need the phi X lane, rather any reference genome against which to align your reads. In our case, we're doing metagenomic studies and can use the host genome as our reference.

**new300** · 12-18-2008, 12:19 PM

Originally posted by ucpete View Post

You don't necessarily need the phi X lane, rather any reference genome against which to align your reads. In our case, we're doing metagenomic studies and can use the host genome as our reference.

That sounds like a neat experiment! I've not tried it but can the Illumina pipeline not use other genomes as a reference for calibration?

**ucpete** · 12-18-2008, 12:53 PM

Originally posted by new300 View Post

That sounds like a neat experiment! I've not tried it but can the Illumina pipeline not use other genomes as a reference for calibration?

Yes, technically. According to Illumina, you can use any genome as a reference for calibration as long as it has 50% GC content. They claim also that this is a very strict requirement, i.e. it can't sway by more than 0.5%. The crappy part about their error rate calculations is that it's only based on those reads that actually align to the reference genome, so if you have a read with > 2 mismatches it won't even align by ELAND to phi X and won't be considered in the error calculations...

**new300** · 12-18-2008, 01:21 PM

Originally posted by ucpete View Post

Yes, technically. According to Illumina, you can use any genome as a reference for calibration as long as it has 50% GC content. They claim also that this is a very strict requirement, i.e. it can't sway by more than 0.5%.

Yep, I guess what everyone wants ideally is a fixed calibration table. I'm surprised it makes that much of a difference though.

Originally posted by ucpete View Post

The crappy part about their error rate calculations is that it's only based on those reads that actually align to the reference genome, so if you have a read with > 2 mismatches it won't even align by ELAND to phi X and won't be considered in the error calculations...

I think that should only make a difference if highly errored reads have a different error source than reads with one or two errors. The fraction of errors within a bin associated with a given feature will still be the same if you look at reads with few errors or many.

What I think looking aligned reads does for you is discard contamination. This is useful as these aren't really errors. For my own calibrator I found that letting reads with about 5 errors through was the sweet spot. So in general I think discarding reads that clearly don't come from the reference genome during calibration is a good thing.

IIRC the Alta cyclic paper doesn't assess the quality scores they assign, do you find the quality scores assigned by Alta cyclic accurate?

**dvh** · 12-18-2008, 03:43 PM

Originally posted by new300 View Post

For my own calibrator I found that letting reads with about 5 errors through was the sweet spot.

Nav,

Am interested:

1. sweet spot=5 errors, but in what read length - 36bp, 45bp, 70bp etc ?

2. Did you remove homopolymer, and "low base quality across entire read" reads first, or rely on the alignment for this?

david

**new300** · 12-18-2008, 04:10 PM

Originally posted by dvh View Post

Nav,

Am interested:

1. sweet spot=5 errors, but in what read length - 36bp, 45bp, 70bp etc ?

2. Did you remove homopolymer, and "low base quality across entire read" reads first, or rely on the alignment for this?

david

I was looking at 36bp reads, just filtered by alignment. I was just using phiX so anything low complexity like homopolymers should get filtered out by alignment. IIRC phix is unique at around 12bp so even with 5 errors you're unlikely to mis-align a 36bp read. Making sure I excluded SNP positions had more of an effect, but that's probably down to the fact I was using a really naive algorithm...

**clivey** · 12-19-2008, 01:45 AM

Originally posted by ucpete View Post

Yes, technically. According to Illumina, you can use any genome as a reference for calibration as long as it has 50% GC content. They claim also that this is a very strict requirement, i.e. it can't sway by more than 0.5%. The crappy part about their error rate calculations is that it's only based on those reads that actually align to the reference genome, so if you have a read with > 2 mismatches it won't even align by ELAND to phi X and won't be considered in the error calculations...

Originally Solexa provided two aligners for precisely this reason. One is Eland with a 2 error limit. The other was PhageAlign with no limit. You were thus able to force align all of the reads and count all of the errors. This was done deliberately and as policy during the development of the tech and transferred as part of the pipeline. Theres nothing to stop you doing this. I think an issue is really the speed of PhageAlign. Its very very very very slow - so you probably only want to do it on a chosen sub-sample of tiles rather than a whole lane.

In fact on 'normal' runs Eland only 'discards' about 3-5% of reads (last time I looked - things may have changed) - some of which will be truly erroneous - some will be contaminants that slipped filters and other oddness caused for example by imaging artifacts.

I still say, if you are getting a significant percentage of reads with more than two errors then something is seriously awry with your system becuase youd be looking at error rates in the high single to double percentages.

**yvan.wenger** · 10-07-2009, 01:05 AM

Hello everybody,

Does anybody tried to compare Alta-Cyclic, the Illumina Pipeline (GAP*1.4.0) and Ibis (http://genomebiology.com/2009/10/8/R83 ) ?

**yvan.wenger** · 12-10-2009, 02:10 AM

Update: I tried Ibis and it performed slightly better than the GA Pipeline 1.4 on 3 lanes, ~60x10⁶ raw reads, 76 bp. Great tool. For the comparison I*tested raw reads (Ibis) vs raw reads (GAP).

Does anyone knows the exact criteria that GERALD*uses to choose to discard low quality reads?

Topics	Statistics	Last Post
New Software Simplifies 3D Gene Expression Mapping by seqadmin Started by seqadmin, Today, 10:17 AM	0 responses 7 views 0 reactions	Last Post by seqadmin Today, 10:17 AM
AI Tool Creates High-Resolution 3D Maps of the Mouse Brain by seqadmin Started by seqadmin, 03-20-2025, 05:03 AM	0 responses 49 views 0 reactions	Last Post by seqadmin 03-20-2025, 05:03 AM
Studying Microbial Gene Transfer with RNA Barcoding by seqadmin Started by seqadmin, 03-19-2025, 07:27 AM	0 responses 59 views 0 reactions	Last Post by seqadmin 03-19-2025, 07:27 AM
Mapping the snoRNAome in Zebrafish to Advance Disease Research by seqadmin Started by seqadmin, 03-18-2025, 12:50 PM	0 responses 50 views 0 reactions	Last Post by seqadmin 03-18-2025, 12:50 PM

Seqanswers Leaderboard Ad

Alta-Cyclic

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News