SEQanswers

Go Back   SEQanswers > Sequencing Technologies/Companies > Illumina/Solexa



Similar Threads
Thread Thread Starter Forum Replies Last Post
50 bp paired end reads vs. 100 bp single end reads efoss Bioinformatics 12 01-15-2014 09:05 PM
#bases & # reads CS Student General 5 01-21-2012 04:10 AM
duplicated RNA-Seq libraries mjp Bioinformatics 4 04-05-2011 07:34 AM
Hello and a Question: 50 or 100 bp reads? kerhard Introductions 0 02-11-2011 02:13 PM
Bowtie and reads that failed to align: (100.00%) michy Bioinformatics 7 02-08-2011 07:42 PM

Reply
 
Thread Tools
Old 10-25-2010, 09:52 AM   #1
wraithnot
Member
 
Location: SF bay area

Join Date: Apr 2009
Posts: 12
Default Duplicated bases in 100 bp GA2 reads

Hi All,

I recently found an odd artifact in some 100 bp illumina GA2 reads we got from our sequencing provider. After some initial consternation, I realized that all the raw data contained duplicated bases at specific cycle numbers. More precisely, every sequence read in two of the samples that were run side-by-side had an insertion at the 37th and 74th positions that corresponded to the base at the 36th and 73 positions respectively. A third sample run at a later time had an insertion at the 51st position that was identical to the base at the 50th position for every single read. If I removed the 37th and 74th base for all the sequence reads in the first two datasets and the 51st base in the third datasets and then everything looked OK.

Has anyone else experienced this type of artifact before? Any idea what could cause this sort of thing? I brought this to their attention and mentioned that the positions of the inserted bases bore a striking resemblance to the standard 36 bp and 50 bp read lengths, but they insisted their machines were working properly and that no one else had complained about the data. Thoughts? Thanks
wraithnot is offline   Reply With Quote
Old 10-25-2010, 01:23 PM   #2
kmcarr
Senior Member
 
Location: USA, Midwest

Join Date: May 2008
Posts: 1,178
Default

Ah, the old "nobody else has experienced this" dodge. HORSE PUCKEY!!

Contact your sequencing provider and ask them for details about these runs. Did they notice a significant drop in intensity across all channels and all lanes at the cycles you mentioned? Do your q-scores tank after this?

We observed these two symptoms during a number of runs at one point. I later correlated duplicate base calls as you described to the anomalous cycles observed during the run.

Here's what we (and be we I mean me and the Illumina engineers) believe was happening. Something prevented dye-terminator cleavage after an imaging cycle; this resulted in no incorporation of a new base so that at the next imaging cycle we were just detecting the bases from the previous cycle a second time. As the dyes had suffered photo bleaching from the first round of imaging the signals were consequently lower. It then appeared that the chemistry would return to normal for subsequent cycles. However due to the anomalous intensities in the one cycle (and we believe significant phasing introduced by the fact that this effect was not 100%) the Q-scores took a nose dive after this cycle. If this occurred early enough in the run we could extend the run with additional cycles, then I would re-run the pipeline off-line starting at a cycle after the bad one. This helped clean up the data somewhat.

What finally resolved this issue for us was to have Illumina replace the VICI valve AND the controller board for this valve.

I can't say for certain that this is what is going on with your data. You need to talk to your sequencing provider and have them discuss this with Illumina.
kmcarr is offline   Reply With Quote
Old 10-25-2010, 06:34 PM   #3
wraithnot
Member
 
Location: SF bay area

Join Date: Apr 2009
Posts: 12
Default

@kmcarr- thanks for the detailed response. I took a closer look at the raw fastq data they gave me. For the sample with two separate insertions spaced 36 bp apart, the second insertion had a clear q-score drop. For the sample with one insertion around base 50 there was also clear q-score drop. So I think you're on to something.

For the sake of argument, is this result consistent with originally setting the machine up to perform a 36 bp or 50 bp run, and then acquiring more data after the run ended and the operator remembered it was supposed to be a 100 bp run? Does the instrument perform dye-terminator cleavage after what is supposed to be the last imagine cycle?

Thanks,
wraithnot

Quote:
Originally Posted by kmcarr View Post
Ah, the old "nobody else has experienced this" dodge. HORSE PUCKEY!!

Contact your sequencing provider and ask them for details about these runs. Did they notice a significant drop in intensity across all channels and all lanes at the cycles you mentioned? Do your q-scores tank after this?

We observed these two symptoms during a number of runs at one point. I later correlated duplicate base calls as you described to the anomalous cycles observed during the run.

Here's what we (and be we I mean me and the Illumina engineers) believe was happening. Something prevented dye-terminator cleavage after an imaging cycle; this resulted in no incorporation of a new base so that at the next imaging cycle we were just detecting the bases from the previous cycle a second time. As the dyes had suffered photo bleaching from the first round of imaging the signals were consequently lower. It then appeared that the chemistry would return to normal for subsequent cycles. However due to the anomalous intensities in the one cycle (and we believe significant phasing introduced by the fact that this effect was not 100%) the Q-scores took a nose dive after this cycle. If this occurred early enough in the run we could extend the run with additional cycles, then I would re-run the pipeline off-line starting at a cycle after the bad one. This helped clean up the data somewhat.

What finally resolved this issue for us was to have Illumina replace the VICI valve AND the controller board for this valve.

I can't say for certain that this is what is going on with your data. You need to talk to your sequencing provider and have them discuss this with Illumina.
wraithnot is offline   Reply With Quote
Old 10-26-2010, 09:37 AM   #4
mattanswers
Member
 
Location: Boston

Join Date: Oct 2009
Posts: 65
Default

Once we had a run in which the percentage alignment was very low for all of the three lanes we ran. However, when I clipped from the 5' end to base 28 before aligning, the alignment went up to what would be expected.

The reason ended-up being that when the image files were copied over to our server, folders for two cycles, 29 and 32, were not copied over in the case of just one lane. This resulted in a sequence that was 34 bases long. So, in this case there ended-up being a deletion, but it affected alignment for all lanes.

When running the Illumina pipeline there was a warning printed on the screen, but it came on and went off the screen so quickly it was difficult to read.
mattanswers is offline   Reply With Quote
Old 10-26-2010, 02:04 PM   #5
kmcarr
Senior Member
 
Location: USA, Midwest

Join Date: May 2008
Posts: 1,178
Default

Quote:
Originally Posted by wraithnot View Post
For the sake of argument, is this result consistent with originally setting the machine up to perform a 36 bp or 50 bp run, and then acquiring more data after the run ended and the operator remembered it was supposed to be a 100 bp run? Does the instrument perform dye-terminator cleavage after what is supposed to be the last imagine cycle?
I am pretty certain that there is no cleavage after the last cycle. (Really, what would be the point, it would just be a waste of time and reagents.) So I suppose there is some plausibility to what you suggest but it would be a big hassle to do. You would need to aggregate all of the images into a single folder, renaming all of the files and folder from the added cycles to reconcile them with the unified cycle number scheme then run OLB from these merged image folders.

It is possible to extend a recipe in progress, but RTA stops at the originally planned last cycle. The the SBS and imaging cycles continue normally to the new run lenght and you then analyze the run off line. Of course this only works if you can save all images so it's no longer an option with SCS 2.8.
kmcarr is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 01:29 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2022, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO