SEQanswers

Go Back   SEQanswers > Sequencing Technologies/Companies > Illumina/Solexa



Similar Threads
Thread Thread Starter Forum Replies Last Post
some problem of solexa results biocc Illumina/Solexa 1 06-08-2010 02:22 PM
problem withe Illumina solexa sequencing g781 Illumina/Solexa 3 05-18-2010 11:05 AM
Illumina solexa 75bp format problem anyone1985 Bioinformatics 5 08-31-2009 08:02 AM
Question about analysis problem for solexa sequencing of RNA profiles beckham423 Illumina/Solexa 0 08-25-2009 12:57 AM
solexa data weasteam Illumina/Solexa 2 02-03-2009 03:53 AM

Reply
 
Thread Tools
Old 03-12-2009, 06:41 AM   #1
BaCh
Member
 
Location: Germany

Join Date: May 2008
Posts: 79
Question GGc.G problem with Solexa data

Dear all,

(Note: the data I'm talking about came from a GAII and, if my assumption is correct, used a 36mer kit to read 40 bases.)

I've been working on a number of projects with Solexa data (bacterial resequencing including hand editing to get "100% certain" SNPs). Looking at cases where the programs report "there's a problem, we're unsure about the true situation", I've made the following observation:
In read direction, errors like to occur directly after GGC.G (the dot standing for any base). On a broader scope, GG..G is at risk.
There are probably other factors as fortunately this does not occur everywhere in the genomes I'm working on, but when this problem strikes, it easily affects 1/3 to 1/2 half of the reads, sometimes more. Also, the problem is more likely to occur in the second half of the read than in the first half.

I've attached a small screenshot of a typical case as example (the yellow C being correct, the blue Gs not).

Now, I have some routines that can filter out problematic cases, but then I loose on rare occasions almost the entire coverage. Not good.

Questions:
  1. Has anyone else seen this?
  2. Any idea if this can be reduced (I suppose more on the lab side)?

Regards,
Bastien
Attached Images
File Type: png scrshot0018.png (2.3 KB, 67 views)
BaCh is offline   Reply With Quote
Old 03-19-2009, 10:53 AM   #2
bioinfosm
Senior Member
 
Location: USA

Join Date: Jan 2008
Posts: 482
Default

What are you using to do the alignments? How do you identify 'cases' or mis-calls?
bioinfosm is offline   Reply With Quote
Old 03-19-2009, 11:13 AM   #3
dvh
Member
 
Location: london, uk

Join Date: Jul 2008
Posts: 35
Default

I had a look in our data, we dont seem to see this. GAII 45bp or 70bp x2 PE. Human resequencing. Novoalign.
dvh is offline   Reply With Quote
Old 03-19-2009, 04:04 PM   #4
BaCh
Member
 
Location: Germany

Join Date: May 2008
Posts: 79
Default

Quote:
Originally Posted by bioinfosm View Post
What are you using to do the alignments? How do you identify 'cases' or mis-calls?
I use MIRA. I've put together a walkthrough using public data (all from the NCBI) that shows the problem and gives a step by step recipe to reproduce what I see in several projects:
http://chevreux.org/GGCxG_problem.html
Disclaimer: yep, MIRA may be slow at times and can be a real memory hog. But it's mine and does exactly what I need
BaCh is offline   Reply With Quote
Reply

Tags
gc solexa

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 03:42 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2021, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO