SEQanswers

Go Back   SEQanswers > Sequencing Technologies/Companies > 454 Pyrosequencing

Similar Threads
Thread Thread Starter Forum Replies Last Post
Can somebody explain the purpose of Y adapters for paired end preps to me? Heisman Illumina/Solexa 12 02-24-2016 06:04 AM
can somene explain how BWA do its trimming foxyg Bioinformatics 7 12-19-2012 09:13 AM
Removing Duplicates Scenario Exome Resequencing Hkins552 Bioinformatics 1 12-05-2011 06:23 PM
explain cytogenetic bands zhangxiaobo General 5 09-16-2010 10:58 PM

Reply
 
Thread Tools
Old 01-15-2009, 06:38 AM   #1
mingkunli
Member
 
Location: Germany

Join Date: Jan 2009
Posts: 41
Default How to explain this scenario ?

When I was assembling the reads , I found this scenario:

TAA-CCTCCCCC-AAANTT-CAGA Consensus
TAA-CCTCCCCC-AAACTT
TAA-CCTCCCCC-AAACTTACAGA
TAA-CCT-CCCC-AAACTTACAGA
TAA-CCTCCCCCAAAACTT-CAGA
TAACCCTCCCCC-AAAATT-CAGA
TAA-CCTCCCCC-AAAATT-CAGA
TAA-CCTCCCCA-AAACTTACAGA
TAA-CCTCCCCC-AAAATT-CAGA
TAA-CCTCCCCC-AAAATT-CAGA
---A-CCTCCCCC-AAAATT-CAGA

The first line is the consensus sequence. You can find a N.
Which was caused by 5C and 5A mapped to that position.
Someone told me this was caused by the homopolymer, the
C observed at the position is likely to be one part of the homopolymer
ahead. Have you met this problem before? Do you think it is possible?
mingkunli is offline   Reply With Quote
Old 01-15-2009, 07:48 AM   #2
apfejes
Senior Member
 
Location: Oakland, California

Join Date: Feb 2008
Posts: 236
Default

You really need to give a lot more information than what you've supplied for any reasonable hypothesis to be provided. Just off the top of my head, I would say there are several plausible explanations.

Homopolymers are definitely possible - but the likelihood depends on the platform. (Oops! I didn't realize this was in the 454 forum! Homopolymers are more common with 454 than some of the other platforms, so yes, this is possible. However, I think my other comments stand; homopolymers are far from the only reason you would see the above scenario.)

If it's from a diploid organism, there could be two alleles - and one of them has a SNP.

If it's from a haploid organism, there could be paralogs, once of which has a single base difference compared to the other, while the reference genome has only one copy.

I'm sure there are many other biological explanations. Since you haven't given probability scores or any other useful information, all we can do is guess.

Good luck figuring it out.
__________________
The more you know, the more you know you don't know. —Aristotle

Last edited by apfejes; 01-15-2009 at 07:50 AM. Reason: didn't realize thiis was posted to the 454 forum!
apfejes is offline   Reply With Quote
Old 01-15-2009, 08:45 AM   #3
mingkunli
Member
 
Location: Germany

Join Date: Jan 2009
Posts: 41
Default

Hi apfejes,

Thanks for your reply. This is a pilot study on how to assemble the genome by 454 data, we found that through 454 software(runAssembly, runMapping), the consensus is too long to be true which due to the influence of the homopolymer, the result is even worse for Seqman, therefore, we write our own script to do the assebling work, until now, we haven't integrated the quality value(quality score, flow value), so we met the problem mentioned above(by 454 software, no N, but instead, these positions would have very low quality score).

Now, I am trying to figure out the algorithm of 454 softwares how they make use of the "flow value " and "quality score", could anyone give me some reference about it, seems not mentioned in the manuals.

I am kind of feeling that "quality score" is derived from "flow value" is that true?
mingkunli is offline   Reply With Quote
Old 02-19-2009, 07:49 AM   #4
bioinfosm
Senior Member
 
Location: USA

Join Date: Jan 2008
Posts: 482
Default

Mingkunli,
Any hits with your aligner of 454 for mito data? We are also looking at mt using 454 ..
bioinfosm is offline   Reply With Quote
Old 02-19-2009, 09:15 AM   #5
hlu
Member
 
Location: Branford, Connecticut

Join Date: Jan 2009
Posts: 32
Default

mingkunli

454 Titanium and most recent FLX uses quality score algorithm are based on Broad Institute paper. 454 offline toolkit also has a script called "sffrescore" to allow you to rescore the old read quality scores into new Broad Institute's one.

Here is Broad Instite paper that 454 read quality score is based on:
http://genome.cshlp.org/content/earl...7.107.abstract

Last edited by hlu; 02-19-2009 at 09:25 AM.
hlu is offline   Reply With Quote
Old 02-24-2009, 08:02 PM   #6
fusu
Junior Member
 
Location: China

Join Date: Feb 2009
Posts: 8
Default

A suggestion, to judge whether the 5th 'A' was homopolymer or SNP, you can amplify this fragment using PCR and clone the product to a T vector, then picking 10 clones to sequence using ABI3730. And I think you'll get the corrct answer.
fusu is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 10:45 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2021, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO