SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
comparing results by cuffdiff, edgeR, DESeq PFS Bioinformatics 5 03-12-2014 04:01 AM
Convert 1000-Genomes-proje BAM to FASTA (aligned to reference, grouped by chromosome) ce.log Bioinformatics 17 01-14-2014 12:35 AM
Where can I find FASTQ files along with reference genomes for various species? gvivek Bioinformatics 1 09-09-2011 04:30 AM
Reference-guided cDNA assembly using related genomes ohofmann Bioinformatics 0 02-08-2011 10:20 AM
Key Reference Whole Genomes now available for NextGENe® Software SoftGenetics Vendor Forum 0 08-25-2009 07:14 AM

Reply
 
Thread Tools
Old 02-23-2009, 01:12 AM   #1
BAJ
Member
 
Location: Paris

Join Date: Nov 2008
Posts: 15
Default comparing results from two different reference genomes

we have a Solexa experiment that seems to be contaminated with a different genome than the one we were originally aiming at. The genomes have very different sizes (mouse vs. pombe) and if I understand correctly the quality scores from the fastq output correctly they are dependent on the size of the reference genome.
In our particular example it seems to me that the smaller genome will always get lower scores (due to the smaller reference genome). Is there a way to account for that and make the quality scores comparable?

To clarify a bit my confusion:
I got a Gerald output in s_1_sequence.txt with a reference genome to pombe that starts like this:
@PF2:1:1:1644:1100
ATGAATTTCAGCCTCTGGTCAGGCAGGGTTCCTTTT
+PF2:1:1:1644:1100
OOOOOOOOPOKOOPPOKKOOOOKOOEKKOOGGGGGA
@PF2:1:1:1702:1050
ACCAAGCGCAAATTTACGATTTAATTAGTATTTATA
+PF2:1:1:1702:1050
OPOOOOPOOOOOOOOOOOOOOOOOOOOHOOCHAHHE
@PF2:1:1:1532:1901
TTCAAATATTCCTGATCCAATGACAAGTTGAACCGT


And I get another file for a mouse genome as reference:
@PF2:1:1:1644:1100
ATGAATTTCAGCCTCTGGTCAGGCAGGGTTCCTTTT
+PF2:1:1:1644:1100
VVVVVVVVVVOVVVVVOOVVVVMVVCMNVVQQRRRE
@PF2:1:1:1702:1050
ACCAAGCGCAAATTTACGATTTAATTAGTATTTATA
+PF2:1:1:1702:1050
VVVVVVVVVVVVVVVVVVVVVVVVVVVIVVGRCRRP
@PF2:1:1:1532:1901
TTCAAATATTCCTGATCCAATGACAAGTTGAACCGT


clearly the quality scores are different.
Which makes me believe that not only peak information but also alignment information is used. Peak information is used because I see for the same sequence different quality scores. There reference genome is used for the calculation of the quality score because for the exact same clusters different scores are being obtained with different reference genomes.
Now the questions: How do I make the values comparable for different reference genomes. I want to identify sequences that align better to one reference genome compared to another one in order to get some understanding about the possible contamination.


Any comment is appreciated.

Thanks,

Bernd

Last edited by BAJ; 02-23-2009 at 08:52 AM.
BAJ is offline   Reply With Quote
Old 02-24-2009, 12:25 AM   #2
BAJ
Member
 
Location: Paris

Join Date: Nov 2008
Posts: 15
Default

I just heard back from techsupport at illumina that this is a property of pipeline 1.0 whereas pipeline 1.3.2 is independent of the alignment.
BAJ is offline   Reply With Quote
Old 02-24-2009, 08:38 AM   #3
bioinfosm
Senior Member
 
Location: USA

Join Date: Jan 2008
Posts: 482
Default

Ohh, so thats changed in 1.3
Yes this was the case as Illumina updated its quality values based on Gerald alignments. If you check the quality values in Bustard folder, they should match irrespective of reference
bioinfosm is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 08:31 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO