comparing results from two different reference genomes

BAJ

Member

Join Date: Nov 2008

Posts: 15
- Share
- Tweet
#1

comparing results from two different reference genomes

02-23-2009, 01:12 AM

we have a Solexa experiment that seems to be contaminated with a different genome than the one we were originally aiming at. The genomes have very different sizes (mouse vs. pombe) and if I understand correctly the quality scores from the fastq output correctly they are dependent on the size of the reference genome.
In our particular example it seems to me that the smaller genome will always get lower scores (due to the smaller reference genome). Is there a way to account for that and make the quality scores comparable?

To clarify a bit my confusion:
I got a Gerald output in s_1_sequence.txt with a reference genome to pombe that starts like this:
@PF2:1:1:1644:1100
ATGAATTTCAGCCTCTGGTCAGGCAGGGTTCCTTTT
+PF2:1:1:1644:1100
OOOOOOOOPOKOOPPOKKOOOOKOOEKKOOGGGGGA
@PF2:1:1:1702:1050
ACCAAGCGCAAATTTACGATTTAATTAGTATTTATA
+PF2:1:1:1702:1050
OPOOOOPOOOOOOOOOOOOOOOOOOOOHOOCHAHHE
@PF2:1:1:1532:1901
TTCAAATATTCCTGATCCAATGACAAGTTGAACCGT

And I get another file for a mouse genome as reference:
@PF2:1:1:1644:1100
ATGAATTTCAGCCTCTGGTCAGGCAGGGTTCCTTTT
+PF2:1:1:1644:1100
VVVVVVVVVVOVVVVVOOVVVVMVVCMNVVQQRRRE
@PF2:1:1:1702:1050
ACCAAGCGCAAATTTACGATTTAATTAGTATTTATA
+PF2:1:1:1702:1050
VVVVVVVVVVVVVVVVVVVVVVVVVVVIVVGRCRRP
@PF2:1:1:1532:1901
TTCAAATATTCCTGATCCAATGACAAGTTGAACCGT

clearly the quality scores are different.
Which makes me believe that not only peak information but also alignment information is used. Peak information is used because I see for the same sequence different quality scores. There reference genome is used for the calculation of the quality score because for the exact same clusters different scores are being obtained with different reference genomes.
Now the questions: How do I make the values comparable for different reference genomes. I want to identify sequences that align better to one reference genome compared to another one in order to get some understanding about the possible contamination.

Any comment is appreciated.

Thanks,

Bernd

Last edited by BAJ; 02-23-2009, 08:52 AM.
Tags: None
BAJ

Member

Join Date: Nov 2008

Posts: 15
- Share
- Tweet
#2

02-24-2009, 12:25 AM

I just heard back from techsupport at illumina that this is a property of pipeline 1.0 whereas pipeline 1.3.2 is independent of the alignment.
Comment
bioinfosm

Senior Member

Join Date: Jan 2008

Posts: 482
- Share
- Tweet
#3

02-24-2009, 08:38 AM

Ohh, so thats changed in 1.3
Yes this was the case as Illumina updated its quality values based on Gerald alignments. If you check the quality values in Bustard folder, they should match irrespective of reference

--
bioinfosm
Comment

Previous template Next

Current Approaches to Protein Sequencing

by seqadmin

Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
- Channel: Articles
04-04-2024, 04:25 PM
Strategies for Sequencing Challenging Samples

by seqadmin

Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
- Channel: Articles
03-22-2024, 06:39 AM

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 31 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 32 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 28 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 53 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

comparing results from two different reference genomes

Comment

Comment

Latest Articles

ad_right_rmr

News