Seqanswers Leaderboard Ad

**adamdeluca** · 12-07-2010, 06:04 AM

a strata is a set of alignments with the same score i.e. all perfect matches.

**mgogol** · 12-07-2010, 07:08 AM

In the -n alignment mode, an alignment's "stratum" is defined as the number of mismatches in the "seed" region, i.e. the leftmost L bases, where L is set with the -l option. In the -v alignment mode, an alignment's stratum is defined as the total number of mismatches in the entire alignment.

--strata option

If many valid alignments exist and are reportable (e.g. are not disallowed via the -k option) and they fall into more than one alignment "stratum", report only those alignments that fall into the best stratum.

**lh3** · 12-07-2010, 08:20 AM

As I understand, a strata is a set of alignments where mismatches occur at identical positions. For example, suppose there are two 1-mismatch hits. One with the mismatch at 7th position and the other at the 9th position. The two hits are in two different strata.

**mgogol** · 12-07-2010, 08:46 AM

Originally posted by lh3 View Post

As I understand, a strata is a set of alignments where mismatches occur at identical positions. For example, suppose there are two 1-mismatch hits. One with the mismatch at 7th position and the other at the 9th position. The two hits are in two different strata.

See, I would interpret that as being in the same strata... Because they have the same number of mismatches.

**lh3** · 12-07-2010, 09:07 AM

Yes, you are right. Thanks for correcting me.

**mgogol** · 12-07-2010, 09:26 AM

Originally posted by lh3 View Post

Yes, you are right. Thanks for correcting me.

It's an honor.

**xinwu** · 12-07-2010, 05:37 PM

Thanks for all of you. Does Bowtie output alignment score somewhere? If Bowtie has alignment score for each alignment, I would like to see it rather than use --best --strata combination. In other words, how can I get all valid alignment (according to my parameters) with their scores so as to do some comparing work.

**hrbigelow** · 03-19-2011, 09:25 AM

AS:i tag for bowtie alignments

Hi,

I noticed that bowtie does not report the raw alignment score, or number of mismatching bases, for alignments. Would it be easy to include it in a future version?

I would actually prefer this over mapq, since it depends only on the characteristics of the alignment itself, not on the collection of other alignments for that read. From raw alignment score I would be able to derive a mapping quality calibrated for a particular genome, read length, and sequencing error characteristic.

Thanks,

Henry

Henry Bigelow
Computational Biologist
Amgen, Inc.

**lh3** · 03-19-2011, 09:52 AM

Bowtie gives the number of mismatches, like most of other mappers.

With one alignment, you are probably computing blast-like E-value but not mapQ. E-value measures if the reported alignment is a random hit, but mapQ measures if the reported position is correct. To compute mapQ, you have to know the alternative hits. For most NGS applications, mapQ is more useful than E-value.

**hrbigelow** · 03-19-2011, 10:19 AM

AS:i tag for bowtie alignments

Hi Heng,

Thanks for the ultra-fast reply. I should have read the manual more thoroughly!

Yes, that's true about blast e-values being a probability of random alignment. I am indeed interested in computing probability of *correct* alignment. Just to give some context here: I'm interested in improving exon and isoform quantification from RNA-Seq data in the face of homology mismapping. I'm hoping it can be improved by better estimates of mapq that take into account the set of raw alignment scores for a given fragment.

Also, I don't quite understand when you said 'with one alignment, you are computing E-value instead of mapq'. Since I'm using a simulated data set, even if bowtie produces just one alignment for a given read, I can compute the probability of correct alignment given alignment score (just by counting the number of correct and incorrect alignments produced for that given score) It is merely the frequentist approach to this problem, nothing more elaborate. Not sure what sort of troubles I'll run into, but I'd be interested in your thoughts.

Henry

**lh3** · 03-19-2011, 10:25 AM

What if there are not mismatches at all? A big fraction of reads fall in this category.

**hrbigelow** · 03-19-2011, 10:55 AM

calibrating alignment accuracy from raw score distribution

For what I'm thinking, the case of zero mismatches isn't treated any differently than 1, 2, 3 or more mismatches.

What I'm thinking is to characterize the relationship between raw alignment score and probability of alignment correctness, by tallying a histogram of counts, with the category:

(top_score, 2nd_highest_score, given_score, is_correct)

top_score is the top alignment score achieved for a given read
2nd_highest_score " " " for given read
given_score: score achieved for given *alignment*, which would be associated with the above top_score and 2nd_highest_score for the other alignments for that read.
is_correct: boolean value telling whether the alignment is deemed 'correct' by some criteria.

The tallying procedure would be as follows:

1. simulate a set of reads from the genome (and perhaps even some more reads that are 'contaminating' reads from some other genome)

2. align reads to genome with bowtie (or other aligner), producing, for each read, a set of 0, 1, or more alignments, with accompanying alignment scores (or mismatch scores)

3. for each simulated read, process the group of alignments as follows:
a. identify the top alignment score (may or may not be unique), and the second-highest alignment score (may not exist. if it doesn't, assign a unique default value)
b. for each alignment among the group, determine if it is a 'correct' alignment (say, if it has > 95% correctly placed bases, for example).
c. from this information, tally the appropriate tuple (top_score, 2nd_highest_score, given_score, is_correct).

As you point out, there are a large number of alignments with zero mismatches, so this histogram as I compute it (which I haven't yet) will be very skewed.

I basically am following the idea in maq here, that the significant quantity of interest in judging whether an alignment is correct, is how much better its raw score is than the runner-up.

But I'm also curious to see if there are any surprises in the distribution.

Henry

**lh3** · 03-19-2011, 07:51 PM

Thanks for the explanation. So you prefer to see an alternate alignment to get the 2nd highest score. Then your strategy is similar to mapQ computed by other programs and should work. Probably I misunderstood your original proposal. Sorry.

**carmeyeii** · 10-14-2012, 01:15 PM

Hi all,

So presumably Bowtie 1 DOES KNOW the alignment quality of a read, as it uses this information to report the top-scoring alignments for a read, but it just doesn't report that value. Am I right?

Thanks!

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 22 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 24 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 19 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 50 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

What does strata mean in Bowtie?

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News