Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • What does strata mean in Bowtie?

    Hi All,
    Although I've read the manual of Bowtie, I can not quite understand the report option of 'strata' mean. I know -a will output all the alignments and --best will report alignments in best-worst order.
    Could somebody tell me the strata in details?
    Thanks.

  • #2
    a strata is a set of alignments with the same score i.e. all perfect matches.

    Comment


    • #3
      In the -n alignment mode, an alignment's "stratum" is defined as the number of mismatches in the "seed" region, i.e. the leftmost L bases, where L is set with the -l option. In the -v alignment mode, an alignment's stratum is defined as the total number of mismatches in the entire alignment.
      --strata option
      If many valid alignments exist and are reportable (e.g. are not disallowed via the -k option) and they fall into more than one alignment "stratum", report only those alignments that fall into the best stratum.

      Comment


      • #4
        As I understand, a strata is a set of alignments where mismatches occur at identical positions. For example, suppose there are two 1-mismatch hits. One with the mismatch at 7th position and the other at the 9th position. The two hits are in two different strata.

        Comment


        • #5
          Originally posted by lh3 View Post
          As I understand, a strata is a set of alignments where mismatches occur at identical positions. For example, suppose there are two 1-mismatch hits. One with the mismatch at 7th position and the other at the 9th position. The two hits are in two different strata.
          See, I would interpret that as being in the same strata... Because they have the same number of mismatches.

          Comment


          • #6
            Yes, you are right. Thanks for correcting me.

            Comment


            • #7
              Originally posted by lh3 View Post
              Yes, you are right. Thanks for correcting me.
              It's an honor.

              Comment


              • #8
                Thanks for all of you. Does Bowtie output alignment score somewhere? If Bowtie has alignment score for each alignment, I would like to see it rather than use --best --strata combination. In other words, how can I get all valid alignment (according to my parameters) with their scores so as to do some comparing work.

                Comment


                • #9
                  AS:i tag for bowtie alignments

                  Hi,

                  I noticed that bowtie does not report the raw alignment score, or number of mismatching bases, for alignments. Would it be easy to include it in a future version?

                  I would actually prefer this over mapq, since it depends only on the characteristics of the alignment itself, not on the collection of other alignments for that read. From raw alignment score I would be able to derive a mapping quality calibrated for a particular genome, read length, and sequencing error characteristic.

                  Thanks,

                  Henry

                  Henry Bigelow
                  Computational Biologist
                  Amgen, Inc.

                  Comment


                  • #10
                    Bowtie gives the number of mismatches, like most of other mappers.

                    With one alignment, you are probably computing blast-like E-value but not mapQ. E-value measures if the reported alignment is a random hit, but mapQ measures if the reported position is correct. To compute mapQ, you have to know the alternative hits. For most NGS applications, mapQ is more useful than E-value.

                    Comment


                    • #11
                      AS:i tag for bowtie alignments

                      Hi Heng,

                      Thanks for the ultra-fast reply. I should have read the manual more thoroughly!

                      Yes, that's true about blast e-values being a probability of random alignment. I am indeed interested in computing probability of *correct* alignment. Just to give some context here: I'm interested in improving exon and isoform quantification from RNA-Seq data in the face of homology mismapping. I'm hoping it can be improved by better estimates of mapq that take into account the set of raw alignment scores for a given fragment.

                      Also, I don't quite understand when you said 'with one alignment, you are computing E-value instead of mapq'. Since I'm using a simulated data set, even if bowtie produces just one alignment for a given read, I can compute the probability of correct alignment given alignment score (just by counting the number of correct and incorrect alignments produced for that given score) It is merely the frequentist approach to this problem, nothing more elaborate. Not sure what sort of troubles I'll run into, but I'd be interested in your thoughts.

                      Henry

                      Comment


                      • #12
                        What if there are not mismatches at all? A big fraction of reads fall in this category.

                        Comment


                        • #13
                          calibrating alignment accuracy from raw score distribution

                          For what I'm thinking, the case of zero mismatches isn't treated any differently than 1, 2, 3 or more mismatches.

                          What I'm thinking is to characterize the relationship between raw alignment score and probability of alignment correctness, by tallying a histogram of counts, with the category:

                          (top_score, 2nd_highest_score, given_score, is_correct)

                          top_score is the top alignment score achieved for a given read
                          2nd_highest_score " " " for given read
                          given_score: score achieved for given *alignment*, which would be associated with the above top_score and 2nd_highest_score for the other alignments for that read.
                          is_correct: boolean value telling whether the alignment is deemed 'correct' by some criteria.


                          The tallying procedure would be as follows:

                          1. simulate a set of reads from the genome (and perhaps even some more reads that are 'contaminating' reads from some other genome)

                          2. align reads to genome with bowtie (or other aligner), producing, for each read, a set of 0, 1, or more alignments, with accompanying alignment scores (or mismatch scores)

                          3. for each simulated read, process the group of alignments as follows:
                          a. identify the top alignment score (may or may not be unique), and the second-highest alignment score (may not exist. if it doesn't, assign a unique default value)
                          b. for each alignment among the group, determine if it is a 'correct' alignment (say, if it has > 95% correctly placed bases, for example).
                          c. from this information, tally the appropriate tuple (top_score, 2nd_highest_score, given_score, is_correct).

                          As you point out, there are a large number of alignments with zero mismatches, so this histogram as I compute it (which I haven't yet) will be very skewed.

                          I basically am following the idea in maq here, that the significant quantity of interest in judging whether an alignment is correct, is how much better its raw score is than the runner-up.

                          But I'm also curious to see if there are any surprises in the distribution.

                          Henry

                          Comment


                          • #14
                            Thanks for the explanation. So you prefer to see an alternate alignment to get the 2nd highest score. Then your strategy is similar to mapQ computed by other programs and should work. Probably I misunderstood your original proposal. Sorry.
                            Last edited by lh3; 03-20-2011, 07:16 AM.

                            Comment


                            • #15
                              Hi all,

                              So presumably Bowtie 1 DOES KNOW the alignment quality of a read, as it uses this information to report the top-scoring alignments for a read, but it just doesn't report that value. Am I right?

                              Thanks!

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Current Approaches to Protein Sequencing
                                by seqadmin


                                Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                                04-04-2024, 04:25 PM
                              • seqadmin
                                Strategies for Sequencing Challenging Samples
                                by seqadmin


                                Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                                03-22-2024, 06:39 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, 04-11-2024, 12:08 PM
                              0 responses
                              22 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-10-2024, 10:19 PM
                              0 responses
                              24 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-10-2024, 09:21 AM
                              0 responses
                              19 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-04-2024, 09:00 AM
                              0 responses
                              50 views
                              0 likes
                              Last Post seqadmin  
                              Working...
                              X