Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • bowtie2 -a

    I'm making the transition from bowtie1 to bowtie2 and I was trying to replicate these results in bowtie1:

    Code:
    bowtie -a -v 2 --best e_coli --suppress 1,5,6,7 -c ATGCATCATGCGCCAT
    
    -   gi|110640213|ref|NC_008253.1|   2852852 8:T>A
    -   gi|110640213|ref|NC_008253.1|   148810  10:A>G,13:C>G
    +   gi|110640213|ref|NC_008253.1|   1093035 2:T>G,15:A>T
    -   gi|110640213|ref|NC_008253.1|   905664  6:A>G,7:G>T
    -   gi|110640213|ref|NC_008253.1|   4930433 4:G>T,6:C>G
    Now in bowtie2 I only get the best alignment, even though I'm using -a:

    Code:
    bowtie2 -x e_coli -a --end-to-end -c ATGCATCATGCGCCAT --no-hd 2> /dev/null | cut  -f2-4,10,12-
    
    16  gi|110640213|ref|NC_008253.1|   2852853 ATGGCGCATGATGCAT    AS:i:-6 XN:i:0  XM:i:1  XO:i:0  XG:i:0  NM:i:1  MD:Z:7T8    YT:Z:UU
    What am I doing wrong?

  • #2
    I think what's going on here is a mismatch issue and a difference in how the two bowtie versions deal with alignments. In your bowtie 1 example you're using -v 2 which allows up to 2 mismatches across the entire query sequence. bowtie2 is actually more similar to bowtie 1 in -n mode which specifically allows some number (up to 2) mismatches in the seed. The seed length is specified with -l. By default bowtie2 is using a seed of 22 bases and allowing 0 mismatches in that seed. I guess it's being generous by allowing you that one alignment even though that one has a mismatch. If you were to run bowtie 1 with a similar restriction (-n 0) you get no alignments at all. If you allow 1 (-n 1) then you get the same single alignment reported by bowtie2.

    Bowtie 1
    Code:
    bowtie -S -a -n 1 e_coli -c ATGCATCATGCGCCAT
    
    0	16	gi|110640213|ref|NC_008253.1|	2852853	255	16M	*	0	0	ATGGCGCATGATGCAT	IIIIIIIIIIIIIIII	XA:i:1	MD:Z:7T8	NM:i:1
    Bowtie 2
    Code:
    bowtie2 -a -N 1 -x e_coli -c ATGCATCATGCGCCAT
    
    0	16	gi|110640213|ref|NC_008253.1|	2852853	255	16M	*	0	0	ATGGCGCATGATGCAT	IIIIIIIIIIIIIIII	AS:i:-6	XN:i:0	XM:i:1	XO:i:0	XG:i:0	NM:i:1	MD:Z:7T8	YT:Z:UU
    The other thing controlling bowtie2's output is the alignment score (the value at the AS attribute in the SAM output). The maximum score for --end-to-end mode is 0 therefore most scores will be some kind of negative value. bowtie2 only reports reads that have a score better than the minimum alignment score allowed. The default minimum alignment score for end-to-end mode is defined as -0.6 + -0.6*L where L is the read length. In your case you've supplied a read length of 16 bases which makse the minimum score -10.2. By default bowtie2 assigns a -6 point alignment penalty for a mismatch at a high-quality position (all positions in your exmaple would be high quality) so two mismatches is below the minimum alignment score threshold. So by modifying the minimum allowed alignment score we can get bowtie2 to report more alignments for this sequence. For this read length if we have the score defined as -0.8 + -0.8*L we have a min of -13.6 wihch would allow two mismatches to squeek through and, I think, no gaps. Gaps get a -11 penalty.

    Code:
    bowtie2 -a -N 1 -L 8 --score-min L,-0.8,-0.8 -x e_coli -c ATGCATCATGCGCCAT
    
    0	16	gi|110640213|ref|NC_008253.1|	2852853	4	16M	*	0	0	ATGGCGCATGATGCAT	IIIIIIIIIIIIIIII	AS:i:-6	XS:i:-12	XN:i:0	XM:i:1	XO:i:0	XG:i:0	NM:i:1	MD:Z:7T8	YT:Z:UU
    0	272	gi|110640213|ref|NC_008253.1|	905665	4	16M	*	0	0	ATGGCGCATGATGCAT	IIIIIIIIIIIIIIII	AS:i:-12	XS:i:-12	XN:i:0	XM:i:2	XO:i:0	XG:i:0	NM:i:2	MD:Z:8G0A6	YT:Z:UU
    0	272	gi|110640213|ref|NC_008253.1|	4930434	4	16M	*	0	0	ATGGCGCATGATGCAT	IIIIIIIIIIIIIIII	AS:i:-12	XS:i:-12	XN:i:0	XM:i:2	XO:i:0	XG:i:0	NM:i:2	MD:Z:9C1G4	YT:Z:UU
    0	256	gi|110640213|ref|NC_008253.1|	1093036	4	16M	*	0	0	ATGCATCATGCGCCAT	IIIIIIIIIIIIIIII	AS:i:-12	XS:i:-12	XN:i:0	XM:i:2	XO:i:0	XG:i:0	NM:i:2	MD:Z:2T12A0	YT:Z:UU
    0	272	gi|110640213|ref|NC_008253.1|	148811	4	16M	*	0	0	ATGGCGCATGATGCAT	IIIIIIIIIIIIIIII	AS:i:-12	XS:i:-12	XN:i:0	XM:i:2	XO:i:0	XG:i:0	NM:i:2	MD:Z:2C2A10	YT:Z:UU
    So now bowtie2 returns the exact same 5 alignments that you got with bowtie1. Also if I run bowtie 1 with -n 2 -l 8 I get those same 5 alignments. Note that with bowtie2 if I don't change the seed length to 8 bases then only one alignment is reported regardless of the minimum score setting. I think that's because we're allowing 1 mismatch in the seed which is set to 22 by default. The fuzzy score-based thresholding only applys to bases past the seed.

    So understanding this - we could allow a little more room in the min score so that we could have 2 mismatches or 1 mismatch and 1 gap. If we do that we get even more alignments out of bowtie2 (39 alignments, actually):

    Code:
    bowtie2 -a -N 1 -L 8 --score-min L,-1,-1 -x e_coli -c ATGCATCATGCGCCAT
    
    0	16	gi|110640213|ref|NC_008253.1|	2852853	3	16M	*	0	0	ATGGCGCATGATGCAT	IIIIIIIIIIIIIIII	AS:i:-6	XS:i:-12	XN:i:0	XM:i:1	XO:i:0	XG:i:0	NM:i:1	MD:Z:7T8	YT:Z:UU
    0	272	gi|110640213|ref|NC_008253.1|	905665	3	16M	*	0	0	ATGGCGCATGATGCAT	IIIIIIIIIIIIIIII	AS:i:-12	XS:i:-12	XN:i:0	XM:i:2	XO:i:0	XG:i:0	NM:i:2	MD:Z:8G0A6	YT:Z:UU
    0	272	gi|110640213|ref|NC_008253.1|	4930434	3	16M	*	0	0	ATGGCGCATGATGCAT	IIIIIIIIIIIIIIII	AS:i:-12	XS:i:-12	XN:i:0	XM:i:2	XO:i:0	XG:i:0	NM:i:2	MD:Z:9C1G4	YT:Z:UU
    0	256	gi|110640213|ref|NC_008253.1|	1093036	3	16M	*	0	0	ATGCATCATGCGCCAT	IIIIIIIIIIIIIIII	AS:i:-12	XS:i:-12	XN:i:0	XM:i:2	XO:i:0	XG:i:0	NM:i:2	MD:Z:2T12A0	YT:Z:UU
    0	272	gi|110640213|ref|NC_008253.1|	148811	3	16M	*	0	0	ATGGCGCATGATGCAT	IIIIIIIIIIIIIIII	AS:i:-12	XS:i:-12	XN:i:0	XM:i:2	XO:i:0	XG:i:0	NM:i:2	MD:Z:2C2A10	YT:Z:UU
    0	272	gi|110640213|ref|NC_008253.1|	3768264	3	5M1D11M	*	0	0	ATGGCGCATGATGCAT	IIIIIIIIIIIIIIII	AS:i:-14	XS:i:-12	XN:i:0	XM:i:1	XO:i:1	XG:i:1	NM:i:2	MD:Z:5^A10A0	YT:Z:UU
    0	272	gi|110640213|ref|NC_008253.1|	4321479	3	5M1I10M	*	0	0	ATGGCGCATGATGCAT	IIIIIIIIIIIIIIII	AS:i:-14	XS:i:-12	XN:i:0	XM:i:1	XO:i:1	XG:i:1	NM:i:2	MD:Z:6C8	YT:Z:UU
    0	272	gi|110640213|ref|NC_008253.1|	4795428	3	6M1I9M	*	0	0	ATGGCGCATGATGCAT	IIIIIIIIIIIIIIII	AS:i:-14	XS:i:-12	XN:i:0	XM:i:1	XO:i:1	XG:i:1	NM:i:2	MD:Z:12T2	YT:Z:UU
    0	272	gi|110640213|ref|NC_008253.1|	1247402	3	7M1I8M	*	0	0	ATGGCGCATGATGCAT	IIIIIIIIIIIIIIII	AS:i:-14	XS:i:-12	XN:i:0	XM:i:1	XO:i:1	XG:i:1	NM:i:2	MD:Z:11T3	YT:Z:UU
    0	272	gi|110640213|ref|NC_008253.1|	3015503	3	4M3I9M	*	0	0	ATGGCGCATGATGCAT	IIIIIIIIIIIIIIII	AS:i:-14	XS:i:-12	XN:i:0	XM:i:0	XO:i:1	XG:i:3	NM:i:3	MD:Z:13	YT:Z:UU
    0	256	gi|110640213|ref|NC_008253.1|	309318	3	9M1I6M	*	0	0	ATGCATCATGCGCCAT	IIIIIIIIIIIIIIII	AS:i:-14	XS:i:-12	XN:i:0	XM:i:1	XO:i:1	XG:i:1	NM:i:2	MD:Z:2A12	YT:Z:UU
    0	272	gi|110640213|ref|NC_008253.1|	667516	3	7M1I8M	*	0	0	ATGGCGCATGATGCAT	IIIIIIIIIIIIIIII	AS:i:-14	XS:i:-12	XN:i:0	XM:i:1	XO:i:1	XG:i:1	NM:i:2	MD:Z:13T1	YT:Z:UU
    0	256	gi|110640213|ref|NC_008253.1|	36731	3	11M1D5M	*	0	0	ATGCATCATGCGCCAT	IIIIIIIIIIIIIIII	AS:i:-14	XS:i:-12	XN:i:0	XM:i:1	XO:i:1	XG:i:1	NM:i:2	MD:Z:6A4^A5	YT:Z:UU
    0	256	gi|110640213|ref|NC_008253.1|	1192084	3	9M1I6M	*	0	0	ATGCATCATGCGCCAT	IIIIIIIIIIIIIIII	AS:i:-14	XS:i:-12	XN:i:0	XM:i:1	XO:i:1	XG:i:1	NM:i:2	MD:Z:4T10	YT:Z:UU
    0	272	gi|110640213|ref|NC_008253.1|	1575366	3	10M1D6M	*	0	0	ATGGCGCATGATGCAT	IIIIIIIIIIIIIIII	AS:i:-14	XS:i:-12	XN:i:0	XM:i:1	XO:i:1	XG:i:1	NM:i:2	MD:Z:5T4^C6	YT:Z:UU
    0	272	gi|110640213|ref|NC_008253.1|	2158101	3	8M4I4M	*	0	0	ATGGCGCATGATGCAT	IIIIIIIIIIIIIIII	AS:i:-17	XS:i:-12	XN:i:0	XM:i:0	XO:i:1	XG:i:4	NM:i:4	MD:Z:12	YT:Z:UU
    0	272	gi|110640213|ref|NC_008253.1|	611487	3	10M2I4M	*	0	0	ATGGCGCATGATGCAT	IIIIIIIIIIIIIIII	AS:i:-17	XS:i:-12	XN:i:0	XM:i:1	XO:i:1	XG:i:2	NM:i:3	MD:Z:8G5	YT:Z:UU
    0	256	gi|110640213|ref|NC_008253.1|	2334712	3	4M4I8M	*	0	0	ATGCATCATGCGCCAT	IIIIIIIIIIIIIIII	AS:i:-17	XS:i:-12	XN:i:0	XM:i:0	XO:i:1	XG:i:4	NM:i:4	MD:Z:12	YT:Z:UU
    0	256	gi|110640213|ref|NC_008253.1|	1187105	3	4M4I8M	*	0	0	ATGCATCATGCGCCAT	IIIIIIIIIIIIIIII	AS:i:-17	XS:i:-12	XN:i:0	XM:i:0	XO:i:1	XG:i:4	NM:i:4	MD:Z:12	YT:Z:UU
    0	256	gi|110640213|ref|NC_008253.1|	329729	3	4M4I8M	*	0	0	ATGCATCATGCGCCAT	IIIIIIIIIIIIIIII	AS:i:-17	XS:i:-12	XN:i:0	XM:i:0	XO:i:1	XG:i:4	NM:i:4	MD:Z:12	YT:Z:UU
    0	256	gi|110640213|ref|NC_008253.1|	4913219	3	6M2I8M	*	0	0	ATGCATCATGCGCCAT	IIIIIIIIIIIIIIII	AS:i:-17	XS:i:-12	XN:i:0	XM:i:1	XO:i:1	XG:i:2	NM:i:3	MD:Z:3G10	YT:Z:UU
    0	256	gi|110640213|ref|NC_008253.1|	2448303	3	6M2I8M	*	0	0	ATGCATCATGCGCCAT	IIIIIIIIIIIIIIII	AS:i:-17	XS:i:-12	XN:i:0	XM:i:1	XO:i:1	XG:i:2	NM:i:3	MD:Z:3A10	YT:Z:UU
    0	256	gi|110640213|ref|NC_008253.1|	4413308	3	7M2D9M	*	0	0	ATGCATCATGCGCCAT	IIIIIIIIIIIIIIII	AS:i:-17	XS:i:-12	XN:i:0	XM:i:1	XO:i:1	XG:i:2	NM:i:3	MD:Z:5C1^GG9	YT:Z:UU
    0	256	gi|110640213|ref|NC_008253.1|	1392242	3	5M2I9M	*	0	0	ATGCATCATGCGCCAT	IIIIIIIIIIIIIIII	AS:i:-17	XS:i:-12	XN:i:0	XM:i:1	XO:i:1	XG:i:2	NM:i:3	MD:Z:2T11	YT:Z:UU
    0	256	gi|110640213|ref|NC_008253.1|	4701791	3	4M2I10M	*	0	0	ATGCATCATGCGCCAT	IIIIIIIIIIIIIIII	AS:i:-17	XS:i:-12	XN:i:0	XM:i:1	XO:i:1	XG:i:2	NM:i:3	MD:Z:7T6	YT:Z:UU
    0	272	gi|110640213|ref|NC_008253.1|	1152726	3	8M2I6M	*	0	0	ATGGCGCATGATGCAT	IIIIIIIIIIIIIIII	AS:i:-17	XS:i:-12	XN:i:0	XM:i:1	XO:i:1	XG:i:2	NM:i:3	MD:Z:9C4	YT:Z:UU
    0	272	gi|110640213|ref|NC_008253.1|	1721800	3	10M2I4M	*	0	0	ATGGCGCATGATGCAT	IIIIIIIIIIIIIIII	AS:i:-17	XS:i:-12	XN:i:0	XM:i:1	XO:i:1	XG:i:2	NM:i:3	MD:Z:3C10	YT:Z:UU
    0	272	gi|110640213|ref|NC_008253.1|	2042298	3	8M2I6M	*	0	0	ATGGCGCATGATGCAT	IIIIIIIIIIIIIIII	AS:i:-17	XS:i:-12	XN:i:0	XM:i:1	XO:i:1	XG:i:2	NM:i:3	MD:Z:12C1	YT:Z:UU
    0	272	gi|110640213|ref|NC_008253.1|	1153758	3	8M2I6M	*	0	0	ATGGCGCATGATGCAT	IIIIIIIIIIIIIIII	AS:i:-17	XS:i:-12	XN:i:0	XM:i:1	XO:i:1	XG:i:2	NM:i:3	MD:Z:13G0	YT:Z:UU
    0	272	gi|110640213|ref|NC_008253.1|	2827871	3	6M2I8M	*	0	0	ATGGCGCATGATGCAT	IIIIIIIIIIIIIIII	AS:i:-17	XS:i:-12	XN:i:0	XM:i:1	XO:i:1	XG:i:2	NM:i:3	MD:Z:13G0	YT:Z:UU
    0	256	gi|110640213|ref|NC_008253.1|	235206	3	4M2I10M	*	0	0	ATGCATCATGCGCCAT	IIIIIIIIIIIIIIII	AS:i:-17	XS:i:-12	XN:i:0	XM:i:1	XO:i:1	XG:i:2	NM:i:3	MD:Z:11T2	YT:Z:UU
    0	272	gi|110640213|ref|NC_008253.1|	1842918	3	4M4I8M	*	0	0	ATGGCGCATGATGCAT	IIIIIIIIIIIIIIII	AS:i:-17	XS:i:-12	XN:i:0	XM:i:0	XO:i:1	XG:i:4	NM:i:4	MD:Z:12	YT:Z:UU
    0	272	gi|110640213|ref|NC_008253.1|	570346	3	6M2I8M	*	0	0	ATGGCGCATGATGCAT	IIIIIIIIIIIIIIII	AS:i:-17	XS:i:-12	XN:i:0	XM:i:1	XO:i:1	XG:i:2	NM:i:3	MD:Z:0G13	YT:Z:UU
    0	256	gi|110640213|ref|NC_008253.1|	2126072	3	7M2I7M	*	0	0	ATGCATCATGCGCCAT	IIIIIIIIIIIIIIII	AS:i:-17	XS:i:-12	XN:i:0	XM:i:1	XO:i:1	XG:i:2	NM:i:3	MD:Z:13A0	YT:Z:UU
    0	272	gi|110640213|ref|NC_008253.1|	191801	3	5M4I7M	*	0	0	ATGGCGCATGATGCAT	IIIIIIIIIIIIIIII	AS:i:-17	XS:i:-12	XN:i:0	XM:i:0	XO:i:1	XG:i:4	NM:i:4	MD:Z:12	YT:Z:UU
    0	256	gi|110640213|ref|NC_008253.1|	2976960	3	6M4I6M	*	0	0	ATGCATCATGCGCCAT	IIIIIIIIIIIIIIII	AS:i:-17	XS:i:-12	XN:i:0	XM:i:0	XO:i:1	XG:i:4	NM:i:4	MD:Z:12	YT:Z:UU
    0	256	gi|110640213|ref|NC_008253.1|	920097	3	7M2D9M	*	0	0	ATGCATCATGCGCCAT	IIIIIIIIIIIIIIII	AS:i:-17	XS:i:-12	XN:i:0	XM:i:1	XO:i:1	XG:i:2	NM:i:3	MD:Z:7^AT8G0	YT:Z:UU
    0	256	gi|110640213|ref|NC_008253.1|	1522118	3	6M4I6M	*	0	0	ATGCATCATGCGCCAT	IIIIIIIIIIIIIIII	AS:i:-17	XS:i:-12	XN:i:0	XM:i:0	XO:i:1	XG:i:4	NM:i:4	MD:Z:12	YT:Z:UU
    0	256	gi|110640213|ref|NC_008253.1|	1932547	3	7M2D9M	*	0	0	ATGCATCATGCGCCAT	IIIIIIIIIIIIIIII	AS:i:-17	XS:i:-12	XN:i:0	XM:i:1	XO:i:1	XG:i:2	NM:i:3	MD:Z:7^CG2A6	YT:Z:UU
    Thanks for posting that question. It forced me to look at this issue - I learned a little more about how bowtie2 works just now. Hope this helps.
    /* Shawn Driscoll, Gene Expression Laboratory, Pfaff
    Salk Institute for Biological Studies, La Jolla, CA, USA */

    Comment


    • #3
      Wow, thank you for giving such a detailed answer. That's really useful!

      Comment


      • #4
        I didn't actually know the extent of the answer. It was useful for me to work through that. I guess when you're working with typical read lengths (50 to 100 bases) those settings aren't as important. At least it's important to know that bowtie2 is kind of like bowtie 1 in -n (and not -v) mode. On that subject, bowtie 1's method of doing what bowtie2 does with the minimum alignment score is it allows a maximum sum of base qualities at mismatches past the seed. So -e 40 would allow 1 quality 40 base mismatch or 2 quality 20 base mismatches, etc. -e is only available in -n mode. I pretty much always use the -n mode now because it allows you to enforce a strict mismatch rate for the start of your reads which tend to be of the best quality but allows some fuzzy mismatch logic for the rest of the read's bases.
        /* Shawn Driscoll, Gene Expression Laboratory, Pfaff
        Salk Institute for Biological Studies, La Jolla, CA, USA */

        Comment

        Latest Articles

        Collapse

        • seqadmin
          Essential Discoveries and Tools in Epitranscriptomics
          by seqadmin




          The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
          04-22-2024, 07:01 AM
        • seqadmin
          Current Approaches to Protein Sequencing
          by seqadmin


          Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
          04-04-2024, 04:25 PM

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by seqadmin, Today, 08:47 AM
        0 responses
        10 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-11-2024, 12:08 PM
        0 responses
        60 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-10-2024, 10:19 PM
        0 responses
        57 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-10-2024, 09:21 AM
        0 responses
        53 views
        0 likes
        Last Post seqadmin  
        Working...
        X