Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Bowtie2 - Concordantly tags.

    HI All.

    BOWTIE2 OUTPUT:
    I have a set of 35 paired sequences.
    25 (71.43%) aligned concordantly 0 times.
    1 (2.86%) aligned concordantly 1 time.
    9 (25.71%) aligned concordantly >1 times.

    When I look in the SAM output manually, at the lines with the 'CP' tag, I cant find how to see what paired-end alignments has aligned concordantly 1 time, and which paired-end alignments did align >1 times. Were do I see that? I don't spot any differences.

    Thanks in advance,

  • #2
    This is going to be a guess but looking at the XS tag should report "... Alignment score for the best-scoring alignment found other than the alignment reported. ... Only present if the SAM record is for an aligned read and more than one alignment was found for the read." But I would have to try some alignments out to be certain. Failing that I suspect that field #5 - MAPQ -- would be lower for the '>1 times' alignments.

    Comment


    • #3
      I tried looking at the XS tag. But all 10 pairs showed XS taggs... I'll look at t he MAPQ now.

      Comment


      • #4
        Originally posted by Coryza View Post
        I tried looking at the XS tag. But all 10 pairs showed XS taggs... I'll look at t he MAPQ now.
        Can't figure it out. Anyone else has some ideas?

        Comment


        • #5
          Why don't you post two of the '>1 times' reads plus the '1 time' reads. Maybe we can eyeball something. In the mean time I think I will set up a test to replicate your results. That will give me something to look at instead of guessing.

          Comment


          • #6
            In my test the MAPQs are definitely different. The first two read pairs are '>1 times' while the third is '1 time'. My little test does not, of course, indicate that your results will show the same pattern but if you could just post the MAPQs of all of your results then it would be interesting to see that.

            Code:
            1R1     99      test_seq        4       1       20M     =       214     230     ATATGCTATCGCGATACTAG    IIIIIIII
            IIIIIIIIIIII    AS:i:0  XS:i:0  XN:i:0  XM:i:0  XO:i:0  XG:i:0  NM:i:0  MD:Z:20 YS:i:0  YT:Z:CP
            1R2     147     test_seq        214     1       20M     =       4       -230    ACGACTACTGCGTAATCTCG    IIIIIIII
            IIIIIIIIIIII    AS:i:0  XS:i:0  XN:i:0  XM:i:0  XO:i:0  XG:i:0  NM:i:0  MD:Z:20 YS:i:0  YT:Z:CP
            2R1     99      test_seq        16      1       20M     =       18      22      GATACTAGACGACTACTGCG    IIIIIIII
            IIIIIIIIIIII    AS:i:0  XS:i:0  XN:i:0  XM:i:0  XO:i:0  XG:i:0  NM:i:0  MD:Z:20 YS:i:0  YT:Z:CP
            2R2     147     test_seq        18      1       20M     =       16      -22     TACTAGACGACTACTGCGTA    IIIIIIII
            IIIIIIIIIIII    AS:i:0  XS:i:0  XN:i:0  XM:i:0  XO:i:0  XG:i:0  NM:i:0  MD:Z:20 YS:i:0  YT:Z:CP
            3R1     99      test_seq        104     42      20M     =       193     109     GTCTCTGAGCGTCATCATCG    IIIIIIII
            IIIIIIIIIIII    AS:i:0  XN:i:0  XM:i:0  XO:i:0  XG:i:0  NM:i:0  MD:Z:20 YS:i:0  YT:Z:CP
            3R2     147     test_seq        193     42      20M     =       104     -109    GATATGCTATCGCGATACTA    IIIIIIII
            IIIIIIIIIIII    AS:i:0  XS:i:0  XN:i:0  XM:i:0  XO:i:0  XG:i:0  NM:i:0  MD:Z:20 YS:i:0  YT:Z:CP

            Comment


            • #7
              1 (2.86%) aligned concordantly exactly 1 time
              9 (25.71%) aligned concordantly >1 times

              Code:
              R1a AS:i:53	XS:i:54	XN:i:0	XM:i:7	XO:i:0	XG:i:0	NM:i:7	MD:Z:5C1T1T1C15C3T1C10	YS:i:83	YT:Z:CP
              R1b AS:i:83	XS:i:57	XN:i:0	XM:i:5	XO:i:0	XG:i:0	NM:i:5	MD:Z:7C3A3C5C21C10	YS:i:53	YT:Z:CP
              R2a AS:i:98	XS:i:99	XN:i:0	XM:i:2	XO:i:0	XG:i:0	NM:i:2	MD:Z:2T3T47	YS:i:170	YT:Z:CP
              R2b AS:i:170	XS:i:161	XN:i:0	XM:i:0	XO:i:0	XG:i:0	NM:i:0	MD:Z:85	YS:i:98	YT:Z:CP
              R3a AS:i:59	XS:i:53	XN:i:0	XM:i:1	XO:i:0	XG:i:0	NM:i:1	MD:Z:11G20	YS:i:59	YT:Z:CP
              R3b AS:i:59	XS:i:59	XN:i:0	XM:i:1	XO:i:0	XG:i:0	NM:i:1	MD:Z:11G20	YS:i:59	YT:Z:CP
              R4a AS:i:56	XS:i:52	XN:i:0	XM:i:2	XO:i:0	XG:i:0	NM:i:2	MD:Z:14G15A2	YS:i:66	YT:Z:CP
              R4b AS:i:66	XS:i:56	XN:i:0	XM:i:0	XO:i:0	XG:i:0	NM:i:0	MD:Z:33	YS:i:56	YT:Z:CP
              R5a AS:i:73	XS:i:73	XN:i:0	XM:i:3	XO:i:0	XG:i:0	NM:i:3	MD:Z:29A7A3A2	YS:i:241	YT:Z:CP
              R5b AS:i:241	XS:i:241	XN:i:0	XM:i:5	XO:i:0	XG:i:0	NM:i:5	MD:Z:8A1A0A7A3A109	YS:i:73	YT:Z:CP
              R6a AS:i:68	XS:i:69	XN:i:0	XM:i:6	XO:i:0	XG:i:0	NM:i:6	MD:Z:6T4T5T0T5T0T23	YS:i:167	YT:Z:CP
              R6b AS:i:167	XS:i:209	XN:i:0	XM:i:9	XO:i:0	XG:i:0	NM:i:9	MD:Z:4T8T9T16T14T5T18T6T13T4	YS:i:68	YT:Z:CP
              R7a AS:i:66	XS:i:66	XN:i:0	XM:i:0	XO:i:0	XG:i:0	NM:i:0	MD:Z:33	YS:i:106	YT:Z:CP
              R7b AS:i:106	XS:i:104	XN:i:0	XM:i:0	XO:i:0	XG:i:0	NM:i:0	MD:Z:53	YS:i:66	YT:Z:CP
              R8a AS:i:69	XS:i:50	XN:i:0	XM:i:3	XO:i:1	XG:i:1	NM:i:4	MD:Z:5T19C4C15	YS:i:64	YT:Z:CP
              R8b AS:i:64	XS:i:59	XN:i:0	XM:i:6	XO:i:0	XG:i:0	NM:i:6	MD:Z:3C6C5C4C4C4C15	YS:i:69	YT:Z:CP
              R9a AS:i:67	XS:i:59	XN:i:0	XM:i:1	XO:i:0	XG:i:0	NM:i:1	MD:Z:29A6	YS:i:61	YT:Z:CP
              R9b AS:i:61	XS:i:61	XN:i:0	XM:i:3	XO:i:0	XG:i:0	NM:i:3	MD:Z:22G2G5G6	YS:i:67	YT:Z:CP
              R10a AS:i:86	XS:i:86	XN:i:0	XM:i:0	XO:i:0	XG:i:0	NM:i:0	MD:Z:43	YS:i:158	YT:Z:CP
              R10b AS:i:158	XS:i:76	XN:i:0	XM:i:0	XO:i:0	XG:i:0	NM:i:0	MD:Z:79	YS:i:86	YT:Z:CP
              MAPQ's:
              R1a 22
              R1b 22
              R2a 0
              R2b 0
              R3a 2
              R3b 2
              R4a 2
              R4b 2
              R5a 2
              R5b 2
              R6a 2
              R6b 2
              R7a 2
              R7b 2
              R8a 2
              R8b 2
              R9a 2
              R9b 2
              R10a 2
              R10b 2

              But in that case, what would be the threshold for the MAPQ values to distinguish between >1 and ==1
              Last edited by Coryza; 02-13-2014, 12:14 AM.

              Comment


              • #8
                I've succesfully divided the CP's from 227201 sequencing pairs compared to the bouwtie2 output.

                ... [Sequencing] mRNA Reference Sequences: 2000123
                ... [Sequencing] cDNA Sequences Sequenced: 454402
                ... [Sequencing] cDNA Pairs Sequenced: 227201 (100.0%)
                ... [Mapping] Pairs mapped unsuccessfully: 208967 (91.9745071545%)
                ... [Mapping] Pairs mapped discordantly: 1679 (0.738993226262%)
                ... [Mapping] Pairs mapped concordantly: 16555 (7.28649961928%)
                ... ... Pairs mapped concordantly once: 2959 (1.3023710283%)
                ... ... Pairs mapped concordantly >1: 13596 (5.98412859098%)

                MAPQ>21 = CP once
                MAPQ<=21 = CP >1

                Comment


                • #9
                  New question... Bowtie2 says that there are 2 sequences (which were excluded as one of the sequences as pair matching), that align only once. However when I look it up manually, i find 3 sequences

                  " 24 pairs aligned 0 times concordantly or discordantly; of these:
                  48 mates make up the pairs; of these:
                  44 (91.67%) aligned 0 times
                  2 (4.17%) aligned exactly 1 time - I manually find 3 sequences
                  2 (4.17%) aligned >1 times"

                  Code:
                  M01452:20:000000000-A6JAA:1:1113:29093:14236	153	BC095405	2846	22	99S34M18S	=	2846	0	[removed seq]	AS:i:58	XN:i:0	XM:i:2	XO:i:0	XG:i:0	NM:i:2	MD:Z:2T3A27	YT:Z:UP
                  M01452:20:000000000-A6JAA:1:1113:26537:22338	81	BC011718	1561	22	99S51M	=	1586	0	[removed seq]	AS:i:53	XN:i:0	XM:i:9	XO:i:0	XG:i:0	NM:i:9	MD:Z:4A6A2A3C1C1A2T0T0T23	YT:Z:UP
                  M01452:20:000000000-A6JAA:1:1113:26537:22338	161	BC011718	1586	44	15S37M99S	=	1561	0	[removed seq]	AS:i:69	XN:i:0	XM:i:1	XO:i:0	XG:i:0	NM:i:1	MD:Z:23T13	YT:Z:UP
                  Last edited by Coryza; 02-13-2014, 04:55 AM.

                  Comment


                  • #10
                    I can see two sequences. The last two are same pairs.

                    Comment


                    • #11
                      Originally posted by TiborNagy View Post
                      I can see two sequences. The last two are same pairs.
                      The last 2 were original 1 pair indeed, but aren't matched as a pair, hence are tried to map apart. right?

                      Comment


                      • #12
                        Perhaps the MAPQ threshold is higher for declaring "mapped only 1 time" for singletons? Presumably there were 4 mapping reads, so what was the MAPQ of the fourth?

                        It would be really nice if things like this were actually documented somewhere

                        Comment


                        • #13
                          Hereby the "other data (which should be a pair)" of the R1 read of the first sequence, which was not mapped.

                          Code:
                          M01452:20:000000000-A6JAA:1:1113:29093:14236	69	BC095405	2846	0	*	=	2846	0	[removed seq]	YT:Z:UP

                          Comment


                          • #14
                            That one wouldn't (or at least shouldn't) have been counted in any case since it's marked as unmapped. Just do a "samtools view -F 0x4 foo.bam" and post the 4 that result.

                            Comment


                            • #15
                              Originally posted by dpryan View Post
                              That one wouldn't (or at least shouldn't) have been counted in any case since it's marked as unmapped. Just do a "samtools view -F 0x4 foo.bam" and post the 4 that result.
                              That one isn't counted. Its just that from the above 3, 2 have been counted and 1* not...
                              Last edited by Coryza; 02-13-2014, 06:59 AM.

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Current Approaches to Protein Sequencing
                                by seqadmin


                                Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                                04-04-2024, 04:25 PM
                              • seqadmin
                                Strategies for Sequencing Challenging Samples
                                by seqadmin


                                Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                                03-22-2024, 06:39 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, 04-11-2024, 12:08 PM
                              0 responses
                              23 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-10-2024, 10:19 PM
                              0 responses
                              24 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-10-2024, 09:21 AM
                              0 responses
                              21 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-04-2024, 09:00 AM
                              0 responses
                              52 views
                              0 likes
                              Last Post seqadmin  
                              Working...
                              X