Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Sam Flag 65 and 129 after BWA

    Hi,

    I am trying to run BWA with a paired end simulated test data (Illumina and 75bp long).

    Basically I generated out of a reference Database an artifical sample and I am trying to verify the results of BWA.

    Unfortuneteley I am not very sure about the flags in the Sam file, I thought that mapped reads as a pair are displayed with the flag: 2

    But in my sam file i just get the flag (65 and 12), which describes:
    65: paired read and forward
    129: paired read and reverse

    But i thought that the flag for mapped (0x0002) will be set as well? So my question is, if there is something wrong in the sample or what flags do I have to use to extract the mapped reads from paired and from single-end data.

    This is an ouput from sampe:
    :
    @SQ SN:gi|157704448|ref|AC_000133.1| LN:219475005
    @PG ID:bwa PN:bwa VN:0.5.9-r16
    testSample_0_1 65 gi|157704448|ref|AC_000133.1| 1 37 75M = 151 150 ATTGACAAGGGGAGGGAAAAGAGGAACAGAAATTCTTTTCTAT$
    testSample_0_2 129 gi|157704448|ref|AC_000133.1| 151 37 75M = 1 -150 ATAACTTGGAAGCTTCCTTTAAAAGGAACATCAGGAGGTGATT$


    Greetings and many thanks,
    TOmoi

  • #2
    No, you are misreading the flags. 65 = 64 +1, which means it's the first read, and it's paired. 129 = 128 + 1, meaning it's the second read, and it's paired. Both are in the forward direction. That's why they aren't properly paired. You can see that for yourself if you blat them.

    The magic numbers for flags for properly paired reads are 83,99,147,163

    Comment


    • #3
      Thank you very much ! this explains it why it didnt work.

      Well, what does it mean: the second read? I though to be honest, that the second is read is meant as the reverse?!

      and do you also have magic numbers for mate reads? cause, I want to extract reads, were one strand is mapped and the other is not.

      thanks in advance

      Comment


      • #4
        You seem very confused.

        2 doesn't mean "mapped".
        It means "mapped in the proper pair". That means one forward, one reverse, with the distance between them being around the average insert size as comapred to the other reads in the project.

        DNA fragments don't know which direction you are calling forward, and which you are calling reverse. The adaptors just go on whatever end they can. So you can't expect all of read one to run in the direction that we by convention call "forward".

        If you made your fastqs as it looks like you did, with those two reads, both in the same direction, and made it look like they were paired by putting one each in a different fastq, then bwa did exactly as its supposed to. The reason it looks strange is because that would be a very strange pair of reads in a real experiment. Rev-comp one of them, and try again, and you will get better results.

        I'm not sure how bwa handles mate reads, I've never tried it. It might fail to flag them as properly paired, because they point out, instead of in, like paired ends. I suppose revcomping both fastqs (and reversing the quality strings) might allow bwa to flag them as properly paired.

        If you want all the reads where one end mapped, and one end didn't, you want all the reads with a 4 or an 8, but not both. 4 means "Did not map". 8 means "mate didn't map". Samtools view can filter like that.

        Comment

        Latest Articles

        Collapse

        • seqadmin
          Essential Discoveries and Tools in Epitranscriptomics
          by seqadmin




          The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
          04-22-2024, 07:01 AM
        • seqadmin
          Current Approaches to Protein Sequencing
          by seqadmin


          Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
          04-04-2024, 04:25 PM

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by seqadmin, 04-11-2024, 12:08 PM
        0 responses
        59 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-10-2024, 10:19 PM
        0 responses
        57 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-10-2024, 09:21 AM
        0 responses
        51 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-04-2024, 09:00 AM
        0 responses
        55 views
        0 likes
        Last Post seqadmin  
        Working...
        X