Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Samtools flagstat 0% properly paired

    Hi,

    I have used BWA 0.75a to map PE data of WGS to reference, and samtools flagstat to check the resulting BAM file. My pipeline includes the usual sorting, fixing malformed bams, and marking duplicates. I have run this pipeline many times before with no problems, but this one genome had presented something I haven't seen before. The flagstat results are as follows:

    36332416 + 0 in total (QC-passed reads + QC-failed reads)
    0 + 0 duplicates
    32959970 + 0 mapped (90.72%:nan%)
    36332416 + 0 paired in sequencing
    18166208 + 0 read1
    18166208 + 0 read2
    354 + 0 properly paired (0.00%:nan%)
    32959970 + 0 with itself and mate mapped
    0 + 0 singletons (0.00%:nan%)
    417710 + 0 with mate mapped to a different chr
    0 + 0 with mate mapped to a different chr (mapQ>=5)

    I'm quite confused, as I have nearly 91% of my reads mapping to my reference, but barely any properly pairing. QC analysis (using FastQC) did not show anything out of the ordinary, and the library prep gives an average fragment that we are used to seeing. Is this just literally a case of the reads having no overlap, and we should re-run this particular genome, or can anyone suggest anything else for me to try to get to the bottom of this?

    Thanks

  • #2
    "36332416 + 0 "

    there is no reads from tne second fastq file in your bam. Did you use single-end instead of paired-end mapping ?

    Comment


    • #3
      Originally posted by lindenb View Post
      "36332416 + 0 "

      there is no reads from tne second fastq file in your bam. Did you use single-end instead of paired-end mapping ?
      You're misreading that, the "+0" are for reads with flag 0x200 set.

      Edit: I should add that one likely cause of this is if the insert size is too big for the aligner to declare reads as having aligned properly paired. Have a look at some of them to see if this might be the case. Another possibility is that paired-reads became out of sync at some point (which is annoying as hell!).

      Comment


      • #4
        Originally posted by dpryan View Post
        You're misreading that, the "+0" are for reads with flag 0x200 set..
        I was talking about the 1st line

        Code:
        36332416 + 0 in total (QC-passed reads + QC-failed reads)
        wich is the same as:

        Code:
        36332416 + 0 paired in sequencing
        from the C code, the 1st line is the total number of reads for/rev with a correct QC :
        Code:
        (...)
                printf("%lld + %lld in total (QC-passed reads + QC-failed reads)\n", s->n_reads[0], s->n_reads[1]);
                printf("%lld + %lld duplicates\n", s->n_dup[0], s->n_dup[1]);
                printf("%lld + %lld mapped (%.2f%%:%.2f%%)\n", s->n_mapped[0], s->n_mapped[1], (float)s->n_mapped[0] / s->n_reads[0] * 100.0, (float)s->n_mapped[1] / s->n_reads[1] * 100.0);
                printf("%lld + %lld paired in sequencing\n", s->n_pair_all[0], s->n_pair_all[1]);
           (...)

        Comment


        • #5
          I think we're talking past each other

          If you have
          Code:
          18166208 + 0 read1
          18166208 + 0 read2
          then a second fastq file was input, the aligner treated things as paired-end, and those alignments exist in the BAM file (though, I suppose the read1 fastq file could have also been specified as the read2 file, which might produce weird results like these).

          Comment


          • #6
            Originally posted by lindenb View Post
            "36332416 + 0 "

            there is no reads from tne second fastq file in your bam. Did you use single-end instead of paired-end mapping ?
            Hi there, definitely used PE mapping, and two fastqs were inputted.

            Edit: I should add that one likely cause of this is if the insert size is too big for the aligner to declare reads as having aligned properly paired. Have a look at some of them to see if this might be the case. Another possibility is that paired-reads became out of sync at some point (which is annoying as hell!).
            Thanks for this suggestion, I will look into this now.

            Comment

            Latest Articles

            Collapse

            • seqadmin
              Essential Discoveries and Tools in Epitranscriptomics
              by seqadmin




              The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
              04-22-2024, 07:01 AM
            • seqadmin
              Current Approaches to Protein Sequencing
              by seqadmin


              Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
              04-04-2024, 04:25 PM

            ad_right_rmr

            Collapse

            News

            Collapse

            Topics Statistics Last Post
            Started by seqadmin, Today, 11:49 AM
            0 responses
            8 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, Yesterday, 08:47 AM
            0 responses
            16 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-11-2024, 12:08 PM
            0 responses
            61 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-10-2024, 10:19 PM
            0 responses
            60 views
            0 likes
            Last Post seqadmin  
            Working...
            X