After aligning paired-end 100bp reads to a reference genome, I am getting very low properly paired percentage:
369208441 0 total (QC-passed reads + QC-failed reads)
8985531 0 secondary
289733341 0 mapped
78.47% N/A mapped %
360222910 0 paired in sequencing
180111455 0 read1
180111455 0 read2
1393338 0 properly paired
0.39% N/A properly paired %
280747810 0 with itself and mate mapped
0 0 singletons
0.00% N/A singletons %
39590468 0 with mate mapped to a different chr
0 0 with mate mapped to a different chr (mapQ>=5)
I followed GATK best practices to align paired-end short-read data to a reference genome. I downloaded the short-read data from NCBI SRA into fastq files using SRA toolkit's fastq-dump, converted the fastq files into unmapped bam using Picard FastqToSam, and marked adapters using Picard MarkIlluminaAdapters. I then piped Picard SamToFastq, bwa mem, and Picard MergeBamAlignment. To get stats on the alignment, I used samtools flagstat. For several of my samples, the alignment went great (90% mapped, 80% properly paired). However, for a couple of my samples, the properly paired percentage was well below 1%. I'm wondering how I could have a normal amount of reads mapping (~78%) but have only .39% of those reads properly paired.
I have double-checked that my fastq files from fastq-dump have identical read counts, and that they are properly interleaved after Picard FastqToSam.
369208441 0 total (QC-passed reads + QC-failed reads)
8985531 0 secondary
289733341 0 mapped
78.47% N/A mapped %
360222910 0 paired in sequencing
180111455 0 read1
180111455 0 read2
1393338 0 properly paired
0.39% N/A properly paired %
280747810 0 with itself and mate mapped
0 0 singletons
0.00% N/A singletons %
39590468 0 with mate mapped to a different chr
0 0 with mate mapped to a different chr (mapQ>=5)
I followed GATK best practices to align paired-end short-read data to a reference genome. I downloaded the short-read data from NCBI SRA into fastq files using SRA toolkit's fastq-dump, converted the fastq files into unmapped bam using Picard FastqToSam, and marked adapters using Picard MarkIlluminaAdapters. I then piped Picard SamToFastq, bwa mem, and Picard MergeBamAlignment. To get stats on the alignment, I used samtools flagstat. For several of my samples, the alignment went great (90% mapped, 80% properly paired). However, for a couple of my samples, the properly paired percentage was well below 1%. I'm wondering how I could have a normal amount of reads mapping (~78%) but have only .39% of those reads properly paired.
I have double-checked that my fastq files from fastq-dump have identical read counts, and that they are properly interleaved after Picard FastqToSam.
Comment