I have to do some filtering of reads in BAM alignments based on mapping quality but also to remove some reads that overlap certain regions of the reference sequence. The filter steps are working fine but I'm having trouble with the downstream processing.
The input is paired-end reads but after one of those filtering steps, I often have only one single read from a pair left because the mate was filtered out. I would then like to modify the flags to correctly reflect the fact that this is a singleton without a mate.
I thought this is exactly what the fixmate command in samtools is supposed to do but it doesn't seem to do anything for me.
I name-sort the BAM files and then run
But I can see absolutely no change to the singelton reads at all.
Here is an example, this is the tail of the name sorted BAM file before and after running through fixmate. The last read in the file is a singleton now that used to have a mate. Running fixmate has changed nothing and its flag still shows it as a paired read.
name sorted BAM before fixmate
After fixmate
The manual for samtools is a bit short on details about what exactly this tools is supposed to do other than to "fill in mate-related flags". I guess I'm not sure what that means? Can anybody share some light on this or recommend a different way of doing this?
EDIT
======================
Just noticed that fixmate has actually made a change but it only appears to have added some optional tags int the last column of the sam file. I had expected it to work on the FLAG field but maybe that's not what it is meant to do at all(?)
The input is paired-end reads but after one of those filtering steps, I often have only one single read from a pair left because the mate was filtered out. I would then like to modify the flags to correctly reflect the fact that this is a singleton without a mate.
I thought this is exactly what the fixmate command in samtools is supposed to do but it doesn't seem to do anything for me.
I name-sort the BAM files and then run
Code:
samtools fixmat in.namesorted.bam out.bam
Here is an example, this is the tail of the name sorted BAM file before and after running through fixmate. The last read in the file is a singleton now that used to have a mate. Running fixmate has changed nothing and its flag still shows it as a paired read.
name sorted BAM before fixmate
Code:
read9955 99 100001 1907 24 150M = 2207 450 CCATATGATGTTCCAGATTATGCATATCCATATGATGTTCCAGATTATGCATAAGGGCCCCGTTTTTCTTACTTATATATTTATACCAATTGATTGTATTTATAACTGTAAAAATGTGTATGTTGTGTGCATATTTTTTTTTGTGCATGC IFHDGFDHEIEFGFFHDGEIFGDFEFDDIIDGDHIDIEIEEEIIEIEDHDDIEIGEFIEIFGGGIDHIDGHDEDIDEHGIDEFGEFIEGIIIFFGHGHFGEEHFEDIIIHDEDEIGGFGHIIIGEEGGIDEDHEDFDEGGGEGDFHGFFE NM:i:0 AS:i:150 read9955 147 100001 2207 22 150M = 1907 -450 ATGGAACATAATATGTTTGAAACAATAAGACAAAATTATTATTATTATTATTATTTTTACTGTTATAATTATGTTGTCTCTTCAATGATTCATAAATAGTTGGACTTGATTTTTAAAATGTTTATAATATGATTAGCATAGTTAAATAAA HGFIIHGEFDEHDDIDEIIDIHDHHFHDEHHIFGEIEHDDGHFIHHEEIGIFHEEEFFFDFGEEIGDGDIDGGIEHGEHGGEEIEDIDGFIDEGDEIDEHFGEHEEGFHGDFDFEFDHGEEIHDGHDGGEEFIGFHHFHHHHHEFEIHGE NM:i:0 AS:i:150 read10095 99 100001 1891 24 150M = 2191 450 TCCAGATTATGCATATCCATATGATGTTCCAGATTATGCATATCCATATGATGTTCCAGATTATGCATAAGGGCCCCGTTTTTCTTACTTATATATTTATACCAATTGATTGTATTTATAACTGTAAAAATGTGTATGTTGTGTGCATAT IHIHDIIEGGHGDDEEEHHFHDIIHFDEEGFGFGDFHDGDDFHFGIHGGDEEIGHIIDFIFHHIGDDHFIIIFHGEDDIDGFFFDGIDEEHFFGDFGGDIDEDIHIGDDEFDEDDHDHHHFDFFDFGHFFFIGIFFIEGDGHHFHGDDGH NM:i:0 AS:i:150 read10095 147 100001 2191 22 150M = 1891 -450 AAAAATTTAGATATTTATGGAACATAATATGTTTGAAACAATAAGACAAAATTATTATTATTATTATTATTTTTACTGTTATAATTATGTTGTCTCTTCAATGATTCATAAATAGTTGGACTTGATTTTTAAAATGTTTATAATATGATT IGGHHIIHGEGFHFFHIHHHGGFEIFEDHHEIDGHIGFDDDGFIEGEDEGHHDIIEHEFHDIIDDHDHIEDHGEHFIIHDHHIEHEEHHHGFHGGFGGIIFDFGIIGDFHEDHGDDDFHFFFGDIDIIGFEFGHEEHFFHGEEEDDIIDF NM:i:0 AS:i:150 read10512 147 100001 2171 22 150M = 1871 -450 TATGAATTTATGACCATATTAAAAATTTAGATATTTATGGAACATAATATGTTTGAAACAATAAGACAAAATTATTATTATTATTATTATTTTTACTGTTATAATTATGTTGTCTCTTCAATGATTCATAAATAGTTGGACTTGATTTTT IDEGHFDIHIEEGGEDIDGGDHDHIDFDDHDHGGDGDEGDHEIHHDDEEGIEFFGGGIDHHHDFGEDEHGGIDHFFEGGIHFEGIIIGEFDFIFEEIHIHDGDHEEFEDEHIIDDEHDIIDDGDDDDHGFEDIGDEFFEGDEDIDDEIDG NM:i:0 AS:i:150
Code:
read9955 99 100001 1907 24 150M = 2207 450 CCATATGATGTTCCAGATTATGCATATCCATATGATGTTCCAGATTATGCATAAGGGCCCCGTTTTTCTTACTTATATATTTATACCAATTGATTGTATTTATAACTGTAAAAATGTGTATGTTGTGTGCATATTTTTTTTTGTGCATGC IFHDGFDHEIEFGFFHDGEIFGDFEFDDIIDGDHIDIEIEEEIIEIEDHDDIEIGEFIEIFGGGIDHIDGHDEDIDEHGIDEFGEFIEGIIIFFGHGHFGEEHFEDIIIHDEDEIGGFGHIIIGEEGGIDEDHEDFDEGGGEGDFHGFFE NM:i:0 AS:i:150 CT:Z:1F150M150T2R150M read9955 147 100001 2207 22 150M = 1907 -450 ATGGAACATAATATGTTTGAAACAATAAGACAAAATTATTATTATTATTATTATTTTTACTGTTATAATTATGTTGTCTCTTCAATGATTCATAAATAGTTGGACTTGATTTTTAAAATGTTTATAATATGATTAGCATAGTTAAATAAA HGFIIHGEFDEHDDIDEIIDIHDHHFHDEHHIFGEIEHDDGHFIHHEEIGIFHEEEFFFDFGEEIGDGDIDGGIEHGEHGGEEIEDIDGFIDEGDEIDEHFGEHEEGFHGDFDFEFDHGEEIHDGHDGGEEFIGFHHFHHHHHEFEIHGE NM:i:0 AS:i:150 read10095 99 100001 1891 24 150M = 2191 450 TCCAGATTATGCATATCCATATGATGTTCCAGATTATGCATATCCATATGATGTTCCAGATTATGCATAAGGGCCCCGTTTTTCTTACTTATATATTTATACCAATTGATTGTATTTATAACTGTAAAAATGTGTATGTTGTGTGCATAT IHIHDIIEGGHGDDEEEHHFHDIIHFDEEGFGFGDFHDGDDFHFGIHGGDEEIGHIIDFIFHHIGDDHFIIIFHGEDDIDGFFFDGIDEEHFFGDFGGDIDEDIHIGDDEFDEDDHDHHHFDFFDFGHFFFIGIFFIEGDGHHFHGDDGH NM:i:0 AS:i:150 CT:Z:1F150M150T2R150M read10095 147 100001 2191 22 150M = 1891 -450 AAAAATTTAGATATTTATGGAACATAATATGTTTGAAACAATAAGACAAAATTATTATTATTATTATTATTTTTACTGTTATAATTATGTTGTCTCTTCAATGATTCATAAATAGTTGGACTTGATTTTTAAAATGTTTATAATATGATT IGGHHIIHGEGFHFFHIHHHGGFEIFEDHHEIDGHIGFDDDGFIEGEDEGHHDIIEHEFHDIIDDHDHIEDHGEHFIIHDHHIEHEEHHHGFHGGFGGIIFDFGIIGDFHEDHGDDDFHFFFGDIDIIGFEFGHEEHFFHGEEEDDIIDF NM:i:0 AS:i:150 read10512 147 100001 2171 22 150M = 1871 -450 TATGAATTTATGACCATATTAAAAATTTAGATATTTATGGAACATAATATGTTTGAAACAATAAGACAAAATTATTATTATTATTATTATTTTTACTGTTATAATTATGTTGTCTCTTCAATGATTCATAAATAGTTGGACTTGATTTTT IDEGHFDIHIEEGGEDIDGGDHDHIDFDDHDHGGDGDEGDHEIHHDDEEGIEFFGGGIDHHHDFGEDEHGGIDHFFEGGIHFEGIIIGEFDFIFEEIHIHDGDHEEFEDEHIIDDEHDIIDDGDDDDHGFEDIGDEFFEGDEDIDDEIDG NM:i:0 AS:i:150
The manual for samtools is a bit short on details about what exactly this tools is supposed to do other than to "fill in mate-related flags". I guess I'm not sure what that means? Can anybody share some light on this or recommend a different way of doing this?
EDIT
======================
Just noticed that fixmate has actually made a change but it only appears to have added some optional tags int the last column of the sam file. I had expected it to work on the FLAG field but maybe that's not what it is meant to do at all(?)