Hi there,
I'm using BFAST to align Solexa reads to a very small portion of a genome (~3kb), and have been considering the best way to remove unmapped reads from the output since these unnecessarily bulk up the output .sam file. I know that samtools can filter an incoming .sam file using the -F command. However, I've read some documentation on the SAM flag format and must admit I find it pretty confusing. Within the flag field I know there are fields for both "the mate is unmapped" and "the query sequence itself is unmapped", but for non-paired-end Solexa reads can either of these be used for removing unmapped reads? Furthermore, what would be the integer or string used in the -F command?
Alternatively, there is the option in samtools view to filter by map quality (MAPQ). Would setting map quality filter to e.g. 1 remove all unmapped reads without affecting the filtered alignment from BFAST postprocess?
Alternatively again, dbamfilter within the DNAA package has the capacity to remove unmapped reads, but if samtools can do the job I'd like to minimise the number of apps employed.
What are thoughts on the best strategy?
Aiden
I'm using BFAST to align Solexa reads to a very small portion of a genome (~3kb), and have been considering the best way to remove unmapped reads from the output since these unnecessarily bulk up the output .sam file. I know that samtools can filter an incoming .sam file using the -F command. However, I've read some documentation on the SAM flag format and must admit I find it pretty confusing. Within the flag field I know there are fields for both "the mate is unmapped" and "the query sequence itself is unmapped", but for non-paired-end Solexa reads can either of these be used for removing unmapped reads? Furthermore, what would be the integer or string used in the -F command?
Alternatively, there is the option in samtools view to filter by map quality (MAPQ). Would setting map quality filter to e.g. 1 remove all unmapped reads without affecting the filtered alignment from BFAST postprocess?
Alternatively again, dbamfilter within the DNAA package has the capacity to remove unmapped reads, but if samtools can do the job I'd like to minimise the number of apps employed.
What are thoughts on the best strategy?
Aiden
Comment