In the latest SAM format specification, the meanings of "read" and "fragment" seem to be reversed from what I am accustomed to.
My understanding is that a fragment is the result of breaking a piece of DNA or RNA (e.g., a chromosome, cDNA, or mRNA) into smaller pieces (e.g., by shearing or nebulization). It is a subsequence of the original DNA or RNA. And reads are subsequences of a fragment (e.g., for paired end reads, from sequencing both ends of the fragment).
However, in the latest SAM Format Specification (v 1.4-r962), April 17, 2011, if I am understanding the specs correctly, the meanings of "fragment" and "read" have been swapped. (The specs can be downloaded from http://samtools.sourceforge.net/ , under the "General Information" heading, under "SAM Spec v1.4".)
In that document, the definitions are (with my emphases):
And the bitwise FLAGs are:
Whereas from the samtools man page (http://samtools.sourceforge.net/samtools.shtml), the Sam Format bitwise flags are
I am confused. As sequencing moves toward more than just two reads (paired ends) per piece of DNA/RNA, are the meanings of the terms "fragment" and "read" changing?
My understanding is that a fragment is the result of breaking a piece of DNA or RNA (e.g., a chromosome, cDNA, or mRNA) into smaller pieces (e.g., by shearing or nebulization). It is a subsequence of the original DNA or RNA. And reads are subsequences of a fragment (e.g., for paired end reads, from sequencing both ends of the fragment).
However, in the latest SAM Format Specification (v 1.4-r962), April 17, 2011, if I am understanding the specs correctly, the meanings of "fragment" and "read" have been swapped. (The specs can be downloaded from http://samtools.sourceforge.net/ , under the "General Information" heading, under "SAM Spec v1.4".)
In that document, the definitions are (with my emphases):
- Template: A DNA/RNA sequence part of which is sequenced on a sequencing machine or assembled from raw sequences.
- Fragment: A contiguous (sub)sequence on a template which is sequenced or assembled. For sequencing data, fragments are indexed by the order in which they are sequenced. For fragments of an assembled sequence, they are indexed by the order of the leftmost coordinate on the assembled sequence.
- Read: A raw sequence that comes off a sequencing machine. A read may consist of multiple fragments.
And the bitwise FLAGs are:
- 0x1 template having multiple fragments in sequencing
- 0x2 each fragment properly aligned according to the aligner
- 0x4 fragment unmapped
- 0x8 next fragment in the template unmapped
- 0x10 SEQ being reverse complemented
- 0x20 SEQ of the next fragment in the template being reversed
- 0x40 the first fragment in the template
- 0x80 the last fragment in the template
- 0x100 secondary alignment
- 0x200 not passing quality controls
- 0x400 PCR or optical duplicate
Whereas from the samtools man page (http://samtools.sourceforge.net/samtools.shtml), the Sam Format bitwise flags are
- 0x0001 the read is paired in sequencing
- 0x0002 the read is mapped in a proper pair
- 0x0004 the query sequence itself is unmapped
- 0x0008 the mate is unmapped
- 0x0010 strand of the query (1 for reverse)
- 0x0020 strand of the mate
- 0x0040 the read is the first read in a pair
- 0x0080 the read is the second read in a pair
- 0x0100 the alignment is not primary
- 0x0200 the read fails platform/vendor quality checks
- 0x0400 the read is either a PCR or an optical duplicate
I am confused. As sequencing moves toward more than just two reads (paired ends) per piece of DNA/RNA, are the meanings of the terms "fragment" and "read" changing?
Comment