Seqanswers Leaderboard Ad

**jkbonfield** · 06-30-2010, 12:52 AM

As you've observed, Sam QNAME isn't what traditional assemblers and viewers refer to as a read name. Instead it's really the template name - in your example qname is READ_A or READ_B. I think the automatic detection of /1 and /2 to trim down a read-name to a qname is there simply because /1 and /2 was the MAQ convention. Ideally no one should rely on this convention now.

A common prefix is prone to error too. When dealing with fastq there's a huge variety of options:

- 1 fastq line per template, with an implicit assumption we chop in half to get both reads (the original illumina format IIRC)

- 1 fastq file with fwd and rev reads alternating, so they always come in pairs.

- 2 fastq files with each end stored in its own file.

- 1 fastq file for single-ended data. This obviously causes confusion with the first two conventions.

Every aligner seems to want something marginally different, and fastq frankly is hopeless as a format for embedding such meta data so it's tricky for tools to work out which read layout you use without either explicit command line options or trying to "be clever". In my experience that nearly always ends up shooting you in the foot sooner or later.

As for SAM. I'd like to see the true read suffix (or entire name if inappropriate) having a standard auxillary key:value tag to go from template names back to the original read names. Adding that and the ability to have more than 2 reads per qname and we gain the ability to use SAM for mixed assemblies with capillary finishing reads in it.

James

**misko** · 06-30-2010, 10:14 AM

Thanks for the answer, this helps clarify

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 55 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 51 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 45 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 55 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

Paired read names / SAM qname format

Comment

Comment

Latest Articles

ad_right_rmr

News