Seqanswers Leaderboard Ad

**ulz_peter** · 03-22-2011, 10:52 PM

Do you know that article?:

G-SQZ: compact encoding of genomic sequence and quality data - PubMed

http://www.ncbi.nlm.nih.gov/pubmed/20605925

available at no cost under a non-open-source license by requesting from the web-site; Binary: available for direct download at no cost. For-Profit: Submit request for for-profit license from the web-site.

Regarding your questions (from my point of view):
1) Quality data is important as well for quality-aware alignment software and especially for SNP-calling (noone would waste that much disk space for unnecessary information)

2) I am not sure about the information of the title line the only thing that comes to my mind is that in case of paired end sqeuencing the first and the second mate need to have the same name. Not sure about the rest of the title line.

3) I don't think the order plays an important role but appreciate any comments correcting me...

**gaffa** · 03-24-2011, 02:25 PM

3. Generally speaking, the order of sequences in a file carries no significance. However, in Next Generation Sequencing it is common that reads produced by a paired-end experiment are stored in two separate fastq files, with the two reads of a pair being found on the same line in the two corresponding files. Clearly in this situation order is crucial.

**asb2718** · 03-24-2011, 08:18 PM

Thank you 'ulz_peter' and 'gaffa' for your answers. We appreciate your feedback. With respect to quality values, is there a particular threshold above which quality values always mean 'high quality'? If there is such a limit, what ascii value corresponds to it?

**ulz_peter** · 03-24-2011, 11:48 PM

Regarding the FastQ read name: Illumina has its own way of naming the reads:
see here: http://en.wikipedia.org/wiki/FASTQ_format
Generally I don't know any fastq naming convention but overriding the titles for illumina data would cause loss of information (afaik the coordinates can be used for optical duplicate detection)

Concerning the qualities: that depends on many factor I don't think there is a general threshold value. Moreover there are different quality encoding standards each using it's own range of ascii values...

**vbholaiit** · 04-07-2011, 12:15 AM

Random access

Hi all, I gave a query regarding importance of random access to individual reads in a compressed FASTQ file? If so then what can be the possible applications which random access can be used for? I'll appreciate any help in this regard. Thank you in advance

Topics	Statistics	Last Post
Expanding the Horizons of Cellular Research with the Single Cell Atlas by seqadmin Started by seqadmin, Today, 11:49 AM	0 responses 12 views 0 likes	Last Post by seqadmin Today, 11:49 AM
Genetic Variants and Diabetes Risk in Childhood Cancer Survivors by seqadmin Started by seqadmin, Yesterday, 08:47 AM	0 responses 16 views 0 likes	Last Post by seqadmin Yesterday, 08:47 AM
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 61 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 60 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM

Seqanswers Leaderboard Ad

Announcement

fastq format questions

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News