Seqanswers Leaderboard Ad

**maubp** · 11-17-2010, 05:48 AM

It looks like SCARF using FASTQ style encoding for the quality scores.

Looks like you can just split the lines on the colon, something like this (in Python) should give you a FASTQ file:

Code:

#!/usr/bin/env python
"""
Simple script to generate FASTQ from colon separated SCARF input 
where the quality scores are ASCII encoded (FASTQ like).

Use at the command line, piping the input and output.
"""
import sys
for line in sys.stdin:
    name, seq, qual = line.rstrip("\n").rsplit(":",2)
    assert len(seq) == len(qual), line
    sys.stdout.write("@%s\n%s\n+\n%s\n" % (name, seq, qual))

The same idea would be equally trivial in Perl or your language of choice.

**agc** · 11-18-2010, 04:34 AM

Thanks a lot! It seems to be working. Any idea why these files are in this format? They're supposed to be illumina files.

**maubp** · 11-18-2010, 04:36 AM

SCARF is one of the many formats used by Solexa/Illumina, it gets confusing

Topics	Statistics	Last Post
Genetic Variants and Diabetes Risk in Childhood Cancer Survivors by seqadmin Started by seqadmin, Today, 08:47 AM	0 responses 10 views 0 likes	Last Post by seqadmin Today, 08:47 AM
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 60 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 57 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 53 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM

Seqanswers Leaderboard Ad

Announcement

unrecognized .txt format

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News