Seqanswers Leaderboard Ad

**gaffa** · 01-11-2011, 02:23 PM

You could make a small script to chug through the file and add the machine id field (either the real one if you can acquire it, or else a made-up placeholder).

Regarding the "#<index>:<pair>" fields, some more info on the experiment might be needed. Is this single-end or paired-end (and how many data files are there? Illumina paired-end data usually comes in paired files with each read pair positioned on corresponding lines in the files). Any multiplexing?

**Airwalker810** · 01-12-2011, 06:54 AM

It is not paired ends, and I'm almost certain there is no multiplexing at all in the sample. A sample input would be great help. Thanks for the assistance!

**gaffa** · 01-12-2011, 09:02 AM

If it's single-end and no multiplexing, then you have all the information you need and it should just be a matter of formatting the ID line to make your program happy. The program is expecting read ID lines to look like this:

@ILxx_1234:1:1:1103:6172#1/1
@ILxx_1234:1:1:1103:16929#7/1
@ILxx_1234:1:1:1103:13497#2/2

where the first field is the ID/name of the machine that performed the experiment followed by the run number, the number after the "#" is the sample ID (if there are multiple samples) and the number after the "/" is the pair info for paired-end experiments (so it's either 1 or 2). If the program really wants a machine name, I guess you could just make up a phony machine name (ILmymachine_0001 or something more clever or whatever) for the first field. And since you have only a single sample, if the program really wants an index I guess you could just add "#1" after the y-coordinate (removing the ":N" part - I'm not sure what it signifies). For the pair-info, my guess is that you can just leave that info out (i.e. simply skip the "/1" part) and the program will treat the data as single-end.

(NOTE: I don't know anything about CASAVA - as I understand things it is Illumina's own program that can do a bunch of stuff. It's not inconceivable that CASAVA itself can generate the correct ID lines from lower level files - but again I don't know much about the pre-fastq pipeline.)

If you know a little Perl or Python scripting you should be able to make those changes to the ID lines to make CASAVA accept them - however this is just a quick-and-dirty practical fix, I don't know the underlying reason why your read ID lines look they way they do (maybe whoever generated the files does).

**Airwalker810** · 01-12-2011, 09:20 AM

Thanks for the help, should make things a bit easier with a little scripting. Yeah, I'm not sure what the deal with this data is, as I said, it was outsourced, and it came back looking like this mess. No idea why specific lines are missing from the data. My lab just procured a DeepSeq machine and I'm trying to force the data through that pipeline to make everything from the past and future work on the same analysis program.

Topics	Statistics	Last Post
Genetic Variants and Diabetes Risk in Childhood Cancer Survivors by seqadmin Started by seqadmin, Yesterday, 08:47 AM	0 responses 16 views 0 likes	Last Post by seqadmin Yesterday, 08:47 AM
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 60 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 60 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 54 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM

Seqanswers Leaderboard Ad

Announcement

Help with FastQ/CASAVA format problems

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News