Seqanswers Leaderboard Ad

**gaffa** · 01-11-2011, 02:23 PM

You could make a small script to chug through the file and add the machine id field (either the real one if you can acquire it, or else a made-up placeholder).

Regarding the "#<index>:<pair>" fields, some more info on the experiment might be needed. Is this single-end or paired-end (and how many data files are there? Illumina paired-end data usually comes in paired files with each read pair positioned on corresponding lines in the files). Any multiplexing?

**Airwalker810** · 01-12-2011, 06:54 AM

It is not paired ends, and I'm almost certain there is no multiplexing at all in the sample. A sample input would be great help. Thanks for the assistance!

**gaffa** · 01-12-2011, 09:02 AM

If it's single-end and no multiplexing, then you have all the information you need and it should just be a matter of formatting the ID line to make your program happy. The program is expecting read ID lines to look like this:

@ILxx_1234:1:1:1103:6172#1/1
@ILxx_1234:1:1:1103:16929#7/1
@ILxx_1234:1:1:1103:13497#2/2

where the first field is the ID/name of the machine that performed the experiment followed by the run number, the number after the "#" is the sample ID (if there are multiple samples) and the number after the "/" is the pair info for paired-end experiments (so it's either 1 or 2). If the program really wants a machine name, I guess you could just make up a phony machine name (ILmymachine_0001 or something more clever or whatever) for the first field. And since you have only a single sample, if the program really wants an index I guess you could just add "#1" after the y-coordinate (removing the ":N" part - I'm not sure what it signifies). For the pair-info, my guess is that you can just leave that info out (i.e. simply skip the "/1" part) and the program will treat the data as single-end.

(NOTE: I don't know anything about CASAVA - as I understand things it is Illumina's own program that can do a bunch of stuff. It's not inconceivable that CASAVA itself can generate the correct ID lines from lower level files - but again I don't know much about the pre-fastq pipeline.)

If you know a little Perl or Python scripting you should be able to make those changes to the ID lines to make CASAVA accept them - however this is just a quick-and-dirty practical fix, I don't know the underlying reason why your read ID lines look they way they do (maybe whoever generated the files does).

**Airwalker810** · 01-12-2011, 09:20 AM

Thanks for the help, should make things a bit easier with a little scripting. Yeah, I'm not sure what the deal with this data is, as I said, it was outsourced, and it came back looking like this mess. No idea why specific lines are missing from the data. My lab just procured a DeepSeq machine and I'm trying to force the data through that pipeline to make everything from the past and future work on the same analysis program.

Topics	Statistics	Last Post
A Close Examination at Probiotic-Related Bacteremia by seqadmin Started by seqadmin, 05-02-2024, 08:06 AM	0 responses 16 views 0 likes	Last Post by seqadmin 05-02-2024, 08:06 AM
Expanded Genetic Insights into Blood Pressure Regulation by seqadmin Started by seqadmin, 04-30-2024, 12:17 PM	0 responses 20 views 0 likes	Last Post by seqadmin 04-30-2024, 12:17 PM
The Role of Enhancers in Defining Cell Fate by seqadmin Started by seqadmin, 04-29-2024, 10:49 AM	0 responses 25 views 0 likes	Last Post by seqadmin 04-29-2024, 10:49 AM
Expanding the Horizons of Cellular Research with the Single Cell Atlas by seqadmin Started by seqadmin, 04-25-2024, 11:49 AM	0 responses 28 views 0 likes	Last Post by seqadmin 04-25-2024, 11:49 AM

Seqanswers Leaderboard Ad

Announcement

Help with FastQ/CASAVA format problems

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News