Unconfigured Ad

**kentk** · 02-16-2012, 05:49 AM

Yes I've done it by Python. Basically each of the I1, I2, R1 and R2 fastq.gz files are related to each other positionally line-by-line. That is, the first line of I1 corresponds to the same cluster as the first line in I2 and in R1 and in R2

What I did was parse and use regexp to read the fastq file read by read and then write each read into its own demultiplexed fastq. You can read the last part of the read header that looks something like this:

1:N:0:1
first number means read/index 1 or 2
last number is the classification according to the order you provided in your run sample sheet.
In this case this mean that this is read 1 and comes from sample 1.

Sometimes you get something like this:
1:N:0:0
This means that CASAVA was not able to classify it because the raw read of one of the indexes is too vague, degenerate, full of useless N's to be able to bin it.

So if you read each header of the raw multiplexed fastq, you can classify each read and write it into separate files.
Hope this helps.

**zherbert** · 02-16-2012, 08:26 AM

Another low-tech way to demultiplex is to point each indexed sample to a different Genome Folder on the Miseq sample sheet and run MiSeq Reporter. This will trick MSR into demultiplexing for you.

**allo** · 02-16-2012, 08:47 AM

Solved!

Hi KentK and Zherbet,
Thank you very much for your replies. After a few emails with Illumina's customer support I got CASAVA to run. It turns out the CASAVA is very picky about the project and sample names on the sample sheet and you cannot have any of these characters: ? ( ) [ ] / \ = +. < > : ; " ' , * ^ | &
So, once I got a bona fide CASAVA style sample sheet the program produced the expected demultiplexed individual fastq files.
>FCID,Lane,Sample_ID,SampleRef,Index,Description,Control,Recipe,Operator,SampleProject
The flow cell ID can be obtained from the SAV. It is the number on top of every SAV graph and has the following format: 000000000-A0???
The advantage of using CASAVA is that I can control the mismatches policy. So I can get perfect indexes or with a single one on either or both indexes.
Thanks!

**kentk** · 02-26-2012, 09:44 PM

Originally posted by zherbert View Post

Another low-tech way to demultiplex is to point each indexed sample to a different Genome Folder on the Miseq sample sheet and run MiSeq Reporter. This will trick MSR into demultiplexing for you.

What do you mean by "point each indexed sample to a different folder"? Do you do this in IEM when creating a sample sheet before the run?

**zherbert** · 02-27-2012, 04:04 AM

Originally posted by kentk View Post

What do you mean by "point each indexed sample to a different folder"? Do you do this in IEM when creating a sample sheet before the run?

Yes, you can do this in IEM, but you can also edit the sample sheet later and rerun MSR. One way to to set this is up is by creating a few subdirectories in the the Genomes location. I tested this by putting 12 copies of phiX subdirectories named 1-12 within a Demultiplex directory:

Path/To/Genomes/Demultiplex/1
Path/To/Genomes/Demultiplex/2
Path/To/Genomes/Demultiplex/3

I would only recommend doing this for low-plexity runs. The output get a bit messy (i.e. 8 very similarly named fastq files for each sample output to the same location in a dual index run), but it works well enough for small numbers of pooled samples.

Hope this helps.

**kentk** · 02-27-2012, 08:10 AM

Thanks zherbert. Ill try this out. Still a hack though. I wish Miseq would just store those sequences instead.

Topics	Statistics	Last Post
High-Resolution Sequencing Exposes Hidden Toxoplasma Diversity by SEQadmin2 Started by SEQadmin2, Today, 11:08 AM	0 responses 6 views 0 reactions	Last Post by SEQadmin2 Today, 11:08 AM
New AI Model Captures Long-Range Genomic Signals to Improve RNA Splice Site Prediction by SEQadmin2 Started by SEQadmin2, 06-30-2026, 05:37 AM	0 responses 11 views 0 reactions	Last Post by SEQadmin2 06-30-2026, 05:37 AM
Large-Scale Protein Screen Uncovers Hidden Regulators of Alternative Polyadenylation by SEQadmin2 Started by SEQadmin2, 06-26-2026, 11:10 AM	0 responses 19 views 0 reactions	Last Post by SEQadmin2 06-26-2026, 11:10 AM
Whole-Genome Sequencing Traces Faroe Islands Ancestry to a North Atlantic Founder Population by SEQadmin2 Started by SEQadmin2, 06-17-2026, 06:09 AM	0 responses 53 views 0 reactions	Last Post by SEQadmin2 06-17-2026, 06:09 AM

Unconfigured Ad

How to Demultiplex a Nextera paired-end MiSeq run

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News