SEQanswers

Go Back   SEQanswers > Sequencing Technologies/Companies > Illumina/Solexa



Similar Threads
Thread Thread Starter Forum Replies Last Post
paired-end reads mapped to genome.. gene with only one direction of paired-end reads? danwiththeplan Bioinformatics 2 09-22-2011 02:06 AM
How long should paired-end alignment run? agc Bioinformatics 11 09-07-2011 12:31 AM
set up TOPHAT run with paired end reads PFS Bioinformatics 1 03-08-2011 04:45 PM
3000 paired end library titration run sabrinaelias Bioinformatics 1 07-01-2010 11:29 AM
paired-end run failed in pipeline v1.3.2 ttkuaile Bioinformatics 1 04-17-2009 01:30 PM

Reply
 
Thread Tools
Old 02-12-2012, 05:04 PM   #1
allo
Member
 
Location: Davis, CA

Join Date: Jul 2009
Posts: 15
Question How to Demultiplex a Nextera paired-end MiSeq run

Has anybody been able to successfully demultiplexed a Nextera paired-end MiSeq run?
The current MiSeq Reporter cannot demultiplex and produce individual fastq.gz files for each dual-indexed sample.
So I thought, I’ll will give CASAVA a try but I keep getting error after error. I faked the sample sheet to look like the examples on the CASAVA UG but now I get a “DemultiplexedBustardConfig.xml” error.
Anybody there with some advice for a frustrated Biologist?
Thank You,
Alfredo Lopez
allo is offline   Reply With Quote
Old 02-16-2012, 04:49 AM   #2
kentk
Member
 
Location: Philippines

Join Date: Dec 2011
Posts: 17
Default

Yes I've done it by Python. Basically each of the I1, I2, R1 and R2 fastq.gz files are related to each other positionally line-by-line. That is, the first line of I1 corresponds to the same cluster as the first line in I2 and in R1 and in R2

What I did was parse and use regexp to read the fastq file read by read and then write each read into its own demultiplexed fastq. You can read the last part of the read header that looks something like this:

1:N:0:1
first number means read/index 1 or 2
last number is the classification according to the order you provided in your run sample sheet.
In this case this mean that this is read 1 and comes from sample 1.

Sometimes you get something like this:
1:N:0:0
This means that CASAVA was not able to classify it because the raw read of one of the indexes is too vague, degenerate, full of useless N's to be able to bin it.

So if you read each header of the raw multiplexed fastq, you can classify each read and write it into separate files.
Hope this helps.
kentk is offline   Reply With Quote
Old 02-16-2012, 07:26 AM   #3
zherbert
Junior Member
 
Location: boston

Join Date: Dec 2009
Posts: 4
Default

Another low-tech way to demultiplex is to point each indexed sample to a different Genome Folder on the Miseq sample sheet and run MiSeq Reporter. This will trick MSR into demultiplexing for you.
zherbert is offline   Reply With Quote
Old 02-16-2012, 07:47 AM   #4
allo
Member
 
Location: Davis, CA

Join Date: Jul 2009
Posts: 15
Talking Solved!

Hi KentK and Zherbet,
Thank you very much for your replies. After a few emails with Illumina's customer support I got CASAVA to run. It turns out the CASAVA is very picky about the project and sample names on the sample sheet and you cannot have any of these characters: ? ( ) [ ] / \ = +. < > : ; " ' , * ^ | &
So, once I got a bona fide CASAVA style sample sheet the program produced the expected demultiplexed individual fastq files.
>FCID,Lane,Sample_ID,SampleRef,Index,Description,Control,Recipe,Operator,SampleProject
The flow cell ID can be obtained from the SAV. It is the number on top of every SAV graph and has the following format: 000000000-A0???
The advantage of using CASAVA is that I can control the mismatches policy. So I can get perfect indexes or with a single one on either or both indexes.
Thanks!
allo is offline   Reply With Quote
Old 02-26-2012, 08:44 PM   #5
kentk
Member
 
Location: Philippines

Join Date: Dec 2011
Posts: 17
Default

Quote:
Originally Posted by zherbert View Post
Another low-tech way to demultiplex is to point each indexed sample to a different Genome Folder on the Miseq sample sheet and run MiSeq Reporter. This will trick MSR into demultiplexing for you.
What do you mean by "point each indexed sample to a different folder"? Do you do this in IEM when creating a sample sheet before the run?
kentk is offline   Reply With Quote
Old 02-27-2012, 03:04 AM   #6
zherbert
Junior Member
 
Location: boston

Join Date: Dec 2009
Posts: 4
Default

Quote:
Originally Posted by kentk View Post
What do you mean by "point each indexed sample to a different folder"? Do you do this in IEM when creating a sample sheet before the run?
Yes, you can do this in IEM, but you can also edit the sample sheet later and rerun MSR. One way to to set this is up is by creating a few subdirectories in the the Genomes location. I tested this by putting 12 copies of phiX subdirectories named 1-12 within a Demultiplex directory:

Path/To/Genomes/Demultiplex/1
Path/To/Genomes/Demultiplex/2
Path/To/Genomes/Demultiplex/3

I would only recommend doing this for low-plexity runs. The output get a bit messy (i.e. 8 very similarly named fastq files for each sample output to the same location in a dual index run), but it works well enough for small numbers of pooled samples.

Hope this helps.
zherbert is offline   Reply With Quote
Old 02-27-2012, 07:10 AM   #7
kentk
Member
 
Location: Philippines

Join Date: Dec 2011
Posts: 17
Default

Thanks zherbert. Ill try this out. Still a hack though. I wish Miseq would just store those sequences instead.
kentk is offline   Reply With Quote
Reply

Tags
casava, demultiplex, miseq

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 08:23 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO