SEQanswers

Go Back   SEQanswers > Sequencing Technologies/Companies > Illumina/Solexa



Similar Threads
Thread Thread Starter Forum Replies Last Post
automatic analysis of mis/non sens genes Jane M Bioinformatics 6 11-21-2011 05:25 AM
Will you stop using 454? james hadfield General 37 06-24-2011 06:24 AM
Automatic Sequence data extraction? tgup Bioinformatics 5 04-21-2011 10:26 PM
to aligning all reads with SHRiMP (as automatic) ahabnar Bioinformatics 0 11-04-2009 09:43 AM
MAQ: finding the stop position of an alignment scirocco Bioinformatics 0 12-28-2008 10:33 AM

Reply
 
Thread Tools
Old 11-23-2011, 03:49 PM   #1
Heisman
Senior Member
 
Location: St. Louis

Join Date: Dec 2010
Posts: 535
Default Stop automatic demultiplexing on MiSeq

Hey guys,

Our lab does not have any sequencers but we have access to a sequencing core. They recently acquired a couple of MiSeqs and thus far they are not able to give out the index read information (the machine demultiplexes the runs). As we don't have the MiSeq I have no way of knowing how exactly it or more importantly the data processing works, and I have no idea if there is a work around. I know they plan to try to address this starting next week but I'm curious if anybody here has any experience with this? I want the indexing reads along with the sequencing reads, not just the sequencing reads in different files.
Heisman is offline   Reply With Quote
Old 11-23-2011, 04:19 PM   #2
ECO
--Site Admin--
 
Location: SF Bay Area, CA, USA

Join Date: Oct 2007
Posts: 1,358
Default

Not sure why your core couldnt give you the index reads as well...I can give you some snippets of the fastqs later, but they follow the Illumina conventions as far as I can tell (see the Wikipedia page on FASTQ).

We demultiplex after the fact on our own, the secondary analysis on the MiSeq is pretty broken for us currently, I'll see if I can dig up a script if you need one.
ECO is offline   Reply With Quote
Old 11-24-2011, 08:43 AM   #3
ECO
--Site Admin--
 
Location: SF Bay Area, CA, USA

Join Date: Oct 2007
Posts: 1,358
Default

R1.fastq:
Code:
@M00182:8:000000000-A0833:1:12:15161:29056 1:N:0:1
sequencehere
+
+114=??0)):??@@@/,6;;=00&55-&)&&0&)&00(+((+8&&)(((+3(((+(+55007&))0(((++()&)&)&)&)&&&&(+((((((((4+((((+4+((+((((++4:((,((+(((+((((++(((+((&+((+((((+((+
R2.fastq:
Code:
@M00182:8:000000000-A0833:1:12:15161:29056 2:N:0:1
sequencehere
+
+8+==22+=@?;+A+<+CA4A+3<A?<<<BCCB@E3)11:?DFGHICHFDHIHGHGEH@FGHIHIEGGGHFEGBC?CDCBB9;ACC@A>CBBDDCDDEEACDDBDDDDDDD@CCDDDDDDDDCC>A@<BCC(9A@AACCDDDDDCC4>A@A
I1.fastq:
Code:
@M00182:8:000000000-A0833:1:12:15161:29056 1:N:0:1
CTTGTA
+
?@@DDD
Here are file snippets of the two read files and the index reads, this is NOT using CASAVA, but the MiSeqReporter to generate fastqs (which are not demultiplexed).

http://en.wikipedia.org/wiki/FASTQ_format
ECO is offline   Reply With Quote
Old 11-24-2011, 09:34 AM   #4
Heisman
Senior Member
 
Location: St. Louis

Join Date: Dec 2010
Posts: 535
Default

Thanks, Eric, I will pass this information along. Perhaps they are using CASAVA and not the MiSeqReporter. They just acquired the MiSeqs recently so things are for sure still being figured out.

Last edited by Heisman; 11-24-2011 at 10:09 AM.
Heisman is offline   Reply With Quote
Old 02-07-2012, 04:47 PM   #5
NextGenSeb
Member
 
Location: Melbourne

Join Date: Jan 2012
Posts: 15
Default

Quote:
Originally Posted by ECO View Post
We demultiplex after the fact on our own, the secondary analysis on the MiSeq is pretty broken for us currently, I'll see if I can dig up a script if you need one.
Hi ECO,
would be great if you could provide the script you use to demultiplex the MiSeq FASTQ. We just got a MiSeq and I am not keen to use Illumina provided software for any of the analysis downstream of FASTQ generation (bad experience from the GAIIx). Therefore I'll have to demultiplex myself, but as far as I know scripts like the one in the fastx toolkit don't work.
Any help would be greatly appreciated.
Cheers
Seb
NextGenSeb is offline   Reply With Quote
Old 02-07-2012, 05:23 PM   #6
ECO
--Site Admin--
 
Location: SF Bay Area, CA, USA

Join Date: Oct 2007
Posts: 1,358
Default

Seb,

It's an internal script at my company written by a colleague, I'll see what his feelings are on sharing it. It's modularized inside a bunch of code that interfaces with our LIMS and instruments, so unless you're pretty competent at python it will be difficult to use in in its current form. I'll post back if I can share it.

I am still dumbfounded that this is a problem on Illumina machines after what...almost 7 years? I still can't access live run metrics outside of some proprietary binary format...so many simple things that have to be reinvented at every customer site.

</rant>
ECO is offline   Reply With Quote
Old 02-07-2012, 08:11 PM   #7
Heisman
Senior Member
 
Location: St. Louis

Join Date: Dec 2010
Posts: 535
Default

Quote:
Originally Posted by NextGenSeb View Post
Hi ECO,
would be great if you could provide the script you use to demultiplex the MiSeq FASTQ. We just got a MiSeq and I am not keen to use Illumina provided software for any of the analysis downstream of FASTQ generation (bad experience from the GAIIx). Therefore I'll have to demultiplex myself, but as far as I know scripts like the one in the fastx toolkit don't work.
Any help would be greatly appreciated.
Cheers
Seb
Ironically as I made this thread, but tell me what format you need the reads/indexes in and I can almost certainly give you a series of linux commands that will convert the MiSeq output to the format that you need.
Heisman is offline   Reply With Quote
Old 02-08-2012, 03:14 PM   #8
NextGenSeb
Member
 
Location: Melbourne

Join Date: Jan 2012
Posts: 15
Default

Thanks for the replies guys

@ECO:
My Python is not brilliant, but I am happy to work on that. Would in general be interested in the LIMS integration anyway, as we use the same system and hope to tie it in the workflow. So any tips in that regard are highly appreciated as well. However I understand the complications of internal politics, so please don't rub any noses

As for Illumina, I have the feeling that they made things even more complicated from the GAIIx to the MiSeq. Just complained to their tech specialist that the only half way convenient method to view the run quality data in realtime (or after the run in fact) is now their windows (!!) based SAV. Moreover there is no way to generate and automated run quality report. Definitely room for improvement there.

@Heisman:
All I want is to generate separate fastq files based on index from the one that the MiSeq spits out. After that I can take it through my pipeline. Rumour has it that the new MiSeq reporter software version is able to provide fastq files split by index, although I'd rather not rely on that... So a series of commands would take me a long way.

Cheers
Seb
NextGenSeb is offline   Reply With Quote
Old 02-08-2012, 03:43 PM   #9
Heisman
Senior Member
 
Location: St. Louis

Join Date: Dec 2010
Posts: 535
Default

Do you have a script to do this for the data if it was in a different format? If so it would be easier to convert it to that format. If not, I have an idea in mind that will work (basically involves pasting the read 1, read 2, and indexed reads together, along with the qualities, and putting a unique character next to the index, then grepping out that index from the full file and splitting the reads into separate files) that can be put into a bash script. It would not take long to write that but it wouldn't have any functionality that nicer scripts would have (ie, allowing 1bp mismatches).
Heisman is offline   Reply With Quote
Old 02-12-2012, 08:34 PM   #10
NextGenSeb
Member
 
Location: Melbourne

Join Date: Jan 2012
Posts: 15
Default

No, I don't have any script yet, and I think fastq is generally a good place to start. Also I think the handling of base-calling errors is important especially as the number of multiplexed samples rises... However appending the index to the front of the read might be feasible as one could use other existing scripts like the one in the fastx toolkit from there... Do you have s.th along these lines already?
NextGenSeb is offline   Reply With Quote
Old 02-13-2012, 05:51 AM   #11
Heisman
Senior Member
 
Location: St. Louis

Join Date: Dec 2010
Posts: 535
Default

Quote:
Originally Posted by NextGenSeb View Post
No, I don't have any script yet, and I think fastq is generally a good place to start. Also I think the handling of base-calling errors is important especially as the number of multiplexed samples rises... However appending the index to the front of the read might be feasible as one could use other existing scripts like the one in the fastx toolkit from there... Do you have s.th along these lines already?
Try something like this (I'm sure expert linux users may have a better way, but this will work):

sed -n '2,${p;n;}' [index_read_1] | sed 's,$,^,' indexes | tr '^' '\n' > indexes_changed_1

paste [read_1] [indexes_changed_1] | sed 's,[tab_key_here (press "ctrl + v", then tab)],,' > read_1_done

And now you're good to go with the index at the end of the read. Then specify that correctly with the barcode splitter (and enter the complement/reverse complement barcodes if necessary for it to work... not sure what is needed).
Heisman is offline   Reply With Quote
Old 03-15-2012, 07:25 AM   #12
smallcompany
Junior Member
 
Location: essex

Join Date: Oct 2011
Posts: 1
Default

We have a very similar problem! We are getting to grips with a new MiSeq and the data outputs in fastq. We don't really have any way of combining the Read1, Read2 and Index read fastq files for complete analysis. The idea of pasting the Index read to the end of R1 or R2 would be useful. I have very little experience with Linux - what does the first line of script actually do?

sed -n '2,${p;n;}' [index_read_1] | sed 's,$,^,' indexes | tr '^' '\n' > indexes_changed_1

I've tried it with some sample reads, but got an error about not finding 'indexes'

Is galaxy any good for de-multiplexing based on the indexes or are there better scripts to use?
smallcompany is offline   Reply With Quote
Old 03-15-2012, 07:35 AM   #13
Heisman
Senior Member
 
Location: St. Louis

Join Date: Dec 2010
Posts: 535
Default

Quote:
Originally Posted by smallcompany View Post
We have a very similar problem! We are getting to grips with a new MiSeq and the data outputs in fastq. We don't really have any way of combining the Read1, Read2 and Index read fastq files for complete analysis. The idea of pasting the Index read to the end of R1 or R2 would be useful. I have very little experience with Linux - what does the first line of script actually do?

sed -n '2,${p;n;}' [index_read_1] | sed 's,$,^,' indexes | tr '^' '\n' > indexes_changed_1

I've tried it with some sample reads, but got an error about not finding 'indexes'

Is galaxy any good for de-multiplexing based on the indexes or are there better scripts to use?
Dammit, "indexes" should not be there at all. Try:
Code:
sed -n '2,${p;n;}' [index_read_1] | sed 's,$,^,' | tr '^' '\n' > indexes_changed_1
What that does is, with the [index_read_1] input file, it first prints out every other line, starting with the second line. Then, it adds an "^" character to the end of each line, which by itself was probably very confusing and I'm not even sure it works as "^" should denote the start of a line, so if it works great but if not replace the "^" in each instance with an ")". Then, the "tr" command replaces the "^" (or ")") with a new line character.
Heisman is offline   Reply With Quote
Old 04-05-2012, 11:50 AM   #14
seb567
Senior Member
 
Location: Québec, Canada

Join Date: Jul 2008
Posts: 260
Default

Quote:
Originally Posted by Heisman View Post
Hey guys,

Our lab does not have any sequencers but we have access to a sequencing core. They recently acquired a couple of MiSeqs and thus far they are not able to give out the index read information (the machine demultiplexes the runs). As we don't have the MiSeq I have no way of knowing how exactly it or more importantly the data processing works, and I have no idea if there is a work around. I know they plan to try to address this starting next week but I'm curious if anybody here has any experience with this? I want the indexing reads along with the sequencing reads, not just the sequencing reads in different files.
Hello,

If all the multiplexed data belong to your group, you can ask them for the whole run that include .cif (intensity) files, .bcl (base calls) files and .clocs (probably a summary of the intensities?) files.

From these, you can generate not-demultiplexed fastq files with CASAVA and then demultiplex the files later.
You can also do base-calling with All-Your-Base.




For example, this will convert .cif files/.bcl files/.clocs files to fastq files (for dual indexes):

HTML Code:
sequenceWorld=/rap/nne-790-ab/Instruments/Illumina_HiSeq_1000_Hellbound
run=111207_SNL131_0065_AC0947ACXX
NSLOTS=8


configureBclToFastq.pl \
--input-dir $sequenceWorld/$run/Data/Intensities/BaseCalls \
--output-dir  $sequenceWorld/$run/Fastq-Sequences \
--use-bases-mask Y*,Y*,Y*,Y*

cd $sequenceWorld/$run/Fastq-Sequences

make -j $NSLOTS



Then, you can demultiplex sequences.
We use FastDemultiplexer, which allows more mismatches then CASAVA 1.8.2.


HTML Code:
FastDemultiplexer.py ../../SampleSheet-Nextera.csv  \ 
Project_redacted/Sample_lane1 Demultiplexed > stat.txt

Sébastien Boisvert
seb567 is offline   Reply With Quote
Old 05-12-2014, 09:40 AM   #15
creeves
Member
 
Location: East Bay

Join Date: Jul 2012
Posts: 26
Default Off line processing of MiSeq data

We have implemented the software to demux and convert bcl to fastq off line, which is necessary when the number of samples gets too high. Does anyone know how to toggle a MiSeq between doing all the processing during a typical run (a few small genomes) or generating only bcl files during a highly multiplexed run? Beyond a certain number of samples, the MiSeq chokes after the index reads and it takes hours after the run is complete before the fastq files show up in the run folder in MiSeqOutput. Often it fails to even transfer all the fastq files into the run folder of MiSeqOutput. Has anyone else encountered this problem?
creeves is offline   Reply With Quote
Old 05-12-2014, 11:10 AM   #16
pmiguel
Senior Member
 
Location: Purdue University, West Lafayette, Indiana

Join Date: Aug 2008
Posts: 2,317
Default

Quote:
Originally Posted by creeves View Post
We have implemented the software to demux and convert bcl to fastq off line, which is necessary when the number of samples gets too high. Does anyone know how to toggle a MiSeq between doing all the processing during a typical run (a few small genomes) or generating only bcl files during a highly multiplexed run? Beyond a certain number of samples, the MiSeq chokes after the index reads and it takes hours after the run is complete before the fastq files show up in the run folder in MiSeqOutput. Often it fails to even transfer all the fastq files into the run folder of MiSeqOutput. Has anyone else encountered this problem?
How many indexes before this becomes an issue?

If you changed your sample sheet to only show some of the index pairs maybe that would speed it up. (You would need a full sample sheet for where ever you were running CASAVA off-site.)
--
Phillip
pmiguel is offline   Reply With Quote
Old 05-17-2014, 01:19 PM   #17
rnaeye
Member
 
Location: East Cost

Join Date: May 2011
Posts: 79
Default

Hi!
If you are fine getting index reads as a separate fastq file, Miseq has an option to turn it on. If you turn on relevant flag, it will print fastq files for each indices as well as reads. It's described in MiseqReporterUserGuide. They just need to add following line into appSettings in .exe.config file.

<add key="CreateFastqForIndexReads" value="1" />

Details are written in the pdf manual.

There might be also an option that allows you stop demultiplexing on Miseq. Call Illumina they are very helpful. Only the way I know is .bcl to fastq conversion using a script, on which others have already commented.
rnaeye is offline   Reply With Quote
Old 05-18-2014, 11:09 PM   #18
Vinz
Member
 
Location: Germany

Join Date: Dec 2010
Posts: 80
Default

Quote:
Originally Posted by creeves View Post
We have implemented the software to demux and convert bcl to fastq off line, which is necessary when the number of samples gets too high. Does anyone know how to toggle a MiSeq between doing all the processing during a typical run (a few small genomes) or generating only bcl files during a highly multiplexed run? Beyond a certain number of samples, the MiSeq chokes after the index reads and it takes hours after the run is complete before the fastq files show up in the run folder in MiSeqOutput. Often it fails to even transfer all the fastq files into the run folder of MiSeqOutput. Has anyone else encountered this problem?
There was a timeout in the MiSeq software for copying the fastq to the output folder. For V3 runs with many samples, copying stopped when exceeding this time. I think this was fixed with 2.4.1. Sometimes copying still fails but most of the times it works in our hands (~1000 to 1500 samples/run).
Vinz is offline   Reply With Quote
Old 07-06-2014, 06:28 PM   #19
drdna
Member
 
Location: Kentucky

Join Date: May 2012
Posts: 76
Default

I don't see why anyone would want to do their own demultiplexing. The demultiplexed data that come off BaseSpace are perfectly good. What are you hoping to accomplish by doing it yourself - salvage the few tens of thousands of reads that can't be recognized?
drdna is offline   Reply With Quote
Old 07-07-2014, 03:15 AM   #20
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 7,077
Default

Quote:
Originally Posted by drdna View Post
I don't see why anyone would want to do their own demultiplexing. The demultiplexed data that come off BaseSpace are perfectly good. What are you hoping to accomplish by doing it yourself - salvage the few tens of thousands of reads that can't be recognized?
There are regulatory requirements that prohibit use of basespace at some institutions (specially with human samples). When several sequencers are involved, having a common data pipeline is convenient.
GenoMax is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 01:06 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO