![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
automatic analysis of mis/non sens genes | Jane M | Bioinformatics | 6 | 11-21-2011 05:25 AM |
Will you stop using 454? | james hadfield | General | 37 | 06-24-2011 06:24 AM |
Automatic Sequence data extraction? | tgup | Bioinformatics | 5 | 04-21-2011 10:26 PM |
to aligning all reads with SHRiMP (as automatic) | ahabnar | Bioinformatics | 0 | 11-04-2009 09:43 AM |
MAQ: finding the stop position of an alignment | scirocco | Bioinformatics | 0 | 12-28-2008 10:33 AM |
![]() |
|
Thread Tools |
![]() |
#1 |
Senior Member
Location: St. Louis Join Date: Dec 2010
Posts: 534
|
![]()
Hey guys,
Our lab does not have any sequencers but we have access to a sequencing core. They recently acquired a couple of MiSeqs and thus far they are not able to give out the index read information (the machine demultiplexes the runs). As we don't have the MiSeq I have no way of knowing how exactly it or more importantly the data processing works, and I have no idea if there is a work around. I know they plan to try to address this starting next week but I'm curious if anybody here has any experience with this? I want the indexing reads along with the sequencing reads, not just the sequencing reads in different files. |
![]() |
![]() |
![]() |
#2 |
--Site Admin--
Location: SF Bay Area, CA, USA Join Date: Oct 2007
Posts: 1,358
|
![]()
Not sure why your core couldnt give you the index reads as well...I can give you some snippets of the fastqs later, but they follow the Illumina conventions as far as I can tell (see the Wikipedia page on FASTQ).
We demultiplex after the fact on our own, the secondary analysis on the MiSeq is pretty broken for us currently, I'll see if I can dig up a script if you need one. |
![]() |
![]() |
![]() |
#3 |
--Site Admin--
Location: SF Bay Area, CA, USA Join Date: Oct 2007
Posts: 1,358
|
![]()
R1.fastq:
Code:
@M00182:8:000000000-A0833:1:12:15161:29056 1:N:0:1 sequencehere + +114=??0)):??@@@/,6;;=00&55-&)&&0&)&00(+((+8&&)(((+3(((+(+55007&))0(((++()&)&)&)&)&&&&(+((((((((4+((((+4+((+((((++4:((,((+(((+((((++(((+((&+((+((((+((+ Code:
@M00182:8:000000000-A0833:1:12:15161:29056 2:N:0:1 sequencehere + +8+==22+=@?;+A+<+CA4A+3<A?<<<BCCB@E3)11:?DFGHICHFDHIHGHGEH@FGHIHIEGGGHFEGBC?CDCBB9;ACC@A>CBBDDCDDEEACDDBDDDDDDD@CCDDDDDDDDCC>A@<BCC(9A@AACCDDDDDCC4>A@A Code:
@M00182:8:000000000-A0833:1:12:15161:29056 1:N:0:1 CTTGTA + ?@@DDD http://en.wikipedia.org/wiki/FASTQ_format |
![]() |
![]() |
![]() |
#4 |
Senior Member
Location: St. Louis Join Date: Dec 2010
Posts: 534
|
![]()
Thanks, Eric, I will pass this information along. Perhaps they are using CASAVA and not the MiSeqReporter. They just acquired the MiSeqs recently so things are for sure still being figured out.
Last edited by Heisman; 11-24-2011 at 10:09 AM. |
![]() |
![]() |
![]() |
#5 | |
Member
Location: Melbourne Join Date: Jan 2012
Posts: 15
|
![]() Quote:
would be great if you could provide the script you use to demultiplex the MiSeq FASTQ. We just got a MiSeq and I am not keen to use Illumina provided software for any of the analysis downstream of FASTQ generation (bad experience from the GAIIx). Therefore I'll have to demultiplex myself, but as far as I know scripts like the one in the fastx toolkit don't work. Any help would be greatly appreciated. Cheers Seb |
|
![]() |
![]() |
![]() |
#6 |
--Site Admin--
Location: SF Bay Area, CA, USA Join Date: Oct 2007
Posts: 1,358
|
![]()
Seb,
It's an internal script at my company written by a colleague, I'll see what his feelings are on sharing it. It's modularized inside a bunch of code that interfaces with our LIMS and instruments, so unless you're pretty competent at python it will be difficult to use in in its current form. I'll post back if I can share it. I am still dumbfounded that this is a problem on Illumina machines after what...almost 7 years? I still can't access live run metrics outside of some proprietary binary format...so many simple things that have to be reinvented at every customer site. </rant> ![]() |
![]() |
![]() |
![]() |
#7 | |
Senior Member
Location: St. Louis Join Date: Dec 2010
Posts: 534
|
![]() Quote:
|
|
![]() |
![]() |
![]() |
#8 |
Member
Location: Melbourne Join Date: Jan 2012
Posts: 15
|
![]()
Thanks for the replies guys
![]() @ECO: My Python is not brilliant, but I am happy to work on that. Would in general be interested in the LIMS integration anyway, as we use the same system and hope to tie it in the workflow. So any tips in that regard are highly appreciated as well. However I understand the complications of internal politics, so please don't rub any noses ![]() As for Illumina, I have the feeling that they made things even more complicated from the GAIIx to the MiSeq. Just complained to their tech specialist that the only half way convenient method to view the run quality data in realtime (or after the run in fact) is now their windows (!!) based SAV. Moreover there is no way to generate and automated run quality report. Definitely room for improvement there. @Heisman: All I want is to generate separate fastq files based on index from the one that the MiSeq spits out. After that I can take it through my pipeline. Rumour has it that the new MiSeq reporter software version is able to provide fastq files split by index, although I'd rather not rely on that... So a series of commands would take me a long way. Cheers Seb |
![]() |
![]() |
![]() |
#9 |
Senior Member
Location: St. Louis Join Date: Dec 2010
Posts: 534
|
![]()
Do you have a script to do this for the data if it was in a different format? If so it would be easier to convert it to that format. If not, I have an idea in mind that will work (basically involves pasting the read 1, read 2, and indexed reads together, along with the qualities, and putting a unique character next to the index, then grepping out that index from the full file and splitting the reads into separate files) that can be put into a bash script. It would not take long to write that but it wouldn't have any functionality that nicer scripts would have (ie, allowing 1bp mismatches).
|
![]() |
![]() |
![]() |
#10 |
Member
Location: Melbourne Join Date: Jan 2012
Posts: 15
|
![]()
No, I don't have any script yet, and I think fastq is generally a good place to start. Also I think the handling of base-calling errors is important especially as the number of multiplexed samples rises... However appending the index to the front of the read might be feasible as one could use other existing scripts like the one in the fastx toolkit from there... Do you have s.th along these lines already?
|
![]() |
![]() |
![]() |
#11 | |
Senior Member
Location: St. Louis Join Date: Dec 2010
Posts: 534
|
![]() Quote:
sed -n '2,${p;n;}' [index_read_1] | sed 's,$,^,' indexes | tr '^' '\n' > indexes_changed_1 paste [read_1] [indexes_changed_1] | sed 's,[tab_key_here (press "ctrl + v", then tab)],,' > read_1_done And now you're good to go with the index at the end of the read. Then specify that correctly with the barcode splitter (and enter the complement/reverse complement barcodes if necessary for it to work... not sure what is needed). |
|
![]() |
![]() |
![]() |
#12 |
Junior Member
Location: essex Join Date: Oct 2011
Posts: 1
|
![]()
We have a very similar problem! We are getting to grips with a new MiSeq and the data outputs in fastq. We don't really have any way of combining the Read1, Read2 and Index read fastq files for complete analysis. The idea of pasting the Index read to the end of R1 or R2 would be useful. I have very little experience with Linux - what does the first line of script actually do?
sed -n '2,${p;n;}' [index_read_1] | sed 's,$,^,' indexes | tr '^' '\n' > indexes_changed_1 I've tried it with some sample reads, but got an error about not finding 'indexes' Is galaxy any good for de-multiplexing based on the indexes or are there better scripts to use? |
![]() |
![]() |
![]() |
#13 | |
Senior Member
Location: St. Louis Join Date: Dec 2010
Posts: 534
|
![]() Quote:
Code:
sed -n '2,${p;n;}' [index_read_1] | sed 's,$,^,' | tr '^' '\n' > indexes_changed_1 |
|
![]() |
![]() |
![]() |
#14 | |
Senior Member
Location: Québec, Canada Join Date: Jul 2008
Posts: 260
|
![]() Quote:
If all the multiplexed data belong to your group, you can ask them for the whole run that include .cif (intensity) files, .bcl (base calls) files and .clocs (probably a summary of the intensities?) files. From these, you can generate not-demultiplexed fastq files with CASAVA and then demultiplex the files later. You can also do base-calling with All-Your-Base. For example, this will convert .cif files/.bcl files/.clocs files to fastq files (for dual indexes): HTML Code:
sequenceWorld=/rap/nne-790-ab/Instruments/Illumina_HiSeq_1000_Hellbound run=111207_SNL131_0065_AC0947ACXX NSLOTS=8 configureBclToFastq.pl \ --input-dir $sequenceWorld/$run/Data/Intensities/BaseCalls \ --output-dir $sequenceWorld/$run/Fastq-Sequences \ --use-bases-mask Y*,Y*,Y*,Y* cd $sequenceWorld/$run/Fastq-Sequences make -j $NSLOTS Then, you can demultiplex sequences. We use FastDemultiplexer, which allows more mismatches then CASAVA 1.8.2. HTML Code:
FastDemultiplexer.py ../../SampleSheet-Nextera.csv \ Project_redacted/Sample_lane1 Demultiplexed > stat.txt Sébastien Boisvert |
|
![]() |
![]() |
![]() |
#15 |
Member
Location: East Bay Join Date: Jul 2012
Posts: 26
|
![]()
We have implemented the software to demux and convert bcl to fastq off line, which is necessary when the number of samples gets too high. Does anyone know how to toggle a MiSeq between doing all the processing during a typical run (a few small genomes) or generating only bcl files during a highly multiplexed run? Beyond a certain number of samples, the MiSeq chokes after the index reads and it takes hours after the run is complete before the fastq files show up in the run folder in MiSeqOutput. Often it fails to even transfer all the fastq files into the run folder of MiSeqOutput. Has anyone else encountered this problem?
|
![]() |
![]() |
![]() |
#16 | |
Senior Member
Location: Purdue University, West Lafayette, Indiana Join Date: Aug 2008
Posts: 2,315
|
![]() Quote:
If you changed your sample sheet to only show some of the index pairs maybe that would speed it up. (You would need a full sample sheet for where ever you were running CASAVA off-site.) -- Phillip |
|
![]() |
![]() |
![]() |
#17 |
Member
Location: East Cost Join Date: May 2011
Posts: 79
|
![]()
Hi!
If you are fine getting index reads as a separate fastq file, Miseq has an option to turn it on. If you turn on relevant flag, it will print fastq files for each indices as well as reads. It's described in MiseqReporterUserGuide. They just need to add following line into appSettings in .exe.config file. <add key="CreateFastqForIndexReads" value="1" /> Details are written in the pdf manual. There might be also an option that allows you stop demultiplexing on Miseq. Call Illumina they are very helpful. Only the way I know is .bcl to fastq conversion using a script, on which others have already commented. |
![]() |
![]() |
![]() |
#18 | |
Member
Location: Germany Join Date: Dec 2010
Posts: 80
|
![]() Quote:
|
|
![]() |
![]() |
![]() |
#19 |
Member
Location: Kentucky Join Date: May 2012
Posts: 76
|
![]()
I don't see why anyone would want to do their own demultiplexing. The demultiplexed data that come off BaseSpace are perfectly good. What are you hoping to accomplish by doing it yourself - salvage the few tens of thousands of reads that can't be recognized?
|
![]() |
![]() |
![]() |
#20 |
Senior Member
Location: East Coast USA Join Date: Feb 2008
Posts: 7,091
|
![]()
There are regulatory requirements that prohibit use of basespace at some institutions (specially with human samples). When several sequencers are involved, having a common data pipeline is convenient.
|
![]() |
![]() |
![]() |
Thread Tools | |
|
|