Seqanswers Leaderboard Ad

**ECO** · 11-23-2011, 05:19 PM

Not sure why your core couldnt give you the index reads as well...I can give you some snippets of the fastqs later, but they follow the Illumina conventions as far as I can tell (see the Wikipedia page on FASTQ).

We demultiplex after the fact on our own, the secondary analysis on the MiSeq is pretty broken for us currently, I'll see if I can dig up a script if you need one.

**ECO** · 11-24-2011, 09:43 AM

R1.fastq:

Code:

@M00182:8:000000000-A0833:1:12:15161:29056 1:N:0:1
sequencehere
+
+114=??0)):??@@@/,6;;=00&55-&)&&0&)&00(+((+8&&)(((+3(((+(+55007&))0(((++()&)&)&)&)&&&&(+((((((((4+((((+4+((+((((++4:((,((+(((+((((++(((+((&+((+((((+((+

R2.fastq:

Code:

@M00182:8:000000000-A0833:1:12:15161:29056 2:N:0:1
sequencehere
+
+8+==22+=@?;+A+<+CA4A+3<A?<<<BCCB@E3)11:?DFGHICHFDHIHGHGEH@FGHIHIEGGGHFEGBC?CDCBB9;ACC@A>CBBDDCDDEEACDDBDDDDDDD@CCDDDDDDDDCC>A@<BCC(9A@AACCDDDDDCC4>A@A

I1.fastq:

Code:

@M00182:8:000000000-A0833:1:12:15161:29056 1:N:0:1
CTTGTA
+
?@@DDD

Here are file snippets of the two read files and the index reads, this is NOT using CASAVA, but the MiSeqReporter to generate fastqs (which are not demultiplexed).

FASTQ format - Wikipedia

http://en.wikipedia.org/wiki/FASTQ_format

**Heisman** · 11-24-2011, 10:34 AM

Thanks, Eric, I will pass this information along. Perhaps they are using CASAVA and not the MiSeqReporter. They just acquired the MiSeqs recently so things are for sure still being figured out.

**NextGenSeb** · 02-07-2012, 05:47 PM

Originally posted by ECO View Post

We demultiplex after the fact on our own, the secondary analysis on the MiSeq is pretty broken for us currently, I'll see if I can dig up a script if you need one.

Hi ECO,
would be great if you could provide the script you use to demultiplex the MiSeq FASTQ. We just got a MiSeq and I am not keen to use Illumina provided software for any of the analysis downstream of FASTQ generation (bad experience from the GAIIx). Therefore I'll have to demultiplex myself, but as far as I know scripts like the one in the fastx toolkit don't work.
Any help would be greatly appreciated.
Cheers
Seb

**ECO** · 02-07-2012, 06:23 PM

Seb,

It's an internal script at my company written by a colleague, I'll see what his feelings are on sharing it. It's modularized inside a bunch of code that interfaces with our LIMS and instruments, so unless you're pretty competent at python it will be difficult to use in in its current form. I'll post back if I can share it.

I am still dumbfounded that this is a problem on Illumina machines after what...almost 7 years? I still can't access live run metrics outside of some proprietary binary format...so many simple things that have to be reinvented at every customer site.

</rant>

**Heisman** · 02-07-2012, 09:11 PM

Originally posted by NextGenSeb View Post

Hi ECO,
would be great if you could provide the script you use to demultiplex the MiSeq FASTQ. We just got a MiSeq and I am not keen to use Illumina provided software for any of the analysis downstream of FASTQ generation (bad experience from the GAIIx). Therefore I'll have to demultiplex myself, but as far as I know scripts like the one in the fastx toolkit don't work.
Any help would be greatly appreciated.
Cheers
Seb

Ironically as I made this thread, but tell me what format you need the reads/indexes in and I can almost certainly give you a series of linux commands that will convert the MiSeq output to the format that you need.

**NextGenSeb** · 02-08-2012, 04:14 PM

Thanks for the replies guys

@ECO:
My Python is not brilliant, but I am happy to work on that. Would in general be interested in the LIMS integration anyway, as we use the same system and hope to tie it in the workflow. So any tips in that regard are highly appreciated as well. However I understand the complications of internal politics, so please don't rub any noses

As for Illumina, I have the feeling that they made things even more complicated from the GAIIx to the MiSeq. Just complained to their tech specialist that the only half way convenient method to view the run quality data in realtime (or after the run in fact) is now their windows (!!) based SAV. Moreover there is no way to generate and automated run quality report. Definitely room for improvement there.

@Heisman:
All I want is to generate separate fastq files based on index from the one that the MiSeq spits out. After that I can take it through my pipeline. Rumour has it that the new MiSeq reporter software version is able to provide fastq files split by index, although I'd rather not rely on that... So a series of commands would take me a long way.

Cheers
Seb

**Heisman** · 02-08-2012, 04:43 PM

Do you have a script to do this for the data if it was in a different format? If so it would be easier to convert it to that format. If not, I have an idea in mind that will work (basically involves pasting the read 1, read 2, and indexed reads together, along with the qualities, and putting a unique character next to the index, then grepping out that index from the full file and splitting the reads into separate files) that can be put into a bash script. It would not take long to write that but it wouldn't have any functionality that nicer scripts would have (ie, allowing 1bp mismatches).

**NextGenSeb** · 02-12-2012, 09:34 PM

No, I don't have any script yet, and I think fastq is generally a good place to start. Also I think the handling of base-calling errors is important especially as the number of multiplexed samples rises... However appending the index to the front of the read might be feasible as one could use other existing scripts like the one in the fastx toolkit from there... Do you have s.th along these lines already?

**Heisman** · 02-13-2012, 06:51 AM

Originally posted by NextGenSeb View Post

No, I don't have any script yet, and I think fastq is generally a good place to start. Also I think the handling of base-calling errors is important especially as the number of multiplexed samples rises... However appending the index to the front of the read might be feasible as one could use other existing scripts like the one in the fastx toolkit from there... Do you have s.th along these lines already?

Try something like this (I'm sure expert linux users may have a better way, but this will work):

sed -n '2,${p;n;}' [index_read_1] | sed 's,$,^,' indexes | tr '^' '\n' > indexes_changed_1

paste [read_1] [indexes_changed_1] | sed 's,[tab_key_here (press "ctrl + v", then tab)],,' > read_1_done

And now you're good to go with the index at the end of the read. Then specify that correctly with the barcode splitter (and enter the complement/reverse complement barcodes if necessary for it to work... not sure what is needed).

**smallcompany** · 03-15-2012, 07:25 AM

We have a very similar problem! We are getting to grips with a new MiSeq and the data outputs in fastq. We don't really have any way of combining the Read1, Read2 and Index read fastq files for complete analysis. The idea of pasting the Index read to the end of R1 or R2 would be useful. I have very little experience with Linux - what does the first line of script actually do?

sed -n '2,${p;n;}' [index_read_1] | sed 's,$,^,' indexes | tr '^' '\n' > indexes_changed_1

I've tried it with some sample reads, but got an error about not finding 'indexes'

Is galaxy any good for de-multiplexing based on the indexes or are there better scripts to use?

**Heisman** · 03-15-2012, 07:35 AM

Originally posted by smallcompany View Post

We have a very similar problem! We are getting to grips with a new MiSeq and the data outputs in fastq. We don't really have any way of combining the Read1, Read2 and Index read fastq files for complete analysis. The idea of pasting the Index read to the end of R1 or R2 would be useful. I have very little experience with Linux - what does the first line of script actually do?

sed -n '2,${p;n;}' [index_read_1] | sed 's,$,^,' indexes | tr '^' '\n' > indexes_changed_1

I've tried it with some sample reads, but got an error about not finding 'indexes'

Is galaxy any good for de-multiplexing based on the indexes or are there better scripts to use?

Dammit, "indexes" should not be there at all. Try:

Code:

sed -n '2,${p;n;}' [index_read_1] | sed 's,$,^,' | tr '^' '\n' > indexes_changed_1

What that does is, with the [index_read_1] input file, it first prints out every other line, starting with the second line. Then, it adds an "^" character to the end of each line, which by itself was probably very confusing and I'm not even sure it works as "^" should denote the start of a line, so if it works great but if not replace the "^" in each instance with an ")". Then, the "tr" command replaces the "^" (or ")") with a new line character.

**seb567** · 04-05-2012, 11:50 AM

Originally posted by Heisman View Post

Hey guys,

Our lab does not have any sequencers but we have access to a sequencing core. They recently acquired a couple of MiSeqs and thus far they are not able to give out the index read information (the machine demultiplexes the runs). As we don't have the MiSeq I have no way of knowing how exactly it or more importantly the data processing works, and I have no idea if there is a work around. I know they plan to try to address this starting next week but I'm curious if anybody here has any experience with this? I want the indexing reads along with the sequencing reads, not just the sequencing reads in different files.

Hello,

If all the multiplexed data belong to your group, you can ask them for the whole run that include .cif (intensity) files, .bcl (base calls) files and .clocs (probably a summary of the intensities?) files.

From these, you can generate not-demultiplexed fastq files with CASAVA and then demultiplex the files later.
You can also do base-calling with All-Your-Base.

For example, this will convert .cif files/.bcl files/.clocs files to fastq files (for dual indexes):

HTML Code:

sequenceWorld=/rap/nne-790-ab/Instruments/Illumina_HiSeq_1000_Hellbound
run=111207_SNL131_0065_AC0947ACXX
NSLOTS=8


configureBclToFastq.pl \
--input-dir $sequenceWorld/$run/Data/Intensities/BaseCalls \
--output-dir  $sequenceWorld/$run/Fastq-Sequences \
--use-bases-mask Y*,Y*,Y*,Y*

cd $sequenceWorld/$run/Fastq-Sequences

make -j $NSLOTS

Then, you can demultiplex sequences.
We use FastDemultiplexer, which allows more mismatches then CASAVA 1.8.2.

HTML Code:

FastDemultiplexer.py ../../SampleSheet-Nextera.csv  \ 
Project_redacted/Sample_lane1 Demultiplexed > stat.txt

Sébastien Boisvert

**creeves** · 05-12-2014, 09:40 AM

Off line processing of MiSeq data

We have implemented the software to demux and convert bcl to fastq off line, which is necessary when the number of samples gets too high. Does anyone know how to toggle a MiSeq between doing all the processing during a typical run (a few small genomes) or generating only bcl files during a highly multiplexed run? Beyond a certain number of samples, the MiSeq chokes after the index reads and it takes hours after the run is complete before the fastq files show up in the run folder in MiSeqOutput. Often it fails to even transfer all the fastq files into the run folder of MiSeqOutput. Has anyone else encountered this problem?

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 18 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 22 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 17 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 49 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

Stop automatic demultiplexing on MiSeq

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News