Seqanswers Leaderboard Ad

**Bukowski** · 08-10-2012, 10:21 AM

I think you will find there is a reference to the barcode/sample at the end of the read name for each read. That might help.

**celzinga** · 08-10-2012, 10:31 AM

This thread may help:

How to Demultiplex a Nextera paired-end MiSeq run - SEQanswers

http://seqanswers.com/forums/showthread.php?t=17620

Bridged amplification & clustering followed by sequencing by synthesis. (Genome Analyzer / HiSeq / MiSeq)

**celzinga** · 08-10-2012, 10:42 AM

also it looks like picard can do this:

Encountered a 404 error

http://picard.sourceforge.net/command-line-overview.shtml#ExtractIlluminaBarcodes

**Cirno** · 08-10-2012, 03:40 PM

Originally posted by celzinga View Post

also it looks like picard can do this:
http://picard.sourceforge.net/comman...luminaBarcodes

Um. I don't see how that tool has anything to do with this problem. I don't need to extract the barcodes at all. I have three fastq files. First fastq is the barcodes already, I.E.:

Code:

@M00511:27:000000000-A1F08:1:1:17545:1321 1:N:0:0
AACCGAGA
+
?AAAAAAB
@M00511:27:000000000-A1F08:1:1:16720:1322 1:N:0:0
AACCGAGA
+
???A?@@B
@M00511:27:000000000-A1F08:1:1:17118:1322 1:N:0:0
AACCGAGA
+
A?AAAAAA
@M00511:27:000000000-A1F08:1:1:17183:1322 1:N:0:0
AAACATCA
+
AAAAABBB

Then the two files for both paired ends...I.E.:

Code:

@M00511:27:000000000-A1F08:1:1:17545:1321 1:N:0:0
NCGGGCACGACCATCACCATCATCATACGACGAACCAACGGGCATTATTCTGGTCGTTCGTCCTGATTGCGACGTTCATGGTCGTCGAAGTCATCGGCGGATTATGGACGAACAGTTTTGCGCTCTTGTCGGACGCCGGGCATATGCTTAG
+
#5<???AADDEEEDDDGGGGGGIIIIIIIIHHHHHHIIHHHHHHIIIIIIIIHIIHHHIHHHHHIIIIHHHHHHHHFHHHHHHGGFGGGGGGGGGGEGGG'.8:C*CCCD4A''*1CE*0:8'4C.:*:?)''.'.'.''2'**0*1:?:1
@M00511:27:000000000-A1F08:1:1:16720:1322 1:N:0:0
NCATACGTACCACCGATGACACCACCGACAAGCGGAACCATCTTCCCAAGATTAACGACCCCCGTATTCCCGAACTTCGTCAATAAGCGGAATCCGACTTTCTGATTGATTTTTTTGATGGTCGATCCAGGAATCTTCTTAATCATATTGA
+
#5<???BBDDDDDEDDFEFFFFIIIHHHHHHHIHHEHHIHIIIIIIIIIIIIIIIIHHHHHHHHDCFHHFHHHEHFDFH?DF;DFFDFEE=EFFA?A@BAEEFFEEEF=ABA?:8>DACAECEDD8A8*?*0:CCA0*::C*:ACA*:E:*
@M00511:27:000000000-A1F08:1:1:17118:1322 1:N:0:0
NTCCGCGTGACGGCGATGCCAGAGCGACGGGCCGCCTCGACGTTCGAGCCGACGTAATAAAACTCACGTCCTGTCTTCGAATACGTCAAAAACAGATGCGCCCCGGCGAAGAACAGAAGCATCAAGATGGCGACGAACGGGACAGGTCCGT
+
#5<???@@DDDDDDDDEEEFFFHHIHHHHHHHHHHHHHHHHHHHHEFHHHHEFFEFFEFFEEFFFFFFFFEEFFFFFFFEFFEFFFEE8A:CEEFEFEFDEADD?DDD'8>8?C:?E:*?:CAE0?::**:2'8;>2>').?8A))1*0'*
@M00511:27:000000000-A1F08:1:1:17183:1322 1:N:0:0
NATCGGAAGAGCACACGTCTGAACTCCAGTCACAAACATCATCTCGTATGCCGTCTTCTGCTTGAAAAAAAAAAAAAAAGACAGAACGAGACAAAAGAAGCACAAATCCGTAATCGATGAGACTTAATGCGAGATCATGACACCATTGTAA
+
#5<???AAEDEDDDDDGGGGGGIIIIIIIIIIIIIIIIIIIIIIIIHHIIIIHHHIIIIIIIIHHIIIIHHHHHD4)42**,,,,,,***3*,4,,,*4,,,3,0****)0*))*)0.************)).'0*1******)*******

and the according mate-pairs of all of those.

I do not want three files as they are. I know which barcodes go with which hashes.

RUN1_I1.fastq
RUN1_R1.fastq
RUN1_R2.fastq

Need to be converted into...

RUN1_R1_AACCGAGA.fastq
RUN1_R2_AACCGAGA.fastq
RUN1_R1_AAACATCA.fastq
RUN1_R2_AAACATCA.fastq

etc etc.

Personally I am beyond flabbergasted that the output of this damnable thing is not the same as the HiSeq - I just want the fastqs sorted by the barcode, it does nothing for me the user to have the barcode/has pairs in a separate file.

**GenoMax** · 08-13-2012, 03:53 AM

Did you get this run at a core facility? I am not sure why that facility did not do the de-multiplexing for you. It should be trivial for them to do this since they would have access to the raw data folder and CASAVA pipeline.

**geertvandeweyer** · 08-13-2012, 06:44 AM

Hi,

I've attached my approach to demultiplexing the MiSeq files. Note that it uses the MiSeq assigned sample idx to name the output files, NOT the barcode. This means you get all reads for the sample, also those with a mismatch in the barcode. It outputs three files per sample: forward reads, reverse reads, and interlaced reads. We use the interlaced reads in galaxy for batch workflow starting.

For files:
RUN1_I1.fastq
RUN1_R1.fastq
RUN1_R2.fastq

Run as:
perl demultiplex_miseq.pl RUN1

Output will be in 'output/' folder. It will also create a file containing all barcodes used per sample, and print the read count per sample.

Attached Files

demultiplex_MiSeq.pl (2.8 KB, 175 views)

**JackieBadger** · 08-13-2012, 05:04 PM

Google Code Archive - Long-term storage for Google Code Project Hosting.

http://code.google.com/p/ea-utils/

or

Galaxy

https://main.g2.bx.psu.edu/root

Galaxy is a community-driven web-based analysis platform for life science research.

Look under NGS Toolbox Beta, NGS: QC and manipulation

Barcode splitter and other FASTQ manipulations

**swNGS** · 08-16-2012, 01:51 PM

What is an interlaced read?

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 25 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 27 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 24 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 52 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

Help with De-Multiplexing MiSeq Data

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News