SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Stop automatic demultiplexing on MiSeq Heisman Illumina/Solexa 21 03-23-2015 05:42 PM
MiSeq Run Fail - Microsoft Issue? Olivia16 Illumina/Solexa 10 03-09-2015 07:13 AM
MiSeq cluster generation problems skosuri Illumina/Solexa 37 08-08-2014 02:05 AM
multiple cartridges in a single MiSeq run? wingtec General 1 08-10-2012 12:21 PM
How to Demultiplex a Nextera paired-end MiSeq run allo Illumina/Solexa 6 02-27-2012 07:10 AM

Reply
 
Thread Tools
Old 04-25-2012, 05:13 PM   #1
kga1978
Senior Member
 
Location: Boston, MA

Join Date: Nov 2010
Posts: 100
Question Demultiplexing MiSeq run - problems

Hi All,

I just got a MiSeq run back from our core and I am trying to demultiplex everything. However, I have received no instructions as to how I do this, so I am a bit in the dark. The run was 151bp paired-end and the first eight bases make up the barcode.

I have been doing the following:

1. Extract Illumina barcodes using the following command:
Code:
java -Xmx2g -jar /usr/local/bin/picard/ExtractIlluminaBarcodes.jar BASECALLS_DIR=/run/Data/Intensities/BaseCalls LANE=1 BARCODE_FILE=~/Desktop/extract.txt READ_STRUCTURE=8B143T METRICS_FILE=metrics NUM_PROCESSORS=6
The 'extract.txt' file containing the barcodes look like this:
Code:
barcode_sequence_1	barcode_sequence_2	barcode_sequence_3	barcode_sequence_4	barcode_sequence_5	barcode_sequence_6	barcode_sequence_7	barcode_sequence_8	barcode_sequence_9	barcode_sequence_10
AGGTAAGG	TGCTCGAC	GCCTAGCC	TTGAGCCT	TATCCAGG	TGCTGCTG	GACCTAAC	CTACCAGG	CTGCGGAT	GTAACATC
The output from this creates 12 s_1_00xx_barcode.txt files as expected - e.g.:
Code:
.CTGATTT        N       aggtaagg        6       9
.TCCTCCT        N               8       0
.CCTGCTT        N       aggtaagg        6       9
CCCACACC        N       aggtaagg        7       9
CTCGCTCC        N               8       0
CCTCAGCC        N       aggtaagg        7       9
TGCCAGTA        N       aggtaagg        6       9
CTTTTGTT        N       aggtaagg        7       9
CGACTCCT        N       aggtaagg        7       9
GTTACGCG        N       aggtaagg        7       9
However, this doesn't look right as all the 12 files look very similar to this with only the first barcode (aggtaagg) displayed in the third column? I'm wondering whether my READ_STRUCTURE=8B143T isn't correct or whether the barcode file isn't read correctly?

2. Alternatively I have given all the barcodes in the command:

Code:
java -Xmx2g -jar /usr/local/bin/picard/ExtractIlluminaBarcodes.jar BASECALLS_DIR=/run/Data/Intensities/BaseCalls LANE=1 BARCODE=AGGTAAGG BARCODE=TGCTCGAC BARCODE=GCCTAGCC BARCODE=TTGAGCCT BARCODE=TATCCAGG BARCODE=TGCTGCTG BARCODE=GACCTAAC BARCODE=CTACCAGG BARCODE=CTGCGGAT BARCODE=GTAACATC READ_STRUCTURE=8B143T METRICS_FILE=metrics NUM_PROCESSORS=6
But now the output looks different:
Code:
.TCCTCCT        N       ttgagcct        3       4
.CCTGCTT        N       tgctgctg        2       4
CCCACACC        N       gcctagcc        4       4
CTCGCTCC        N       tgctcgac        5       5
CCTCAGCC        N       gcctagcc        3       5
TGCCAGTA        N       tgctcgac        4       4
CTTTTGTT        N       ctgcggat        4       6
CGACTCCT        N       ttgagcct        5       5
GTTACGCG        N       gtaacatc        4       5
CCCCCATT        N       ctaccagg        4       5
CTAACACG        N       ctaccagg        2       3
This also doesn't look correct.

When I run the IlluminaBasecallsToSam command:

Code:
java -Xmx2g -jar /usr/local/bin/picard/IlluminaBasecallsToSam.jar BASECALLS_DIR=/run/Data/Intensities/BaseCalls LANE=1 RUN_BARCODE=120423_MiSeq SEQUENCING_CENTER=Broad BARCODE_PARAMS=~/Desktop/barcodes.txt READ_STRUCTURE=8B143T NUM_PROCESSORS=6 ADAPTERS_TO_CHECK=PAIRED_END
I get a bunch of errors. For number 1) above I get the following:
Code:
Picard version: 1.67(1190)
INFO	2012-04-25 21:09:47	IlluminaBasecallsToSam	READ STRUCTURE IS 8B143T
INFO	2012-04-25 21:09:47	IlluminaBasecallsToSam	Creating 6 TileProcessors.
Before explicit GC, Runtime.totalMemory()=85000192
After explicit GC, Runtime.totalMemory()=85000192
ERROR	2012-04-25 21:09:47	IlluminaBasecallsToSam	Exception in TileProcessor
net.sf.picard.PicardException: Barcode encountered in that was not specified in BARCODE_PARAMS: null
	at net.sf.picard.illumina.IlluminaBasecallsToSam$TileProcessor.processTile(IlluminaBasecallsToSam.java:639)
	at net.sf.picard.illumina.IlluminaBasecallsToSam$TileProcessor.run(IlluminaBasecallsToSam.java:592)
	at java.lang.Thread.run(Thread.java:680)
For number 2) above I get this:
Code:
Exception in thread "main" net.sf.picard.PicardException: Could not find a format with available files for the following data types: Position, Barcodes, BaseCalls, QualityScores, PF
	at net.sf.picard.illumina.parser.IlluminaDataProviderFactory.<init>(IlluminaDataProviderFactory.java:134)
	at net.sf.picard.illumina.IlluminaBasecallsToSam.doWork(IlluminaBasecallsToSam.java:152)
	at net.sf.picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:177)
	at net.sf.picard.illumina.IlluminaBasecallsToSam.main(IlluminaBasecallsToSam.java:377)
My barcode.txt input for the IlluminaBasecallsToSam.jar command looks like this:
Code:
BARCODE	OUTPUT	SAMPLE_ALIAS	LIBRARY_NAME
AGGTAAGG	Control.bam	Control	MiSeq_Test
TGCTCGAC	LASV_90.bam	LASV_90	MiSeq_Test
GCCTAGCC	LASV_241.bam	LASV_241	MiSeq_Test
TTGAGCCT	LASV_245.bam	LASV_245	MiSeq_Test
TATCCAGG	LASV_254.bam	LASV_254	MiSeq_Test
TGCTGCTG	LASV_263.bam	LASV_263	MiSeq_Test
GACCTAAC	LASV_289.bam	LASV_289	MiSeq_Test
CTACCAGG	LASV_291.bam	LASV_291	MiSeq_Test
CTGCGGAT	LASV_295.bam	LASV_295	MiSeq_Test
GTAACATC	LASV_309.bam	LASV_309	MiSeq_Test
Any idea what is going on here? Any help would be MUCH appreciated - I'm lost!
kga1978 is offline   Reply With Quote
Old 04-25-2012, 07:18 PM   #2
kga1978
Senior Member
 
Location: Boston, MA

Join Date: Nov 2010
Posts: 100
Default

Alright, got it - the first couple of errors were indeed caused by the READ_STRUCTURE parameter. Since I used a 151bp PE run the correct format was 151T8B151T. The second problem ('Barcode encountered in that was not specified...'), was cause by me not adding an 'N' barcode to my barcode.txt. Once I had done all that, everything worked beautifully!

Thanks Picard for being so awesome - once I get you to work!
kga1978 is offline   Reply With Quote
Old 04-28-2012, 03:04 AM   #3
swNGS
Member
 
Location: SW UK

Join Date: Nov 2011
Posts: 83
Default

If I'm assuming that your core just gives you the FASTQS in the same form as we get them off the machine, you will have 3 files, R1, R2 and the index file (I've assumed that your PE'ing)
You can ignore the index file ( one barcode read per read in the R1&R2 files), and demultiplex based on the number at the end of the header of each read, which corresponds to the sample order in the order that they were listed in the sample run sheet on the MiSeq.
If th makes sense, I've got a script that demultiplexes based on this if your MiSeq is creating FASTQs in the same way as ours does, I'll post it when I get a mo
swNGS is offline   Reply With Quote
Old 05-15-2012, 08:14 AM   #4
mnkyboy
Member
 
Location: Seattle, WA

Join Date: Mar 2009
Posts: 87
Default

I was wondering if you were still going to post that script? I was looking for an easy solution for this as well.
mnkyboy is offline   Reply With Quote
Old 07-21-2016, 10:48 AM   #5
drea11
Junior Member
 
Location: Canada

Join Date: Aug 2014
Posts: 9
Default

I'm attempting to extract barcodes from a set of illumina miseq paired-end sequences and came across this post which seems to have the best description I have seen on the picard tools... tools. I have a couple of questions regarding what is posted above, and my own issues with running the script.

1) Is the metrics_file, listed as required by the function, input by me, if so what is included in this file, and is there a sample with the appropriate format that I could use as a guide?
2) I'm unclear on the read structure string. The post included above describes a similar experimental setup to what I have, however I'm not sure how this leads to 151T8B151T as listed. I have both forward and reverse barcodes for my sequences, so would it not make more sense to have something like 8B151T151T8B?
3) The post also recommends use of an N barcode in the barcode_file. How does one include this, if indeed this is required?

As I am currently running this command,
java -jar ~/Desktop/software/picard-tools-1.119/ExtractIlluminaBarcodes.jar BASECALLS_DIR=Data/Intensities/BaseCalls LANE=1 BARCODE_FILE=barcode.txt READ_STRUCTURE=151T8B151T METRICS_FILE=metrics NUM_PROCESSORS=4

barcode.txt file is as below:
barcode_sequence_1 barcode_sequence_2 barcode_sequence_3 barcode_sequence_4 barcode_sequence_5 barcode_sequence_6barcode_sequence_7 barcode_sequence_8 barcode_sequence_9 barcode_sequence_10 barcode_sequence_11 barcode_sequence_12barcode_sequence_13 barcode_sequence_14 barcode_sequence_15 barcode_sequence_16 barcode_sequence_17
TACGCTGC ATGCGCAG TAGCGCTC ACTGAGCG CCTAAGAC CGATCAGT TCCTGAGC ATCTCAGG ACTGCATA AAGGAGTA CTAAGCCT CGTCTAAT TCTCTCCG CTCTCTAT TATCCTCT GTAAGGAG

The error I'm getting is:
Exception in thread "main" picard.PicardException: Could not find a format with available files for the following data types: BaseCalls, PF
at picard.illumina.parser.IlluminaDataProviderFactory.(IlluminaDataProviderFactory.java:172)
at picard.illumina.parser.IlluminaDataProviderFactory.(IlluminaDataProviderFactory.java:127)
at picard.illumina.ExtractIlluminaBarcodes.customCommandLineValidation(ExtractIlluminaBarcodes.java:332)
at picard.cmdline.CommandLineProgram.parseArgs(CommandLineProgram.java:242)
at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:128)
at picard.illumina.ExtractIlluminaBarcodes.main(ExtractIlluminaBarcodes.java:357)

java version 1.8.0_91

Appreciate any help!
drea11 is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 07:32 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO