SEQanswers

SEQanswers (http://seqanswers.com/forums/index.php)
-   Bioinformatics (http://seqanswers.com/forums/forumdisplay.php?f=18)
-   -   Demultiplexing MiSeq run - problems (http://seqanswers.com/forums/showthread.php?t=19538)

kga1978 04-25-2012 06:13 PM

Demultiplexing MiSeq run - problems
 
Hi All,

I just got a MiSeq run back from our core and I am trying to demultiplex everything. However, I have received no instructions as to how I do this, so I am a bit in the dark. The run was 151bp paired-end and the first eight bases make up the barcode.

I have been doing the following:

1. Extract Illumina barcodes using the following command:
Code:

java -Xmx2g -jar /usr/local/bin/picard/ExtractIlluminaBarcodes.jar BASECALLS_DIR=/run/Data/Intensities/BaseCalls LANE=1 BARCODE_FILE=~/Desktop/extract.txt READ_STRUCTURE=8B143T METRICS_FILE=metrics NUM_PROCESSORS=6
The 'extract.txt' file containing the barcodes look like this:
Code:

barcode_sequence_1        barcode_sequence_2        barcode_sequence_3        barcode_sequence_4        barcode_sequence_5        barcode_sequence_6        barcode_sequence_7        barcode_sequence_8        barcode_sequence_9        barcode_sequence_10
AGGTAAGG        TGCTCGAC        GCCTAGCC        TTGAGCCT        TATCCAGG        TGCTGCTG        GACCTAAC        CTACCAGG        CTGCGGAT        GTAACATC

The output from this creates 12 s_1_00xx_barcode.txt files as expected - e.g.:
Code:

.CTGATTT        N      aggtaagg        6      9
.TCCTCCT        N              8      0
.CCTGCTT        N      aggtaagg        6      9
CCCACACC        N      aggtaagg        7      9
CTCGCTCC        N              8      0
CCTCAGCC        N      aggtaagg        7      9
TGCCAGTA        N      aggtaagg        6      9
CTTTTGTT        N      aggtaagg        7      9
CGACTCCT        N      aggtaagg        7      9
GTTACGCG        N      aggtaagg        7      9

However, this doesn't look right as all the 12 files look very similar to this with only the first barcode (aggtaagg) displayed in the third column? I'm wondering whether my READ_STRUCTURE=8B143T isn't correct or whether the barcode file isn't read correctly?

2. Alternatively I have given all the barcodes in the command:

Code:

java -Xmx2g -jar /usr/local/bin/picard/ExtractIlluminaBarcodes.jar BASECALLS_DIR=/run/Data/Intensities/BaseCalls LANE=1 BARCODE=AGGTAAGG BARCODE=TGCTCGAC BARCODE=GCCTAGCC BARCODE=TTGAGCCT BARCODE=TATCCAGG BARCODE=TGCTGCTG BARCODE=GACCTAAC BARCODE=CTACCAGG BARCODE=CTGCGGAT BARCODE=GTAACATC READ_STRUCTURE=8B143T METRICS_FILE=metrics NUM_PROCESSORS=6
But now the output looks different:
Code:

.TCCTCCT        N      ttgagcct        3      4
.CCTGCTT        N      tgctgctg        2      4
CCCACACC        N      gcctagcc        4      4
CTCGCTCC        N      tgctcgac        5      5
CCTCAGCC        N      gcctagcc        3      5
TGCCAGTA        N      tgctcgac        4      4
CTTTTGTT        N      ctgcggat        4      6
CGACTCCT        N      ttgagcct        5      5
GTTACGCG        N      gtaacatc        4      5
CCCCCATT        N      ctaccagg        4      5
CTAACACG        N      ctaccagg        2      3

This also doesn't look correct.

When I run the IlluminaBasecallsToSam command:

Code:

java -Xmx2g -jar /usr/local/bin/picard/IlluminaBasecallsToSam.jar BASECALLS_DIR=/run/Data/Intensities/BaseCalls LANE=1 RUN_BARCODE=120423_MiSeq SEQUENCING_CENTER=Broad BARCODE_PARAMS=~/Desktop/barcodes.txt READ_STRUCTURE=8B143T NUM_PROCESSORS=6 ADAPTERS_TO_CHECK=PAIRED_END
I get a bunch of errors. For number 1) above I get the following:
Code:

Picard version: 1.67(1190)
INFO        2012-04-25 21:09:47        IlluminaBasecallsToSam        READ STRUCTURE IS 8B143T
INFO        2012-04-25 21:09:47        IlluminaBasecallsToSam        Creating 6 TileProcessors.
Before explicit GC, Runtime.totalMemory()=85000192
After explicit GC, Runtime.totalMemory()=85000192
ERROR        2012-04-25 21:09:47        IlluminaBasecallsToSam        Exception in TileProcessor
net.sf.picard.PicardException: Barcode encountered in that was not specified in BARCODE_PARAMS: null
        at net.sf.picard.illumina.IlluminaBasecallsToSam$TileProcessor.processTile(IlluminaBasecallsToSam.java:639)
        at net.sf.picard.illumina.IlluminaBasecallsToSam$TileProcessor.run(IlluminaBasecallsToSam.java:592)
        at java.lang.Thread.run(Thread.java:680)

For number 2) above I get this:
Code:

Exception in thread "main" net.sf.picard.PicardException: Could not find a format with available files for the following data types: Position, Barcodes, BaseCalls, QualityScores, PF
        at net.sf.picard.illumina.parser.IlluminaDataProviderFactory.<init>(IlluminaDataProviderFactory.java:134)
        at net.sf.picard.illumina.IlluminaBasecallsToSam.doWork(IlluminaBasecallsToSam.java:152)
        at net.sf.picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:177)
        at net.sf.picard.illumina.IlluminaBasecallsToSam.main(IlluminaBasecallsToSam.java:377)

My barcode.txt input for the IlluminaBasecallsToSam.jar command looks like this:
Code:

BARCODE        OUTPUT        SAMPLE_ALIAS        LIBRARY_NAME
AGGTAAGG        Control.bam        Control        MiSeq_Test
TGCTCGAC        LASV_90.bam        LASV_90        MiSeq_Test
GCCTAGCC        LASV_241.bam        LASV_241        MiSeq_Test
TTGAGCCT        LASV_245.bam        LASV_245        MiSeq_Test
TATCCAGG        LASV_254.bam        LASV_254        MiSeq_Test
TGCTGCTG        LASV_263.bam        LASV_263        MiSeq_Test
GACCTAAC        LASV_289.bam        LASV_289        MiSeq_Test
CTACCAGG        LASV_291.bam        LASV_291        MiSeq_Test
CTGCGGAT        LASV_295.bam        LASV_295        MiSeq_Test
GTAACATC        LASV_309.bam        LASV_309        MiSeq_Test

Any idea what is going on here? Any help would be MUCH appreciated - I'm lost!

kga1978 04-25-2012 08:18 PM

Alright, got it - the first couple of errors were indeed caused by the READ_STRUCTURE parameter. Since I used a 151bp PE run the correct format was 151T8B151T. The second problem ('Barcode encountered in that was not specified...'), was cause by me not adding an 'N' barcode to my barcode.txt. Once I had done all that, everything worked beautifully!

Thanks Picard for being so awesome - once I get you to work!

swNGS 04-28-2012 04:04 AM

If I'm assuming that your core just gives you the FASTQS in the same form as we get them off the machine, you will have 3 files, R1, R2 and the index file (I've assumed that your PE'ing)
You can ignore the index file ( one barcode read per read in the R1&R2 files), and demultiplex based on the number at the end of the header of each read, which corresponds to the sample order in the order that they were listed in the sample run sheet on the MiSeq.
If th makes sense, I've got a script that demultiplexes based on this if your MiSeq is creating FASTQs in the same way as ours does, I'll post it when I get a mo :)

mnkyboy 05-15-2012 09:14 AM

I was wondering if you were still going to post that script? I was looking for an easy solution for this as well.

drea11 07-21-2016 11:48 AM

I'm attempting to extract barcodes from a set of illumina miseq paired-end sequences and came across this post which seems to have the best description I have seen on the picard tools... tools. I have a couple of questions regarding what is posted above, and my own issues with running the script.

1) Is the metrics_file, listed as required by the function, input by me, if so what is included in this file, and is there a sample with the appropriate format that I could use as a guide?
2) I'm unclear on the read structure string. The post included above describes a similar experimental setup to what I have, however I'm not sure how this leads to 151T8B151T as listed. I have both forward and reverse barcodes for my sequences, so would it not make more sense to have something like 8B151T151T8B?
3) The post also recommends use of an N barcode in the barcode_file. How does one include this, if indeed this is required?

As I am currently running this command,
java -jar ~/Desktop/software/picard-tools-1.119/ExtractIlluminaBarcodes.jar BASECALLS_DIR=Data/Intensities/BaseCalls LANE=1 BARCODE_FILE=barcode.txt READ_STRUCTURE=151T8B151T METRICS_FILE=metrics NUM_PROCESSORS=4

barcode.txt file is as below:
barcode_sequence_1 barcode_sequence_2 barcode_sequence_3 barcode_sequence_4 barcode_sequence_5 barcode_sequence_6barcode_sequence_7 barcode_sequence_8 barcode_sequence_9 barcode_sequence_10 barcode_sequence_11 barcode_sequence_12barcode_sequence_13 barcode_sequence_14 barcode_sequence_15 barcode_sequence_16 barcode_sequence_17
TACGCTGC ATGCGCAG TAGCGCTC ACTGAGCG CCTAAGAC CGATCAGT TCCTGAGC ATCTCAGG ACTGCATA AAGGAGTA CTAAGCCT CGTCTAAT TCTCTCCG CTCTCTAT TATCCTCT GTAAGGAG

The error I'm getting is:
Exception in thread "main" picard.PicardException: Could not find a format with available files for the following data types: BaseCalls, PF
at picard.illumina.parser.IlluminaDataProviderFactory.(IlluminaDataProviderFactory.java:172)
at picard.illumina.parser.IlluminaDataProviderFactory.(IlluminaDataProviderFactory.java:127)
at picard.illumina.ExtractIlluminaBarcodes.customCommandLineValidation(ExtractIlluminaBarcodes.java:332)
at picard.cmdline.CommandLineProgram.parseArgs(CommandLineProgram.java:242)
at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:128)
at picard.illumina.ExtractIlluminaBarcodes.main(ExtractIlluminaBarcodes.java:357)

java version 1.8.0_91

Appreciate any help!


All times are GMT -8. The time now is 12:49 PM.

Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.