Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Demultiplexing MiSeq run - problems

    Hi All,

    I just got a MiSeq run back from our core and I am trying to demultiplex everything. However, I have received no instructions as to how I do this, so I am a bit in the dark. The run was 151bp paired-end and the first eight bases make up the barcode.

    I have been doing the following:

    1. Extract Illumina barcodes using the following command:
    Code:
    java -Xmx2g -jar /usr/local/bin/picard/ExtractIlluminaBarcodes.jar BASECALLS_DIR=/run/Data/Intensities/BaseCalls LANE=1 BARCODE_FILE=~/Desktop/extract.txt READ_STRUCTURE=8B143T METRICS_FILE=metrics NUM_PROCESSORS=6
    The 'extract.txt' file containing the barcodes look like this:
    Code:
    barcode_sequence_1	barcode_sequence_2	barcode_sequence_3	barcode_sequence_4	barcode_sequence_5	barcode_sequence_6	barcode_sequence_7	barcode_sequence_8	barcode_sequence_9	barcode_sequence_10
    AGGTAAGG	TGCTCGAC	GCCTAGCC	TTGAGCCT	TATCCAGG	TGCTGCTG	GACCTAAC	CTACCAGG	CTGCGGAT	GTAACATC
    The output from this creates 12 s_1_00xx_barcode.txt files as expected - e.g.:
    Code:
    .CTGATTT        N       aggtaagg        6       9
    .TCCTCCT        N               8       0
    .CCTGCTT        N       aggtaagg        6       9
    CCCACACC        N       aggtaagg        7       9
    CTCGCTCC        N               8       0
    CCTCAGCC        N       aggtaagg        7       9
    TGCCAGTA        N       aggtaagg        6       9
    CTTTTGTT        N       aggtaagg        7       9
    CGACTCCT        N       aggtaagg        7       9
    GTTACGCG        N       aggtaagg        7       9
    However, this doesn't look right as all the 12 files look very similar to this with only the first barcode (aggtaagg) displayed in the third column? I'm wondering whether my READ_STRUCTURE=8B143T isn't correct or whether the barcode file isn't read correctly?

    2. Alternatively I have given all the barcodes in the command:

    Code:
    java -Xmx2g -jar /usr/local/bin/picard/ExtractIlluminaBarcodes.jar BASECALLS_DIR=/run/Data/Intensities/BaseCalls LANE=1 BARCODE=AGGTAAGG BARCODE=TGCTCGAC BARCODE=GCCTAGCC BARCODE=TTGAGCCT BARCODE=TATCCAGG BARCODE=TGCTGCTG BARCODE=GACCTAAC BARCODE=CTACCAGG BARCODE=CTGCGGAT BARCODE=GTAACATC READ_STRUCTURE=8B143T METRICS_FILE=metrics NUM_PROCESSORS=6
    But now the output looks different:
    Code:
    .TCCTCCT        N       ttgagcct        3       4
    .CCTGCTT        N       tgctgctg        2       4
    CCCACACC        N       gcctagcc        4       4
    CTCGCTCC        N       tgctcgac        5       5
    CCTCAGCC        N       gcctagcc        3       5
    TGCCAGTA        N       tgctcgac        4       4
    CTTTTGTT        N       ctgcggat        4       6
    CGACTCCT        N       ttgagcct        5       5
    GTTACGCG        N       gtaacatc        4       5
    CCCCCATT        N       ctaccagg        4       5
    CTAACACG        N       ctaccagg        2       3
    This also doesn't look correct.

    When I run the IlluminaBasecallsToSam command:

    Code:
    java -Xmx2g -jar /usr/local/bin/picard/IlluminaBasecallsToSam.jar BASECALLS_DIR=/run/Data/Intensities/BaseCalls LANE=1 RUN_BARCODE=120423_MiSeq SEQUENCING_CENTER=Broad BARCODE_PARAMS=~/Desktop/barcodes.txt READ_STRUCTURE=8B143T NUM_PROCESSORS=6 ADAPTERS_TO_CHECK=PAIRED_END
    I get a bunch of errors. For number 1) above I get the following:
    Code:
    Picard version: 1.67(1190)
    INFO	2012-04-25 21:09:47	IlluminaBasecallsToSam	READ STRUCTURE IS 8B143T
    INFO	2012-04-25 21:09:47	IlluminaBasecallsToSam	Creating 6 TileProcessors.
    Before explicit GC, Runtime.totalMemory()=85000192
    After explicit GC, Runtime.totalMemory()=85000192
    ERROR	2012-04-25 21:09:47	IlluminaBasecallsToSam	Exception in TileProcessor
    net.sf.picard.PicardException: Barcode encountered in that was not specified in BARCODE_PARAMS: null
    	at net.sf.picard.illumina.IlluminaBasecallsToSam$TileProcessor.processTile(IlluminaBasecallsToSam.java:639)
    	at net.sf.picard.illumina.IlluminaBasecallsToSam$TileProcessor.run(IlluminaBasecallsToSam.java:592)
    	at java.lang.Thread.run(Thread.java:680)
    For number 2) above I get this:
    Code:
    Exception in thread "main" net.sf.picard.PicardException: Could not find a format with available files for the following data types: Position, Barcodes, BaseCalls, QualityScores, PF
    	at net.sf.picard.illumina.parser.IlluminaDataProviderFactory.<init>(IlluminaDataProviderFactory.java:134)
    	at net.sf.picard.illumina.IlluminaBasecallsToSam.doWork(IlluminaBasecallsToSam.java:152)
    	at net.sf.picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:177)
    	at net.sf.picard.illumina.IlluminaBasecallsToSam.main(IlluminaBasecallsToSam.java:377)
    My barcode.txt input for the IlluminaBasecallsToSam.jar command looks like this:
    Code:
    BARCODE	OUTPUT	SAMPLE_ALIAS	LIBRARY_NAME
    AGGTAAGG	Control.bam	Control	MiSeq_Test
    TGCTCGAC	LASV_90.bam	LASV_90	MiSeq_Test
    GCCTAGCC	LASV_241.bam	LASV_241	MiSeq_Test
    TTGAGCCT	LASV_245.bam	LASV_245	MiSeq_Test
    TATCCAGG	LASV_254.bam	LASV_254	MiSeq_Test
    TGCTGCTG	LASV_263.bam	LASV_263	MiSeq_Test
    GACCTAAC	LASV_289.bam	LASV_289	MiSeq_Test
    CTACCAGG	LASV_291.bam	LASV_291	MiSeq_Test
    CTGCGGAT	LASV_295.bam	LASV_295	MiSeq_Test
    GTAACATC	LASV_309.bam	LASV_309	MiSeq_Test
    Any idea what is going on here? Any help would be MUCH appreciated - I'm lost!

  • #2
    Alright, got it - the first couple of errors were indeed caused by the READ_STRUCTURE parameter. Since I used a 151bp PE run the correct format was 151T8B151T. The second problem ('Barcode encountered in that was not specified...'), was cause by me not adding an 'N' barcode to my barcode.txt. Once I had done all that, everything worked beautifully!

    Thanks Picard for being so awesome - once I get you to work!

    Comment


    • #3
      If I'm assuming that your core just gives you the FASTQS in the same form as we get them off the machine, you will have 3 files, R1, R2 and the index file (I've assumed that your PE'ing)
      You can ignore the index file ( one barcode read per read in the R1&R2 files), and demultiplex based on the number at the end of the header of each read, which corresponds to the sample order in the order that they were listed in the sample run sheet on the MiSeq.
      If th makes sense, I've got a script that demultiplexes based on this if your MiSeq is creating FASTQs in the same way as ours does, I'll post it when I get a mo

      Comment


      • #4
        I was wondering if you were still going to post that script? I was looking for an easy solution for this as well.

        Comment


        • #5
          I'm attempting to extract barcodes from a set of illumina miseq paired-end sequences and came across this post which seems to have the best description I have seen on the picard tools... tools. I have a couple of questions regarding what is posted above, and my own issues with running the script.

          1) Is the metrics_file, listed as required by the function, input by me, if so what is included in this file, and is there a sample with the appropriate format that I could use as a guide?
          2) I'm unclear on the read structure string. The post included above describes a similar experimental setup to what I have, however I'm not sure how this leads to 151T8B151T as listed. I have both forward and reverse barcodes for my sequences, so would it not make more sense to have something like 8B151T151T8B?
          3) The post also recommends use of an N barcode in the barcode_file. How does one include this, if indeed this is required?

          As I am currently running this command,
          java -jar ~/Desktop/software/picard-tools-1.119/ExtractIlluminaBarcodes.jar BASECALLS_DIR=Data/Intensities/BaseCalls LANE=1 BARCODE_FILE=barcode.txt READ_STRUCTURE=151T8B151T METRICS_FILE=metrics NUM_PROCESSORS=4

          barcode.txt file is as below:
          barcode_sequence_1 barcode_sequence_2 barcode_sequence_3 barcode_sequence_4 barcode_sequence_5 barcode_sequence_6barcode_sequence_7 barcode_sequence_8 barcode_sequence_9 barcode_sequence_10 barcode_sequence_11 barcode_sequence_12barcode_sequence_13 barcode_sequence_14 barcode_sequence_15 barcode_sequence_16 barcode_sequence_17
          TACGCTGC ATGCGCAG TAGCGCTC ACTGAGCG CCTAAGAC CGATCAGT TCCTGAGC ATCTCAGG ACTGCATA AAGGAGTA CTAAGCCT CGTCTAAT TCTCTCCG CTCTCTAT TATCCTCT GTAAGGAG

          The error I'm getting is:
          Exception in thread "main" picard.PicardException: Could not find a format with available files for the following data types: BaseCalls, PF
          at picard.illumina.parser.IlluminaDataProviderFactory.(IlluminaDataProviderFactory.java:172)
          at picard.illumina.parser.IlluminaDataProviderFactory.(IlluminaDataProviderFactory.java:127)
          at picard.illumina.ExtractIlluminaBarcodes.customCommandLineValidation(ExtractIlluminaBarcodes.java:332)
          at picard.cmdline.CommandLineProgram.parseArgs(CommandLineProgram.java:242)
          at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:128)
          at picard.illumina.ExtractIlluminaBarcodes.main(ExtractIlluminaBarcodes.java:357)

          java version 1.8.0_91

          Appreciate any help!

          Comment

          Latest Articles

          Collapse

          • seqadmin
            Strategies for Sequencing Challenging Samples
            by seqadmin


            Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
            03-22-2024, 06:39 AM
          • seqadmin
            Techniques and Challenges in Conservation Genomics
            by seqadmin



            The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

            Avian Conservation
            Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
            03-08-2024, 10:41 AM

          ad_right_rmr

          Collapse

          News

          Collapse

          Topics Statistics Last Post
          Started by seqadmin, Yesterday, 06:37 PM
          0 responses
          10 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, Yesterday, 06:07 PM
          0 responses
          9 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 03-22-2024, 10:03 AM
          0 responses
          49 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 03-21-2024, 07:32 AM
          0 responses
          67 views
          0 likes
          Last Post seqadmin  
          Working...
          X