Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • error in running casava

    I am working with data from MiSeq bcl files, I want to convert it to fastq files using the bcl2fastq.

    below is sample of my data

    [Header]
    IEMFileVersion 4
    Investigator Name XXXXX
    Experiment Name XXXX_plate01_1pool
    Date xx/xx/xxxx
    Workflow GenerateFASTQ
    Application FASTQ Only
    Assay TruSeq LT
    Description Test
    Chemistry Default

    Reads
    250
    250

    [Settings]
    ReverseComplement 0
    Adapter TTTTTTTTTTTTTTT
    AdapterRead2 AAAAAAAAAAAA

    [Data]
    FCID Lane SampleID Sample_Ref index Description Control Recipe Operator Sample_Project
    073388Sm XXXXXXXX
    073389Sm XXXXXXXX
    073390Sm XXXXXXXX
    073391Sm XXXXXXXX
    073392Sm XXXXXXXX
    073393Sm XXXXXXXX
    073394Sm XXXXXXXX
    cont1 XXXXXXXX


    bash-4.1$ /home/Downloads/CASAVA/bin/configureBclToFastq.pl --output-dir /home/Projects/Data/unaligned --input-dir /home/Projects/Data/Intensities/BaseCalls --fastq-cluster-count 0 --sample-sheet /home/Projects/Data/Intensities/BaseCalls/SampleSheet.csv --tiles s_1_* --force --mismatches 1 --ignore-missing-bcl --ignore-missing-stats --use-bases-mask n
    could not find ParserDetails.ini in /home/Downloads/localperl/lib/site_perl/5.20.2/XML/SAX
    [2015-02-24 17:18:02] [configureBclToFastq.pl] INFO: Basecalling software: RTA
    [2015-02-24 17:18:02] [configureBclToFastq.pl] INFO: version: 1.18 (build 54)
    [2015-02-24 17:18:02] [configureBclToFastq.pl] WARNING: Couldn't find run info in /home/Projects/Data/Intensities/BaseCalls/../../../RunInfo.xml
    [2015-02-24 17:18:02] [configureBclToFastq.pl] WARNING: Couldn't find RunInfo.xml for /home/Projects/Data/Intensities/BaseCalls
    [2015-02-24 17:18:02] [configureBclToFastq.pl] INFO: Original use-bases mask: n
    [2015-02-24 17:18:02] [configureBclToFastq.pl] INFO: Guessed use-bases mask: n
    ERROR: Wrong number of fields in sample sheet (expected: 10, got 8: IEMFileVersion,4,,,,,,)
    at /home/Downloads/CASAVA/lib/bcl2fastq-1.8.4/perl/Casava/Demultiplex.pm line 531

    I am running casava for the first time so any help will be appreciated

    Thank you

  • #2
    You can use a simplified samplesheet like the example here: http://seqanswers.com/forums/showpos...4&postcount=14

    See the entire thread for additional information.

    Comment


    • #3
      error in running casava

      Hi

      I have seen this thread, it is still not clear to me, by simplified are you saying to remove the run information in top and just keep the values from column FCID and onwards.

      Also in FCID column I have ids and then cont1, cont2 and so on would that be the reason for inconsistent flowcell ID.

      thank you

      Comment


      • #4
        You can manually create a Samplesheet.csv file (you can name the file anything, it has to be in comma separated value (CSV) format)) that exactly looks like the example I linked above.

        That example contains the minimum information you need to convert BCL files to fastq when de-multiplexing your samples. You will need to grab the last part of the flowcell ID from the folder name (e.g. 000000000-ADB2U).
        Last edited by GenoMax; 02-25-2015, 04:46 AM.

        Comment


        • #5
          my FCID look like this
          073388Sm
          073389Sm
          073390Sm
          073391Sm
          073392Sm
          073393Sm
          073394Sm
          cont1

          if this is not correct where should I look for it

          Thanks

          Comment


          • #6
            Those must be your sample ID's. The Samplesheet.csv file that is contained in the raw data folder does not have the Flowcell ID in the file.

            Did you get the complete raw data folder from your sequence provider? It should have a date stamp as the starting name (http://support.illumina.com/help/Seq...FileNaming.htm).

            Comment


            • #7
              This is how I have it now

              [Header]
              IEMFileVersion 4
              Investigator Name
              Experiment Name
              Date 0/00/2015
              Workflow GenerateFASTQ
              Application FASTQ Only
              Assay TruSeq LT
              Description Test
              Chemistry Default

              [Reads]
              250
              250

              [Settings]
              ReverseComplement 0
              Adapter
              AdapterRead2

              [Data]
              FCID Lane Sample_ID SampleRef index Description Control Recipe Operator SampleProject
              000000000-ADBFK 070008Sm xxxxxxxx
              000000000-ADBFK 070009Sm xxxxxxxx
              000000000-ADBFK 070010Sm xxxxxxxx
              000000000-ADBFK 070011Sm xxxxxxxx
              000000000-ADBFK 070012Sm xxxxxxxx
              000000000-ADBFK 070013Sm xxxxxxxx
              000000000-ADBFK 070014Sm xxxxxxxx
              000000000-ADBFK cont1 xxxxxxxx
              000000000-ADBFK 070016Sm xxxxxxxx
              000000000-ADBFK 070017Sm xxxxxxxx

              I am still getting this error

              /home/CASAVA/bin/configureBclToFastq.pl --output-dir /home/unaligned --input-dir /home/DevelopmentRun1/Data/Intensities/BaseCalls --fastq-cluster-count 0 --sample-sheet /home/DevelopmentRun1/SampleSheet1.csv --tiles s_1_* --force --mismatches 1 --ignore-missing-bcl --ignore-missing-stats --use-bases-mask n
              could not find ParserDetails.ini in /home/localperl/lib/site_perl/5.20.2/XML/SAX
              [2015-02-25 10:38:57] [configureBclToFastq.pl] INFO: Basecalling software: RTA
              [2015-02-25 10:38:57] [configureBclToFastq.pl] INFO: version: 1.18 (build 54)
              [2015-02-25 10:38:57] [configureBclToFastq.pl] INFO: Original use-bases mask: n
              [2015-02-25 10:38:57] [configureBclToFastq.pl] INFO: Guessed use-bases mask: n,IIIIIIIn,yyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyy
              ERROR: FlowCell ID is inconsistent across Sample Sheet lines. Expected: 'Casava:emultiplex::SampleSheet::Csv=HASH(0x2f16e80)->flowCellId()', got Investigator Name
              at /home/CASAVA/lib/bcl2fastq-1.8.4/perl/Casava/Demultiplex.pm line 531

              Thank you for the kind help

              Comment


              • #8
                Is this a 1D or 2D barcode run?

                Comment


                • #9
                  Your samplesheet (if this is a 1D run) needs to look like this:

                  Code:
                  FCID,Lane,Sample_ID,SampleRef,index,Description,Control,Recipe,Operator,SampleProject
                  000000000-ADBFK,1,073388Sm,no_ref,PUT_TAG_SEQ_HERE,NA,N,NA,NA,
                  000000000-ADBFK,1,073389Sm,no_ref,PUT_TAG_SEQ_HERE,NA,N,NA,NA,
                  000000000-ADBFK,1,073390Sm,no_ref,PUT_TAG_SEQ_HERE,NA,N,NA,NA,
                  000000000-ADBFK,1,073391Sm,no_ref,PUT_TAG_SEQ_HERE,NA,N,NA,NA,
                  and so on
                  The samplesheet that you are using is needed if you were using MiSeq reporter to do the analysis.
                  Last edited by GenoMax; 02-25-2015, 09:09 AM.

                  Comment


                  • #10
                    its a 2D run and now my sample sheet looks like this:

                    FCID,Lane,Sample_ID,SampleRef,index,Description,Control,Recipe,Operator,SampleProject
                    000000000-ADBFK,1,070008Sm,,CGCCCGCC-AAAAAAAA,,N,,,
                    000000000-ADBFK,1,070009Sm,,CGCCCGCC-AAAAAAAA,,N,,,
                    000000000-ADBFK,1,070000Sm,,CGCCCGCC-AAAAAAAA,,N,,,
                    000000000-ADBFK,1,070011Sm,,CGCCCGCC-AAAAAAAA,,N,,,
                    000000000-ADBFK,1,070012Sm,,CGCCCGCC-AAAAAAAA,,N,,,
                    000000000-ADBFK,1,070013Sm,,CGCCCGCC-AAAAAAAA,,N,,,

                    /home/CASAVA/bin/configureBclToFastq.pl --output-dir /home/DevelopmentRun1/unaligned --input-dir /home/DevelopmentRun1/Data/Intensities/BaseCalls --fastq-cluster-count 0 --sample-sheet /home/DevelopmentRun1/SampleSheet1.csv --tiles s_1_* --force --mismatches 1 --ignore-missing-bcl --ignore-missing-stats --use-bases-mask n
                    could not find ParserDetails.ini in /home/Downloads/localperl/lib/site_perl/5.20.2/XML/SAX
                    [2015-02-25 15:00:14] [configureBclToFastq.pl] INFO: Basecalling software: RTA
                    [2015-02-25 15:00:14] [configureBclToFastq.pl] INFO: version: 1.18 (build 54)
                    [2015-02-25 15:00:14] [configureBclToFastq.pl] INFO: Original use-bases mask: n
                    [2015-02-25 15:00:14] [configureBclToFastq.pl] INFO: Guessed use-bases mask: n,IIIIIIIn,yyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyy
                    [2015-02-25 15:00:14] [configureBclToFastq.pl] ERROR: barcode ACGCATGGATGACTGG for lane 1 has length 16: expected barcode lenth (including delimiters) is 7
                    [2015-02-25 15:00:14] [configureBclToFastq.pl] BACKTRACE: at /home/Downloads/CASAVA/lib/bcl2fastq-1.8.4/perl/Casava/Demultiplex.pm line 553
                    Casava:emultiplex::loadSampleSheet('Casava:emultiplex=HASH(0x20f1498)') called at /home/Downloads/CASAVA/bin/configureBclToFastq.pl line 427
                    Died at /home/Downloads/CASAVA/lib/bcl2fastq-1.8.4/perl/Casava/Common/Log.pm line 310

                    I have tried to change the --use-bases-mask to different I settings as well but none of them seems to work

                    Comment


                    • #11
                      Use this base mask (if you have 8 bp tags).
                      Code:
                      --use-bases-mask Y*,I8,I8,Y*
                      or you could completely omit that option and bcl2fastq will guess the correct values from RunInfo.xml file.

                      I hope you have separate tags for each sample since giving identical tags to all samples is not going to separate any samples.

                      Comment


                      • #12
                        It worked but I don't see a Demultiplex_Stats File and DemultiplexedBustardSummary.xml. Looks like it is putting everything in Undetermined _indices.

                        Comment


                        • #13
                          I have it working now thank you for all the guidance.

                          Comment


                          • #14
                            Great. All of your output files should be in the "Unaligned/Basecall_Stats*" and "Unaligned/Project_*" directories. Undetermined pile of sequences goes into "Undetermined_indices" directory.

                            Comment


                            • #15
                              yes that's where they are thank you!

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Strategies for Sequencing Challenging Samples
                                by seqadmin


                                Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                                03-22-2024, 06:39 AM
                              • seqadmin
                                Techniques and Challenges in Conservation Genomics
                                by seqadmin



                                The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                                Avian Conservation
                                Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                                03-08-2024, 10:41 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, Yesterday, 06:37 PM
                              0 responses
                              10 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, Yesterday, 06:07 PM
                              0 responses
                              9 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 03-22-2024, 10:03 AM
                              0 responses
                              50 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 03-21-2024, 07:32 AM
                              0 responses
                              67 views
                              0 likes
                              Last Post seqadmin  
                              Working...
                              X