Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #16
    Originally posted by skruglyak View Post
    We are planning a minor release of CASAVA in October that is primarily intended to handle an improvement to the number of supported index sequences. In the same release, we plan to change the default behavior and omit reads that do not pass filter from the FASTQ files. In general, we do not recommend the use of non-PF reads. Users that want to retain the non-PF reads will be able to do so by adding the following parameter to the configureBcltoFastq.pl:

    --with-failed-reads

    A read is classified as non-PF when more than one cycle in the first 25 cycles has a poor ratio (<0.6) of the brightest intensity to the sum of the brightest and second brightest.
    Our variant calling software ignores non-PF reads, but there are many alternate methods that use all data, disregarding the non-PF flag. The inclusion of non-PF reads increases time to align, increases the data footprint, increases the measured error rate, and can lead to variant calling errors. As a result we have decided to exclude such reads as the default behavior. As a consequence of being excluded from the FASTQ files, the reads will also be excluded from all downstream processing and output including BAM files – archival and standard.

    Please let me know if you have questions or concerns.

    Thank you,
    Semyon
    Semyon,

    I really appreciate that Illumina has been so responsive to customer feedback with regard to refinement of the CASAVA pipeline and I really hate to keep coming up with more things to tweak/change, but...

    I just ran my first data set through the new 1.8.2 pipeline and truly appreciate the PF only default and --fastq-cluster-count 0 options, however I noted what I consider a bug in some of the summary files produced by CASAVA. Some summary files (e.g. Flowcell_demux_summary.xml) report the number of PF clusters/bases for for both raw and PF counts. Other files (e.g. BustardSummary.xml) appear to correctly report raw and PF.

    Thanks again.

    Comment


    • #17
      A single bam file as alignment output

      Dear Semyon,

      Is there a way to get alignments in a single file per sample in bam format as alignment output?

      As far as I know we need an additional "configurebuild --targets sort bam " step to achieve it right now.

      Thanks

      Comment


      • #18
        Originally posted by kmcarr View Post
        Semyon,

        I really appreciate that Illumina has been so responsive to customer feedback with regard to refinement of the CASAVA pipeline and I really hate to keep coming up with more things to tweak/change, but...

        I just ran my first data set through the new 1.8.2 pipeline and truly appreciate the PF only default and --fastq-cluster-count 0 options, however I noted what I consider a bug in some of the summary files produced by CASAVA. Some summary files (e.g. Flowcell_demux_summary.xml) report the number of PF clusters/bases for for both raw and PF counts. Other files (e.g. BustardSummary.xml) appear to correctly report raw and PF.

        Thanks again.
        Sorry for the late reply. I somehow missed notification of the post. You are correct. The stats are computed after the FASTQ file is made, so this leads to the issue that you observe. Have you tried using SAV (sequence analysis viewer)? It reports a lot of valuable statistics created by RTA, including %PF.

        Thanks for your feedback.

        Semyon

        Comment


        • #19
          Originally posted by selen View Post
          Dear Semyon,

          Is there a way to get alignments in a single file per sample in bam format as alignment output?

          As far as I know we need an additional "configurebuild --targets sort bam " step to achieve it right now.

          Thanks
          Hi selen,

          You are correct to use configureBuild to generate the single BAM file. I spoke with a member of my team and he provided the following example.

          Thanks,
          Semyon

          $CASAVA_PATH/bin/configureBuild.pl \
          --outDir ./outdir \
          --inSampleDir /path/to/eland_alignment/Sample_exampleSample \ --samtoolsRefFile genome.fa \ --targets sort bam \ --sortKeepAllReads

          Comment


          • #20
            Originally posted by skruglyak View Post
            Sorry for the late reply. I somehow missed notification of the post. You are correct. The stats are computed after the FASTQ file is made, so this leads to the issue that you observe. Have you tried using SAV (sequence analysis viewer)? It reports a lot of valuable statistics created by RTA, including %PF.

            Thanks for your feedback.

            Semyon
            Semyon,

            Yes, it's true SAV presents some of that data, but I need the data in a format that I can parse to generate reports. This means the .xml files produced by CASAVA. The files produced by CASAVA really should properly report the number of Raw and PF clusters generated regardless of what is output to the FASTQ files.

            Comment


            • #21
              Has anyone figured out how to get the %PF per lane/barcode in the Demultiplex_Stats.htm file? Previous to version 1.8.2 this listed the %PF for each sample, now it just list 100% for all samples.

              From reading above it seems like you might need to use:

              --with-failed-reads

              Then filter the out the PF failing reads separately to get the stats to show.

              My argument would be if you have the field in the QC file, you might as well show the result as listing 100% across the board is not very useful.

              Comment

              Latest Articles

              Collapse

              • seqadmin
                Strategies for Sequencing Challenging Samples
                by seqadmin


                Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                03-22-2024, 06:39 AM
              • seqadmin
                Techniques and Challenges in Conservation Genomics
                by seqadmin



                The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                Avian Conservation
                Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                03-08-2024, 10:41 AM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by seqadmin, 03-27-2024, 06:37 PM
              0 responses
              13 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 03-27-2024, 06:07 PM
              0 responses
              11 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 03-22-2024, 10:03 AM
              0 responses
              53 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 03-21-2024, 07:32 AM
              0 responses
              69 views
              0 likes
              Last Post seqadmin  
              Working...
              X