Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Default Change in CASAVA / BCL->FASTQ

    We are planning a minor release of CASAVA in October that is primarily intended to handle an improvement to the number of supported index sequences. In the same release, we plan to change the default behavior and omit reads that do not pass filter from the FASTQ files. In general, we do not recommend the use of non-PF reads. Users that want to retain the non-PF reads will be able to do so by adding the following parameter to the configureBcltoFastq.pl:

    --with-failed-reads

    A read is classified as non-PF when more than one cycle in the first 25 cycles has a poor ratio (<0.6) of the brightest intensity to the sum of the brightest and second brightest.
    Our variant calling software ignores non-PF reads, but there are many alternate methods that use all data, disregarding the non-PF flag. The inclusion of non-PF reads increases time to align, increases the data footprint, increases the measured error rate, and can lead to variant calling errors. As a result we have decided to exclude such reads as the default behavior. As a consequence of being excluded from the FASTQ files, the reads will also be excluded from all downstream processing and output including BAM files – archival and standard.

    Please let me know if you have questions or concerns.

    Thank you,
    Semyon

  • #2
    Thank you.

    Comment


    • #3
      This saves us from having to add steps post standard CASAVA processing. Thanks.

      Comment


      • #4
        Thanks for changing this - this will help us a lot.

        If there's any chance you could add a switch to eland to make it write out a single bam file containing just the alignments (the equivalent to the old sorted files) when it is run as part of a pipeline then we'd have all of the functionality back in 1.8 which we had in previous versions, with the benefits of smaller more standard output files.

        Comment


        • #5
          What are the alternate methods that use all data, disregarding the non-PF flag?

          Comment


          • #6
            Originally posted by msincan View Post
            What are the alternate methods that use all data, disregarding the non-PF flag?
            A common aligner such as BWA will not recognize the filter flag in our FASTQ file. As a result, the BAM bitwise flag that reflects "not passing quality controls" will not be set. Any variant caller (samtools or GATK) will end up using all of the data.

            Thanks,

            Semyon

            Comment


            • #7
              Hi Semyon

              Could you clarify if the unaligned reads in SAM/BAM format from Illumina CASAVA 1.8.x have FLAG bit 0x200 (not passing quality controls) set according to your non-PF QC?

              Thanks,

              Peter

              Comment


              • #8
                Thank you for this change!

                Is this patch released yet, or is there a more specific ETA?

                Comment


                • #9
                  Originally posted by maubp View Post
                  Hi Semyon

                  Could you clarify if the unaligned reads in SAM/BAM format from Illumina CASAVA 1.8.x have FLAG bit 0x200 (not passing quality controls) set according to your non-PF QC?

                  Thanks,

                  Peter
                  Hi Peter,

                  There is a distinction between unaligned reads and non-PF reads.
                  In CASAVA 1.8 all non-PF reads in the BAM output have the "not passing quality controls" flag bit set (0x200). Note that this setting is independent of alignment -- unaligned reads are indicated with the conventional "segment unmapped" flag bit (0x004). Starting in 1.8.2,the default behavior will be to exclude non-PF reads entirely as explained earlier in the post.

                  Thanks,
                  Semyon

                  Comment


                  • #10
                    Thanks.

                    Apologies if I was unclear - I was trying to distinguish raw reads in SAM/BAM (all reads unaligned) from a finished assembly/mapping in SAM/BAM (where most of the reads are aligned).

                    Comment


                    • #11
                      Originally posted by maubp View Post
                      Thanks.

                      Apologies if I was unclear - I was trying to distinguish raw reads in SAM/BAM (all reads unaligned) from a finished assembly/mapping in SAM/BAM (where most of the reads are aligned).

                      Sorry if I still misunderstand... CASAVA produces FASTQ files (not unaligned BAMs). The only BAM files produced are post alignment.

                      Comment


                      • #12
                        Oh. I was under the (wrong?) impression that Illumina was looking at producing unaligned SAM/BAM as an output alternative. This idea is attractive because it has explicit standards for things like QC flags, and other things like read pairings - rather than the current pain where the precise encoding of this meta information into the FASTQ free text seems to change far too often.

                        Perhaps I'd misheard the news that Illumina was doing post-alignment output as SAM/BAM.

                        Update: See this blog post and this thread for more about unaligned SAM/BAM as an alternative to FASTQ.
                        Last edited by maubp; 10-21-2011, 04:51 AM.

                        Comment


                        • #13
                          Originally posted by mcrusch View Post
                          Thank you for this change!

                          Is this patch released yet, or is there a more specific ETA?

                          1.8.2 is available starting today.

                          Thanks,

                          Semyon

                          Comment


                          • #14
                            Originally posted by skruglyak View Post
                            1.8.2 is available starting today.

                            Thanks,

                            Semyon
                            Semyon,

                            Thanks for letting us know.

                            I do have one nit to pick however. Starting with CASAVA 1.8 the download tarball contains a massive validation data set, 1.5GB (>90% of the uncompressed size). Would it be possible to separate the code from the sample data for the folks who don't want to spend a couple of hours downloading the software. It's particularly irksome since I've now downloaded exactly the same data set 3 times (with 1.8.0, 1.8.1 and now 1.8.2).

                            Thanks.

                            Comment


                            • #15
                              Originally posted by kmcarr View Post
                              Would it be possible to separate the code from the sample data
                              Yes - if you could change this it would be great! Very few people will bother running the validation data and those that want to will be happy to download it. We got awful transfer rates from illumina.com (presumably pulling data over a transatlantic link), and downloading the last Casava update took the best part of a day.

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Techniques and Challenges in Conservation Genomics
                                by seqadmin



                                The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                                Avian Conservation
                                Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                                03-08-2024, 10:41 AM
                              • seqadmin
                                The Impact of AI in Genomic Medicine
                                by seqadmin



                                Artificial intelligence (AI) has evolved from a futuristic vision to a mainstream technology, highlighted by the introduction of tools like OpenAI's ChatGPT and Google's Gemini. In recent years, AI has become increasingly integrated into the field of genomics. This integration has enabled new scientific discoveries while simultaneously raising important ethical questions1. Interviews with two researchers at the center of this intersection provide insightful perspectives into...
                                02-26-2024, 02:07 PM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, 03-14-2024, 06:13 AM
                              0 responses
                              33 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 03-08-2024, 08:03 AM
                              0 responses
                              72 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 03-07-2024, 08:13 AM
                              0 responses
                              80 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 03-06-2024, 09:51 AM
                              0 responses
                              68 views
                              0 likes
                              Last Post seqadmin  
                              Working...
                              X