Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Undetermined.fastq file

    Hello,

    This may be a very elementary question but since what I have found thus far on the internet has not entirely clarified this for me, I figured I'd ask here.

    When a sequencing experiment is run on an Illumina platform, after demultiplexing, there are always *_Undetermined.fastq.gz files. I am lost as to why exactly some reads end up in there, and what the purpose of this file is. I've read that sometimes one may use this file to observe index frequencies or for other troubleshooting issues, but again, I am not entirely clear on this. Is the presence of this file strictly for troubleshooting (i.e. the reads in this file will never be used in any downstream analysis)??

    Thanks in advance for any help on this.

  • #2
    I think it's just the reads where it has not been possible to demultiplex on the barcode with sufficient accuracy. There's always some data here even if your sample sheet is set up properly.

    Comment


    • #3
      That's correct. Undetermined is also where PhiX reads are supposed to end up if it was spiked in.

      Comment


      • #4
        There are some special circumstances when I deliberately want the reads to go into "undetermined" file (when using CASAVA or bcl2fastq to demultiplex). This preserves the tags in the read ID's. We have built a demultiplexer for Qiime that can use this undetermined file to produce sample files in the qiime format. (Sending all reads to "undetermined" file is achieved by including a dummy tag sequence like YYYY in the samplesheet)

        Comment


        • #5
          Yes, people use it for a variety of different purposes, but it's actual intended purpose is simply as a catch-all for any read that wasn't assignable to a sample for any reason (poor quality, incorrect indexes specified in the sample sheet, missing index sequences (i.e. PhiX reads, which have no index) sequencing error in the index read for that sequence, etc.)

          Comment


          • #6
            Undetermined sequence

            Hi,

            Can anyone please explain how miseq (300bp pair-end sequencing) determines undetermined sequences. I am studying for a microbiome in the plant tissue collected from a fruit plant in the environment. Sequence provider mentioned that I got about 20GB of undetermined, but when I did OTU analysis and BLAST, I am able to differentiate different species and different undetermined OTUs (it is ok to me to expect some undetermined OTUs from the environmental sample) I got bit confused ...does undetermined OTUs are different form the miseq picked undetermined folder?

            Thank you
            Vanga

            Comment


            • #7
              Miseq does not determine anything. It is a sequencing platform; all it does is produce sequences - it is up to the user to determine what they are.

              Illumina sequencing platforms support multiplexing, in which multiple libraries are sequenced together. They have different indexes (or bar codes) which indicate the library they came from. During demultiplexing, the reads are split into different libraries based on the bar code (typically, 8bp sequences within the adapters of the molecule being sequenced). If the bar code sequence is low quality, the read will be sent to the "undetermined" bin, meaning that it is not clear which library it came from. It may be possible for the user to BLAST the undetermined bin and decide with high confidence which organism it came from, in situations where the multiplexed organisms are very different. But, I don't recommend that, as it will increase noise. Instead, if you are getting a large volume in your undetermined bin, you should complain to Illumina (or whoever provides your adapters) about wasted sequence due to the low quality of the index reads, or insufficient length and edit distance of indexes to distinguish between libraries.
              Last edited by Brian Bushnell; 03-01-2017, 10:06 PM.

              Comment


              • #8
                Thank you

                In this case, though about 2.0GB was sent into an undetermined bin, I still have obtained about 900 OTUs with good length (350 to 380bp), and sequence depth (about 50,000 reads per sample). BTW what is a good sequence depth? is there any rough figure to judge the sequence depth or is it highly variable based on the sample.

                Comment


                • #9
                  "50,000 reads per sample" is not a depth. A depth would be something like "300x", which would be the result of, for example, sequencing 10 million 2x150bp pairs (3Gbp) for a 10Mbp organism.

                  It would be helpful if you could clarify your experiment and goal. Also, I suggest you repost the question in a new thread as it is unrelated to the current thread. By that, I mean, take some time to think about the optimal phrasing of the question, and then create a new thread explaining everything you know about the situation, and what you want to accomplish.

                  Comment


                  • #10
                    Originally posted by bvanga View Post
                    In this case, though about 2.0GB was sent into an undetermined bin, I still have obtained about 900 OTUs with good length (350 to 380bp), and sequence depth (about 50,000 reads per sample). BTW what is a good sequence depth? is there any rough figure to judge the sequence depth or is it highly variable based on the sample.
                    Using reads from "undetermined' pool (if they ended up there after allowing for 1 or more errors in tag reads) is questionable. There are always some reads that can't be explained by observed "tags" in multiplex sequencing. Even if you were able to obtain OTU's from them, you can't be sure which of your samples they belong to.

                    Comment

                    Latest Articles

                    Collapse

                    • seqadmin
                      Strategies for Sequencing Challenging Samples
                      by seqadmin


                      Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                      03-22-2024, 06:39 AM
                    • seqadmin
                      Techniques and Challenges in Conservation Genomics
                      by seqadmin



                      The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                      Avian Conservation
                      Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                      03-08-2024, 10:41 AM

                    ad_right_rmr

                    Collapse

                    News

                    Collapse

                    Topics Statistics Last Post
                    Started by seqadmin, Yesterday, 06:37 PM
                    0 responses
                    8 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, Yesterday, 06:07 PM
                    0 responses
                    8 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 03-22-2024, 10:03 AM
                    0 responses
                    49 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 03-21-2024, 07:32 AM
                    0 responses
                    67 views
                    0 likes
                    Last Post seqadmin  
                    Working...
                    X