Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #16
    Oh - that's an intentional protection from overwriting files. Just delete the output file first or add the "overwrite" flag.

    Comment


    • #17
      high contaninants

      Thanks.

      Input is being processed as unpaired

      Input: 385043 reads 10781204 bases.
      Contaminants: 341911 reads (88.80%) 9573508 bases (88.80%)
      Result: 43132 reads (11.20%) 1207696 bases (11.20%)

      What is diffinition of contaminants? It looks very high.

      Comment


      • #18
        I need to read 30 nt for sequences. Miseq read 32 nt in sequencing. Thus many sequences have NN at last 2 positions. Does this relate to high contaminant rate?

        Comment


        • #19
          Are you using bbduk.sh? That's the only one that prints anything about contaminants. Can you show your specific command line?

          Anyway, if you tried filtering out adapters and you got a result like that, it means you have almost no product and mostly adapter sequence.

          Comment


          • #20
            Yes, bbduk.sh.

            Input is being processed as unpaired

            Input: 385043 reads 10781204 bases.
            Contaminants: 341911 reads (88.80%) 9573508 bases (88.80%)
            Result: 43132 reads (11.20%) 1207696 bases (11.20%)

            Comment


            • #21
              Please give me the exact command line (what you typed before you hit enter).

              Comment


              • #22
                k=16 shows high contaminants than k=26

                zheng@zheng-XPS-8500:~/Desktop/bbmap/20140916ngs$ bbduk.sh -Xmx1g in=probe48mix25fg_S7_L001_R2_001.fastq ref=ngs13template.fasta stats=probe48mix25fg_S7_L001_R2_001_26.txt k=26 fbm
                java -ea -Xmx1g -cp /home/zheng/Desktop/bbmap/current/ jgi.BBDukF -Xmx1g in=probe48mix25fg_S7_L001_R2_001.fastq ref=ngs13template.fasta stats=probe48mix25fg_S7_L001_R2_001_26.txt k=26 fbm
                Executing jgi.BBDukF [-Xmx1g, in=probe48mix25fg_S7_L001_R2_001.fastq, ref=ngs13template.fasta, stats=probe48mix25fg_S7_L001_R2_001_26.txt, k=26, fbm]

                No output stream specified. To write to stdout, please specify 'out=stdout.fq' or similar.
                Initial:
                Memory: free=237m, used=14m

                Added 13 kmers; time: 0.023 seconds.
                Memory: free=228m, used=23m

                Input is being processed as unpaired

                Input: 159642 reads 4469976 bases.
                Contaminants: 130724 reads (81.89%) 3660272 bases (81.89%)
                Result: 28918 reads (18.11%) 809704 bases (18.11%)

                Time: 0.197 seconds.
                Reads Processed: 159k 811.47k reads/sec
                Bases Processed: 4469k 22.72m bases/sec
                zheng@zheng-XPS-8500:~/Desktop/bbmap/20140916ngs$ ^C
                zheng@zheng-XPS-8500:~/Desktop/bbmap/20140916ngs$ bduk.sh -Xmx1g in=probe48mix25fg_S7_L001_R2_001.fastq ref=ngs13template.fasta stats=probe48mix25fg_S7_L001_R2_001_16.txt k=16 fbm
                bduk.sh: command not found
                zheng@zheng-XPS-8500:~/Desktop/bbmap/20140916ngs$ bbduk.sh -Xmx1g in=probe48mix25fg_S7_L001_R2_001.fastq ref=ngs13template.fasta stats=probe48mix25fg_S7_L001_R2_001_16.txt k=16 fbm
                java -ea -Xmx1g -cp /home/zheng/Desktop/bbmap/current/ jgi.BBDukF -Xmx1g in=probe48mix25fg_S7_L001_R2_001.fastq ref=ngs13template.fasta stats=probe48mix25fg_S7_L001_R2_001_16.txt k=16 fbm
                Executing jgi.BBDukF [-Xmx1g, in=probe48mix25fg_S7_L001_R2_001.fastq, ref=ngs13template.fasta, stats=probe48mix25fg_S7_L001_R2_001_16.txt, k=16, fbm]

                No output stream specified. To write to stdout, please specify 'out=stdout.fq' or similar.
                Initial:
                Memory: free=237m, used=14m

                Added 143 kmers; time: 0.028 seconds.
                Memory: free=228m, used=23m

                Input is being processed as unpaired

                Input: 159642 reads 4469976 bases.
                Contaminants: 151727 reads (95.04%) 4248356 bases (95.04%)
                Result: 7915 reads (4.96%) 221620 bases (4.96%)

                Comment


                • #23
                  So... that's telling you that you are getting matches between the stuff in your input file (probe48mix25fg_S7_L001_R2_001.fastq) and your reference file (ngs13template.fasta). And a shorter kmer will always find more matches in the presence of error.

                  probe48mix25fg_S7_L001_R2_001_26.txt will contain a list of which reference sequences were seen, and how many times they were seen.

                  Comment


                  • #24
                    And a shorter kmer will always find more matches in the presence of error.

                    Here k=16 shows less match sequences than k=26

                    for k=16
                    Input: 159642 reads 4469976 bases.
                    Contaminants: 151727 reads (95.04%) 4248356 bases (95.04%)
                    Result: 7915 reads (4.96%) 221620 bases (4.96%)

                    for k=26
                    Input: 159642 reads 4469976 bases.
                    Contaminants: 130724 reads (81.89%) 3660272 bases (81.89%)
                    Result: 28918 reads (18.11%) 809704 bases (18.11%)

                    Comment


                    • #25
                      In this case, the output is misleading... BBDuk assumes that the ref file is a file of contaminants because that's what I originally designed it for. So "Contaminants" actually means "Things that match the reference". I may change the wording eventually.

                      In other words, 95.04% of the reads matched the reference for K=16 and 81.89% did for K=26.

                      Comment


                      • #26
                        Great, thanks.

                        Zheng

                        Comment


                        • #27
                          Is there a size limitation for the referece sequences? It will not work when I add a 20 bp reference sequence.

                          Comment


                          • #28
                            The size limit is the same as kmer length. So, if k=30, it will not work with anything less than a 30bp reference.

                            Comment


                            • #29
                              Thanks.

                              How do you separate unambiguousReads and ambiguousReads in bbmap.sh?

                              Comment


                              • #30
                                Ambiguously mapped reads get a "XT:A:R" tag in the sam output while unambiguously mapped get "XT:A:U".

                                You can also forbid ambiguously-mapping reads using the flag "ambig=toss", which will consider them unmapped.

                                Comment

                                Latest Articles

                                Collapse

                                • seqadmin
                                  Investigating the Gut Microbiome Through Diet and Spatial Biology
                                  by seqadmin




                                  The human gut contains trillions of microorganisms that impact digestion, immune functions, and overall health1. Despite major breakthroughs, we’re only beginning to understand the full extent of the microbiome’s influence on health and disease. Advances in next-generation sequencing and spatial biology have opened new windows into this complex environment, yet many questions remain. This article highlights two recent studies exploring how diet influences microbial...
                                  02-24-2025, 06:31 AM
                                • seqadmin
                                  Quality Control Essentials for Next-Generation Sequencing Workflows
                                  by seqadmin




                                  Like all molecular biology applications, next-generation sequencing (NGS) workflows require diligent quality control (QC) measures to ensure accurate and reproducible results. Proper QC begins at nucleic acid extraction and continues all the way through to data analysis. This article outlines the key QC steps in an NGS workflow, along with the commonly used tools and techniques.

                                  Nucleic Acid Quality Control
                                  Preparing for NGS starts with isolating the...
                                  02-10-2025, 01:58 PM

                                ad_right_rmr

                                Collapse

                                News

                                Collapse

                                Topics Statistics Last Post
                                Started by seqadmin, 03-03-2025, 01:15 PM
                                0 responses
                                46 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 02-28-2025, 12:58 PM
                                0 responses
                                167 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 02-24-2025, 02:48 PM
                                0 responses
                                525 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 02-21-2025, 02:46 PM
                                0 responses
                                256 views
                                0 likes
                                Last Post seqadmin  
                                Working...
                                X