Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #16
    I would caution against blindly filtering your metagenomic data against any database of contaminants. Ideally, you would have negative controls run alongside your samples that could be checked for the presence of these contaminants instead....

    Comment


    • #17
      I won't say either one of you is right or wrong, and negative controls are always a good idea. But, the reason I put together the bacterial contaminant file is because JGI is not capable of distinguishing between actual samples, and contaminants in that file. With sufficient amplification (like single cells), some wells may have high levels of a contaminant that is zero in other wells, since it only takes one particle. Some of them, like Pseudomonas, are present in reagents. Others, like E.coli, are present on human skin, and often make their way into the libraries.

      I have seen dozens of posters that incorrectly claim Pseudomonas or various other common contaminants are endemic to some environment. But it's likely artifact of poor quality control. So, I encourage you to be very cautious.

      *Edit. For reference, JGI no longer sequences anything on that list.
      Last edited by Brian Bushnell; 07-17-2017, 08:58 AM.

      Comment


      • #18
        RE: filtering soil metagenomics data

        Thank you, fanli and Brian! I guess one possibility is not to filter initially, but then check the final assembly against the contaminant files just to find out if some of the species that I am detecting are known contaminants?

        Comment


        • #19
          That's certainly a good possibility. If you have zero or trivial amounts of common contaminants, don't bother filtering. If you have a lot... then, try to figure out whether you have an exact strain match, which greatly boosts the likelihood of it originating in a reagent.

          Comment


          • #20
            Thanks a lot, Brian! I'll give it a try.

            Comment


            • #21
              Hi Brain,

              This might be a silly question, but do we need to index the reference every time for every sample?

              Thanks!
              Last edited by yang zhang; 08-27-2017, 11:27 AM.

              Comment


              • #22
                Originally posted by yang zhang View Post
                Hi Brain,

                This might be a silly question, but do we need to index the reference every time for every sample?

                Thanks!
                No. You can create an index of the reference up-front by doing
                Code:
                 bbmap.sh in=ref.fa
                that will create a "ref" directory, which will contain all index files. Do not worry about the contents of the folder since they are arranged in a way BBMap requires them.

                In future when you want to use this index replace "ref=" with "path=/path_to_directory_containing_ref_folder" in your command line.

                Comment


                • #23
                  I see. Thank you for the clarification, GenoMax!

                  Comment


                  • #24
                    Hi Brian,
                    I downloaded the files above and have successfully indexed the cat and dog file. I tried the command:

                    bbmap.sh minid=0.95 maxindel=3 bwr=0.16 bw=12 quickmatch fast minhits qtrim=rl trimq=10 untrim Xmx23g in=cleanAR1.fastq.gz outu=clean2AR1.fastq.gz outm=catAR1.fq

                    and got the following error:
                    Exception in thread “main” java.lang.NumberFormatException: null
                    at java.lang.Integer.parseInt(Integer.java:542)
                    at java.lang.Integer.parseInt(Integer.java:615)
                    at align2.AbstractMapper.parse(AbstractMapper.java:449)
                    at align2.AbstractMapper.<init>(AbstractMapper.java:54)
                    at align2.BBMap.<init>(BBMap.java:42)
                    at align2.BBMap.main(BBMap.java:30)

                    The same happened when I tried the dog masked file. I was able to successfully remove the human contamination using the masked file you provided and the commands above. Is this command not applicable to the cat, dog, and mouse files you provided? Is there an extra step I am missing? I am not versed in java so I don't know how to interpret the error.

                    coyk

                    Comment


                    • #25
                      Hi Brian,

                      Just reviving an old thread here. I have been testing out a lot of different methods to clean human reads and I really love BBMap because it's such a well thought-out program. However, when I try to clean human reads with the settings you have specified, I routinely get a ton of reads remaining - upwards of 70% (so only 30% are cleaned). I have tried to adjust the various parameters, but the only thing that seems to make a difference for depletion is the 'minid' setting. Setting that at 0.50 (which is *very* low) depletes around 95% of reads. As a comparison, a default run with bwa mem depletes 100%.

                      Any idea how I might get BBMap to more accurately deplete human reads?

                      Comment


                      • #26
                        Have you tried using "bbsplit.sh" with human genome to see if that works better. If you are interested in non-human data then I would use the non-masked genome and risk losing a few additional reads.

                        Comment


                        • #27
                          I have not - however, bbsplit doesn't really seem to be the right tool for removing human reads?

                          Comment


                          • #28
                            "bbsplit.sh" is a general purpose tool that will bin reads into any number of bins (depending on the reference sequences provided, you can provide as many as you want). In this case you would provide human_genome.fa (and any other reference you want to use). If you only use human then reads not mapping to human genome will be collected in other bin.

                            Comment


                            • #29
                              Hello Brian,

                              I am trying to use removehuman.sh on MSU HPCC.

                              Inputs (*_filtered.fastq.gz files) are phix filtered R1 and R2 files using BBduk as following: [leejooy5@dev-intel18 filtered_reads]$ bbduk.sh -Xmx10g in1=NFW_R1_trimmed.fastq.gz in2=NFW_R2_trimmed.fastq.gz out1=NFW_R1_filtered.fastq.gz out2=NFW_R2_filtered.fastq.gz ref=/opt/software/BBMap/37.93-foss-2018a/resources/phix174_ill.ref.fa.gz k=31 hdist=1 stats=GR25_stats.txt threadS=8)

                              As shown in below, error message "Exception in thread "main" java.lang.RuntimeException: Can't find file /global/projectb/sandbox/gaag/bbtools/hg19/ref/genome/1/summary.txt
                              " popped up when I tried to run "removehuman.sh". I tried without additional parameters such as -Xmx and threads, but same error happened. Also, I tried to find the find the
                              file "/global/projectb/sandbox/gaag/bbtools/hg19/ref/genome/1/summary.txt", but I couldn't. Could you tell me what mistake I did or let me know where I can find a solution? Thank you for your time and consideration.

                              Cheers,
                              Joo-Young

                              =======================
                              [leejooy5@dev-intel18 filtered_reads]$ removehuman.sh -Xmx10g in1=NFW_R1_filtered.fastq.gz in2=NFW_R2_filtered.fastq.gz out1=NFW_R1_clean.fastq.gz out2=NFW_R2_clean.fastq.gz threads=8

                              removehuman.sh -Xmx10g in1=NFW_R1_filtered.fastq.gz in2=NFW_R2_filtered.fastq.gz out1=NFW_R1_clean.fastq.gz out2=NFW_R2_clean.fastq.gz threads=8
                              java -Djava.library.path=/opt/software/BBMap/37.93-foss-2018a/jni/ -ea -Xmx10g -cp /opt/software/BBMap/37.93-foss-2018a/current/ align2.BBMap minratio=0.9 maxindel=3 bwr=0.16 bw=12 quickmatch fast minhits=2 path=/global/projectb/sandbox/gaag/bbtools/hg19 pigz unpigz zl=6 qtrim=r trimq=10 untrim idtag usemodulo printunmappedcount usejni ztd=2 kfilter=25 maxsites=1 k=14 -Xmx10g in1=NFW_R1_filtered.fastq.gz in2=NFW_R2_filtered.fastq.gz out1=NFW_R1_clean.fastq.gz out2=NFW_R2_clean.fastq.gz threads=8
                              Executing align2.BBMap [tipsearch=20, maxindel=80, minhits=2, bwr=0.18, bw=40, minratio=0.65, midpad=150, minscaf=50, quickmatch=t, rescuemismatches=15, rescuedist=800, maxsites=3, maxsites2=100, minratio=0.9, maxindel=3, bwr=0.16, bw=12, quickmatch, minhits=2, path=/global/projectb/sandbox/gaag/bbtools/hg19, pigz, unpigz, zl=6, qtrim=r, trimq=10, untrim, idtag, usemodulo, printunmappedcount, usejni, ztd=2, kfilter=25, maxsites=1, k=14, -Xmx10g, in1=NFW_R1_filtered.fastq.gz, in2=NFW_R2_filtered.fastq.gz, out1=NFW_R1_clean.fastq.gz, out2=NFW_R2_clean.fastq.gz, threads=8]
                              Version 37.93 [tipsearch=20, maxindel=80, minhits=2, bwr=0.18, bw=40, minratio=0.65, midpad=150, minscaf=50, quickmatch=t, rescuemismatches=15, rescuedist=800, maxsites=3, maxsites2=100, minratio=0.9, maxindel=3, bwr=0.16, bw=12, quickmatch, minhits=2, path=/global/projectb/sandbox/gaag/bbtools/hg19, pigz, unpigz, zl=6, qtrim=r, trimq=10, untrim, idtag, usemodulo, printunmappedcount, usejni, ztd=2, kfilter=25, maxsites=1, k=14, -Xmx10g, in1=NFW_R1_filtered.fastq.gz, in2=NFW_R2_filtered.fastq.gz, out1=NFW_R1_clean.fastq.gz, out2=NFW_R2_clean.fastq.gz, threads=8]

                              Set MINIMUM_ALIGNMENT_SCORE_RATIO to 0.650
                              Set MINIMUM_ALIGNMENT_SCORE_RATIO to 0.900
                              Set threads to 8
                              Retaining first best site only for ambiguous mappings.
                              Exception in thread "main" java.lang.RuntimeException: Can't find file /global/projectb/sandbox/gaag/bbtools/hg19/ref/genome/1/summary.txt
                              at fileIO.ReadWrite.getRawInputStream(ReadWrite.java:906)
                              at fileIO.ReadWrite.getInputStream(ReadWrite.java:871)
                              at fileIO.TextFile.open(TextFile.java:227)
                              at fileIO.TextFile.<init>(TextFile.java:71)
                              at dna.Data.setGenome2(Data.java:822)
                              at dna.Data.setGenome(Data.java:768)
                              at align2.BBMap.loadIndex(BBMap.java:313)
                              at align2.BBMap.main(BBMap.java:32)

                              Comment


                              • #30
                                @jylee: "/global/projectb/sandbox/gaag/bbtools/hg19/ref/genome/1/summary.txt" appears to refer to a location on JGI servers (if that is not your own). You will need to download and provide hg19 reference sequence. You can pre-index the genome with BBMap to use with path= or use ref= option to point to the genome sequence multi-fasta file location.

                                Comment

                                Latest Articles

                                Collapse

                                • seqadmin
                                  Strategies for Sequencing Challenging Samples
                                  by seqadmin


                                  Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                                  03-22-2024, 06:39 AM
                                • seqadmin
                                  Techniques and Challenges in Conservation Genomics
                                  by seqadmin



                                  The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                                  Avian Conservation
                                  Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                                  03-08-2024, 10:41 AM

                                ad_right_rmr

                                Collapse

                                News

                                Collapse

                                Topics Statistics Last Post
                                Started by seqadmin, Yesterday, 06:37 PM
                                0 responses
                                10 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, Yesterday, 06:07 PM
                                0 responses
                                9 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 03-22-2024, 10:03 AM
                                0 responses
                                49 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 03-21-2024, 07:32 AM
                                0 responses
                                67 views
                                0 likes
                                Last Post seqadmin  
                                Working...
                                X