Seqanswers Leaderboard Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • Brian Bushnell
    Super Moderator
    • Jan 2014
    • 2709

    Hi Shini,

    I will certainly do this if I get a chance. Right now I'm working on 3 new pieces of software, which have priority... but I have more time for it than I did in January, so hopefully I'll be able to get to it sometime soon.

    -Brian

    Comment

    • Shini Sunagawa
      Junior Member
      • Jan 2016
      • 8

      Hi Brian,

      This sounds really great!

      Just FYI, I ran another large clustering job in less then 5 h, which required >1 month (!) using alternative tools, with similar output. As I said, it is only that dedupe does not provide the containment information... .

      Best wishes,
      Shini

      Comment

      • yaanlpc
        Junior Member
        • Feb 2012
        • 3

        hey, Brian, I am not able to enable the multiple hits feature with bbmap. I actually want to study the indels. I constructed a ref with 3 templates, one original and two with ins and del for 2 and 3 bp, respectively. however, the output indicates that read can only have one hit (the original), even I set maxindel=10 maxsites=10.

        my command:

        bbmap.sh ref=../input/demo.genome.fa in=../input/query.fastq outm=test.sam threads=1 maxindel=10 maxsites=10
        what's the way out?

        ref seq:
        >chr1_ins
        atcggatCCAAAATCTAGCCGGTCCTCTCACAATAGCAAATTGGGCTTGCAAAATggCCCAAATGCTCTCTCCACATCTTTTCTAGCCGCCGCATGAGCATTGTGGAAGTttgac(insertion gg from >chr1)
        >chr1
        atcggatCCAAAATCTAGCCGGTCCTCTCACAATAGCAAATTGGGCTTGCAAAATCCCAAATGCTCTCTCCACATCTTTTCTAGCCGCCGCATGAGCATTGTGGAAGTttgac
        >chr2_del
        atcggatCCAAAATCTAGCCGGTCCTCTCACAATAGCAAATTGGGCTTGCAAAATAAATGCTCTCTCCACATCTTTTCTAGCCGCCGCATGAGCATTGTGGAAGTttgac(deletion ccc from >chr1)


        input:
        @HWI-ST822:1690U49ACXX:8:1101:2952:2236 1:N:0:CAGATC
        CCAAAATCTAGCCGGTCCTCTCACAATAGCAAATTGGGCTTGCAAAATCCCAAATGCTCTCTCCACATCTTTTCTAGCCGCCGCATGAGCATTGTGGAAGT
        +
        @C@FFDFFHHFFDIJDDEHHGDEHIIIBC>FEIADGAHHGHADHEDGIIHGDEGGEGGGGGIGIIEHEECD@CD>?CCED@@;B;9<>:@@CC:@C84:>>

        outputonly one hit)
        HWI-ST822:1690U49ACXX:8:1101:2952:2236 1:N:0:CAGATC 0 chr1 8 38 101= * 0 0 CCAAAATCTAGCCGGTCCTCTCACAATAGCAAATTGGGCTTGCAAAATCCCAAATGCTCTCTCCACATCTTTTCTAGC
        CGCCGCATGAGCATTGTGGAAGT @C@FFDFFHHFFDIJDDEHHGDEHIIIBC>FEIADGAHHGHADHEDGIIHGDEGGEGGGGGIGIIEHEECD@CD>?CCED@@;B;9<>:@@CC:@C84:>> NM:i:0 AM:i:38
        Last edited by yaanlpc; 10-04-2016, 03:20 PM.

        Comment

        • sebl
          Member
          • Mar 2014
          • 26

          coverage stats

          Hi,

          I would like to calculate the average coverage of some genomes. I used:

          Code:
          bbmap.sh in1=reads_R1.fq in2=reads_R2.fq ref=scaffolds.fasta nodisk covstats=stats.txt covhist=histogram.txt
          I get nice output files with coverage etc. for each scaffold, but the average for the whole genome is shown only in the terminal. How could I output it to a file, and preferably in a way that I can get tens of genomes coverages summarized in the same file?

          Thanks!

          Comment

          • GenoMax
            Senior Member
            • Feb 2008
            • 7142

            Originally posted by sebl View Post
            Hi,

            I would like to calculate the average coverage of some genomes. I used:

            Code:
            bbmap.sh in1=reads_R1.fq in2=reads_R2.fq ref=scaffolds.fasta nodisk covstats=stats.txt covhist=histogram.txt
            I get nice output files with coverage etc. for each scaffold, but the average for the whole genome is shown only in the terminal. How could I output it to a file, and preferably in a way that I can get tens of genomes coverages summarized in the same file?

            Thanks!
            Capture the standard out/error streams to a file (would depend on your shell but examples for bash are here) and you will have that information in a file.

            Comment

            • Carol_B
              Junior Member
              • Oct 2016
              • 1

              Originally posted by Brian Bushnell View Post
              You can filter proper pairs only with the "po" flag; unpaired reads will then be classified as unmapped. You can use the 'outm' stream to just get mapped reads. You can also filter with samtools after the fact.

              From these, you could filter only the pairs where at least one read is uniquely aligned (mapq>3 or some higher threshold for greater confidence), but this can't be done in BBMap currently; you could, however, do this with a custom script or with samtools.
              Hi Brian,

              I'm also struggling with the selection of unique alignments. I had previously used the option ambiguous=toss, but for some pair-end reads I only get the forward to be considered aligned. I saw your suggestions of removing the "ambiguos=toss" flag and later filter the alignment based on their mapq value. Nevertheless, I am curious on how to select a correct threshold for my filtering. I am using local=t alignment, is the maximum mapq score different when you use local as compared to global alignment? why do you suggest 4 as a good threshold?

              Thanks in advance !!!

              Carol

              Comment

              • darthsequencer
                Member
                • Feb 2012
                • 35

                Hi Brian,
                Is there a way to set a desired coverage for randomreads.sh rather than the total number of reads?

                On that note - I'm trying to simulate a metagenome with bbmap tools. Knowing BBMap there's probably a way to do that (especially since JGI has probably needed it too). So can BBmap simulate a metagenome (in terms of abundance profile) from a bunch of reference genomes?

                Comment

                • Brian Bushnell
                  Super Moderator
                  • Jan 2014
                  • 2709

                  Originally posted by darthsequencer View Post
                  Hi Brian,
                  Is there a way to set a desired coverage for randomreads.sh rather than the total number of reads?

                  On that note - I'm trying to simulate a metagenome with bbmap tools. Knowing BBMap there's probably a way to do that (especially since JGI has probably needed it too). So can BBmap simulate a metagenome (in terms of abundance profile) from a bunch of reference genomes?
                  Unfortunately, no to both. And you are correct, simulating metagenomes is a need we have at the JGI so I probably should have developed one by now!

                  I'll look into making a wrapper for the purpose; it shouldn't take very long.

                  Comment

                  • Brian Bushnell
                    Super Moderator
                    • Jan 2014
                    • 2709

                    Originally posted by Carol_B View Post
                    Hi Brian,

                    I'm also struggling with the selection of unique alignments. I had previously used the option ambiguous=toss, but for some pair-end reads I only get the forward to be considered aligned. I saw your suggestions of removing the "ambiguos=toss" flag and later filter the alignment based on their mapq value. Nevertheless, I am curious on how to select a correct threshold for my filtering. I am using local=t alignment, is the maximum mapq score different when you use local as compared to global alignment? why do you suggest 4 as a good threshold?

                    Thanks in advance !!!

                    Carol
                    Hi Carol,

                    The definition of the mapq is the Phred-scaled (decibel) probability that the mapping is incorrect; q=10 means a 10% chance of being incorrect, q=20 means 1%, q=30 means 0.1%, etc. q=3 indicates 50% - so q=3 or below indicate that the read is at least 50% likely to have come from somewhere else, while q=4 or above means the mapping is probably correct. Therefore, reads that map equally well to multiple locations should be assigned a mapping score of 3 or less, and a mapping score of 4 or more indicates the read cannot map anywhere else as well as the indicated position.

                    The local alignments will give different scores than global alignments because mismatches toward the ends of the reads are ignored. In practice, though, there will not be much difference in most cases; mainly just near the ends of contigs, or when dealing with reads that are chimeric/low-quality/not-adapter-trimmed, etc.

                    Comment

                    • GenoMax
                      Senior Member
                      • Feb 2008
                      • 7142

                      Originally posted by Brian Bushnell View Post
                      Unfortunately, no to both. And you are correct, simulating metagenomes is a need we have at the JGI so I probably should have developed one by now!

                      I'll look into making a wrapper for the purpose; it shouldn't take very long.
                      Starting with a bunch of bacterial genomes in a file and using randomreads.sh should produce something acceptable, correct? I have used it in the past with a defined set of genomes to simulate a mixed dataset.

                      Comment

                      • Brian Bushnell
                        Super Moderator
                        • Jan 2014
                        • 2709

                        You can simulate a mixed dataset just fine that way, but unfortunately metagenomes tend to have an exponential distribution of coverage. So to really simulate a metagenome, you need to mimic that, which would mean (for now) running randomreads on each genome individually with an exponential-random coverage target, then combining all the outputs together.

                        I did write a program "mutate.sh" that I use in simulating metagenomes, which takes a genome fasta and produces a mutant version of it at a specified error rate; I use it to do things like verify that error-correction is not erasing strain differences in metagenomes.

                        Anyway, I think it will be pretty easy to modify randomreads for a metagenome mode, so I'll try to do that today. And a coverage (rather than #reads) threshold is pretty easy as well.

                        Comment

                        • darthsequencer
                          Member
                          • Feb 2012
                          • 35

                          That would be awesome Brian! Most of the tools out there are bit wonky, slow, or have fallen to the wayside.

                          More of a nuts and bolts question but how do you handle fragmented reference genomes in a simulated metagenome? In the sense that one reference but is in multiple fasta files but will get assigned different coverages because most programs think each file is a different genome.

                          Comment

                          • Brian Bushnell
                            Super Moderator
                            • Jan 2014
                            • 2709

                            I just uploaded BBMap v36.59 which adds "coverage" and "metagenome" flags.

                            coverage=X will automatically set "reads" to a level that will give X average coverage (decimal point is allowed).

                            metagenome will assign each scaffold a random exponential variable, which decides the probability that a read be generated from that scaffold. So, if you concatenate together 20 bacterial genomes, you can run randomreads and get a metagenomic-like distribution. It could also be used for RNA-seq when using a transcriptome reference.

                            The coverage is decided on a per-reference-sequence level, so if a bacterial assembly has more than one contig, you may want to glue them together first with fuse.sh before concatenating them with the other references.

                            Comment

                            • darthsequencer
                              Member
                              • Feb 2012
                              • 35

                              Great - Thanks a bunch!

                              Comment

                              • ShellfishGene
                                Member
                                • Mar 2009
                                • 14

                                qtrim and maq in BBMap

                                Hi Brian,

                                I'm using BBMap to get reads that do not align to a genome. I want the output quality trimmed, and use
                                Code:
                                bbmap.sh in=raw_reads.fastq.gz outu=stdout.sam qin=auto qout=33 qtrim=lr maq=30 mintrimlength=30
                                The BBDuk docs state that the minimum average quality is checked after clipping. I assume this is the same in BBMap. However with this command I have output such as this:
                                Code:
                                1600:7:1:1413:2208      653     *       0       0       *       *       0       0       AAATGAAGAGAAAGATTACAGGGAAGTAAC  ##############################
                                The read should clearly be removed, I would think? Or does that work only for the mapped reads?
                                Also, is there an option to drop all reads that are clipped to below a certain length?

                                Btw, the option "minaveragequality" is not recognized, only the short version "maq". Not sure if that's intentional.

                                Cheers,

                                Till

                                Comment

                                Latest Articles

                                Collapse

                                • seqadmin
                                  Pathogen Surveillance with Advanced Genomic Tools
                                  by seqadmin




                                  The COVID-19 pandemic highlighted the need for proactive pathogen surveillance systems. As ongoing threats like avian influenza and newly emerging infections continue to pose risks, researchers are working to improve how quickly and accurately pathogens can be identified and tracked. In a recent SEQanswers webinar, two experts discussed how next-generation sequencing (NGS) and machine learning are shaping efforts to monitor viral variation and trace the origins of infectious...
                                  03-24-2025, 11:48 AM
                                • seqadmin
                                  New Genomics Tools and Methods Shared at AGBT 2025
                                  by seqadmin


                                  This year’s Advances in Genome Biology and Technology (AGBT) General Meeting commemorated the 25th anniversary of the event at its original venue on Marco Island, Florida. While this year’s event didn’t include high-profile musical performances, the industry announcements and cutting-edge research still drew the attention of leading scientists.

                                  The Headliner
                                  The biggest announcement was Roche stepping back into the sequencing platform market. In the years since...
                                  03-03-2025, 01:39 PM

                                ad_right_rmr

                                Collapse

                                News

                                Collapse

                                Topics Statistics Last Post
                                Started by seqadmin, Today, 12:59 PM
                                0 responses
                                6 views
                                0 reactions
                                Last Post seqadmin  
                                Started by seqadmin, Yesterday, 10:17 AM
                                0 responses
                                8 views
                                0 reactions
                                Last Post seqadmin  
                                Started by seqadmin, 03-20-2025, 05:03 AM
                                0 responses
                                49 views
                                0 reactions
                                Last Post seqadmin  
                                Started by seqadmin, 03-19-2025, 07:27 AM
                                0 responses
                                60 views
                                0 reactions
                                Last Post seqadmin  
                                Working...