Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Hi Brian,

    I'm using BBtools to call indels from small RNA virus genomes using both bbmap and callvariants. It's doing a great job of calling these. We are interested in the possibility of also calling inversion events from this data. I know this is somewhat common in transcriptomics data. Do these programs have this functionality?

    To be more specific, is it possible to identify reads that are aligning to both the sense and antisense of the given reference?

    Thanks in advance!

    James

    Comment


    • bbmap local + idfilter

      Brian et al., I am using the 'local' option with bbmap because I sometimes have a couple bases of adapter remaining (after bbduk mink=6) and my understanding is that 'local' will trim the ends to achieve a better score.

      My question: does the idfilter option work on the local or global alignment? I have reads with 100% ID after trimming ends but they are not passing the idfilter. I'm curious whether there is another way to filter based on the local alignment percent id?

      thanks!
      MCMC

      Comment


      • mcmc- what are you setting the idfilter value to? Because reads still take a hit to their alignment score for soft-clipping. That is to say a read that's aligned with no mismatches but it was soft-clipped should have a lower score than a read that was mapped perfectly without soft-clipping. This strategy doesn't make sense when you're read is being soft-clipped due to adapter sequence at the end of the read...but for biological purposes it does make sense. Maybe a more appropriate question should be how to get rid of those "couple bases of adapter remaining" after trimming them.
        /* Shawn Driscoll, Gene Expression Laboratory, Pfaff
        Salk Institute for Biological Studies, La Jolla, CA, USA */

        Comment


        • Originally posted by sdriscoll View Post
          mcmc- what are you setting the idfilter value to? Because reads still take a hit to their alignment score for soft-clipping. That is to say a read that's aligned with no mismatches but it was soft-clipped should have a lower score than a read that was mapped perfectly without soft-clipping. This strategy doesn't make sense when you're read is being soft-clipped due to adapter sequence at the end of the read...but for biological purposes it does make sense. Maybe a more appropriate question should be how to get rid of those "couple bases of adapter remaining" after trimming them.
          yes I know the scores may be different... but I'm simply wondering whether the idfilter and subfilter post-filtering options work on the soft-clipped alignments or the non-clipped alignments. I want to be able to filter with, say, 2 MM or 98% ID *after* the soft clipping.

          Thanks,
          MCMC

          Comment


          • Bug in call variants with relation to sam files created from minimap2

            Hello, I've been using different parts of the BBMap suite and mixing them with other softwares for WGS of bacterial samples (primarily for the purpose of automated QC). I'm a fan of callvariants.sh as it works quickly and doesn't require me to sort .sam files before hand and it's simple to clearfilters to return all values. Unfortunately when using callvariants.sh with .sam files created from minimap2 (https://github.com/lh3/minimap2) using illumina nextseq reads. I receive the error seen below. I was curious if this was a minor bug that could be corrected?

            I'm running callvariants.sh from bbmap v37.90 obtained from conda (https://anaconda.org/bioconda/bbmap). Appreciate any assistance given.

            -Kim

            Error:
            java -ea -Xmx48779m -Xms48779m -cp /home/ssi.ad/kimn/.conda/envs/env_kim/opt/bbmap-37.90/current/ var2.CallVariants in=aln.sam ref=contigs.fasta vcf=minimap.vcf ploidy=1 clearfilters
            Executing var2.CallVariants [in=aln.sam, ref=contigs.fasta, vcf=minimap.vcf, ploidy=1, clearfilters]

            Loading reference.
            Time: 0.304 seconds.
            Processing input files.
            Exception in thread "Thread-3" Exception in thread "Thread-4" Exception in thread "Thread-5" java.lang.AssertionError: TODO: Encountered a read with 'M' in cigar string but no MD tag.
            at stream.SamLine.toShortMatch(SamLine.java:1200)
            at stream.SamLine.toRead(SamLine.java:1972)
            at stream.SamLine.toRead(SamLine.java:1832)
            at stream.SamReadStreamer$ProcessThread.makeReads(SamReadStreamer.java:206)
            at stream.SamReadStreamer$ProcessThread.run(SamReadStreamer.java:135)
            java.lang.AssertionError: TODO: Encountered a read with 'M' in cigar string but no MD tag.
            at stream.SamLine.toShortMatch(SamLine.java:1200)
            at stream.SamLine.toRead(SamLine.java:1972)
            at stream.SamLine.toRead(SamLine.java:1832)
            at stream.SamReadStreamer$ProcessThread.makeReads(SamReadStreamer.java:206)
            at stream.SamReadStreamer$ProcessThread.run(SamReadStreamer.java:135)
            java.lang.AssertionError: TODO: Encountered a read with 'M' in cigar string but no MD tag.
            at stream.SamLine.toShortMatch(SamLine.java:1200)
            at stream.SamLine.toRead(SamLine.java:1972)
            at stream.SamLine.toRead(SamLine.java:1832)
            at stream.SamReadStreamer$ProcessThread.makeReads(SamReadStreamer.java:206)
            at stream.SamReadStreamer$ProcessThread.run(SamReadStreamer.java:135)

            Comment


            • You can try reformatting your alignment file using "reformat.sh in=you.bam out=new.bam sam=1.4" to see if that helps.

              Comment


              • Appreciate the reply but unfortunately it leads to the same error:

                java -ea -Xmx200m -cp /home/ssi.ad/kimn/.conda/envs/env_kim/opt/bbmap-37.90/current/ jgi.ReformatReads in=minimap.sam out=new.bam sam=1.4
                Executing jgi.ReformatReads [in=minimap.sam, out=new.bam, sam=1.4]

                Input is being processed as unpaired
                java.lang.AssertionError: TODO: Encountered a read with 'M' in cigar string but no MD tag.
                java.lang.AssertionError: TODO: Encountered a read with 'M' in cigar string but no MD tag.
                at stream.SamLine.toShortMatch(SamLine.java:1200)
                at stream.SamLine.toRead(SamLine.java:1972)
                at stream.SamLine.toRead(SamLine.java:1832)
                at stream.SamReadInputStream.toReadList(SamReadInputStream.java:118)
                at stream.SamReadInputStream.fillBuffer(SamReadInputStream.java:89)
                at stream.SamReadInputStream.hasMore(SamReadInputStream.java:53)
                at stream.ConcurrentGenericReadInputStream$ReadThread.readLists(ConcurrentGenericReadInputStream.java:664)
                at stream.ConcurrentGenericReadInputStream$ReadThread.run(ConcurrentGenericReadInputStream.java:653)

                Comment


                • It was worth a try.

                  Based on the `TODO` tag I see perhaps this is something @Brian intends to work on. Unfortunately he has not been participating in forums of late so there is no guarantee as to when this question may be addressed. You could try creating a ticket at BBMap SF repository for this question.

                  Comment


                  • Thanks for the info didn't realise that was the site for tickets so appreciate it, I'll create a ticket there.

                    Comment


                    • You can try the reformat.sh command per @GenoMax but with 'sam=1.3' flag. That worked for me with a different variant caller (FreeBayes).

                      Comment


                      • Thanks for the input but same issue using sam=1.3, I think Geno's comment on something he intended to do is on the mark.

                        Comment


                        • Hi Brian - What are the right flags to set if I want all of the top scoring mappings for paired end reads?

                          Thanks

                          Comment


                          • Have you tried ambig=all?

                            Comment


                            • Originally posted by GenoMax View Post
                              Have you tried ambig=all?
                              Yeah - I have but from what I can tell it maxes out at 6 reports unless maxsites is increased. I am wondering if there is a setting that avoids maxsites and reports everything above some score.

                              Comment


                              • Originally posted by darthsequencer View Post
                                Yeah - I have but from what I can tell it maxes out at 6 reports unless maxsites is increased. I am wondering if there is a setting that avoids maxsites and reports everything above some score.
                                You might try the align2.BBMapPacBioSkimmer mapper with ambig=all and expectedsites= some high number. Not exactly what you were asking but I get dozens of hits with that.
                                Providing nextRAD genotyping and PacBio sequencing services. http://snpsaurus.com

                                Comment

                                Latest Articles

                                Collapse

                                • seqadmin
                                  Strategies for Sequencing Challenging Samples
                                  by seqadmin


                                  Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                                  03-22-2024, 06:39 AM
                                • seqadmin
                                  Techniques and Challenges in Conservation Genomics
                                  by seqadmin



                                  The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                                  Avian Conservation
                                  Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                                  03-08-2024, 10:41 AM

                                ad_right_rmr

                                Collapse

                                News

                                Collapse

                                Topics Statistics Last Post
                                Started by seqadmin, Yesterday, 06:37 PM
                                0 responses
                                8 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, Yesterday, 06:07 PM
                                0 responses
                                8 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 03-22-2024, 10:03 AM
                                0 responses
                                49 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 03-21-2024, 07:32 AM
                                0 responses
                                66 views
                                0 likes
                                Last Post seqadmin  
                                Working...
                                X