Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Can you post an example of a read that is not getting trimmed with the command you posted?

    Also, "ktrim=rl" with BBDuk won't work; it only allows "ktrim=r" or "ktrim=l". I've changed it now so in the next release it will print an error message if you use other values.

    Comment


    • Brian,

      Thanks for the response. My issue was exactly that. It seems to be working fine now by only using one at a time. The other issue was a hamming distance thing that I also figured out.

      Will you be incorporating the soft-clipped base trimming in this next update?

      Thanks again.

      Comment


      • Originally posted by Mchicken View Post
        Thanks a lot for your helpful replies. It is now working perfectly . One more question to the "xstag". I see that if the gene is on the + strand, R2 maps on+ and R1 maps on -. If I use "xstag=ss" I get for spliced R1 and R2 reads XS=+. So I think this is the correct library type. Am I right?
        I think so. I've always found Tophat's description of those terms to be very confusing, so I have trouble remember for more than a few days which is the correct one to use and when.

        Originally posted by jweger1988
        Will you be incorporating the soft-clipped base trimming in this next update?
        Yes, I have added that as of v37.23 which I just released. The flag is "trimclip", for example:

        bbduk.sh in=mapped.sam out=trimmed.sam trimclip ordered

        You can also output them as fastq if you want.

        Comment


        • Hi Brian,

          I have a couple of questions.

          I've been comparing different alignment programs and then some different variant calling software. For some reason when I use a bam file produced with bbmap and try to call variants with lofreq I get an error message, but not with bam files produced with other programs (bowtie, mosaic, etc...). Even after sorting the bam I get the same message. Any thoughts on why this might be?

          Also, I'm wondering if it's possible to use dedupe with the input as either paired fastq files (i.e. in1 and in2) or a bam file?

          Thanks,
          James

          Comment


          • Not sure if this is applicable but BBMap produces SAM v.1.4 CIGAR strings by default. If your variant calling program expects v.1.3 format then you can either add sam=1.3 option when you align (or reformat.sh in=your.bam out=new.bam sam=1.3).

            Dedupe can be used with PE files but only if reads are interleaved. You may want to use clumpify instead of dedupe.

            Comment


            • Genomax,

              Awesome, changing CIGAR fixed the issue. Thanks.

              About Clumpify vs Dedupe, that's a good idea. I remember Brian saying somewhere that clumpify is not quite as good as Dedupe? What would be your recommended settings?

              Thanks

              Comment


              • If you wanted to use dedupe with PE files you could do it this way
                Code:
                reformat.sh in1=R1.fq in2=R2.fq out=stdout.fq | dedupe.sh in=stdin.fq out=stdout.fq | reformat.sh in=stdin.fq out1=new1.fq out2=new2.fq
                Clumpify is more flexible compared to dedupe. You can select what happens to the duplicates. You may be interested in this option:
                Code:
                dedupe=t optical=f
                All duplicates are detected, whether optical or not.  All copies except one are removed for each duplicate.

                Comment


                • minid and idfilter

                  Hi Brian,

                  Thank you for making the deterministic flag. Results in that regard look great.

                  I am confused by how minid and idfilter are working, however. I will email you a small dataset that recreates the 'unexpected' behavior I am encountering if that is helpful.

                  My expectation is that this command should only allow alignments where BOTH the forward and reverse read, independently align with at least 75% pairwise identity to the reference.... ie overlap in the reads is NOT taken into account and a contig pairwise identity to the reference is NOT what the setting is referring to. Note: I usually output 4 fastqs for mapped/unmapped pairs but have changed to sam output and specified the idtag flag to give more info.
                  Code:
                  bbsplit.sh\
                   -Xmx23g\
                   averagepairdist=200\
                   deterministic=t\
                   k=8\
                   minid=0.75\
                   idfilter=0.75\
                   maxindel=20\
                   nzo=f\
                   po=t\
                   ambiguous2=split\
                   ref=test_ref\
                   idtag=t\
                   in=Input_1P.fastq\
                   in2=Input_2P.fastq\
                   outm=Mapped.sam\
                   outu=Unmapped.sam\
                  Here is the head of Mapped.sam... how are these mapping? YI:f:73.10, YI:f:64.14 etc...
                  Code:
                  @HD     VN:1.4  SO:unsorted
                  @SQ     SN:006_Koxy_tonB_PC     LN:317
                  @SQ     SN:022_NDM_PC   LN:267
                  @PG     ID:BBMap        PN:BBMap        VN:37.22        CL:java -Djava.library.path=/install/bbmap/jni/ -ea -Xmx23g align2.BBSplitter ow=t fastareadlen=500 minhits=1 minratio=0.56 maxindel=20 qtrim=rl untrim=t trimq=6 -Xmx23g averagepairdist=200 deterministic=t k=8 minid=0.75 idfilter=0.75 maxindel=20 nzo=f po=t ambiguous2=split ref=test_ref idtag=t in=Input_1P.fastq in2=Input_2P.fastq outm=Mapped.sam outu=Unmapped.sam
                  gi|828959694|gb|CP011636.1|_Klebsiella_oxytoca_strain_CAV1374,_complete_genome_(reversed)-1-146 83      006_Koxy_tonB_PC        209     18      4=2X2=37I100=   =       6       -311    ATCAAGCTGTTTGCCGGGAATAACAACCAGCGCGGGGCGGCGTTTGACGTTACCGGGGCGCTGGACGATAACGATCGCGTGGCGGCGCGCTTAAGCGGCATGACCCGCTATGCAGACTCGCAGTTTGATACCTTAAAAGAGCAGC  ?????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????  NM:i:39 AM:i:18 YI:f:73.10
                  gi|828959694|gb|CP011636.1|_Klebsiella_oxytoca_strain_CAV1374,_complete_genome_(reversed)-1-146 163     006_Koxy_tonB_PC        6       9       87=1X2=47I1X1=1X3=2X    =       209     311     AATGATGGGCGATACCAACTCGCACAGCTCGCTGGTGGTCGATCCGTGGTTCCTGGAAAATATCGAAGTGGTGCGCGGCCCGGCCTCAGTGCTGTACGGCCGCTCTTCGCCCGGCGGCATCGTCGCCCTCACCTCGCGTAAACCC  ?????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????  NM:i:52 AM:i:9  YI:f:64.14
                  gi|828959694|gb|CP011636.1|_Klebsiella_oxytoca_strain_CAV1374,_complete_genome_(reversed)-1-146 83      006_Koxy_tonB_PC        209     18      4=2X2=37I100=   =       6       -311    ATCAAGCTGTTTGCCGGGAATAACAACCAGCGCGGGGCGGCGTTTGACGTTACCGGGGCGCTGGACGATAACGATCGCGTGGCGGCGCGCTTAAGCGGCATGACCCGCTATGCAGACTCGCAGTTTGATACCTTAAAAGAGCAGC  ?????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????  NM:i:39 AM:i:18 YI:f:73.10
                  gi|828959694|gb|CP011636.1|_Klebsiella_oxytoca_strain_CAV1374,_complete_genome_(reversed)-1-146 163     006_Koxy_tonB_PC        6       9       87=1X2=47I1X1=1X3=2X    =       209     311     AATGATGGGCGATACCAACTCGCACAGCTCGCTGGTGGTCGATCCGTGGTTCCTGGAAAATATCGAAGTGGTGCGCGGCCCGGCCTCAGTGCTGTACGGCCGCTCTTCGCCCGGCGGCATCGTCGCCCTCACCTCGCGTAAACCC  ?????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????  NM:i:52 AM:i:9  YI:f:64.14
                  gi|828959694|gb|CP011636.1|_Klebsiella_oxytoca_strain_CAV1374,_complete_genome_(reversed)-1-146 83      006_Koxy_tonB_PC        209     18      4=2X2=37I100=   =       6       -311    ATCAAGCTGTTTGCCGGGAATAACAACCAGCGCGGGGCGGCGTTTGACGTTACCGGGGCGCTGGACGATAACGATCGCGTGGCGGCGCGCTTAAGCGGCATGACCCGCTATGCAGACTCGCAGTTTGATACCTTAAAAGAGCAGC  ?????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????  NM:i:39 AM:i:18 YI:f:73.10
                  gi|828959694|gb|CP011636.1|_Klebsiella_oxytoca_strain_CAV1374,_complete_genome_(reversed)-1-146 163     006_Koxy_tonB_PC        6       9       87=1X2=47I1X1=1X3=2X    =       209     311     AATGATGGGCGATACCAACTCGCACAGCTCGCTGGTGGTCGATCCGTGGTTCCTGGAAAATATCGAAGTGGTGCGCGGCCCGGCCTCAGTGCTGTACGGCCGCTCTTCGCCCGGCGGCATCGTCGCCCTCACCTCGCGTAAACCC  ?????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????  NM:i:52 AM:i:9  YI:f:64.14
                  Thanks, Kate
                  Last edited by GenoMax; 05-17-2017, 03:39 PM.

                  Comment


                  • Originally posted by darthsequencer View Post
                    Great! Yeah I'd like to know the duplicate membership for containments too.
                    Hi Brian - I think "renamecluster=t" in dedupe.sh will do what I want with regards to tracking which contigs are greater than "minidentity".

                    I run dedupe.sh like this: dedupe.sh in=file.fa out=file.fa mindentity=99

                    Will adding "renameclusters=t" change what dedupe is doing besides renaming the sequences to what cluster they belong?

                    Comment


                    • No, it doesn't change the behavior otherwise. But for clustering you may want to disable duplicate removal (or you won't get accurate cluster sizes) with "am=f ac=f" and you need to add some clustering flags, "fo c pc".

                      So the command might look like:

                      dedupe.sh in=file.fa out=file.fa mindentity=99 am=f ac=f fo c pc

                      Comment


                      • Originally posted by Brian Bushnell View Post
                        No, it doesn't change the behavior otherwise. But for clustering you may want to disable duplicate removal (or you won't get accurate cluster sizes) with "am=f ac=f" and you need to add some clustering flags, "fo c pc".

                        So the command might look like:

                        dedupe.sh in=file.fa out=file.fa mindentity=99 am=f ac=f fo c pc
                        Hi I gave that a shot and dedupe throws a bunch of errors and produces about double the clusters than dedupe does without clustering.

                        Here's a copy of log: https://www.dropbox.com/s/t8aqj7fgf2...er.fa.log?dl=0

                        Comment


                        • Hi Brian,

                          Is there a tool in the suite that will take a .gff or other annotation file and annotate a .vcf from callvariants.sh? By annotation I mean adding the result of the variant (i.e. AA change). I've tried to do it with a bunch of other tools but I'm having trouble.

                          I'm sure it's not currently supported, but is there any possibility that one could callvariants from a .gbk or .gff reference as opposed to a fasta to get the coding changes directly?

                          Thanks in advance.

                          Comment


                          • @jweger1988, what tools have you tried for VCF annotation and what problems have you encountered?

                            Comment


                            • @HESmith, thanks for the response.

                              I've tried Annovar, SnpEff and a few others. Do you have any experience with these or other recommendations? It's totally possibly I'm just messing this up.

                              I am working with a virus that is not in any of the database, I tried to build my own file in the database and was getting an error message about the file not having the correct chromosomes set (with SnpEff).

                              Any clues?

                              Comment


                              • Not trying to be a jerk, but have you tried variant calling with the test data that are included with the software? Success would eliminate installation and command errors as the source of your problem.

                                Also, a general rule for online troubleshooting of software problems is 1) to provide the exact command syntax that your used, and 2) report the exact error message that you received.

                                Comment

                                Latest Articles

                                Collapse

                                • seqadmin
                                  Techniques and Challenges in Conservation Genomics
                                  by seqadmin



                                  The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                                  Avian Conservation
                                  Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                                  03-08-2024, 10:41 AM
                                • seqadmin
                                  The Impact of AI in Genomic Medicine
                                  by seqadmin



                                  Artificial intelligence (AI) has evolved from a futuristic vision to a mainstream technology, highlighted by the introduction of tools like OpenAI's ChatGPT and Google's Gemini. In recent years, AI has become increasingly integrated into the field of genomics. This integration has enabled new scientific discoveries while simultaneously raising important ethical questions1. Interviews with two researchers at the center of this intersection provide insightful perspectives into...
                                  02-26-2024, 02:07 PM

                                ad_right_rmr

                                Collapse

                                News

                                Collapse

                                Topics Statistics Last Post
                                Started by seqadmin, 03-14-2024, 06:13 AM
                                0 responses
                                33 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 03-08-2024, 08:03 AM
                                0 responses
                                72 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 03-07-2024, 08:13 AM
                                0 responses
                                81 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 03-06-2024, 09:51 AM
                                0 responses
                                68 views
                                0 likes
                                Last Post seqadmin  
                                Working...
                                X