![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Bowtie, an ultrafast, memory-efficient, open source short read aligner | Ben Langmead | Bioinformatics | 514 | 03-13-2020 04:57 AM |
Introducing BBMap, a new short-read aligner for DNA and RNA | Brian Bushnell | Bioinformatics | 24 | 07-07-2014 10:37 AM |
Miso's open source | joyce kang | Bioinformatics | 1 | 01-25-2012 07:25 AM |
Targeted resequencing - open source | stanford_genome_tech | Genomic Resequencing | 3 | 09-27-2011 04:27 PM |
EKOPath 4 going open source | dnusol | Bioinformatics | 0 | 06-15-2011 02:10 AM |
![]() |
|
Thread Tools |
![]() |
#601 |
Member
Location: california Join Date: Feb 2012
Posts: 35
|
![]() |
![]() |
![]() |
![]() |
#602 |
Registered Vendor
Location: Eugene, OR Join Date: May 2013
Posts: 521
|
![]()
You might try the align2.BBMapPacBioSkimmer mapper with ambig=all and expectedsites= some high number. Not exactly what you were asking but I get dozens of hits with that.
__________________
Providing nextRAD genotyping and PacBio sequencing services. http://snpsaurus.com |
![]() |
![]() |
![]() |
#603 |
Registered Vendor
Location: Eugene, OR Join Date: May 2013
Posts: 521
|
![]()
I have a question about the bbmap summary output. Here is an example:
Code:
Reads Used: 1061591 (151284618 bases) Mapping: 15.250 seconds. Reads/sec: 69611.57 kBases/sec: 9920.17 Read 1 data: pct reads num reads pct bases num bases mapped: 76.2432% 809391 77.6086% 117409909 unambiguous: 75.3885% 800318 76.8084% 116199316 ambiguous: 0.8547% 9073 0.8002% 1210593 low-Q discards: 0.0000% 0 0.0000% 0 perfect best site: 11.4049% 121073 9.7282% 14717263 semiperfect site: 58.9222% 625513 60.0540% 90852423 Match Rate: NA NA 99.1429% 116413381 Error Rate: 20.6634% 183594 0.2527% 296692 Sub Rate: 19.9279% 177059 0.2369% 278137 Del Rate: 0.6244% 5548 0.0084% 9819 Ins Rate: 0.6302% 5599 0.0074% 8736 N Rate: 72.0298% 639985 0.6044% 709655
__________________
Providing nextRAD genotyping and PacBio sequencing services. http://snpsaurus.com |
![]() |
![]() |
![]() |
#604 |
Super Moderator
Location: Walnut Creek, CA Join Date: Jan 2014
Posts: 2,707
|
![]()
This means 72 percent of the reads mapped with an "N" symbol in the match string, an internal data structure similar to a cigar string. The "N" symbol denotes either an N in the read or an N in the reference, which can include sequences that go off the end of scaffolds (the ends are internally padded with Ns). Additionally, degenerate IUPAC codes are counted as Ns.
|
![]() |
![]() |
![]() |
#605 | |
Super Moderator
Location: Walnut Creek, CA Join Date: Jan 2014
Posts: 2,707
|
![]() Quote:
Prior to that you could probably add MD tags with "samtools calmd". |
|
![]() |
![]() |
![]() |
#606 |
Registered Vendor
Location: Eugene, OR Join Date: May 2013
Posts: 521
|
![]()
The reference in this case is a list of RAD loci sequences, each at 150 bp. The reads are 150 bp and should match one of the RAD loci. Since there are no N (or degenerate nucleotides) in the reference, and only a small number of reads with an N, it must have to do with the padding. A trimmed read would not have N padding, would it? This sample was degraded DNA, so plenty of reads were trimmed to less than 150 nucleotides. I doubt that 72% of the loci from this sample had an insertion that made the read extend past the reference (generated from consensus of many samples). Odd, but I guess it is not critical for me to fully figure this out.
__________________
Providing nextRAD genotyping and PacBio sequencing services. http://snpsaurus.com |
![]() |
![]() |
![]() |
#607 |
Junior Member
Location: Vienna Join Date: Mar 2018
Posts: 2
|
![]()
Hello Brian,
Thanks for the lovely tool. I've been using msa.sh with cutprimers.sh (as described here) to extract sequences between pairs of primers. However, I got some unexpectedly large sequences as output, and when I took a look at the SAM alignments output by msa.sh, none of them are mapping to the reverse strand. For example, with the following template sequence… >template…it should be possible to use the following commands to extract the sequence between the given primers: msa.sh in=template.fa primer=GGGGCCTAGACATTGCCCTCCA out=fwd.samThis outputs the following (primer alignments shown in uppercase): GGGGCCTAGACATTGCCCTCCAgagagagcacccaacaccctccaggcttgaccggccagggtgtccccttGACACCCGTCAAGCCTThe sequence GACACCCGTCAAGCCT is the best match for the reverse primer on the forward strand, but not the best match overall. That is on the reverse strand, which would result in the following (primer alignments again in uppercase): GGGGCCTAGACATTGCCCTCCAgagagagcacccaacaccctccAGGCTTGACCGGCCAGGGTGTCFor the latest version of BBMap (BBMap_37.95.tar.gz), it looks like this issue might be fixed by changing line 183 of file 'bbmap/current/jgi/FindPrimers.java' to: score2=ss.score=ss.quickScore; Last edited by tabwalsh; 03-29-2018 at 07:31 AM. |
![]() |
![]() |
![]() |
#608 | |
Senior Member
Location: East Coast USA Join Date: Feb 2008
Posts: 7,083
|
![]() Quote:
|
|
![]() |
![]() |
![]() |
#609 |
Junior Member
Location: Vienna Join Date: Mar 2018
Posts: 2
|
![]()
Thanks, I've submitted a ticket here: https://sourceforge.net/p/bbmap/tickets/7/
|
![]() |
![]() |
![]() |
#610 |
Member
Location: Philadelphia Join Date: Jan 2012
Posts: 58
|
![]()
Greetings,
First time bbmap user and Im having some odd compatibility issues with certain tools using a bbmap generated bam. I do not think I did anything particularly odd in my bam workflow: Left trim and right trim using bbduk | bbmerge | bbmap | samtools addreplacerg From this I was able to successfully make variant calls and generate some metrics using GATK/Picard. However bedtools was reporting zero coverage at all intervals and I am unable to view reads restricting to regions using samtools. Code:
[bwubb@node063 GRCh37]$ samtools view 4721.ready.bam 1 [main_samview] region "1" specifies an unknown reference name. Continue anyway. [bwubb@node063 GRCh37]$ samtools view -H 4721.ready.bam | head -5 @HD VN:1.4 SO:coordinate @SQ SN:1 dna:chromosome chromosome:GRCh37:1:1:249250621:1 LN:249250621 @SQ SN:2 dna:chromosome chromosome:GRCh37:2:1:243199373:1 LN:243199373 @SQ SN:3 dna:chromosome chromosome:GRCh37:3:1:198022430:1 LN:198022430 @SQ SN:4 dna:chromosome chromosome:GRCh37:4:1:191154276:1 LN:191154276 Code:
>1 dna:chromosome chromosome:GRCh37:1:1:249250621:1 -bwubb |
![]() |
![]() |
![]() |
#611 |
Senior Member
Location: US Join Date: Dec 2010
Posts: 452
|
![]()
Convenient way to avoid re-indexing references?
When working with big genomes one would really like to avoid re-indexing reference sequences. BBmap is always looking for the reference indices in the working directory and not in the directory where the reference resides. Is there any way to point BBMap to the index? When using "path=" it seems to automatically re-index i this location. Thanks in advance. |
![]() |
![]() |
![]() |
#612 | |
Senior Member
Location: East Coast USA Join Date: Feb 2008
Posts: 7,083
|
![]() Quote:
You can use the option "trd=t" to truncate the fasta header names after the first space in the name. There is an option for reformat.sh that can do this after the fact for aligned data. I can look this up later if you don't find it. |
|
![]() |
![]() |
![]() |
#613 | |
Senior Member
Location: East Coast USA Join Date: Feb 2008
Posts: 7,083
|
![]() Quote:
Code:
bbmap.sh ref=your_genome.fa When you want to do the alignments instead of "ref=" you pass "path=/path_to_dir_containing_ref_dir". That should use the pre-made index. Last edited by GenoMax; 04-07-2018 at 10:37 AM. |
|
![]() |
![]() |
![]() |
#614 |
Junior Member
Location: Austin Join Date: May 2014
Posts: 4
|
![]()
Hi Brian, can BBduk deal with trimming adapter sequence only but retain other bases? Forexample:
SEQUENCEADAPTERSEQUENCE; trimming adapter leaves: SEQUENCESEQUENCE? Thank you! Kevin |
![]() |
![]() |
![]() |
#615 |
Member
Location: West Virginia Join Date: Jun 2011
Posts: 14
|
![]()
Hello,
I've been trying to get any basic command to work on my windows 7 machine. I've installed bbmap in the C:\Users\me directory. My .fastq.gz files are in the bbmap folder. When I type the command C:\Users\me>java -ea -Xmx1g -cp \bbmap\current\jgi.BBDukF in1=inputR1.fastq.gz in2=inputR2.fastq.gz out1=test1.fastq out2=test2.fastq ktrim=r mink=8 k=15 I get "could not find or load main class in1=inputR1.fastq.gz" I've messed with the type of slash, putting the infiles in the current directory, and taking away parts of the directory names. I even truncated the argument after -cp to just jgi.BBDukF, which I know is wrong, and it still wants the main class of in1. I don't know a thing about java and when I read instructions about paths, classpaths, path environments, etc, I get confused quickly. When I type "echo %CLASSPATH% into the command prompt, I get CLASSPATH back so that's not set. I tried using Git Bash but when I typed "bbduk.sh" I got "command not found" so I'm stuck, but I would really like to use this tool on my windows machine. Anyone willing to give me very basic instructions on the exact commands that get bbduk to run on a windows machine? |
![]() |
![]() |
![]() |
#616 |
Member
Location: Bethesda, MD Join Date: Oct 2010
Posts: 47
|
![]()
Hello,
Any suggestions as to why filterbyname is not pulling out reads that I know are there. Help?? Jen |
![]() |
![]() |
![]() |
#617 |
Senior Member
Location: East Coast USA Join Date: Feb 2008
Posts: 7,083
|
![]() |
![]() |
![]() |
![]() |
#618 | |
Senior Member
Location: East Coast USA Join Date: Feb 2008
Posts: 7,083
|
![]() Quote:
Code:
java -ea -Xmx200m -cp /path/to/bbmap/current/ jgi.BBDukF in=reads.fq out=processed.fq |
|
![]() |
![]() |
![]() |
#619 |
Senior Member
Location: oceania Join Date: Feb 2014
Posts: 115
|
![]()
Hi,
I don't quite understand the SAM output of bbmap.sh here is my line: bbmap.sh ref=ref.fasta ambig=all nodisk=t nzo=t mappedonly=t outu=unmapped.fasta scafstats=stats.map in=reads.fastq out=mapped.sam I usually just use the stats file to get read counts per feature using a ref with features as a multifasta, but I need to annotate from a .gff3 file using a draft genome ref and the only way I can see to do that is from the sam (to get the mapping coordinate. However, the sam seems to have low quality mappings too (MAPQ=3).. What is the threshold actually used for mapping quality? I can see that in the options, and how can I get the sam to only contain the mapped reads above that threshold? Thanks, S. |
![]() |
![]() |
![]() |
#620 |
Senior Member
Location: East Coast USA Join Date: Feb 2008
Posts: 7,083
|
![]()
Since you are using ambig=all you are going to get multi-mappers with their MAPQ being set to a small value (3). You could post filter your sam file or set ambig=toss (or ambig=random/best to get only one alignment).
|
![]() |
![]() |
![]() |
Tags |
bbmap, metagenomics, rna-seq aligners, short read alignment |
Thread Tools | |
|
|