SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Bowtie, an ultrafast, memory-efficient, open source short read aligner Ben Langmead Bioinformatics 513 05-14-2015 02:29 PM
Introducing BBMap, a new short-read aligner for DNA and RNA Brian Bushnell Bioinformatics 24 07-07-2014 09:37 AM
Miso's open source joyce kang Bioinformatics 1 01-25-2012 06:25 AM
Targeted resequencing - open source stanford_genome_tech Genomic Resequencing 3 09-27-2011 03:27 PM
EKOPath 4 going open source dnusol Bioinformatics 0 06-15-2011 01:10 AM

Reply
 
Thread Tools
Old 08-02-2017, 09:47 AM   #521
Brian Bushnell
Super Moderator
 
Location: Walnut Creek, CA

Join Date: Jan 2014
Posts: 2,695
Default

Oh, yep, for some reason I capped it at 8 threads. I wonder why? I'll eliminate that cap in the next release, which will probably be sometime today.
Brian Bushnell is offline   Reply With Quote
Old 08-02-2017, 09:52 AM   #522
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 6,495
Default

Quote:
Originally Posted by Brian Bushnell View Post
Oh, yep, for some reason I capped it at 8 threads. I wonder why? I'll eliminate that cap in the next release, which will probably be sometime today.
How about tying the number to the number of threads specified for BBMap? That way we know that many threads are available.

Last edited by GenoMax; 08-02-2017 at 10:00 AM.
GenoMax is offline   Reply With Quote
Old 08-02-2017, 10:08 AM   #523
Brian Bushnell
Super Moderator
 
Location: Walnut Creek, CA

Join Date: Jan 2014
Posts: 2,695
Default

Quote:
Originally Posted by GenoMax View Post
How about tying the number to the number of threads specified for BBMap? That way we know that many threads are available.
It is tied to the number of threads defined for BBMap, just for some reason I capped it at a max of 8 even if the main process was allowed to use more; probably to conserve memory. I've increased it to a max of 64.
Brian Bushnell is offline   Reply With Quote
Old 08-09-2017, 10:23 AM   #524
darthsequencer
Member
 
Location: california

Join Date: Feb 2012
Posts: 29
Default

Quote:
Originally Posted by Brian Bushnell View Post
It is tied to the number of threads defined for BBMap, just for some reason I capped it at a max of 8 even if the main process was allowed to use more; probably to conserve memory. I've increased it to a max of 64.
Thanks- that helps a lot!
darthsequencer is offline   Reply With Quote
Old 08-09-2017, 10:29 AM   #525
darthsequencer
Member
 
Location: california

Join Date: Feb 2012
Posts: 29
Default bbmap fast macro?

Hi Brian,
I have a lot of reference sequences I'm mapping to (~11 million) and want to eek out as much as speed as possible.

I'm mostly looking for close matches - ex. I set minid to 0.97. Will setting fast still find matches like that? Any other thoughts on what I can set to get more speed?

Thanks a bunch!
darthsequencer is offline   Reply With Quote
Old 08-09-2017, 10:39 AM   #526
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 6,495
Default

Quote:
Originally Posted by darthsequencer View Post
Hi Brian,
I have a lot of reference sequences I'm mapping to (~11 million)
How long are the query sequences?
GenoMax is offline   Reply With Quote
Old 08-09-2017, 10:47 AM   #527
Brian Bushnell
Super Moderator
 
Location: Walnut Creek, CA

Join Date: Jan 2014
Posts: 2,695
Default

To maximize speed when you are not looking for low-identity matches, "fast" (plus your identity threshold) is generally adequate. You can also speed it up by reducing "maxindel" (fast sets it to 80). Quality-trimming and adapter-trimming generally increase alignment speed.

With a large reference you may be able to increase speed with "k=14" instead of the default "k=13" - this increases the time to load the reference and memory usage, but increases mapping speed (so whether the process becomes faster or slower depends on how long it takes to load the reference compared to how much data you have to map). Also, turning off mate rescue (rescue=f) or reducing rescuedist (fast defaults to rescuedist=800) can also increase the speed slightly. Note that all of these options reduce sensitivity (aside from trimming which increases it), but at 97% identity you only need very low sensitivity anyway.

Last edited by Brian Bushnell; 08-09-2017 at 11:01 AM.
Brian Bushnell is offline   Reply With Quote
Old 08-09-2017, 02:04 PM   #528
darthsequencer
Member
 
Location: california

Join Date: Feb 2012
Posts: 29
Default

Quote:
Originally Posted by GenoMax View Post
How long are the query sequences?
They range between 50bp single end to 2 x 250bp
darthsequencer is offline   Reply With Quote
Old 08-09-2017, 02:24 PM   #529
darthsequencer
Member
 
Location: california

Join Date: Feb 2012
Posts: 29
Default

Quote:
Originally Posted by Brian Bushnell View Post
To maximize speed when you are not looking for low-identity matches, "fast" (plus your identity threshold) is generally adequate. You can also speed it up by reducing "maxindel" (fast sets it to 80). Quality-trimming and adapter-trimming generally increase alignment speed.

With a large reference you may be able to increase speed with "k=14" instead of the default "k=13" - this increases the time to load the reference and memory usage, but increases mapping speed (so whether the process becomes faster or slower depends on how long it takes to load the reference compared to how much data you have to map). Also, turning off mate rescue (rescue=f) or reducing rescuedist (fast defaults to rescuedist=800) can also increase the speed slightly. Note that all of these options reduce sensitivity (aside from trimming which increases it), but at 97% identity you only need very low sensitivity anyway.
Thanks that's helpful. On the note of loading references - is there a way to use wildcards with the input and output of bbwrap?
darthsequencer is offline   Reply With Quote
Old 08-09-2017, 02:42 PM   #530
Brian Bushnell
Super Moderator
 
Location: Walnut Creek, CA

Join Date: Jan 2014
Posts: 2,695
Default

Quote:
Originally Posted by darthsequencer View Post
Thanks that's helpful. On the note of loading references - is there a way to use wildcards with the input and output of bbwrap?
Ah, sorry, but nope. You can, however, do something like...

Code:
ls *.fastq.gz > ls.txt
...then replace '\n' with ',' using a text editor (like Notepad++). I'm sure there's a simpler sed/awk solution, too.
Brian Bushnell is offline   Reply With Quote
Old 08-15-2017, 07:18 AM   #531
lucila
Junior Member
 
Location: Argentina

Join Date: Mar 2017
Posts: 2
Default

Hi Brian,
thank you so much for this useful tool. I would like to ask you a question. Does BBMap generate a list of transcripts as a result of the mapping of the reads? What I mean is if this tool generates a fasta file of reconstructed transcripts based on the reference genome and the reads used for mapping.
I need this because the genome that I am using is not annotated (I mean, I do not have a gff) and I want to generate a reference transcriptome using my reads and the genome.

Thank you again!
Best,
Lucila.
lucila is offline   Reply With Quote
Old 08-15-2017, 12:20 PM   #532
Brian Bushnell
Super Moderator
 
Location: Walnut Creek, CA

Join Date: Jan 2014
Posts: 2,695
Default

Quote:
Originally Posted by lucila View Post
Hi Brian,
thank you so much for this useful tool. I would like to ask you a question. Does BBMap generate a list of transcripts as a result of the mapping of the reads? What I mean is if this tool generates a fasta file of reconstructed transcripts based on the reference genome and the reads used for mapping.
I need this because the genome that I am using is not annotated (I mean, I do not have a gff) and I want to generate a reference transcriptome using my reads and the genome.
Hi Lucila,

No, it does not, but I have been considering adding gff output, at least for exon boundaries, if not full transcripts. For generating a reference transcriptome, you can do either mapping or assembly; so, assemblers like Trinity are also an option.

I have not tried it, but you may also want to examine this:
https://github.com/enormandeau/gawn
Brian Bushnell is offline   Reply With Quote
Old 08-16-2017, 10:35 AM   #533
lucila
Junior Member
 
Location: Argentina

Join Date: Mar 2017
Posts: 2
Default

Quote:
Originally Posted by Brian Bushnell View Post
Hi Lucila,

No, it does not, but I have been considering adding gff output, at least for exon boundaries, if not full transcripts. For generating a reference transcriptome, you can do either mapping or assembly; so, assemblers like Trinity are also an option.

I have not tried it, but you may also want to examine this:
https://github.com/enormandeau/gawn
Thank you Brian for the information!!
Cheers,
Lucila.
lucila is offline   Reply With Quote
Old 08-17-2017, 10:55 PM   #534
jweger1988
Member
 
Location: Paris, France

Join Date: Apr 2017
Posts: 35
Default

Hi Brian,

I'm having an issue I'm wondering you can help with.

I am using bbmap to align to a small viral genome. The resulting bam or sam file give me an error something like : Segmentation fault (core dumped) - every time I run the program lofreq viterbi, which is used before variant calling with lofreq. When I generate a bam with the same reference file and fastq files with bowtie2 I do not get the same error.

I have tried using reformat.sh to convert to cigar 1.3 but this doesn't seem to work. The fasta file has only one sequence and there are no spaces in the name.

Here is an example of the error with the command -
Code:
lofreq viterbi -f ZIKV_PRVABC59_CDS.fa -o bbmapv.bam bbmap.bam
Segmentation fault (core dumped)
It runs OK with bowtie produced bam.

Here is the output from the command head with either bowtie or bbmap produced sam file.

bbmap
Code:
@HD     VN:1.4  SO:unsorted
@SQ     SN:ZIKV_PRVABC59_CDS    LN:10272
@PG     ID:BBMap        PN:BBMap        VN:37.47        CL:java -Djava.library.path=/home/weger/bb
p/jni/ -ea -Xmx216859m align2.BBMap build=1 overwrite=true fastareadlen=500 in1=1_ctff_R1.fq.gz in
1_ctff_R2.fq.gz out=1_bbmap.sam ref=ZIKV_PRVABC59_CDS.fa maq=25
NS500697:37:HH5V7BGXY:2:23303:13708:4313 1:N:0:TAAGGCGA+ATATAGAG        77      *       0       0
TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT   AAAAAEEEEEEEEEEEEEEEEEEEEEEEEEEEEE
EEEEEEEEEEAE/EEEA/AEEEE/EA
NS500697:37:HH5V7BGXY:2:23303:13708:4313 1:N:0:TAAGGCGA+ATATAGAG        141     *       0       0
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA      AAAE<6<<<66<<<<<6<<6666666666666E66666666E66E666A6666A66AA
E6EAA6EAAAEAEAEAEEAEEAEEEEEAE6EAEAAEAAAEEEAEAEAAEEEEAEAEEAEEEEEEEEAEEEEEEEE
NS500697:37:HH5V7BGXY:2:21207:1798:4356 1:N:0:TAAGGCGA+ATAGAGAC 77      *       0       0       *
TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT  AAAAAEEEEEEEEEEEEEEEEEEEEEEEEEEEEE
EEEEEEEEEEEEEAEEE//EEEEE//E
NS500697:37:HH5V7BGXY:2:21207:1798:4356 1:N:0:TAAGGCGA+ATAGAGAC 141     *       0       0       *
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA 6666666666<<<<<<666666666666666666666666
NS500697:45:HJGYTAFXX:2:11101:23195:4485 1:N:0:TAAGGCGA+ATAGAGAG        77      *       0       0
TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT
TTTTTTTTTTTTTTTTTTTTTT  AAAAAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE
EEEEEEEEEEEEEEEEEEEEEEEEEEEEEAEEEEAE<EE<A/EA//E
NS500697:45:HJGYTAFXX:2:11101:23195:4485 1:N:0:TAAGGCGA+ATAGAGAG        141     *       0       0
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA A6A6<<<6<<E<<<<<6<<6<6<<<<<6666<<<<6666E66E6666666EE6666E66EEEE666
6AAAAAAAEA6EAAEAAE6E6EAE6EAEEAEAAEAEAEEEAEEEAAEEAEEEEEEAEEEEAEAE
NS500697:37:HH5V7BGXY:2:22101:16007:4646 1:N:0:TAAGGCGA+ATAGAGAG        77      *       0       0
TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT      AAAAAEEEEEEEEEEEEEEEEEEEEEEEEEEEEE
EEEEEEEEEEEEE/AE/EEA//A
bowtie2
Code:
@HD     VN:1.0  SO:unsorted
@SQ     SN:ZIKV_PRVABC59_CDS    LN:10272
@PG     ID:bowtie2      PN:bowtie2      VN:2.2.9        CL:"/apps/bowtie2-2.2.9/bowtie2-align-s --
apper basic-0 -x /data/ebel_lab/Weger/indexed_fasta_bowtie/ZIKV__PRVABC59_CDS -S 1_bowtie.sam -1 /
p/31806.inpipe1 -2 /tmp/31806.inpipe2"
NS500697:37:HH5V7BGXY:4:13502:9896:14279        77      *       0       0       *       *       0
AGCCTAAGTATGTCAATACAACAAATACTTACTGTTTCATTTCTAGTAATGAAAAAAAAAAAAAAAA     AAAAAEEEEEEEEEEEEEEEEEEEEE
EEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE/EE        YT:Z:UP
NS500697:37:HH5V7BGXY:4:13502:9896:14279        141     *       0       0       *       *       0
CAGAGACATACCCCTCCTTACAAATAAAAATGTTCTTTATAGATGTAAATTTATTTTACAAAAATGTTTCAAAATGACCAGATGAAAATCATCCTTAT
CAGAAAGACTTGTTTTTTTTTTTTTAAT    AAAAAEEEEEEE/EEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE
EEEEEEEEEEEEEEEEEEAEEEEEEEEEEEEEEEEEAEEE/AEEEEEEEEEEEEEEEE/<E   YT:Z:UP
NS500697:37:HH5V7BGXY:4:12605:4827:14803        77      *       0       0       *       *       0
AGCCTAAGTATGTCAATACAACAAATACTTACTGTTTCATTTCTAGTAATGAAAAAAAAAAAAACAA     AAAAAEEEEEEEEEEEEEEEEEEEEE
EEEEEEEEEEEEEEEEEAEEAEEEEEEEEEEEEEEEEAEE        YT:Z:UP
NS500697:37:HH5V7BGXY:4:12605:4827:14803        141     *       0       0       *       *       0
CAGAGACATACCCCTCCTTACAAATAAAAATGTTCTTTATAGATGTAAATTTATTTTACAAAAATGTTTCAAAATGACCAGATGAAAATCATCCTTAT
CAGAAAGACTTGTTTTTTTTTTTTTAATT   AAAAAEEEEEEEEAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE
EEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEA<AEEEEEEEEEEEEEEEEEEEEEEE<<<E  YT:Z:UP
NS500697:37:HH5V7BGXY:4:22607:4886:15556        77      *       0       0       *       *       0
AGCCTAAGTATGTCAATACAACAAATACTTACTGTTTCATTTCTAGTAATGAAAAAAAAAAAAAAAA     AAAAAEEEEEEEEEAEEAEEEEEEEE
E/EEEAE/AAEEEEEEAAEEEEEAA6EEE6EEEEEEEE/E        YT:Z:UP
NS500697:37:HH5V7BGXY:4:22607:4886:15556        141     *       0       0       *       *       0
AGAGACATACCCCTCCTTACAAATAAAAATGTTCTTTATAGATGTAAATTTATTTTACAAAAATGTTTCAAAATGACCAGATGAAAATCATCCTTATG
AGAAAGACTTGTTTTTTTTTTTTTTAATTA  AAAAEEEEEEA/EAEEEEEEAEEEEEEEEEEEEEEEEEEEEAEEEEEEEEEEEEEEEEEEEEEEEE
EEEEEEEEEEAEEAEEEEEEEAEEEEEEAEAAE/AEAAEEEEE6E/EEEAEEAEEEEA//<EA YT:Z:UP
NS500697:37:HH5V7BGXY:4:12507:9984:15827        77      *       0       0       *       *       0
AGCCTAAGTATGTCAATACAACAAATACTTACTGTTTCATTTCTAGTAATGAAAAAAAAAAAAAAAAGTCTTTAT     AAAAAEAEEEEEEEEEEE
EEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE6E6EEE/EEEEE/EE/E6EAE/E        YT:Z:UP
Any ideas you might have would be greatly appreciated.

Thanks in advance.

Last edited by GenoMax; 08-18-2017 at 03:23 AM. Reason: Added [CODE] tags
jweger1988 is offline   Reply With Quote
Old 08-18-2017, 03:30 AM   #535
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 6,495
Default

I wonder if viterbi program is having trouble with (NS500697:37:HH5V7BGXY:2:23303:13708:4313 1:N:0:TAAGGCGA+ATATAGAG ) the tag sequence which BBMap preserves but Bowtie does not.

Can you "reformat" the bbmap alignment using "trd=t" flag and see if that helps?
GenoMax is offline   Reply With Quote
Old 08-18-2017, 04:12 AM   #536
jweger1988
Member
 
Location: Paris, France

Join Date: Apr 2017
Posts: 35
Default

So I tried

reformat.sh in=1_bbmap.sam out=1_bbmap_reform1.bam trd=t sam=1.3

and just

reformat.sh in=1_bbmap.sam out=1_bbmap_reform1.bam trd=t

I then tried running lofreq viterbi either with sorting or without and same error.

Segmentation fault (core dumped)

I did notice that bbmap bams have the N and bowtie does not.

Any more thoughts?

Thanks for the quick reply.
jweger1988 is offline   Reply With Quote
Old 08-18-2017, 04:54 AM   #537
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 6,495
Default

I can't think of anything now. Let us see if Brian has any other suggestions.

lofreq viterbi seems to be for re-alignment. I wonder if you can do without that with BBMap generated files.
GenoMax is offline   Reply With Quote
Old 08-18-2017, 05:01 AM   #538
jweger1988
Member
 
Location: Paris, France

Join Date: Apr 2017
Posts: 35
Default

Quote:
Originally Posted by GenoMax View Post
I can't think of anything now. Let us see if Brian has any other suggestions.

lofreq viterbi seems to be for re-alignment. I wonder if you can do without that with BBMap generated files.
It seems to end up calling weird indels when viterbi is not run that are clearly not real. Thanks for the help. I'll wait for Brian and see what he thinks.
jweger1988 is offline   Reply With Quote
Old 08-23-2017, 01:05 PM   #539
jweger1988
Member
 
Location: Paris, France

Join Date: Apr 2017
Posts: 35
Default

Hi Brian,

I'm trying to call some kmers with kmercountexact.sh and am running into some troubles.

I am merging paired end reads with bbmerge then mapping to a small viral reference file with bbmap. I am then using reformat.sh to trim to a smaller read length and then calling/counting kmers with kmercountexact.

The problem I am having is that after mapping I'm getting reads that are both in the forward and reverse sense. So when calling kmers I get them from both ends of the sequence I'm interested in. Is there any way to reverse complement just the reads that are in the reverse complement in relation to the reference sequence?

Thanks in advance.
jweger1988 is offline   Reply With Quote
Old 08-23-2017, 02:53 PM   #540
Brian Bushnell
Super Moderator
 
Location: Walnut Creek, CA

Join Date: Jan 2014
Posts: 2,695
Default

After mapping, you can split the sam file like this:

Code:
splitsam.sh mapped.sam plus.sam minus.sam unmapped.sam
reformat.sh in=plus.sam out=plus.fq
reformat.sh in=minus.sam out=minus.fq rcomp
cat plus.fq minus.fq > all.fq
As for your previous question (sorry, somehow I did not notice it) - I think Genomax may be right that the issue is the read names, but it's hard to say. Also, it's hard to see what's going on here since none of the reads are mapped...

Incidentally, what do you mean by this?

Quote:
I did notice that bbmap bams have the N and bowtie does not.
Do you mean N in the read name, or in the cigar string, or in the read sequence?

Also, I'd be interested to see how the output of callvariants.sh (with appropriate sensitivity parameters) compares to lofreq, if you want to give it a try.

Last edited by Brian Bushnell; 08-23-2017 at 03:06 PM.
Brian Bushnell is offline   Reply With Quote
Reply

Tags
bbmap, metagenomics, rna-seq aligners, short read alignment

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 06:47 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2017, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO