View Single Post
Old 01-06-2017, 06:28 AM   #408
JVGen
Member
 
Location: East Coast

Join Date: Jul 2016
Posts: 37
Default

Quote:
Originally Posted by Brian Bushnell View Post
How many files do you have after BBMap, and what are they named?

The file name is: "1-JL08-P1-A3 assembled to HXB2 Nested Amplified Region extraction". Within the file, there are thousands of reads with the naming convention that I shared in the previous post.


I doubt it - BBTools should be able to handle reads named like that.



For the default window=50 entropyk=5, reads must be at least 50bp long to be processed by the entropy filter (you can reduce that by making the window smaller). And entropy=0.01 will remove any sequence that is a singly mononucleotide, as long as it's at least 50bp long. Note that if there are some errors so that it is no longer a pure mononucleotide you'd need a higher value for entropy. Something like "AAAAAAAAAAGGGGGGGGGGGGGGGG" would also need a higher value (50 A's and 50 G's appears to need entropy=0.21). Don't set it too high, though, or you'll lose the low complexity parts of your genome.
I think I might try something like window = 15 and entropy 0.01. That should get rid of the mononucleotide strings. The HIV genome generally doesn't have anything like that, so it should work.


***Update. Brian, I contacted Geneious and they seem to be aware of the problem. They gave me a macro/workflow that extracts the reads from the BBMap'd contig file, and now they are feeding into Tadpole without a problem. Thanks for your help on this, you're getting all the gold stars!

Last edited by JVGen; 01-06-2017 at 07:51 AM.
JVGen is offline   Reply With Quote