SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Targeted resequencing/enrichment for low complexity regions tracecakes Genomic Resequencing 1 01-27-2014 09:27 PM
Finding and segregating low-complexity regions coding the genome Genomics101 Bioinformatics 0 12-19-2013 10:42 AM
Better accuracy in assembling, and SNP calling in, low-complexity sequence regions? Genomics101 Bioinformatics 2 08-30-2012 03:18 AM
internal multiplication of low-complexity regions during cluster building ein_io Illumina/Solexa 0 05-09-2012 04:38 PM
Changing dNTP Flow Order for Low-Complexity Template Regions SeqNerd Ion Torrent 9 01-16-2012 07:28 AM

Reply
 
Thread Tools
Old 09-17-2014, 01:30 AM   #1
maglund
Junior Member
 
Location: Sweden

Join Date: Sep 2014
Posts: 2
Default Repeatmasker - How to make searches for low complexity regions less stringent?

I am using RepeatMasker only to find regions of low-complexity regions of DNA. With the default settings "100 bp stretch of DNA is masked when it is >87% AT or >89% GC, a 30 bp stretch has to contain 29 A/T (or GC) nucleotides. ". What can I do to loosen this criteria and play around with the settings?

Perhaps there is better program for what I want to accomplish? I have a list of 60000 rather short sequences (every sequence is about 600 bases).

Thanks
maglund is offline   Reply With Quote
Old 09-17-2014, 09:58 AM   #2
Brian Bushnell
Super Moderator
 
Location: Walnut Creek, CA

Join Date: Jan 2014
Posts: 2,707
Default

I made a tool called BBMask, available in the BBTools package, here.

Usage:
bbmask.sh -Xmx6g in=file.fa out=masked.fa window=80 entropy=0.75 ke=5

That will mask areas with entropy below 0.75 on a scale of 0-1, using a window size of 80, using a kmer length of 5 for entropy calculation. Those are the default settings, but you can customize it. A higher value of entropy will mask more sequences. It's extremely fast so you can play around with the settings (mainly entropy) until it masks the amount you want. It reports how much it masked.
Brian Bushnell is offline   Reply With Quote
Old 09-18-2014, 02:34 AM   #3
maglund
Junior Member
 
Location: Sweden

Join Date: Sep 2014
Posts: 2
Default

Your program looks like it might do the trick. Quick and easy to change the parameters. But I would like an output file that gives me %masked/input sequence.

I have tried the covstats and scafstats outputs but I get "unknown parameter". There is an example below. What I have tired is to change the file format. In the example I have written .fa but I have also tried other formats or simply skipped writing a format. What am I doing wrong?

Thank you for your kind help

magnus@magnus-MacBookPro:~/Downloads/bbmap$ bash bbmask.sh -Xmx6g in=/home/magnus/Downloads/Testar_med_farre.fa out=/home/magnus/Documents/BBMask/masked8.fa covstats=/home/magnus/Documents/BBMask/covstats.fa window=20 entropy=0.95 ke=5 overwrite
bbmask.sh: line 87: module: command not found
bbmask.sh: line 88: module: command not found
java -ea -Xmx6g -cp /home/magnus/Downloads/bbmap/current/ jgi.BBMask -Xmx6g in=/home/magnus/Downloads/Testar_med_farre.fa out=/home/magnus/Documents/BBMask/masked8.fa covstats=/home/magnus/Documents/BBMask/covstats.fa window=20 entropy=0.95 ke=5 overwrite
Executing jgi.BBMask [-Xmx6g, in=/home/magnus/Downloads/Testar_med_farre.fa, out=/home/magnus/Documents/BBMask/masked8.fa, covstats=/home/magnus/Documents/BBMask/covstats.fa, window=20, entropy=0.95, ke=5, overwrite]

Unknown parameter covstats=/home/magnus/Documents/BBMask/covstats.fa
Exception in thread "main" java.lang.AssertionError: Unknown parameter covstats=/home/magnus/Documents/BBMask/covstats.fa
at jgi.BBMask.<init>(BBMask.java:216)
at jgi.BBMask.main(BBMask.java:45)
maglund is offline   Reply With Quote
Old 09-18-2014, 09:31 AM   #4
Brian Bushnell
Super Moderator
 
Location: Walnut Creek, CA

Join Date: Jan 2014
Posts: 2,707
Default

Oh... let me clarify. The "readme.txt" file is for BBMap. BBMask's instructions are in its shellscript; you can print them by running the shellscript (bbmask.sh) with no arguments. So, covstats and scafstats are just for BBMap. The percent masked will be printed to the screen. So, the command should be this:

bash bbmask.sh -Xmx6g in=/home/magnus/Downloads/Testar_med_farre.fa out=/home/magnus/Documents/BBMask/masked8.fa window=20 entropy=0.95 ke=5 overwrite

A complete run looks like this:

Code:
bash bbmask.sh in=Panicum_hallii.fasta out=masked.fasta
java -ea -Xmx46673m -cp /usr/common/jgi/utilities/bbtools/prod-v33.42/lib/BBTools.jar jgi.BBMask in=Panicum_hallii.fasta out=masked.fasta
Executing jgi.BBMask [in=Panicum_hallii.fasta, out=masked.fasta]

Loading input
Loading Time:                   2.920 seconds.

Masking low-entropy (to disable, set 'mle=f')
Low Complexity Masking Time:    2.703 seconds.
Ref Bases:                 556945529    206.08m bases/sec
Low Complexity Bases:         899687

Converting masked bases to N
Done Masking
Conversion Time:                1.784 seconds.

Writing output
Writing Time:                   1.171 seconds.

Total Bases Masked:           899687/556945529  0.162%
Total Time:                     8.611 seconds.
That all gets printed to std err, so if you want to log it in a file, add >2 at the end, like this:

bash bbmask.sh -Xmx6g in=/home/magnus/Downloads/Testar_med_farre.fa out=/home/magnus/Documents/BBMask/masked8.fa window=20 entropy=0.95 ke=5 overwrite 2>log.txt
Brian Bushnell is offline   Reply With Quote
Reply

Tags
low complexity, repeatmasker

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 10:02 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2021, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO