SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
problem with RepeatScout heiya Bioinformatics 3 10-02-2012 10:44 AM
RepeatMasker output sydghyyh14 Bioinformatics 3 09-11-2012 03:26 PM
RepeatMasker output flipwell Bioinformatics 5 08-17-2011 03:48 PM
RepeatMasker question slny Bioinformatics 0 06-08-2011 12:31 PM
RepeatScout anna_sh Bioinformatics 2 05-06-2011 07:38 AM

Reply
 
Thread Tools
Old 06-09-2010, 05:07 PM   #1
Zimbobo
Member
 
Location: US

Join Date: Mar 2010
Posts: 25
Default RepeatMasker & RepeatScout

Hello there,

I was wondering whether anybody on this list could knows how to run RepeatScout (1.0.5) and RepeatMasker (3.2.8).

Basically I have a new genome, and want to use RepeatScout to make a
library for RepeatMasker.

Here is what I do:

build_lmer_table -sequence genome.fa -freq genome.fq
RepeatScout -sequence genome.fa -output repeats.fa -freq genome.fq
filter-stage-1.prl repeats.fa &> repeats.fa.filter_1
RepeatMasker genome.fa -e abblast -lib repeats.fa.filter_1

Now:
Do I use the correct file for -lib?
RepeatMasker is still complaining about not finding Libraries/RepeatMasker.lib
and Libraries/RepeatmaskerLib.embl.

Thanks a lot in advance for any help.
Zimbobo is offline   Reply With Quote
Old 06-10-2010, 08:40 AM   #2
darked89
Member
 
Location: Barcelona, Spain

Join Date: Jun 2009
Posts: 36
Default

Here is a recipe how to install and run RepeatScout:

http://openwetware.org/wiki/Wikiomic...ng#RepeatScout

Hope it helps,

Darek
darked89 is offline   Reply With Quote
Old 06-10-2010, 09:38 AM   #3
Zimbobo
Member
 
Location: US

Join Date: Mar 2010
Posts: 25
Default

I actually followed those instructions.

RepeatMasker is complaining still about missing libraries (ie Libraries/RepeatMasker.lib etc) and advises to get something from www.girinst.org.

The whole point of running RepeatScout for me is to build my own library. Is there a flag to teach RepeatMasker not to look for those libraries or is there a reason RepeatMasker must have those libraries?
Zimbobo is offline   Reply With Quote
Old 08-09-2010, 07:30 PM   #4
edge
Senior Member
 
Location: China

Join Date: Sep 2009
Posts: 199
Default

Hi Zimbobo,

Can I ask you how you edit the perl script, filter-stage-1.prl to allow it point to the TRF path that we install?
Which line of filter-stage-1.prl that we need to edit the path of TRF?
My server keep on shown the below message:
"No such file or directory at ./filter-stage-1.prl line 110"
Thanks a lot for your sharing and guiding.
edge is offline   Reply With Quote
Old 10-18-2011, 10:57 AM   #5
hkuntal
Junior Member
 
Location: jaipur (india)

Join Date: Oct 2011
Posts: 1
Default repeatscout

hello everyone
i m try to work with repeatscout but every time when i m runninf filter-stage-1.prl, the filtered library generated is created empty( no data)..... any solution???
hkuntal is offline   Reply With Quote
Old 11-11-2011, 01:42 AM   #6
sameoldmike
Junior Member
 
Location: Denmark

Join Date: Oct 2011
Posts: 1
Default

same thing happened to me -- the filtered output file is empty after running for a very long time. it was run on a repeat-rich genome.

could it be that i don't have nseg and TRF properly installed? there is no output about those two programs that i can see...
sameoldmike is offline   Reply With Quote
Old 04-20-2012, 01:24 AM   #7
WhatsOEver
Senior Member
 
Location: Germany

Join Date: Apr 2012
Posts: 215
Default

I don't know if its still an actual problem, but I had it too and was able to solve it on my system (ubuntu11, 64bit).
The libs RepeatMasker is looking for are not the downloaded ones, but the blast dbs that should have been created by rmblast. rmblast itself is looking for a libpcre.so.0 file which it could not find on my system. The file is known to cause problems with some progs as symlinks are not made correctly during updates.
Therefore I just created symlinks manually in my /lib/ and /lib32/ folder to the actual file (so just type "sudo ln -s /lib/libpcre.so.3 /lib/libpcre.so.0" and "sudo ln -s /lib32/libpcre.so.3 /lib32/libpcre.so.0") and afterwards everything worked fine for me

@edge: you don't need to change anything in the .prl file, but you need to rename the trf404-linux64 (or else) executable to simply to trf.

Last edited by WhatsOEver; 04-20-2012 at 01:27 AM.
WhatsOEver is offline   Reply With Quote
Old 04-20-2012, 01:38 AM   #8
gprakhar
Member
 
Location: India

Join Date: Aug 2010
Posts: 78
Default

Hello,

This is not a direct answer to your question, but there is a tool from the Repeat Masker group.
Its called Repeat Modeler, this tool integrates Repeat Scout, RECON and TRF.
It creates a de-novo repeat library and then annotates the sequences.
Repeat Modeler

--
pg
gprakhar is offline   Reply With Quote
Old 04-20-2012, 01:43 AM   #9
WhatsOEver
Senior Member
 
Location: Germany

Join Date: Apr 2012
Posts: 215
Default

Thats true and it works fine, but RepeatModeler also uses RepeatMasker and eventually the rmblast package, so you might have to face the same problems as described before.
WhatsOEver is offline   Reply With Quote
Old 04-20-2012, 01:59 AM   #10
rahularjun86
Member
 
Location: Frankfurt(M), Germany

Join Date: Jan 2011
Posts: 58
Default

Dear All,
I ran Repeatscout successfully, Commands I used:
Code:
1225 ##RepeatSout Run
1226 #step1
1227 build_lmer_table -l 14 -sequence Final_assembly.fasta -freq Final_assembly.freq
1228 #step2
1229 RepeatScout -sequence Final_assembly.fasta -output Final_assembly_repeats.fasta -freq Final_assembly.freq -l 14
1230 #step3
1231 cat Final_assemblyf_repeats.fasta | filter-stage-1.prl > Final_assembly_repeats_filtered_stg1.fasta
1232 #step4
1233 RepeatMasker -pa 20 -s -lib Final_assembly_repeats_filtered_stg1.fasta Final_assembly.fasta &
1234 #step5
1235 cat Final_assembly_repeats_filtered_stg1.fasta | filter-stage-2.prl --cat=Final_assembly.fasta.out --thresh=3 > Final_assembly_repeats_filtered_stg2_thresh3.fasta
1236 #step6
1237 RepeatMasker -pa 20 -s -lib Final_assembly_repeats_filtered_stg2_thresh3.fasta Final_assembly.fasta &
__________________
Rahul Sharma,
Ph.D
Frankfurt am Main, Germany
rahularjun86 is offline   Reply With Quote
Old 09-22-2012, 06:52 AM   #11
tnguyen
Junior Member
 
Location: Australia

Join Date: Sep 2011
Posts: 6
Default

Hi Rahul,

How large was your genome? How much memory was needed for your run? I received this error message at the start of Step 2:

"Could not allocate space for sequence"

Last edited by tnguyen; 09-22-2012 at 07:01 AM.
tnguyen is offline   Reply With Quote
Old 09-22-2012, 06:53 AM   #12
tnguyen
Junior Member
 
Location: Australia

Join Date: Sep 2011
Posts: 6
Default

Sorry the full error message was:

"Could not allocate space for sequence"

Last edited by tnguyen; 09-22-2012 at 07:02 AM.
tnguyen is offline   Reply With Quote
Old 09-29-2012, 12:16 PM   #13
rahularjun86
Member
 
Location: Frankfurt(M), Germany

Join Date: Jan 2011
Posts: 58
Default

Hi tnguyen,
sorry for replying late. Genome was of ~20Mb and other one was in Gb's. Actually I ran on the cluster and I did'nt check the memory it used.
Best wishes,
Rahul
__________________
Rahul Sharma,
Ph.D
Frankfurt am Main, Germany
rahularjun86 is offline   Reply With Quote
Old 09-29-2012, 02:32 PM   #14
tnguyen
Junior Member
 
Location: Australia

Join Date: Sep 2011
Posts: 6
Default

Thank you Rahul,
My genome size is ~1.7Gb, any idea how to make RepeatScout to work for large genome?
TN
tnguyen is offline   Reply With Quote
Old 09-29-2012, 08:11 PM   #15
mike.t
Member
 
Location: Spain

Join Date: Mar 2010
Posts: 36
Default

You probably don't need to use the whole genome for RepeatScout. Just use a few chromosomes or supercontigs. If repeats are distributed across all the chromosomes in the genome, scanning just a few of them with RepeatScout should be enough to find then and create consensus sequences that you can input to RepeatMasker. Then, mask the whole genome with RepeatMasker.
mike.t is offline   Reply With Quote
Old 09-29-2012, 08:22 PM   #16
tnguyen
Junior Member
 
Location: Australia

Join Date: Sep 2011
Posts: 6
Default

Thank you Mike. I will try what you suggest. Sounds like a good idea. I will let you know if it works.
Thanks,
TN
tnguyen is offline   Reply With Quote
Old 09-30-2012, 03:31 PM   #17
DFJ111
Member
 
Location: Auckland

Join Date: Aug 2012
Posts: 20
Default

Here's an example of a run I did successfully. I never got RepeatModeler working, and the installation of the standalone Blast program RMblast was a bit tricky. Make sure TRF and nseg are working too, for the first filtering stage below. As I understand it, RepeatModeler is basically just a wrapper for the programs below anyway.

Repeatscout run: using yourgenome.fasta

Code:
./build_lmer_table -l 14 -sequence yourgenome.fasta  -freq ~/Desktop/Vi_14.freq
Build a frequency table of all repeats of size 14 within the Vi genome

Code:
./RepeatScout -sequence yourgenome.fasta  -output your_repeats.fasta -freq your_freq_table -l 14
Greedily extend 14-mer repeats until they diverge (see http://bix.ucsd.edu/repeatscout/repeatscout-ismb.pptfor a good explanation of this)

Code:
cat your_repeats.fasta| ./filter-stage-1.prl >your_repeats_filtered1.fasta
Filter out low-complexity or tandem repeats

Code:
./RepeatMasker -s -lib your_repeats_filtered1.fasta yourgenome.fasta
Generate a masked genome using (non-low-complexity, non-tandem) repeats

Code:
cat your_repeats_filtered1.fasta | ./filter-stage-2.prl --cat yourgenome.fasta.out --thresh 10  your_repeats_filtered2.fasta
Filter out all (non-low-complexity, non-tandem) repeats that have less than 10 repeats

Code:
./RepeatMasker -pa 4 -s -lib your_repeats_filtered2.fasta -nolow -norna -no_is -gff yourgenome.fasta
Produce a .gff file (among other files) of all non-low-complexity, non-tandem, non-rRNA repeats.

Obviously you might need to modify parameters here and there to fit your requirements. The naming of the features in the resulting .gff file is a bit uninformative too.
DFJ111 is offline   Reply With Quote
Old 09-30-2012, 03:34 PM   #18
DFJ111
Member
 
Location: Auckland

Join Date: Aug 2012
Posts: 20
Default

By the way Zimbobo, if you're doing de novo repeat element predictions you won't need existing repeat element libraries at all. You generate them yourself.
DFJ111 is offline   Reply With Quote
Old 10-02-2012, 12:20 AM   #19
tnguyen
Junior Member
 
Location: Australia

Join Date: Sep 2011
Posts: 6
Default

Hi DFJ111 and mike.t,

I followed the suggestions from you both, the repeat library was successfully built.

When I ran the first filter, the results said:

14184 deleted. 14185 saved. 111 skipped for length.

but the output file (contigs65fullQC2.filtered.fa.gt1k.fa.repeatscout.filter1) was empty.
Code:
cat /group/aquaculture/mussels/sequencing/MUSSEL1/repeatscout/contigs65fullQC2.filtered.fa.gt1k.fa.repeatscout | ./filter-stage-1.prl > contigs65fullQC2.filtered.fa.gt1k.fa.repeatscout.filter1
Do you have any idea why?

Thanks,
TN
tnguyen is offline   Reply With Quote
Old 10-02-2012, 12:57 AM   #20
mike.t
Member
 
Location: Spain

Join Date: Mar 2010
Posts: 36
Default

I haven't run RepeatScout in a while so I'm afraid I can't help you. You may want to try another de novo repeat finding program. Try piler or RepeatModeler. piler usually works pretty well on fungi, although I am using the REPET pipeline these days.
mike.t is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 07:04 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2018, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO