![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
RepeatModeler runtime | sunhh | Bioinformatics | 0 | 10-02-2012 11:14 AM |
Problem running RepeatModeler (using WUBlast) | najoshi | Bioinformatics | 1 | 08-19-2011 01:18 PM |
![]() |
|
Thread Tools |
![]() |
#1 |
Junior Member
Location: France Join Date: Oct 2012
Posts: 1
|
![]()
I,
I want to use repeatModeler: I have created my database witout error, but when I launch the RepeatModeler script, I have this error in the output file : nohup: ignoring input RepeatModeler Version open-1.0.5 ================================ Search Engine = ncbi Database = PM .. - Sequences = 12940 - Bases = 93195127 Using temporary directory = /home/chris/ReapeatModeler/PM/RM_15527.TueOct161418482012 RepeatModeler Round # 1 ======================== Searching for Repeats -- Sampling from the database... BeGINNING... - Gathering up to 40000000 bp RepeatModeler::sampleFromDB() Could not obtain sequence ncbi ( entry = 1-0-5, start = 1 end = 46138 ) from the database! Have you an idea? Thanks by advance Chris |
![]() |
![]() |
![]() |
#2 |
Member
Location: US Join Date: Feb 2012
Posts: 19
|
![]()
I got the same problem, cannot figure out......
|
![]() |
![]() |
![]() |
#3 |
Junior Member
Location: Madison, WI Join Date: Jan 2011
Posts: 6
|
![]()
Any luck with figuring out your problem? I'm similarly lost.
|
![]() |
![]() |
![]() |
#4 | |
Member
Location: China Join Date: Sep 2011
Posts: 14
|
![]() Quote:
lyn |
|
![]() |
![]() |
![]() |
#5 |
Junior Member
Location: Japan Join Date: Nov 2012
Posts: 2
|
![]()
I also got the same problem using NCBI rpsblast engine.
After several fixation below, RepeatModeler started to run, though I don't know whether there are some problems or not, and there remains the possiblity that my fasta input might have been incorrect. At least, rondomely selected genomic DNA sequences were generated for statistic calculation of repetition. Anyway, the main problem was calling of "blastdbcmd" from the RepeatModeler perl script. (The below Line numbers might be inaccurate because I modified the file.) Line 281: Modification `$RepModelConfig::NCBIDBCMD_PRGM -db $genomeDB -entry all -outfmt "%g %l"` ( "%t %l" -> "%g %l" ) #In my environment, the outfmt %t outputted nothing. So, I used %g instead. Line 1779: Modification my $openCoordStart = $start ( $start - 1 -> $start ) #In my database, $start often outputted zero (0) though blastdbcmd program doesn't accept zero as input in -range option. So, I deleted "- 1" in the script. Line 1780: Insertion $seq = `$RepModelConfig::NCBIDBCMD_PRGM -db $dbFile -entry $seqID -range openCoordStart-$openCoordEnd`; #It seems that the program does not accept input without regitering our rmsblast database with gi| tags. So, I ignored " if ( $seqID =~ /gi\|(\d+)/ ) { ..." sentence and inserted another input line. Line 1783: Modification `$RepModelConfig::NCBIDBCMD_PRGM -db $dbFile -entry $seqID -range $openCoordStart-$openCoordEnd`; ( -range $openCoordEnd-$openCoordStart -> -range $openCoordStart-$openCoordEnd ) #The correct input format of coordinate values for "-range" option of blastdbcmd is "Start"-"End". However, the order was reverse in the script. |
![]() |
![]() |
![]() |
#6 |
Junior Member
Location: Madison, WI Join Date: Jan 2011
Posts: 6
|
![]()
I think you might be my new hero Tando, thanks!
I will point out that my version of the script 1.0.5 is slightly different.. For me these changes got the program to work: line 281: change ( %t --> %g ) line 1775: remove -1; ($start - 1 --> $start) line 1776: remove the If condition it seems that when i use BuildDatabase the seqID takes the form: gi|1:3333 (as opposed to gi|1 ).. I just removed the statement.. so my $seqID is a full gi|1:333 and not just a number. IF this becomes a problem then I should just redefine $seqID line 1778: my script was $seq = `$RepModelConfig::NCBIDBCMD_PRGM -db $dbFile -entry $1 -range $openCoordStart-$openCoordEnd`; I changed ( -entry $1 --> -entry $seqID ). $1 is defined as the seqID earlier in the script but that value doesn't get passed to the subdomain for sampleFromDB() . rather it uses some other definition of $1, and it ended up using "1.0.5" (the script version number) as the entry number. my perl skills are pretty weak and I couldn't determine what exactly was happening here, but your version makes more sense and seems to work. Thanks again, I for one, appreciate it! |
![]() |
![]() |
![]() |
#7 |
Junior Member
Location: Japan Join Date: Nov 2012
Posts: 2
|
![]()
$1, $2, $3 ... are the special variables that receive the 1st, 2nd and 3rd ... matches of regular expression, respectively.
The script is assuming that $1 receives sequence IDs when conducting RegExp match at Line:1780 ( if ( $seqID =~ /gi\|(\d+)/...). However, without any match in this line (without the gi| tag), $1 (and $seq in the successive if sentence) are not renewed, and unfortunately, there remains the previously matched characters of the script version, "1-0-5" in $1. This causes aborting at the next "die if ($seq eq "") ..." lines and output "1-0-5" message. |
![]() |
![]() |
![]() |
#8 |
Junior Member
Location: germany Join Date: Oct 2010
Posts: 2
|
![]()
Hi guys,
I got the same problem with RepeatModeler_1.0.5 I changed the script like you proposed, except the Lines 1776 and 1778: if ( $seqID =~ /gi\|(\d+)/ ) { $seq = `$RepModelConfig::NCBIDBCMD_PRGM -db $dbFile -entry $1 -range $openCoordStart-$openCoordEnd`; } Tando, you said, that you ignored the 1st line and inserted another one. How does these lines have to look like exactly then? I would be very grateful for some help. Thanks in advance! |
![]() |
![]() |
![]() |
#9 |
Member
Location: Barcelona Join Date: Dec 2012
Posts: 19
|
![]()
Hi guys!
I am trying to install RrepeatModeler, but when I give it RepeatMasker path it returns: “RepeatMasker is too old. Must be open-4.0.0 or later. Install a newer version of RepeatMasker and re-run configure.” So I re-installed the latest version of RepeatMasker (Latest Released Version: 1/10/2013: RepeatMasker-open-4-0-0.tar.gz) and tried again with RepeatModeler, but it keeps saying the same, even if it is the version it is asking for. It may be because of the name of the file. My file doesn’t have the version number (open-4.0.0), when I unpacked it changes to RepeatMasker only. But it may not be this. Any ideas? Thanks in advance Nuria |
![]() |
![]() |
![]() |
#10 | |
Junior Member
Location: Belgium Join Date: Jan 2013
Posts: 1
|
![]() Quote:
modify in the configure script line 214 '$version <= 400' should be '$version < 400' Stephane |
|
![]() |
![]() |
![]() |
#11 |
Member
Location: Barcelona Join Date: Dec 2012
Posts: 19
|
![]()
Thank you very much for your help Stephane
I changed it and it worked!! ![]() Nuria |
![]() |
![]() |
![]() |
#12 |
Junior Member
Location: Chicago Join Date: Mar 2011
Posts: 3
|
![]()
Hello all,
I am able to successfully run RepeatModeler (1-0-7) and it returns several hundred repeat models in my genome. However, all of these models are a result of RECON; nothing is returned by RepeatScout. RepeatScout is called during RepeatModeler round 1 but at the end it says "NOTE: RepeatScout did not return any models." RepeatScout is not called again by RepeatModeler. However, when I run RepeatScout directly on my genome it returns several hundred repeat models. Has anybody successfully gotten RepeatScout to return repeat models within RepeatModeler? I don't understand why this would happen since RepeatScout works when I run it outside of RepeatModeler. Any ideas? Thanks, Ben |
![]() |
![]() |
![]() |
#13 |
Junior Member
Location: Australia Join Date: Sep 2009
Posts: 2
|
![]()
Hi Everyone,
I've exactly the same problem as Ben described above, no models returned by 'RepeatScout' with 'RepeatModeler' run, however, many repeat models with independent 'RepeatScout' run. Also is there any option to make 'RepeatModeler' run faster (e.g. parallel processing like that of RepeatMasker ? Cheers. |
![]() |
![]() |
![]() |
#14 |
Member
Location: Washington Join Date: Sep 2012
Posts: 10
|
![]()
Hi Ben,
RepeatScout is a great program for finding highly conserved repetitive elements. As a consequence we run RepeatScout first ( and only one round ) in order to find and remove the young elements first before moving on to RECON. RepeatScout will often will find tandem repeats and low complexity sequences in its return set. These are filtered out in RepeatModeler. You may want to check your hand-run result set isn't completely simple/low complexity by running nseg/trf on it. Another consideration is your choice of lmer size for RepeatScout. To fairly compare the results from both programs you need to use the same lmer size and the same sample ( from the input ) sequence. I rarely check seqanswers so please feel free to contact us through our website if you have further questions ( www.repeatmasker.org ). -R |
![]() |
![]() |
![]() |
#15 |
Junior Member
Location: Chicago Join Date: Mar 2011
Posts: 3
|
![]()
Thank you for your input Robert. My problem turned out to be with RepeatScout, not RepeatModeler. Line 26 and 27 of the RepeatScout script "filter-stage-1.prl" are:
my $TRF_COMMAND = $ENV{'TRF_COMMAND'} || "trf"; my $NSEG_COMMAND = $ENV{'NSEG_COMMAND'} || "nseg"; I changed this to: my $TRF_COMMAND = "trf"; my $NSEG_COMMAND = "nseg"; Note that both "trf" and "nseg" are executables in my path. I don't know perl so I don't fully understand what is going on, but I think that RepeatScout was failing to find tandem repeat finder (TRF) and, without anything back from TRF, it determined that everything was a tandem repeat and filtered it all out. However, this must have something to do with calling TRF from within RepeatModeler, as RepeatScout returned models for me when I used it independently, so something funny appears to be happening with paths. Regardless, the RepeatModeler pipeline is now fully functional for me and recovers repeat models from RepeatScout as well as RECON. |
![]() |
![]() |
![]() |
#16 |
Member
Location: Washington Join Date: Sep 2012
Posts: 10
|
![]()
Hi Ben,
That's an interesting find. Alke's filter-stage-1.prl script should be better at reporting when it cannot find a dependent program. I added the following code to the script: use File::Which; unless ( which( $TRF_COMMAND ) && which( $NSEG_COMMAND ) ) { die "ERROR: RepeatScout script filter-stage-1.prl cannot find 'trf' or 'nseg' programs in the user's path!\n"; } This should at least produce an error message that will indicate something went wrong. I don't see why you would need to get rid of the "$ENV{'TRF_COMMAND'} ||" portion as that is simply a conditional statement which allows you to have environmental variables set to point to the programs location. Perhaps you have these environmental variables set and set incorrectly? Perhaps when you ran the program you hadn't updated the shell's path in memory ( using rehash command )? In any case, I am glad you got it working. I will push this change to filter-stage-1.prl out in the next release of RepeatScout. |
![]() |
![]() |
![]() |
Thread Tools | |
|
|