SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
RepeatModeler runtime sunhh Bioinformatics 0 10-02-2012 11:14 AM
Problem running RepeatModeler (using WUBlast) najoshi Bioinformatics 1 08-19-2011 01:18 PM

Reply
 
Thread Tools
Old 10-16-2012, 06:13 AM   #1
chrisbioinfo
Junior Member
 
Location: France

Join Date: Oct 2012
Posts: 1
Default RepeatModeler

I,

I want to use repeatModeler:
I have created my database witout error, but when I launch the RepeatModeler script, I have this error in the output file :


nohup: ignoring input
RepeatModeler Version open-1.0.5
================================
Search Engine = ncbi
Database = PM ..
- Sequences = 12940
- Bases = 93195127
Using temporary directory = /home/chris/ReapeatModeler/PM/RM_15527.TueOct161418482012


RepeatModeler Round # 1
========================
Searching for Repeats
-- Sampling from the database...
BeGINNING...
- Gathering up to 40000000 bp

RepeatModeler::sampleFromDB() Could not obtain sequence ncbi ( entry = 1-0-5, start = 1 end = 46138 ) from the database!





Have you an idea?
Thanks by advance

Chris
chrisbioinfo is offline   Reply With Quote
Old 11-02-2012, 03:45 PM   #2
mkdir
Member
 
Location: US

Join Date: Feb 2012
Posts: 19
Default

I got the same problem, cannot figure out......
mkdir is offline   Reply With Quote
Old 11-27-2012, 09:09 AM   #3
themwg
Junior Member
 
Location: Madison, WI

Join Date: Jan 2011
Posts: 6
Default RepeatModeler

Any luck with figuring out your problem? I'm similarly lost.
themwg is offline   Reply With Quote
Old 11-27-2012, 08:57 PM   #4
Lyn Hsiong
Member
 
Location: China

Join Date: Sep 2011
Posts: 14
Default

Quote:
Originally Posted by chrisbioinfo View Post
I,

I want to use repeatModeler:
I have created my database witout error, but when I launch the RepeatModeler script, I have this error in the output file :


nohup: ignoring input
RepeatModeler Version open-1.0.5
================================
Search Engine = ncbi
Database = PM ..
- Sequences = 12940
- Bases = 93195127
Using temporary directory = /home/chris/ReapeatModeler/PM/RM_15527.TueOct161418482012


RepeatModeler Round # 1
========================
Searching for Repeats
-- Sampling from the database...
BeGINNING...
- Gathering up to 40000000 bp

RepeatModeler::sampleFromDB() Could not obtain sequence ncbi ( entry = 1-0-5, start = 1 end = 46138 ) from the database!





Have you an idea?
Thanks by advance

Chris
Hi, I think you could try to change the engine by "-engine abblast". I don't know why, but it works for me when I have a similar problem.
lyn
Lyn Hsiong is offline   Reply With Quote
Old 11-30-2012, 02:37 AM   #5
tando
Junior Member
 
Location: Japan

Join Date: Nov 2012
Posts: 2
Default

I also got the same problem using NCBI rpsblast engine.
After several fixation below, RepeatModeler started to run, though I don't know whether there are some problems or not, and there remains the possiblity that my fasta input might have been incorrect. At least, rondomely selected genomic DNA sequences were generated for statistic calculation of repetition.
Anyway, the main problem was calling of "blastdbcmd" from the RepeatModeler perl script.

(The below Line numbers might be inaccurate because I modified the file.)

Line 281: Modification
`$RepModelConfig::NCBIDBCMD_PRGM -db $genomeDB -entry all -outfmt "%g %l"`
( "%t %l" -> "%g %l" )
#In my environment, the outfmt %t outputted nothing. So, I used %g instead.

Line 1779: Modification
my $openCoordStart = $start
( $start - 1 -> $start )
#In my database, $start often outputted zero (0) though blastdbcmd program doesn't accept zero as input in -range option. So, I deleted "- 1" in the script.

Line 1780: Insertion
$seq = `$RepModelConfig::NCBIDBCMD_PRGM -db $dbFile -entry $seqID -range openCoordStart-$openCoordEnd`;
#It seems that the program does not accept input without regitering our rmsblast database with gi| tags. So, I ignored " if ( $seqID =~ /gi\|(\d+)/ ) { ..." sentence and inserted another input line.

Line 1783: Modification
`$RepModelConfig::NCBIDBCMD_PRGM -db $dbFile -entry $seqID -range $openCoordStart-$openCoordEnd`;
( -range $openCoordEnd-$openCoordStart -> -range $openCoordStart-$openCoordEnd )
#The correct input format of coordinate values for "-range" option of blastdbcmd is "Start"-"End". However, the order was reverse in the script.
tando is offline   Reply With Quote
Old 11-30-2012, 03:07 PM   #6
themwg
Junior Member
 
Location: Madison, WI

Join Date: Jan 2011
Posts: 6
Default

I think you might be my new hero Tando, thanks!

I will point out that my version of the script 1.0.5 is slightly different..
For me these changes got the program to work:

line 281: change ( %t --> %g )

line 1775: remove -1; ($start - 1 --> $start)

line 1776: remove the If condition
it seems that when i use BuildDatabase the seqID takes the form: gi|1:3333 (as opposed to gi|1 ).. I just removed the statement.. so my $seqID is a full gi|1:333 and not just a number. IF this becomes a problem then I should just redefine $seqID

line 1778: my script was
$seq = `$RepModelConfig::NCBIDBCMD_PRGM -db $dbFile -entry $1 -range $openCoordStart-$openCoordEnd`;
I changed ( -entry $1 --> -entry $seqID ).
$1 is defined as the seqID earlier in the script but that value doesn't get passed to the subdomain for sampleFromDB() . rather it uses some other definition of $1, and it ended up using "1.0.5" (the script version number) as the entry number. my perl skills are pretty weak and I couldn't determine what exactly was happening here, but your version makes more sense and seems to work.

Thanks again, I for one, appreciate it!
themwg is offline   Reply With Quote
Old 11-30-2012, 07:23 PM   #7
tando
Junior Member
 
Location: Japan

Join Date: Nov 2012
Posts: 2
Default

$1, $2, $3 ... are the special variables that receive the 1st, 2nd and 3rd ... matches of regular expression, respectively.

The script is assuming that $1 receives sequence IDs when conducting RegExp match at Line:1780 ( if ( $seqID =~ /gi\|(\d+)/...).

However, without any match in this line (without the gi| tag), $1 (and $seq in the successive if sentence) are not renewed, and unfortunately, there remains the previously matched characters of the script version, "1-0-5" in $1.

This causes aborting at the next "die if ($seq eq "") ..." lines and output "1-0-5" message.
tando is offline   Reply With Quote
Old 12-14-2012, 06:13 AM   #8
jaZt
Junior Member
 
Location: germany

Join Date: Oct 2010
Posts: 2
Default

Hi guys,

I got the same problem with RepeatModeler_1.0.5

I changed the script like you proposed, except the Lines 1776 and 1778:
if ( $seqID =~ /gi\|(\d+)/ ) {
$seq =
`$RepModelConfig::NCBIDBCMD_PRGM -db $dbFile -entry $1 -range $openCoordStart-$openCoordEnd`;
}


Tando, you said, that you ignored the 1st line and inserted another one.

How does these lines have to look like exactly then?

I would be very grateful for some help.
Thanks in advance!
jaZt is offline   Reply With Quote
Old 01-29-2013, 03:02 AM   #9
HeyIamNuria
Member
 
Location: Barcelona

Join Date: Dec 2012
Posts: 19
Default

Hi guys!
I am trying to install RrepeatModeler, but when I give it RepeatMasker path it returns:
“RepeatMasker is too old. Must be open-4.0.0 or later. Install a newer version of RepeatMasker and re-run configure.”

So I re-installed the latest version of RepeatMasker (Latest Released Version: 1/10/2013: RepeatMasker-open-4-0-0.tar.gz) and tried again with RepeatModeler, but it keeps saying the same, even if it is the version it is asking for.

It may be because of the name of the file. My file doesn’t have the version number (open-4.0.0), when I unpacked it changes to RepeatMasker only. But it may not be this.

Any ideas?
Thanks in advance

Nuria
HeyIamNuria is offline   Reply With Quote
Old 01-31-2013, 08:51 AM   #10
stephrom
Junior Member
 
Location: Belgium

Join Date: Jan 2013
Posts: 1
Default

Quote:
Originally Posted by HeyIamNuria View Post
Hi guys!
I am trying to install RrepeatModeler, but when I give it RepeatMasker path it returns:
“RepeatMasker is too old. Must be open-4.0.0 or later. Install a newer version of RepeatMasker and re-run configure.”

So I re-installed the latest version of RepeatMasker (Latest Released Version: 1/10/2013: RepeatMasker-open-4-0-0.tar.gz) and tried again with RepeatModeler, but it keeps saying the same, even if it is the version it is asking for.

It may be because of the name of the file. My file doesn’t have the version number (open-4.0.0), when I unpacked it changes to RepeatMasker only. But it may not be this.

Any ideas?
Thanks in advance

Nuria
Hi Nuria,

modify in the configure script line 214
'$version <= 400' should be '$version < 400'

Stephane
stephrom is offline   Reply With Quote
Old 02-01-2013, 02:05 AM   #11
HeyIamNuria
Member
 
Location: Barcelona

Join Date: Dec 2012
Posts: 19
Default Thank you

Thank you very much for your help Stephane

I changed it and it worked!!

Nuria
HeyIamNuria is offline   Reply With Quote
Old 04-15-2013, 09:08 AM   #12
antben
Junior Member
 
Location: Chicago

Join Date: Mar 2011
Posts: 3
Default RepeatScout fails in RepeatModeler

Hello all,

I am able to successfully run RepeatModeler (1-0-7) and it returns several hundred repeat models in my genome. However, all of these models are a result of RECON; nothing is returned by RepeatScout. RepeatScout is called during RepeatModeler round 1 but at the end it says "NOTE: RepeatScout did not return any models." RepeatScout is not called again by RepeatModeler. However, when I run RepeatScout directly on my genome it returns several hundred repeat models.

Has anybody successfully gotten RepeatScout to return repeat models within RepeatModeler? I don't understand why this would happen since RepeatScout works when I run it outside of RepeatModeler.

Any ideas?

Thanks,
Ben
antben is offline   Reply With Quote
Old 05-16-2013, 04:56 PM   #13
abaten
Junior Member
 
Location: Australia

Join Date: Sep 2009
Posts: 2
Default

Hi Everyone,
I've exactly the same problem as Ben described above, no models returned by 'RepeatScout' with 'RepeatModeler' run, however, many repeat models with independent 'RepeatScout' run.
Also is there any option to make 'RepeatModeler' run faster (e.g. parallel processing like that of RepeatMasker ?

Cheers.
abaten is offline   Reply With Quote
Old 06-12-2013, 10:03 AM   #14
rhubley
Member
 
Location: Washington

Join Date: Sep 2012
Posts: 10
Default

Hi Ben,

RepeatScout is a great program for finding highly conserved repetitive elements. As a consequence we run RepeatScout first ( and only one round ) in order to find and remove the young elements first before moving on to RECON. RepeatScout will often will find tandem repeats and low complexity sequences in its return set. These are filtered out in RepeatModeler. You may want to check your hand-run result set isn't completely simple/low complexity by running nseg/trf on it. Another consideration is your choice of lmer size for RepeatScout. To fairly compare the results from both programs you need to use the same lmer size and the same sample ( from the input ) sequence. I rarely check seqanswers so please feel free to contact us through our website if you have further questions ( www.repeatmasker.org ).

-R
rhubley is offline   Reply With Quote
Old 06-18-2013, 06:29 PM   #15
antben
Junior Member
 
Location: Chicago

Join Date: Mar 2011
Posts: 3
Default

Thank you for your input Robert. My problem turned out to be with RepeatScout, not RepeatModeler. Line 26 and 27 of the RepeatScout script "filter-stage-1.prl" are:

my $TRF_COMMAND = $ENV{'TRF_COMMAND'} || "trf";
my $NSEG_COMMAND = $ENV{'NSEG_COMMAND'} || "nseg";

I changed this to:

my $TRF_COMMAND = "trf";
my $NSEG_COMMAND = "nseg";

Note that both "trf" and "nseg" are executables in my path.

I don't know perl so I don't fully understand what is going on, but I think that RepeatScout was failing to find tandem repeat finder (TRF) and, without anything back from TRF, it determined that everything was a tandem repeat and filtered it all out. However, this must have something to do with calling TRF from within RepeatModeler, as RepeatScout returned models for me when I used it independently, so something funny appears to be happening with paths. Regardless, the RepeatModeler pipeline is now fully functional for me and recovers repeat models from RepeatScout as well as RECON.
antben is offline   Reply With Quote
Old 06-19-2013, 11:11 AM   #16
rhubley
Member
 
Location: Washington

Join Date: Sep 2012
Posts: 10
Default

Hi Ben,

That's an interesting find. Alke's filter-stage-1.prl script should be better at reporting when it cannot find a dependent program. I added the following code to the script:

use File::Which;
unless ( which( $TRF_COMMAND ) && which( $NSEG_COMMAND ) )
{
die "ERROR: RepeatScout script filter-stage-1.prl cannot find 'trf' or 'nseg' programs in the user's path!\n";
}

This should at least produce an error message that will indicate something went wrong. I don't see why you would need to get rid of the "$ENV{'TRF_COMMAND'} ||" portion as that is simply a conditional statement which allows you to have environmental variables set to point to the programs location. Perhaps you have these environmental variables set and set incorrectly? Perhaps when you ran the program you hadn't updated the shell's path in memory ( using rehash command )? In any case, I am glad you got it working. I will push this change to filter-stage-1.prl out in the next release of RepeatScout.
rhubley is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 04:58 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2021, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO