SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Dual indexing construct for MiSeq - where to put the barcode for index2? Vinz Illumina/Solexa 50 12-12-2014 02:19 PM
how to construct phylogenetic tree using SNPs tianyub836 Bioinformatics 10 05-21-2013 06:27 AM
RepeatMasker output sydghyyh14 Bioinformatics 3 09-11-2012 03:26 PM
Tmap: how to use it to construct genetic map with large-scale SNP markers orctyr Bioinformatics 0 03-16-2011 01:11 AM
construct whole genome reference for HG19 foxyg Bioinformatics 3 09-08-2010 01:12 PM

Reply
 
Thread Tools
Old 10-24-2012, 08:53 AM   #1
sunhh
Member
 
Location: Ithaca, NY

Join Date: Jun 2012
Posts: 18
Default How to construct a combined library for repeatmasker

Thanks for your attention.
I am constructing a repeat library for a genome sized ~970 Mb.
Firstly I used repeatmodeler to generate a de novo repeat consensus library (libA.fas).
At the same time, I used ltr_struc and ltr_finder to generate a LTR sequences library (libB.fas).
Then I cat libA.fas, libB.fas, RepBase library and another library from MIPS to one file (LIB.fas).
But I get a wired result.
When I used LIB.fas as a input for "-lib" option of repeatmasker, I got 24.45 % region masked in the genome.
While when I used libA.fas (output of repeatmodeler) as a input library, I got 47.78 % region masked.

Can anyone tell me why I used a smaller library to get a larger repeat region masked?
There are some parameters different between two runs, but I can not decide which one could cause this large difference.

Thanks a lot!

My command for repeatmasker is :
for libA.fas:
RepeatMasker -pa 10 genome.fa -no_is -nolow -norna -lib libA.fas

For LIB.fas:
RepeatMasker -lib database/LIB.fas -xsmall -no_is -nolow -pa 10 -frag 4000000 -a -gff genome.fa >Rmask_genome.out
sunhh is offline   Reply With Quote
Old 10-25-2012, 05:34 AM   #2
sunhh
Member
 
Location: Ithaca, NY

Join Date: Jun 2012
Posts: 18
Default

Well, now the reason is found.
I run another two test runs, the only difference of which is the parameter "-frag".
The run without "-frag 4000000" assigned gave 45.60 % repeat region close to the expected.

So in the future I will use "-frag" options carefully!

ps: I did not check the script for that effection, though in the help document I cannot find a reason as the "-frag " is explained as Max limit, "Maximum sequence length masked without fragmenting".
sunhh is offline   Reply With Quote
Old 10-25-2012, 05:50 AM   #3
sunhh
Member
 
Location: Ithaca, NY

Join Date: Jun 2012
Posts: 18
Default

But there is still a question, why it does not matter when I set "-frag 4000000" with a library as small as 940 KB?
I might check it in the future.
sunhh is offline   Reply With Quote
Old 02-04-2014, 09:59 PM   #4
amitbik
Member
 
Location: Italy

Join Date: May 2013
Posts: 50
Default

Hi sunhh

I have some problem in repeatmodeler and ltr_finder. Can you guide me how you construct library in repeatmodeler , ltr_struct and ltr_finder. From last 3 days ltr_finder is runnig but file size is not increasing. Plz guide me...

Thanks...
amitbik is offline   Reply With Quote
Old 02-05-2014, 12:43 PM   #5
sunhh
Member
 
Location: Ithaca, NY

Join Date: Jun 2012
Posts: 18
Default

Quote:
Originally Posted by amitbik View Post
Hi sunhh

I have some problem in repeatmodeler and ltr_finder. Can you guide me how you construct library in repeatmodeler , ltr_struct and ltr_finder. From last 3 days ltr_finder is runnig but file size is not increasing. Plz guide me...

Thanks...
Hi amitbik,

Could you show what problems you met? I simply followed the instruction of repeatmodeler and ltr_finder, and they works.
I didn't use ltr_struct.

Well, there is a small problem in repeatmodeler, where you need to correct the path for RECON in some file. And after I change -num_threads paramter of blastn from 4 to 30, the time used decreased to half.
I cannot access my computing server now, maybe I can post more details later.
sunhh is offline   Reply With Quote
Old 02-05-2014, 07:40 PM   #6
amitbik
Member
 
Location: Italy

Join Date: May 2013
Posts: 50
Default

Thank you.. sunhh for your reply..

Actually I have installed repeatmodeler. But when i am building database it is showing error

./BuildDatabase -name test test.fa

RepModelConfig.pm did not return a true value at ./BuildDatabase line 146.
BEGIN failed--compilation aborted at ./BuildDatabase line 146.

And one more thing RepModelConfig.pm file is empty.

In ltr_finder i am giving this command and i am getting output like this

ltr_finder -p 30 -w -C file.fa > ltr.fa

output-

Predict protein Domains 0.000 second
>Sequence: Contig2 Len:9055
No LTR Retrotransposons Found


Do i have give my assembly file directly in repeatmodeler and ltr_finder or have to process some filteration?

Last edited by amitbik; 02-05-2014 at 09:21 PM.
amitbik is offline   Reply With Quote
Old 02-05-2014, 09:28 PM   #7
sunhh
Member
 
Location: Ithaca, NY

Join Date: Jun 2012
Posts: 18
Default

Quote:
Originally Posted by amitbik View Post
Thank you.. sunhh for your reply..

Actually I have installed repeatmodeler. But when i am building database it is showing error

./BuildDatabase -name test test.fa

RepModelConfig.pm did not return a true value at ./BuildDatabase line 146.
BEGIN failed--compilation aborted at ./BuildDatabase line 146.

And one more thing RepModelConfig.pm file is empty.

Do i have give my assembly file directly in repeatmodeler and ltr_finder or have to process some filteration?
Hi,
For building database, I think you might need to add "-engine ncbi" to the command, if your aligning engine is blast as me.

And the error "line 146" should be the same problem of RepModelConfig.pm.
That file should not be empty. I advise you to re-download the package and install it again.
sunhh is offline   Reply With Quote
Old 02-05-2014, 09:48 PM   #8
sunhh
Member
 
Location: Ithaca, NY

Join Date: Jun 2012
Posts: 18
Default

Quote:
Originally Posted by amitbik View Post
Thank you.. sunhh for your reply..

Actually I have installed repeatmodeler. But when i am building database it is showing error

./BuildDatabase -name test test.fa

RepModelConfig.pm did not return a true value at ./BuildDatabase line 146.
BEGIN failed--compilation aborted at ./BuildDatabase line 146.

And one more thing RepModelConfig.pm file is empty.

In ltr_finder i am giving this command and i am getting output like this

ltr_finder -p 30 -w -C file.fa > ltr.fa

output-

Predict protein Domains 0.000 second
>Sequence: Contig2 Len:9055
No LTR Retrotransposons Found


Do i have give my assembly file directly in repeatmodeler and ltr_finder or have to process some filteration?
And for ltr_finder, I used a command like this:
ltr_finder -w 0 -s ref_tRNAs.fa -a /path/to/ps_scan in_genome.fa 1>in_genome.fa.ltrF 2>in_genome.fa.ltrF.err

It looks different from yours, especially "-w 0" parameter. I am not sure what "-C" means.

Best
sunhh is offline   Reply With Quote
Old 02-05-2014, 11:33 PM   #9
amitbik
Member
 
Location: Italy

Join Date: May 2013
Posts: 50
Default

Quote:
Originally Posted by sunhh View Post
Hi,
For building database, I think you might need to add "-engine ncbi" to the command, if your aligning engine is blast as me.

And the error "line 146" should be the same problem of RepModelConfig.pm.
That file should not be empty. I advise you to re-download the package and install it again.
Before configure Repeatmodeler the RepModelConfig.pm file was not empty after i configure the Repeatemodeler and database the RepModelConfig.pm file became empty. When i start building the data base it is showing error.
amitbik is offline   Reply With Quote
Old 02-05-2014, 11:35 PM   #10
sunhh
Member
 
Location: Ithaca, NY

Join Date: Jun 2012
Posts: 18
Default

Quote:
Originally Posted by amitbik View Post
Before configure Repeatmodeler the RepModelConfig.pm file was not empty after i configure the Repeatemodeler and database the RepModelConfig.pm file became empty. When i start building the data base it is showing error.
Please redo the configuration of Repeatmodeler. And record everything this time.
sunhh is offline   Reply With Quote
Old 02-05-2014, 11:52 PM   #11
amitbik
Member
 
Location: Italy

Join Date: May 2013
Posts: 50
Default

Quote:
Originally Posted by sunhh View Post
And for ltr_finder, I used a command like this:
ltr_finder -w 0 -s ref_tRNAs.fa -a /path/to/ps_scan in_genome.fa 1>in_genome.fa.ltrF 2>in_genome.fa.ltrF.err

It looks different from yours, especially "-w 0" parameter. I am not sure what "-C" means.

Best
By mistake i didn't put 0 in my command and "-C" is for delete highly repeat regions.
Can tell me you have given 3 files in_genome.fa, in_genome.fa.ltrF and in_genome.fa.ltrF.err
what are these files?
amitbik is offline   Reply With Quote
Old 02-06-2014, 12:26 AM   #12
sunhh
Member
 
Location: Ithaca, NY

Join Date: Jun 2012
Posts: 18
Default

Quote:
Originally Posted by amitbik View Post
By mistake i didn't put 0 in my command and "-C" is for delete highly repeat regions.
Can tell me you have given 3 files in_genome.fa, in_genome.fa.ltrF and in_genome.fa.ltrF.err
what are these files?
Only in_genome.fa is an input file, and the rest are output files.
sunhh is offline   Reply With Quote
Old 02-06-2014, 02:32 AM   #13
amitbik
Member
 
Location: Italy

Join Date: May 2013
Posts: 50
Default

Thanks sunhh... for your help

My Repeatmodeler is working now. I can build data base now. This time i run Repeatmodeler from a different path and i change the path of Recon, Repeatscout...etc and it is working now.....
amitbik is offline   Reply With Quote
Old 02-09-2014, 07:53 PM   #14
amitbik
Member
 
Location: Italy

Join Date: May 2013
Posts: 50
Default

Hi sunhh,

I have some problem in ltr_finder i am using this command

ltr_finder -w 0 -s trna.fa -a ./ps_scan/ uni.fa > uni_ltr.txt

it run arround 16 hours and the two file uni.fa.ltrf and uni.fa.ltrf.err is empty. It also showed an error cannot find resonable bandwith: continue anyway.

Can you tell me why this error came and the two files are empty?

Thank you...
amitbik is offline   Reply With Quote
Old 02-10-2014, 07:00 PM   #15
amitbik
Member
 
Location: Italy

Join Date: May 2013
Posts: 50
Default

Can any one help me to find out the error.
amitbik is offline   Reply With Quote
Old 02-11-2014, 02:27 AM   #16
SES
Senior Member
 
Location: Vancouver, BC

Join Date: Mar 2010
Posts: 275
Default

Quote:
Originally Posted by amitbik View Post
Hi sunhh,

I have some problem in ltr_finder i am using this command

ltr_finder -w 0 -s trna.fa -a ./ps_scan/ uni.fa > uni_ltr.txt

it run arround 16 hours and the two file uni.fa.ltrf and uni.fa.ltrf.err is empty. It also showed an error cannot find resonable bandwith: continue anyway.

Can you tell me why this error came and the two files are empty?

Thank you...
How large is your input file? I would recommend running the command on a small subset rather than trying to get the program working on a full genome. That error is very strange and makes me think that something went wrong with your local machine or network rather than LTR_Finder.
SES is offline   Reply With Quote
Old 02-11-2014, 03:08 AM   #17
amitbik
Member
 
Location: Italy

Join Date: May 2013
Posts: 50
Default

Quote:
Originally Posted by SES View Post
How large is your input file? I would recommend running the command on a small subset rather than trying to get the program working on a full genome. That error is very strange and makes me think that something went wrong with your local machine or network rather than LTR_Finder.
Thanks for your response SES

My input file is 318 mb. I ll tried as you said this time i took 1000 seq. i gave command

ltr_finder -w 0 -s trna.fa -a ./ps_scan/ t.fa > t.txt

No output file is created but in the t.txt file the result came like this for all the sequence

Predict protein Domains 0.180 second
>Sequence: CONTIG6613 Len:1479
No LTR Retrotransposons Found

Last edited by amitbik; 02-11-2014 at 03:35 AM.
amitbik is offline   Reply With Quote
Old 02-12-2014, 07:59 PM   #18
amitbik
Member
 
Location: Italy

Join Date: May 2013
Posts: 50
Default

Can any one reply me why i am getting error like this?? plz reply me guys..
amitbik is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 09:10 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO