Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to construct a combined library for repeatmasker

    Thanks for your attention.
    I am constructing a repeat library for a genome sized ~970 Mb.
    Firstly I used repeatmodeler to generate a de novo repeat consensus library (libA.fas).
    At the same time, I used ltr_struc and ltr_finder to generate a LTR sequences library (libB.fas).
    Then I cat libA.fas, libB.fas, RepBase library and another library from MIPS to one file (LIB.fas).
    But I get a wired result.
    When I used LIB.fas as a input for "-lib" option of repeatmasker, I got 24.45 % region masked in the genome.
    While when I used libA.fas (output of repeatmodeler) as a input library, I got 47.78 % region masked.

    Can anyone tell me why I used a smaller library to get a larger repeat region masked?
    There are some parameters different between two runs, but I can not decide which one could cause this large difference.

    Thanks a lot!

    My command for repeatmasker is :
    for libA.fas:
    RepeatMasker -pa 10 genome.fa -no_is -nolow -norna -lib libA.fas

    For LIB.fas:
    RepeatMasker -lib database/LIB.fas -xsmall -no_is -nolow -pa 10 -frag 4000000 -a -gff genome.fa >Rmask_genome.out

  • #2
    Well, now the reason is found.
    I run another two test runs, the only difference of which is the parameter "-frag".
    The run without "-frag 4000000" assigned gave 45.60 % repeat region close to the expected.

    So in the future I will use "-frag" options carefully!

    ps: I did not check the script for that effection, though in the help document I cannot find a reason as the "-frag " is explained as Max limit, "Maximum sequence length masked without fragmenting".

    Comment


    • #3
      But there is still a question, why it does not matter when I set "-frag 4000000" with a library as small as 940 KB?
      I might check it in the future.

      Comment


      • #4
        Hi sunhh

        I have some problem in repeatmodeler and ltr_finder. Can you guide me how you construct library in repeatmodeler , ltr_struct and ltr_finder. From last 3 days ltr_finder is runnig but file size is not increasing. Plz guide me...

        Thanks...

        Comment


        • #5
          Originally posted by amitbik View Post
          Hi sunhh

          I have some problem in repeatmodeler and ltr_finder. Can you guide me how you construct library in repeatmodeler , ltr_struct and ltr_finder. From last 3 days ltr_finder is runnig but file size is not increasing. Plz guide me...

          Thanks...
          Hi amitbik,

          Could you show what problems you met? I simply followed the instruction of repeatmodeler and ltr_finder, and they works.
          I didn't use ltr_struct.

          Well, there is a small problem in repeatmodeler, where you need to correct the path for RECON in some file. And after I change -num_threads paramter of blastn from 4 to 30, the time used decreased to half.
          I cannot access my computing server now, maybe I can post more details later.

          Comment


          • #6
            Thank you.. sunhh for your reply..

            Actually I have installed repeatmodeler. But when i am building database it is showing error

            ./BuildDatabase -name test test.fa

            RepModelConfig.pm did not return a true value at ./BuildDatabase line 146.
            BEGIN failed--compilation aborted at ./BuildDatabase line 146.

            And one more thing RepModelConfig.pm file is empty.

            In ltr_finder i am giving this command and i am getting output like this

            ltr_finder -p 30 -w -C file.fa > ltr.fa

            output-

            Predict protein Domains 0.000 second
            >Sequence: Contig2 Len:9055
            No LTR Retrotransposons Found


            Do i have give my assembly file directly in repeatmodeler and ltr_finder or have to process some filteration?
            Last edited by amitbik; 02-05-2014, 10:21 PM.

            Comment


            • #7
              Originally posted by amitbik View Post
              Thank you.. sunhh for your reply..

              Actually I have installed repeatmodeler. But when i am building database it is showing error

              ./BuildDatabase -name test test.fa

              RepModelConfig.pm did not return a true value at ./BuildDatabase line 146.
              BEGIN failed--compilation aborted at ./BuildDatabase line 146.

              And one more thing RepModelConfig.pm file is empty.

              Do i have give my assembly file directly in repeatmodeler and ltr_finder or have to process some filteration?
              Hi,
              For building database, I think you might need to add "-engine ncbi" to the command, if your aligning engine is blast as me.

              And the error "line 146" should be the same problem of RepModelConfig.pm.
              That file should not be empty. I advise you to re-download the package and install it again.

              Comment


              • #8
                Originally posted by amitbik View Post
                Thank you.. sunhh for your reply..

                Actually I have installed repeatmodeler. But when i am building database it is showing error

                ./BuildDatabase -name test test.fa

                RepModelConfig.pm did not return a true value at ./BuildDatabase line 146.
                BEGIN failed--compilation aborted at ./BuildDatabase line 146.

                And one more thing RepModelConfig.pm file is empty.

                In ltr_finder i am giving this command and i am getting output like this

                ltr_finder -p 30 -w -C file.fa > ltr.fa

                output-

                Predict protein Domains 0.000 second
                >Sequence: Contig2 Len:9055
                No LTR Retrotransposons Found


                Do i have give my assembly file directly in repeatmodeler and ltr_finder or have to process some filteration?
                And for ltr_finder, I used a command like this:
                ltr_finder -w 0 -s ref_tRNAs.fa -a /path/to/ps_scan in_genome.fa 1>in_genome.fa.ltrF 2>in_genome.fa.ltrF.err

                It looks different from yours, especially "-w 0" parameter. I am not sure what "-C" means.

                Best

                Comment


                • #9
                  Originally posted by sunhh View Post
                  Hi,
                  For building database, I think you might need to add "-engine ncbi" to the command, if your aligning engine is blast as me.

                  And the error "line 146" should be the same problem of RepModelConfig.pm.
                  That file should not be empty. I advise you to re-download the package and install it again.
                  Before configure Repeatmodeler the RepModelConfig.pm file was not empty after i configure the Repeatemodeler and database the RepModelConfig.pm file became empty. When i start building the data base it is showing error.

                  Comment


                  • #10
                    Originally posted by amitbik View Post
                    Before configure Repeatmodeler the RepModelConfig.pm file was not empty after i configure the Repeatemodeler and database the RepModelConfig.pm file became empty. When i start building the data base it is showing error.
                    Please redo the configuration of Repeatmodeler. And record everything this time.

                    Comment


                    • #11
                      Originally posted by sunhh View Post
                      And for ltr_finder, I used a command like this:
                      ltr_finder -w 0 -s ref_tRNAs.fa -a /path/to/ps_scan in_genome.fa 1>in_genome.fa.ltrF 2>in_genome.fa.ltrF.err

                      It looks different from yours, especially "-w 0" parameter. I am not sure what "-C" means.

                      Best
                      By mistake i didn't put 0 in my command and "-C" is for delete highly repeat regions.
                      Can tell me you have given 3 files in_genome.fa, in_genome.fa.ltrF and in_genome.fa.ltrF.err
                      what are these files?

                      Comment


                      • #12
                        Originally posted by amitbik View Post
                        By mistake i didn't put 0 in my command and "-C" is for delete highly repeat regions.
                        Can tell me you have given 3 files in_genome.fa, in_genome.fa.ltrF and in_genome.fa.ltrF.err
                        what are these files?
                        Only in_genome.fa is an input file, and the rest are output files.

                        Comment


                        • #13
                          Thanks sunhh... for your help

                          My Repeatmodeler is working now. I can build data base now. This time i run Repeatmodeler from a different path and i change the path of Recon, Repeatscout...etc and it is working now.....

                          Comment


                          • #14
                            Hi sunhh,

                            I have some problem in ltr_finder i am using this command

                            ltr_finder -w 0 -s trna.fa -a ./ps_scan/ uni.fa > uni_ltr.txt

                            it run arround 16 hours and the two file uni.fa.ltrf and uni.fa.ltrf.err is empty. It also showed an error cannot find resonable bandwith: continue anyway.

                            Can you tell me why this error came and the two files are empty?

                            Thank you...

                            Comment


                            • #15
                              Can any one help me to find out the error.

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Current Approaches to Protein Sequencing
                                by seqadmin


                                Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                                04-04-2024, 04:25 PM
                              • seqadmin
                                Strategies for Sequencing Challenging Samples
                                by seqadmin


                                Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                                03-22-2024, 06:39 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, 04-11-2024, 12:08 PM
                              0 responses
                              25 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-10-2024, 10:19 PM
                              0 responses
                              27 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-10-2024, 09:21 AM
                              0 responses
                              24 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-04-2024, 09:00 AM
                              0 responses
                              52 views
                              0 likes
                              Last Post seqadmin  
                              Working...
                              X