Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • RepeatMasker & RepeatScout

    Hello there,

    I was wondering whether anybody on this list could knows how to run RepeatScout (1.0.5) and RepeatMasker (3.2.8).

    Basically I have a new genome, and want to use RepeatScout to make a
    library for RepeatMasker.

    Here is what I do:

    build_lmer_table -sequence genome.fa -freq genome.fq
    RepeatScout -sequence genome.fa -output repeats.fa -freq genome.fq
    filter-stage-1.prl repeats.fa &> repeats.fa.filter_1
    RepeatMasker genome.fa -e abblast -lib repeats.fa.filter_1

    Now:
    Do I use the correct file for -lib?
    RepeatMasker is still complaining about not finding Libraries/RepeatMasker.lib
    and Libraries/RepeatmaskerLib.embl.

    Thanks a lot in advance for any help.

  • #2
    Here is a recipe how to install and run RepeatScout:



    Hope it helps,

    Darek

    Comment


    • #3
      I actually followed those instructions.

      RepeatMasker is complaining still about missing libraries (ie Libraries/RepeatMasker.lib etc) and advises to get something from www.girinst.org.

      The whole point of running RepeatScout for me is to build my own library. Is there a flag to teach RepeatMasker not to look for those libraries or is there a reason RepeatMasker must have those libraries?

      Comment


      • #4
        Hi Zimbobo,

        Can I ask you how you edit the perl script, filter-stage-1.prl to allow it point to the TRF path that we install?
        Which line of filter-stage-1.prl that we need to edit the path of TRF?
        My server keep on shown the below message:
        "No such file or directory at ./filter-stage-1.prl line 110"
        Thanks a lot for your sharing and guiding.

        Comment


        • #5
          repeatscout

          hello everyone
          i m try to work with repeatscout but every time when i m runninf filter-stage-1.prl, the filtered library generated is created empty( no data)..... any solution???

          Comment


          • #6
            same thing happened to me -- the filtered output file is empty after running for a very long time. it was run on a repeat-rich genome.

            could it be that i don't have nseg and TRF properly installed? there is no output about those two programs that i can see...

            Comment


            • #7
              I don't know if its still an actual problem, but I had it too and was able to solve it on my system (ubuntu11, 64bit).
              The libs RepeatMasker is looking for are not the downloaded ones, but the blast dbs that should have been created by rmblast. rmblast itself is looking for a libpcre.so.0 file which it could not find on my system. The file is known to cause problems with some progs as symlinks are not made correctly during updates.
              Therefore I just created symlinks manually in my /lib/ and /lib32/ folder to the actual file (so just type "sudo ln -s /lib/libpcre.so.3 /lib/libpcre.so.0" and "sudo ln -s /lib32/libpcre.so.3 /lib32/libpcre.so.0") and afterwards everything worked fine for me

              @edge: you don't need to change anything in the .prl file, but you need to rename the trf404-linux64 (or else) executable to simply to trf.
              Last edited by WhatsOEver; 04-20-2012, 01:27 AM.

              Comment


              • #8
                Hello,

                This is not a direct answer to your question, but there is a tool from the Repeat Masker group.
                Its called Repeat Modeler, this tool integrates Repeat Scout, RECON and TRF.
                It creates a de-novo repeat library and then annotates the sequences.
                Repeat Modeler

                --
                pg

                Comment


                • #9
                  Thats true and it works fine, but RepeatModeler also uses RepeatMasker and eventually the rmblast package, so you might have to face the same problems as described before.

                  Comment


                  • #10
                    Dear All,
                    I ran Repeatscout successfully, Commands I used:
                    Code:
                    1225 ##RepeatSout Run
                    1226 #step1
                    1227 build_lmer_table -l 14 -sequence Final_assembly.fasta -freq Final_assembly.freq
                    1228 #step2
                    1229 RepeatScout -sequence Final_assembly.fasta -output Final_assembly_repeats.fasta -freq Final_assembly.freq -l 14
                    1230 #step3
                    1231 cat Final_assemblyf_repeats.fasta | filter-stage-1.prl > Final_assembly_repeats_filtered_stg1.fasta
                    1232 #step4
                    1233 RepeatMasker -pa 20 -s -lib Final_assembly_repeats_filtered_stg1.fasta Final_assembly.fasta &
                    1234 #step5
                    1235 cat Final_assembly_repeats_filtered_stg1.fasta | filter-stage-2.prl --cat=Final_assembly.fasta.out --thresh=3 > Final_assembly_repeats_filtered_stg2_thresh3.fasta
                    1236 #step6
                    1237 RepeatMasker -pa 20 -s -lib Final_assembly_repeats_filtered_stg2_thresh3.fasta Final_assembly.fasta &
                    Rahul Sharma,
                    Ph.D
                    Frankfurt am Main, Germany

                    Comment


                    • #11
                      Hi Rahul,

                      How large was your genome? How much memory was needed for your run? I received this error message at the start of Step 2:

                      "Could not allocate space for sequence"
                      Last edited by tnguyen; 09-22-2012, 07:01 AM.

                      Comment


                      • #12
                        Sorry the full error message was:

                        "Could not allocate space for sequence"
                        Last edited by tnguyen; 09-22-2012, 07:02 AM.

                        Comment


                        • #13
                          Hi tnguyen,
                          sorry for replying late. Genome was of ~20Mb and other one was in Gb's. Actually I ran on the cluster and I did'nt check the memory it used.
                          Best wishes,
                          Rahul
                          Rahul Sharma,
                          Ph.D
                          Frankfurt am Main, Germany

                          Comment


                          • #14
                            Thank you Rahul,
                            My genome size is ~1.7Gb, any idea how to make RepeatScout to work for large genome?
                            TN

                            Comment


                            • #15
                              You probably don't need to use the whole genome for RepeatScout. Just use a few chromosomes or supercontigs. If repeats are distributed across all the chromosomes in the genome, scanning just a few of them with RepeatScout should be enough to find then and create consensus sequences that you can input to RepeatMasker. Then, mask the whole genome with RepeatMasker.

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Advancing Precision Medicine for Rare Diseases in Children
                                by seqadmin




                                Many organizations study rare diseases, but few have a mission as impactful as Rady Children’s Institute for Genomic Medicine (RCIGM). “We are all about changing outcomes for children,” explained Dr. Stephen Kingsmore, President and CEO of the group. The institute’s initial goal was to provide rapid diagnoses for critically ill children and shorten their diagnostic odyssey, a term used to describe the long and arduous process it takes patients to obtain an accurate...
                                12-16-2024, 07:57 AM
                              • seqadmin
                                Recent Advances in Sequencing Technologies
                                by seqadmin



                                Innovations in next-generation sequencing technologies and techniques are driving more precise and comprehensive exploration of complex biological systems. Current advancements include improved accessibility for long-read sequencing and significant progress in single-cell and 3D genomics. This article explores some of the most impactful developments in the field over the past year.

                                Long-Read Sequencing
                                Long-read sequencing has seen remarkable advancements,...
                                12-02-2024, 01:49 PM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, 12-17-2024, 10:28 AM
                              0 responses
                              26 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 12-13-2024, 08:24 AM
                              0 responses
                              42 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 12-12-2024, 07:41 AM
                              0 responses
                              28 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 12-11-2024, 07:45 AM
                              0 responses
                              42 views
                              0 likes
                              Last Post seqadmin  
                              Working...
                              X