Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • RepeatExplorer

    Hi everyone, I'm new in this forum !
    I'd love to have some guidelines from you!

    I 've started using RepeatExplorer, to find highly repeated (obviously) sequences, in data from Miseq Illumina. We don't have reference genomes, cause we are working on parasites that their genome has not been sequenced yet (only mt).
    We need the repeats to develop a diagnostic kit, so we don't care pretty much from assembling the sequence.

    The output after running hours and hours of RepeatExplorer gives clusters, graphs-so many graphs- and I'm a bit confused.
    Anyone has/had same experience ?

    Thank you very very much !

  • #2
    Hello Marina_P,

    I just started using RepeatExplorer, as well. Yes and there are many graphs. I actually came on the forum to see if there is any discussion about the program to help me understand some of the data details. Sadly I noticed no other threads but yours.

    How is it going? Are you able to identify most of your clusters/graphs as elements? and for clusters without identification how do you deal with them?

    I have just started with RepeatExplorer so maybe with two of us on the forum we can bounce things off of eachother?

    Cheers

    Comment


    • #3
      Hey htetre,

      It was for about a month kinda idle the post, so I didn't check it -as you can see- for almost 2 months !

      So ??? Did you figure everything out ?
      I picked the clusters that looked more homogeneous to me, but now the first good ones, cause when I blasted the sequences included where mito ones, sth that I want to avoid.
      What are you looking for ?

      Best,
      Marina

      Comment


      • #4
        RepeatExplorer workshop

        Hi, you might be interested in a practical course on using RepeatExplorer and interperting its results:


        Jiri

        Comment


        • #5
          Thanks jimacas for your response !
          I came across to this announcement as well.
          Unfortunately, I'm in the US now, I don't think I will be able to make it, it seems a great opportunity to figure out what you're looking for or interpret your results though.
          Thank you again !

          Have a nice day !
          Marina

          Comment


          • #6
            Hi Marina,

            It is a pity you cannot make it to the course. You can at least have a look at some of our presentations from the previous workshop, I made them available here: http://w3lamc.umbr.cas.cz/repeatexplorer/?page_id=125

            Best, Jiri

            Comment


            • #7
              This is extremely helpful Jiri, thank you so much for that !

              I hope I can be as helpful for you in the future.

              :-) I'll go through your work and will come back if I have any questions !

              Thanks a million !

              All the best,
              Marina

              Comment


              • #8
                I am posting a solution for a different problem here for potential Google searchers, it took me some time to track down.

                Using the command line RepeatExplorer, with --sq_rename parameter.
                The following error occured
                Code:
                Calculating graph layouts
                2014-03-21 09:56
                
                 reading .cls file
                original cluster CL 1 was above threshold!, sample of graph is used
                original cluster CL 2 was above threshold!, sample of graph is used
                [COLOR="Red"]Error in { : task 1 failed - "line 1 did not have 3 elements"[/COLOR]
                Calls: %dopar% -> <Anonymous>
                Execution halted
                exit status:1
                This error is ultimately caused by a '#' character in the read names, as is found in some Illumina reads, e.g. >XXX2XX4ACXX:1:1101:1441:2408#CAAGGAGCA/1

                More specifically, in
                repeatexplorer/umbr_programs/seqclust/programs/clusters2graphs.R
                the command
                gd=read.table(file=ncolfile,sep='\t',header=F,as.is=T,col.names=c(1,2,'weight'))
                fails if there were '#' characters in the read name.

                My solution was just to remove '#' from the read names.

                Comment


                • #9
                  Dear dsenalik,

                  what were you trying to do with the command window?

                  Something with the graphs and the repeat layouts ?

                  Thanks for that, I'm sure a lot of people came across to such a struggle.

                  :-)

                  M.

                  Comment


                  • #10
                    Dear Marina_P,
                    I have about 30 genotypes I want to analyze, and it is easier to run on my own server than on the Galaxy server, and also I don't want to overload it! Well, easier only once I have everything installed properly, there were a number of dependencies to install or configure.
                    It might help someone else, so here are my installation notes.

                    My plan is to see if all genotypes have a particular repeat cluster of interest.
                    To do this, I have put sequences from that cluster from an initial analysis into a custom RepeatMasker database, and I hope to see if a corresponding cluster shows up annotated in the other genotypes. It will take some time to run all of these...

                    Comment


                    • #11
                      Dear all,

                      I've been working with RepeatExplorer for the last month and I would be interested in get a fasta file with all the singlet reads. It provide you the number of singlet reads which aren't in any cluster but I don't know (even if it's possible) how to get these singlet reads. Someone know if that is possible? or how can I get them?

                      Thanks in advance!

                      Comment


                      • #12
                        RepeatExplorer singlet reads

                        Dear all,

                        I've been working with RepeatExplorer for the last month and I would be interested in get a fasta file with all the singlet reads. It provide you the number of singlet reads which aren't in any cluster but I don't know (even if it's possible) how to get these singlet reads. Someone know if that is possible? or how can I get them?

                        Thanks in advance!

                        Comment


                        • #13
                          The file that will list all reads in all clusters is
                          Code:
                          MyREoutputdir/seqClust/clustering/hitsort_PID90_LCOV55.cls
                          This file lists all reads in all clusters, even those too small for the summary HTML output. The numbers of clusters and of reads will match those in the summary graph at the top of the HTML output.
                          The format is a fasta-style header line with cluster number and number of reads, and then a second long line with all reads in that cluster
                          e.g.
                          Code:
                          ...
                          >CL13980 3
                          I01405774f I01340829r I01263003f
                          >CL13981 3
                          I01149129r I01499415r I01202179f
                          ...
                          Now, to do what you want would take some programming or clever shell scripts, any read whose ID is in this file is excluded, and what is left are the unclustered reads.

                          One way that might work:

                          1. Make a file with list of IDs to exclude
                          Code:
                          grep -v ">" MyREoutputdir/seqClust/clustering/hitsort_PID90_LCOV55.cls | tr " " "\n" > Myexcludelist.txt
                          The renamed input sequence in FASTA format can be found as
                          Code:
                          MyREoutputdir/seqClust/sequences/seqClust
                          You could then use biopieces to exclude these reads
                          Code:
                          read_fasta -i MyREoutputdir/seqClust/sequences/seqClust | grab -i -E Myexcludelist.txt | write_fasta -xo Mysinglecopy.fasta

                          Comment


                          • #14
                            Thanks very much dsenalik!

                            You have solved my problem!

                            Comment


                            • #15
                              Telomeres not clustered by RepeatExplorer

                              (I am posting this here for lack of a better place, just for information.)

                              I discovered that I had reads that were entirely the classic arabidopsis telomere repeat, i.e.
                              AGGGTTT
                              But despite adequate abundance, none of these reads show up in any clusters. However, a smaller number of reads that are two thirds this motif did get clustered.
                              It is probably some aspect of the clustering process that can't handle a 7-nucleotide repeat motif.

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Current Approaches to Protein Sequencing
                                by seqadmin


                                Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                                04-04-2024, 04:25 PM
                              • seqadmin
                                Strategies for Sequencing Challenging Samples
                                by seqadmin


                                Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                                03-22-2024, 06:39 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, 04-11-2024, 12:08 PM
                              0 responses
                              17 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-10-2024, 10:19 PM
                              0 responses
                              22 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-10-2024, 09:21 AM
                              0 responses
                              16 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-04-2024, 09:00 AM
                              0 responses
                              46 views
                              0 likes
                              Last Post seqadmin  
                              Working...
                              X