Seqanswers Leaderboard Ad

**htetre** · 08-14-2013, 01:36 PM

Hello Marina_P,

I just started using RepeatExplorer, as well. Yes and there are many graphs. I actually came on the forum to see if there is any discussion about the program to help me understand some of the data details. Sadly I noticed no other threads but yours.

How is it going? Are you able to identify most of your clusters/graphs as elements? and for clusters without identification how do you deal with them?

I have just started with RepeatExplorer so maybe with two of us on the forum we can bounce things off of eachother?

Cheers

**Marina_P** · 10-06-2013, 12:02 PM

Hey htetre,

It was for about a month kinda idle the post, so I didn't check it -as you can see- for almost 2 months !

So ??? Did you figure everything out ?
I picked the clusters that looked more homogeneous to me, but now the first good ones, cause when I blasted the sequences included where mito ones, sth that I want to avoid.
What are you looking for ?

Best,
Marina

**jimacas** · 03-13-2014, 01:23 AM

RepeatExplorer workshop

Hi, you might be interested in a practical course on using RepeatExplorer and interperting its results:

Please note the change of address!

http://w3lamc.umbr.cas.cz/repeatexplorer/?page_id=14

Jiri

**Marina_P** · 03-13-2014, 04:45 AM

Thanks jimacas for your response !
I came across to this announcement as well.
Unfortunately, I'm in the US now, I don't think I will be able to make it, it seems a great opportunity to figure out what you're looking for or interpret your results though.
Thank you again !

Have a nice day !
Marina

**jimacas** · 03-13-2014, 05:53 AM

Hi Marina,

It is a pity you cannot make it to the course. You can at least have a look at some of our presentations from the previous workshop, I made them available here: http://w3lamc.umbr.cas.cz/repeatexplorer/?page_id=125

Best, Jiri

**Marina_P** · 03-13-2014, 08:46 AM

This is extremely helpful Jiri, thank you so much for that !

I hope I can be as helpful for you in the future.

:-) I'll go through your work and will come back if I have any questions !

Thanks a million !

All the best,
Marina

**dsenalik** · 03-21-2014, 12:32 PM

I am posting a solution for a different problem here for potential Google searchers, it took me some time to track down.

Using the command line RepeatExplorer, with --sq_rename parameter.
The following error occured

Code:

Calculating graph layouts
2014-03-21 09:56

 reading .cls file
original cluster CL 1 was above threshold!, sample of graph is used
original cluster CL 2 was above threshold!, sample of graph is used
[COLOR="Red"]Error in { : task 1 failed - "line 1 did not have 3 elements"[/COLOR]
Calls: %dopar% -> <Anonymous>
Execution halted
exit status:1

This error is ultimately caused by a '#' character in the read names, as is found in some Illumina reads, e.g. >XXX2XX4ACXX:1:1101:1441:2408#CAAGGAGCA/1

More specifically, in
repeatexplorer/umbr_programs/seqclust/programs/clusters2graphs.R
the command
gd=read.table(file=ncolfile,sep='\t',header=F,as.is=T,col.names=c(1,2,'weight'))
fails if there were '#' characters in the read name.

My solution was just to remove '#' from the read names.

**Marina_P** · 03-21-2014, 03:08 PM

Dear dsenalik,

what were you trying to do with the command window?

Something with the graphs and the repeat layouts ?

Thanks for that, I'm sure a lot of people came across to such a struggle.

:-)

M.

**dsenalik** · 03-22-2014, 07:38 AM

Dear Marina_P,
I have about 30 genotypes I want to analyze, and it is easier to run on my own server than on the Galaxy server, and also I don't want to overload it! Well, easier only once I have everything installed properly, there were a number of dependencies to install or configure.
It might help someone else, so here are my installation notes.

My plan is to see if all genotypes have a particular repeat cluster of interest.
To do this, I have put sequences from that cluster from an initial analysis into a custom RepeatMasker database, and I hope to see if a corresponding cluster shows up annotated in the other genotypes. It will take some time to run all of these...

**AleixArnau** · 05-06-2014, 07:37 AM

Dear all,

I've been working with RepeatExplorer for the last month and I would be interested in get a fasta file with all the singlet reads. It provide you the number of singlet reads which aren't in any cluster but I don't know (even if it's possible) how to get these singlet reads. Someone know if that is possible? or how can I get them?

Thanks in advance!

**AleixArnau** · 05-06-2014, 07:37 AM

RepeatExplorer singlet reads

Dear all,

I've been working with RepeatExplorer for the last month and I would be interested in get a fasta file with all the singlet reads. It provide you the number of singlet reads which aren't in any cluster but I don't know (even if it's possible) how to get these singlet reads. Someone know if that is possible? or how can I get them?

Thanks in advance!

**dsenalik** · 05-06-2014, 09:47 AM

The file that will list all reads in all clusters is

Code:

MyREoutputdir/seqClust/clustering/hitsort_PID90_LCOV55.cls

This file lists all reads in all clusters, even those too small for the summary HTML output. The numbers of clusters and of reads will match those in the summary graph at the top of the HTML output.
The format is a fasta-style header line with cluster number and number of reads, and then a second long line with all reads in that cluster
e.g.

Code:

...
>CL13980 3
I01405774f I01340829r I01263003f
>CL13981 3
I01149129r I01499415r I01202179f
...

Now, to do what you want would take some programming or clever shell scripts, any read whose ID is in this file is excluded, and what is left are the unclustered reads.

One way that might work:

1. Make a file with list of IDs to exclude

Code:

grep -v ">" MyREoutputdir/seqClust/clustering/hitsort_PID90_LCOV55.cls | tr " " "\n" > Myexcludelist.txt

The renamed input sequence in FASTA format can be found as

Code:

MyREoutputdir/seqClust/sequences/seqClust

You could then use biopieces to exclude these reads

Code:

read_fasta -i MyREoutputdir/seqClust/sequences/seqClust | grab -i -E Myexcludelist.txt | write_fasta -xo Mysinglecopy.fasta

**AleixArnau** · 05-07-2014, 02:46 AM

Thanks very much dsenalik!

You have solved my problem!

**dsenalik** · 06-16-2014, 10:48 AM

Telomeres not clustered by RepeatExplorer

(I am posting this here for lack of a better place, just for information.)

I discovered that I had reads that were entirely the classic arabidopsis telomere repeat, i.e.
AGGGTTT
But despite adequate abundance, none of these reads show up in any clusters. However, a smaller number of reads that are two thirds this motif did get clustered.
It is probably some aspect of the clustering process that can't handle a 7-nucleotide repeat motif.

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 17 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 22 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 16 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 46 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

RepeatExplorer

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News