SEQanswers

SEQanswers (http://seqanswers.com/forums/index.php)
-   Bioinformatics (http://seqanswers.com/forums/forumdisplay.php?f=18)
-   -   repeat sequences/large files in galaxy (http://seqanswers.com/forums/showthread.php?t=4445)

Giles 03-22-2010 07:15 PM

repeat sequences/large files in galaxy
 
does anyone know a good way to analyze the repeat sequences, ie, those that don't align using eland in illumina pipleine? I think that there are some interesting biological aspects to the sequences that are not unique in my dataset and would like to learn about them.
I'll throw out there what I have in mind, I'd like to upload the export.txt file to galaxy, and then group and count the most common sequence tags in the export file, then blat/blast search the most common tags to see what they are (ie, satellite, line, sine, etc.) My other problem is that I am unable to upload the export.txt file to galaxy. I assume it must be compressed, does anyone know anything about an upper size limit to file size? Or the best way to compress? Or any other suggestions for dealing w/ the repeat sequences?
thanks, keith.

stoker 04-04-2011 03:43 AM

This may be useful:

Genomics. 2010 Nov;96(5):316-21. Epub 2010 Aug 13.
Pokrzywa R, Polanski A.; BWtrs: A tool for searching for tandem repeats in DNA sequences based on the Burrows-Wheeler transform.

gntc 06-27-2011 12:08 PM

Identifying repeat elements
 
Were you ever able to find a fast, efficient method of analyzing large amounts of data? I have 15 million sequence and I would like to know what percentage of them are satellite DNA, LINEs, SINEs, etc. I have been trying to use repeatmasker but it is unbearably slow.


All times are GMT -8. The time now is 05:19 AM.

Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.