Go Back   SEQanswers > Bioinformatics > Bioinformatics

Similar Threads
Thread Thread Starter Forum Replies Last Post
Determining Repeat Sequences clostridium40 Bioinformatics 0 09-02-2011 01:23 PM
Sorting large files scami Bioinformatics 3 09-20-2010 11:45 PM
Large RNA sequences ? Does it has any sense ? perencia Bioinformatics 10 07-29-2010 07:05 AM
Galaxy and gz files Giles Bioinformatics 4 03-31-2010 06:52 AM
PubMed: Detection of large numbers of novel sequences in the metatranscriptomes of co Newsbot! Literature Watch 0 08-30-2008 05:06 AM

Thread Tools
Old 03-22-2010, 06:15 PM   #1
Location: Birmingham, Al

Join Date: Feb 2010
Posts: 39
Default repeat sequences/large files in galaxy

does anyone know a good way to analyze the repeat sequences, ie, those that don't align using eland in illumina pipleine? I think that there are some interesting biological aspects to the sequences that are not unique in my dataset and would like to learn about them.
I'll throw out there what I have in mind, I'd like to upload the export.txt file to galaxy, and then group and count the most common sequence tags in the export file, then blat/blast search the most common tags to see what they are (ie, satellite, line, sine, etc.) My other problem is that I am unable to upload the export.txt file to galaxy. I assume it must be compressed, does anyone know anything about an upper size limit to file size? Or the best way to compress? Or any other suggestions for dealing w/ the repeat sequences?
thanks, keith.
Giles is offline   Reply With Quote
Old 04-04-2011, 02:43 AM   #2
Location: Poland

Join Date: Oct 2010
Posts: 17

This may be useful:

Genomics. 2010 Nov;96(5):316-21. Epub 2010 Aug 13.
Pokrzywa R, Polanski A.; BWtrs: A tool for searching for tandem repeats in DNA sequences based on the Burrows-Wheeler transform.
Tomasz Stokowy
stoker is offline   Reply With Quote
Old 06-27-2011, 11:08 AM   #3
Location: Phoenix, AZ

Join Date: Feb 2011
Posts: 17
Default Identifying repeat elements

Were you ever able to find a fast, efficient method of analyzing large amounts of data? I have 15 million sequence and I would like to know what percentage of them are satellite DNA, LINEs, SINEs, etc. I have been trying to use repeatmasker but it is unbearably slow.
gntc is offline   Reply With Quote

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

All times are GMT -8. The time now is 08:45 AM.

Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO