1. I want to download the DNA sequences for the mouse mm9 RepeatMasker track from the UCSC Genome Browser. When I tried the Table Browser it got to 167 MB after ~10 minutes then stopped, having only completed chromosome 1 and part of 2. The file I downloaded ended with:
My internet connection is very fast but it was downloading pretty slowly so I assume it's being limited by UCSC's speed. Should I be able to download this from the Table Browser or is there a better way?
2. Here is what one sequence looks like:
The header includes the repeat name (L1_Mur2), but I would also like the repeat Family and Class which you can get if you download the RepeatMasker track itself, but not if you download the actual sequences like I'm trying to do. I'm pretty sure I could use Perl and add correct Family and Class info to each sequence but if there is some way to get the sequences with this information already included it would save a bit of time.
[Edit] Problem solved using bedtools getfasta with the genome fasta file and a bed file for each type of repeat.
Code:
procedures have exceeded timeout: 1200 seconds, function has ended.
2. Here is what one sequence looks like:
Code:
>mm9_rmsk_L1_Mur2 range=chr1:3000002-3000156 5'pad=0 3'pad=0 strand=- repeatMasking=none AAATGTTAAATCTAAAAAAATCCTAACAAGAAACAGCCAGGAAATCTGGG ACACTATGAAAAGACCAAACCTAAGAAAAATAGGAATAAAAGAAGGACAA AAGTTTCAGCTGAAACACCCAGAAAACATATTAAACTAAATCATAGAAAA GAATT
[Edit] Problem solved using bedtools getfasta with the genome fasta file and a bed file for each type of repeat.