Hi,
I'm trying to pull out all the putative introns given a reference sequence and a set of EST. I find out GenSeqer (http://brendelgroup.org/bioinformatics2go/GeneSeqer.php) as a powerful tool for this approach which also takes into account different splice models; great!. But I don't know what to do with the output file; which is like a kind of genbank output and probably can be loaded into gene browsers. It is like this:
In my case, I just want to take the intron regions (like fasta file) from the whole genome and make an Introme dataset; though I'm not sure how to do it with this format. Any Idea? Thanks
I'm trying to pull out all the putative introns given a reference sequence and a set of EST. I find out GenSeqer (http://brendelgroup.org/bioinformatics2go/GeneSeqer.php) as a powerful tool for this approach which also takes into account different splice models; great!. But I don't know what to do with the output file; which is like a kind of genbank output and probably can be loaded into gene browsers. It is like this:
Code:
EST sequence 3 +strand 263 n (File: gi-9785390+) 1 GAGAGACCTC TGGCTCTGTA TCGCTCGCTG CTCTTCCTCC CACAGATCGA AAACCATGAA 61 TCCTGAGTAC GACTATCTTT TCAAGCTCCT GCTTATCGGG GATTCTGGCG TAGGCAAGTC 121 TTGTCTTCTT TTGAGATTCT CTGATGATTC TTATGTAGAA AGTTACATTA GCACTATTGG 181 AGTCGATTTT AAAATTAGGA CTGTGGAACA AGATGGCAAA ACAATTAAGC TCCAAATTTG 241 GGACACTGCT GGTCAAGAAC GGT Predicted gene structure (within gDNA segment 52894 to 50989): Exon 1 52231 52163 ( 69 n); cDNA 1 69 ( 69 n); score: 0.986 Intron 1 52162 52056 ( 107 n); Pd: 0.902 (s: 1.00), Pa: 0.886 (s: 1.00) Exon 2 52055 51983 ( 73 n); cDNA 70 142 ( 73 n); score: 1.000 Intron 2 51982 51897 ( 86 n); Pd: 0.978 (s: 1.00), Pa: 0.990 (s: 1.00) Exon 3 51896 51849 ( 48 n); cDNA 143 190 ( 48 n); score: 1.000 Intron 3 51848 51759 ( 90 n); Pd: 0.637 (s: 1.00), Pa: 0.963 (s: 1.00) Exon 4 51758 51711 ( 48 n); cDNA 191 238 ( 48 n); score: 1.000 Intron 4 51710 51624 ( 87 n); Pd: 0.977 (s: 1.00), Pa: 0.986 (s: 0) Exon 5 51623 51599 ( 25 n); cDNA 239 263 ( 25 n); score: 1.000 MATCH SQ;L89959- gi-9785390+ 0.993 263 1.000 C PGS_SQ;L89959-_gi-9785390+ (52231 52163,52055 51983,51896 51849,51758 51711,51623 51599) Alignment (genomic DNA sequence = upper lines): GAGAGATCTC TGGCTCTGTA TCGCTCGCTG CTCTTCCTCC CACAGATCGA AAACCATGAA 52172 |||||| ||| |||||||||| |||||||||| |||||||||| |||||||||| |||||||||| GAGAGACCTC TGGCTCTGTA TCGCTCGCTG CTCTTCCTCC CACAGATCGA AAACCATGAA 60 TCCTGAGTAG TAAGTTCCTT TCTCCATCGA CACATACTTG GGTCGAAATT ACCTCTGTTA 52112 ||||||||| TCCTGAGTA. .......... .......... .......... .......... .......... 69