I need my reference genome (S. cerevisiae strain S288c) to be in a specific format for some analysis but can't find anything matching what I need online.
i.e. the data I have is in this format:
>I [org=Saccharomyces cerevisiae] [strain=S288C] [chromosome=I]
CCACACCACACCCACACACCCAC...etc
But I need it to be in a table like:
chr pos seq
1 1 C
1 2 C
1 3 A
1 4 C
etc.
I'm hoping that one of the following will be true, please let me know if you can help... Thanks!
1. The data already exists somewhere online
2. There's a program that can make this file for me out of my reference sequence (possibly PySAMstats?)
3. Some hints to write my own R script to make the file?
i.e. the data I have is in this format:
>I [org=Saccharomyces cerevisiae] [strain=S288C] [chromosome=I]
CCACACCACACCCACACACCCAC...etc
But I need it to be in a table like:
chr pos seq
1 1 C
1 2 C
1 3 A
1 4 C
etc.
I'm hoping that one of the following will be true, please let me know if you can help... Thanks!
1. The data already exists somewhere online
2. There's a program that can make this file for me out of my reference sequence (possibly PySAMstats?)
3. Some hints to write my own R script to make the file?
Comment