Hi Everyone,
I'm very new to this field and am currently working with existing data. My ultimate goal is to map sequence reads to a gene to make some nice visuals.
I know I need to get the reads into SAM format and then into BAM format in order to give them to a visualization program (like IGV). My problem is that I can't figure out what format the data is currently in.
From what I can tell, it is in a format SIMILAR to a SOAPalign output, but not quite. The file extension is .txt. Any hints would be greatly appreciated.
Here is a link to the GEO page where the data lives:
Here is the file header and first line:
#Conf {confSoapsM = Just "fp_rich1_80610_chr_soap", confReadsM = Just "fp_rich1_80610_rrna_unalign", confOutputM = Just "fp_rich1_80610_chr", confVectorSequ = Chunk "AAAAAAAAAAAAAAAAAAAA" Empty, confFastaM = Just "/home/rawdata/Sequences/saccharomyces_cerevisiae.fa", confMinMatch = 32}
#F 19:43:59 z
AAAAAAAAAAAAAAAAAATCTGAAAAAAAAAAAAAA 1 35 chrII 145916 145937 22 0
Here is the description in the README:
Alignment files (chr_best):
1: tag sequence
2: tag count
3: alignment score = # of matches, versus reference plus poly-A tail; # mismatches = length($1) - $3.
4: reference sequence name
5: reference sequence start position
6: reference sequence end position
7: maximum reference alignment length
8: length ambiguity in match, due to poly-A tailing; minimum reference alignment length is $7 - $8
I'm very new to this field and am currently working with existing data. My ultimate goal is to map sequence reads to a gene to make some nice visuals.
I know I need to get the reads into SAM format and then into BAM format in order to give them to a visualization program (like IGV). My problem is that I can't figure out what format the data is currently in.
From what I can tell, it is in a format SIMILAR to a SOAPalign output, but not quite. The file extension is .txt. Any hints would be greatly appreciated.
Here is a link to the GEO page where the data lives:
Here is the file header and first line:
#Conf {confSoapsM = Just "fp_rich1_80610_chr_soap", confReadsM = Just "fp_rich1_80610_rrna_unalign", confOutputM = Just "fp_rich1_80610_chr", confVectorSequ = Chunk "AAAAAAAAAAAAAAAAAAAA" Empty, confFastaM = Just "/home/rawdata/Sequences/saccharomyces_cerevisiae.fa", confMinMatch = 32}
#F 19:43:59 z
AAAAAAAAAAAAAAAAAATCTGAAAAAAAAAAAAAA 1 35 chrII 145916 145937 22 0
Here is the description in the README:
Alignment files (chr_best):
1: tag sequence
2: tag count
3: alignment score = # of matches, versus reference plus poly-A tail; # mismatches = length($1) - $3.
4: reference sequence name
5: reference sequence start position
6: reference sequence end position
7: maximum reference alignment length
8: length ambiguity in match, due to poly-A tailing; minimum reference alignment length is $7 - $8
Comment