Hello!
I am having some format troubles with the NRSF dataset downloaded from here:
The files are in txt format like:
Uniq files: all tags has single alignments in the genome
Column 1: tag sequence
Column 2: alignment score
Column 3: # of hit in the genome, 1 = unique hit
Column 4: Chr position
Column 5: Chr direction
Column 6: matched genome sequence
Column 7: next best possible alignment score
example:
head -5 GSM327023_chipFC1592_uniq_hg17.txt
GCAGAGTAACCCGCCCCACCCCACC 10406 1 chr6:156964520 F GCAGAGTAACTCTCCCCACCCCACC 9359
Now I want to run Useq on it. As I have to run the program a lot of times I would like to make it run with one command (ChIPSeq application) which supports ELAND format. So I was trying to convert my files into eland.
I was wondering if s.o. can help me here. Apparently I didn't manage a correct conversion. I don't seem to have produced the right amount of columns.
So my question is: How does the ELAND format read by Useq look like?
That's a line of the error meassage:
Error: line does not contain enough columns -> GCGCCGAGCATTCCGGCCTGAGGAG CTTCCCAGGCCGGAATGCTCGGCGC U1 1 0 0 chr2.fa 55756496 R .
If anyone could help here that would be great! As this NRSF dataset is widely used I might not be the only one having these troubles...
Thanks a lot!
dani
I am having some format troubles with the NRSF dataset downloaded from here:
The files are in txt format like:
Uniq files: all tags has single alignments in the genome
Column 1: tag sequence
Column 2: alignment score
Column 3: # of hit in the genome, 1 = unique hit
Column 4: Chr position
Column 5: Chr direction
Column 6: matched genome sequence
Column 7: next best possible alignment score
example:
head -5 GSM327023_chipFC1592_uniq_hg17.txt
GCAGAGTAACCCGCCCCACCCCACC 10406 1 chr6:156964520 F GCAGAGTAACTCTCCCCACCCCACC 9359
Now I want to run Useq on it. As I have to run the program a lot of times I would like to make it run with one command (ChIPSeq application) which supports ELAND format. So I was trying to convert my files into eland.
I was wondering if s.o. can help me here. Apparently I didn't manage a correct conversion. I don't seem to have produced the right amount of columns.
So my question is: How does the ELAND format read by Useq look like?
That's a line of the error meassage:
Error: line does not contain enough columns -> GCGCCGAGCATTCCGGCCTGAGGAG CTTCCCAGGCCGGAATGCTCGGCGC U1 1 0 0 chr2.fa 55756496 R .
If anyone could help here that would be great! As this NRSF dataset is widely used I might not be the only one having these troubles...
Thanks a lot!
dani
Comment