Seqanswers Leaderboard Ad

**vivek_** · 08-24-2012, 12:01 PM

This could be done with the unix command line but it would be helpful if you can post a few lines enclosed within code brackets like

Code:

paste here

to get a precise idea of the file format.

**Etherella** · 08-30-2012, 02:59 AM

Originally posted by vivek_ View Post

This could be done with the unix command line but it would be helpful if you can post a few lines enclosed within code brackets like

Code:

paste here

to get a precise idea of the file format.

Code:

SCS_0004:2:1:1053:18066#0/1	AGCAATATTGACTACANCCTCATCAAAGCCTGTAGGCACC	[YITQR]MST\WN\\TEQU[`]WU]]WPYXXXOXU]`\W`	5	29	29	chr17:68048647-68172163_36129	3979	+	1	1
SCS_0004:2:1:1053:18066#0/1	AGCAATATTGACTACANCCTCATCAAAGCCTGTAGGCACC	[YITQR]MST\WN\\TEQU[`]WU]]WPYXXXOXU]`\W`	5	29	29	chr17:68048647-68172163_36130	3979	+	1	1
SCS_0004:2:1:1053:18066#0/1	AGCAATATTGACTACANCCTCATCAAAGCCTGTAGGCACC	[YITQR]MST\WN\\TEQU[`]WU]]WPYXXXOXU]`\W`	5	29	29	uc008dkh.1	4033	+	1	1
SCS_0004:2:1:1053:18066#0/1	AGCAATATTGACTACANCCTCATCAAAGCCTGTAGGCACC	[YITQR]MST\WN\\TEQU[`]WU]]WPYXXXOXU]`\W`	5	29	29	chr17:68046720-68172163_36128	3943	+	1	1
SCS_0004:2:1:1053:18066#0/1	AGCAATATTGACTACANCCTCATCAAAGCCTGTAGGCACC	[YITQR]MST\WN\\TEQU[`]WU]]WPYXXXOXU]`\W`	5	29	29	chr17:68046720-68172163_36127	3943	+	1	1
SCS_0004:2:1:1054:5070#0/1	TTTCTCTGTCTTGTCCNCCTAGTTTCCCTCCTGTAGGCAC	aaaaaaaaaaaaaaa]EaaaW]]]Yaa\a`[aa]Pa^]VT	2	30	30	chr2:40378133-40378584_42654	395	-	1	1
SCS_0004:2:1:1054:5070#0/1	TTTCTCTGTCTTGTCCNCCTAGTTTCCCTCCTGTAGGCAC	aaaaaaaaaaaaaaa]EaaaW]]]Yaa\a`[aa]Pa^]VT	2	30	30	chr1:8926487-8927380_99	16	+	1	1

Something like this. By the way, I don't have unix installed only Mac OS and windows, but as far as I understand Mac OS is a unix-based system, right?

**dpryan** · 08-30-2012, 10:23 AM

The easiest way would be to write a small script (in python, perl, whatever) to read that in and spit out the same data (sans alignment information) in fastq format. Column 1 is the read id, column 2 is the sequence, and column 3 is the quality score. If you have python installed on your Mac, then the following would probably work (changing INPUT_FILENAME to the name of the file you got from GEO and SOME_OUTPUT_FILE to whatever you want the output to be):

Code:

#!/usr/bin/python
import csv

f = csv.reader(open("INPUT_FILENAME", "r"), dialect="excel-tab")
output = open("SOME_OUTPUT_FILE", "w")

last = ""
for line in f :
    if(line[0] != last) :
        output.write(">%s\n" % (line[0]))
        output.write("%s\n" % (line[1]))
        output.write("+\n") 
        output.write("%s\n" % (line[2]))
        last = line[0]
output.close()

Something like that would probably work.

**Etherella** · 09-03-2012, 04:16 AM

thanks for the reply, I managed to get it working through galaxy.

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 30 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 32 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 28 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 53 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

Converting GEO database TXT format to fasta

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News