Seqanswers Leaderboard Ad

**maasha** · 11-15-2010, 05:48 AM

My guess it that its a plain tab separated table with sequence and copy count.

You can use Biopieces (www.biopieces.org) to convert this table into something useful - like FASTA format:

Code:

read_tab -i test.tab -k SEQ,COUNT | add_ident -k SEQ_NAME | merge_vals -k SEQ_NAME,COUNT | write_fasta -x
>ID00000000_1
AAAACTGGTTCCAGAAGTTGAGAC
>ID00000001_1
AAAACTGGTTCTGGCAGGTAG
>ID00000002_5
AAAACTGGTTGGGCTTAAAACTGC
>ID00000003_2
AAAACTGGTTGTAAACGGAGGAGC
>ID00000004_2
AAAACTGGTTTTAGATGGATAGAA
>ID00000005_1
AAAACTGGTTTTGCACTATTGGGC
>ID00000006_1
AAAACTGTAAAACAGGTGGTT

You can also use Biopieces for mapping with bowtie, BWA, BLAST, etc ...

Cheers,

Martin

**fkrueger** · 11-15-2010, 05:52 AM

This format seems to have been processed with an adapter trimming program which results in varying sequence lengths.

I assume
"AAAACTGGTTGGGCTTAAAACTGC 5"

needs to be interpreted as
the sequence: "AAAACTGGTTGGGCTTAAAACTGC" was present exactly "5" times.

What we have done with formats like this is transform it to FastA format like this:

>1
AAAACTGGTTGGGCTTAAAACTGC
>2
AAAACTGGTTGGGCTTAAAACTGC
>3
AAAACTGGTTGGGCTTAAAACTGC
>4
AAAACTGGTTGGGCTTAAAACTGC
>5
AAAACTGGTTGGGCTTAAAACTGC

(to reflect the quantative aspect)

and then map it to a genome using Bowtie or something similar.

Good luck!

edit: doh I was late!

**vebaev** · 11-15-2010, 05:53 AM

Thanks!,

for the fasta I will managed to convert it, I was wandering if bowtie can get the small reads for the mapping in this fasta than..., because I saw all the times the input is fastq

**fkrueger** · 11-15-2010, 06:14 AM

yes, just specify

bowtie -f sequence_file.fa > output.txt

This is taken from the Bowtie manual:

-f The query input files (specified either as <m1> and <m2>, or as <s>) are FASTA files (usually having extension .fa, .mfa, .fna or similar). All quality values are assumed to be 40 on the Phred quality scale

**Torst** · 11-15-2010, 10:56 PM

Originally posted by vebaev View Post

AAAACTGGTTCCAGAAGTTGAGAC 1
AAAACTGGTTCTGGCAGGTAG 1
AAAACTGGTTGGGCTTAAAACTGC 5
AAAACTGGTTGTAAACGGAGGAGC 2
AAAACTGGTTTTAGATGGATAGAA 2
AAAACTGGTTTTGCACTATTGGGC 1
AAAACTGTAAAACAGGTGGTT 1

It looks like someone has SORTED the reads, and COUNTED their frequency of occurrence.

% grep -v '^>' reads.fasta | sort | uniq -c > vebaev.out

Topics	Statistics	Last Post
Evaluating Genome Sequencing for ECMO Patients in the NICU by seqadmin Started by seqadmin, 12-17-2024, 10:28 AM	0 responses 26 views 0 likes	Last Post by seqadmin 12-17-2024, 10:28 AM
New Genetic Toolkit Refines Studies on Gene Function and Disease by seqadmin Started by seqadmin, 12-13-2024, 08:24 AM	0 responses 42 views 0 likes	Last Post by seqadmin 12-13-2024, 08:24 AM
Study Links Brain Mechanism to Emotional Responses in Animals and Humans by seqadmin Started by seqadmin, 12-12-2024, 07:41 AM	0 responses 28 views 0 likes	Last Post by seqadmin 12-12-2024, 07:41 AM
Study Identifies Ribosomal RNA Fingerprints as Early Cancer Biomarkers by seqadmin Started by seqadmin, 12-11-2024, 07:45 AM	0 responses 42 views 0 likes	Last Post by seqadmin 12-11-2024, 07:45 AM

Seqanswers Leaderboard Ad

Announcement

what is this format?

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News