Seqanswers Leaderboard Ad

**maasha** · 11-15-2010, 05:48 AM

My guess it that its a plain tab separated table with sequence and copy count.

You can use Biopieces (www.biopieces.org) to convert this table into something useful - like FASTA format:

Code:

read_tab -i test.tab -k SEQ,COUNT | add_ident -k SEQ_NAME | merge_vals -k SEQ_NAME,COUNT | write_fasta -x
>ID00000000_1
AAAACTGGTTCCAGAAGTTGAGAC
>ID00000001_1
AAAACTGGTTCTGGCAGGTAG
>ID00000002_5
AAAACTGGTTGGGCTTAAAACTGC
>ID00000003_2
AAAACTGGTTGTAAACGGAGGAGC
>ID00000004_2
AAAACTGGTTTTAGATGGATAGAA
>ID00000005_1
AAAACTGGTTTTGCACTATTGGGC
>ID00000006_1
AAAACTGTAAAACAGGTGGTT

You can also use Biopieces for mapping with bowtie, BWA, BLAST, etc ...

Cheers,

Martin

**fkrueger** · 11-15-2010, 05:52 AM

This format seems to have been processed with an adapter trimming program which results in varying sequence lengths.

I assume
"AAAACTGGTTGGGCTTAAAACTGC 5"

needs to be interpreted as
the sequence: "AAAACTGGTTGGGCTTAAAACTGC" was present exactly "5" times.

What we have done with formats like this is transform it to FastA format like this:

>1
AAAACTGGTTGGGCTTAAAACTGC
>2
AAAACTGGTTGGGCTTAAAACTGC
>3
AAAACTGGTTGGGCTTAAAACTGC
>4
AAAACTGGTTGGGCTTAAAACTGC
>5
AAAACTGGTTGGGCTTAAAACTGC

(to reflect the quantative aspect)

and then map it to a genome using Bowtie or something similar.

Good luck!

edit: doh I was late!

**vebaev** · 11-15-2010, 05:53 AM

Thanks!,

for the fasta I will managed to convert it, I was wandering if bowtie can get the small reads for the mapping in this fasta than..., because I saw all the times the input is fastq

**fkrueger** · 11-15-2010, 06:14 AM

yes, just specify

bowtie -f sequence_file.fa > output.txt

This is taken from the Bowtie manual:

-f The query input files (specified either as <m1> and <m2>, or as <s>) are FASTA files (usually having extension .fa, .mfa, .fna or similar). All quality values are assumed to be 40 on the Phred quality scale

**Torst** · 11-15-2010, 10:56 PM

Originally posted by vebaev View Post

AAAACTGGTTCCAGAAGTTGAGAC 1
AAAACTGGTTCTGGCAGGTAG 1
AAAACTGGTTGGGCTTAAAACTGC 5
AAAACTGGTTGTAAACGGAGGAGC 2
AAAACTGGTTTTAGATGGATAGAA 2
AAAACTGGTTTTGCACTATTGGGC 1
AAAACTGTAAAACAGGTGGTT 1

It looks like someone has SORTED the reads, and COUNTED their frequency of occurrence.

% grep -v '^>' reads.fasta | sort | uniq -c > vebaev.out

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 59 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 57 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 53 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 56 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

what is this format?

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News