Seqanswers Leaderboard Ad

**Brian Bushnell** · 01-22-2015, 02:36 PM

I don't know what that format is. I suggest you strip out the first column only (sequence), reformat it as fasta, and remap it to produce a proper sam file with things like cigar strings and flags.

**MaximusPrime** · 01-22-2015, 02:44 PM

Thanks Brian. Sadly, I'm a novice -- I can strip out the sequence data and reformat it, but I wouldn't know where to go from there.

I've been reading up on the various programs/tools available, though. Do you have any beginner's guides for NGS data processing that you're a particular fan of?

**Brian Bushnell** · 01-22-2015, 02:57 PM

Unfortunately, no, but you can use BBMap to get the sequence into a mapped, sorted, indexed bam file, which is what IGV needs:

bbmap.sh nodisk ref=gene.fasta in=reads.fasta out=mapped.sam bs=sort.sh
sh sort.sh

The first command will map and create a sam file, and a shellscript. The second command will run the shellscript, which uses samtools to transform the sam file to a sorted indexed bam file. I added that option because I use IGV a lot

The BBTools package contains a lot of NGS data processing tools, but unfortunately there's no beginners guide - I should write one.

FYI, a correctly formatted fasta file will look like this:

>1
ACGTTTCG
TTTGGGGGGG
>2
AAATTT

...etc. It needs to alternate between headers, which start with ">", and sequence, which can span multiple lines, but doesn't have to.

**GenoMax** · 01-22-2015, 03:35 PM

@MaximusPrime: You appear to have have downloaded the wrong files. What you have appears to be some kind of processed data that is provided on the page for the samples (e.g. http://www.ncbi.nlm.nih.gov/geo/quer...?acc=GSM346111).

Best solution here is to use the sratoolkit to download the fastq files directly. Here is an example of how to do this: http://seqanswers.com/forums/showpos...36&postcount=7

You can find sratoolkit binaries here: http://www.ncbi.nlm.nih.gov/Traces/s...?view=software

**MaximusPrime** · 01-22-2015, 04:15 PM

Fantastic, thank you.

I'm sure I'll be back if I run into any trouble

Topics	Statistics	Last Post
Genetic Variants and Diabetes Risk in Childhood Cancer Survivors by seqadmin Started by seqadmin, Today, 08:47 AM	0 responses 12 views 0 likes	Last Post by seqadmin Today, 08:47 AM
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 60 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 59 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 54 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM

Seqanswers Leaderboard Ad

Announcement

Identifying the format of mysterious files

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News