Unconfigured Ad

**Brian Bushnell** · 01-22-2015, 02:36 PM

I don't know what that format is. I suggest you strip out the first column only (sequence), reformat it as fasta, and remap it to produce a proper sam file with things like cigar strings and flags.

**MaximusPrime** · 01-22-2015, 02:44 PM

Thanks Brian. Sadly, I'm a novice -- I can strip out the sequence data and reformat it, but I wouldn't know where to go from there.

I've been reading up on the various programs/tools available, though. Do you have any beginner's guides for NGS data processing that you're a particular fan of?

**Brian Bushnell** · 01-22-2015, 02:57 PM

Unfortunately, no, but you can use BBMap to get the sequence into a mapped, sorted, indexed bam file, which is what IGV needs:

bbmap.sh nodisk ref=gene.fasta in=reads.fasta out=mapped.sam bs=sort.sh
sh sort.sh

The first command will map and create a sam file, and a shellscript. The second command will run the shellscript, which uses samtools to transform the sam file to a sorted indexed bam file. I added that option because I use IGV a lot

The BBTools package contains a lot of NGS data processing tools, but unfortunately there's no beginners guide - I should write one.

FYI, a correctly formatted fasta file will look like this:

>1
ACGTTTCG
TTTGGGGGGG
>2
AAATTT

...etc. It needs to alternate between headers, which start with ">", and sequence, which can span multiple lines, but doesn't have to.

**GenoMax** · 01-22-2015, 03:35 PM

@MaximusPrime: You appear to have have downloaded the wrong files. What you have appears to be some kind of processed data that is provided on the page for the samples (e.g. http://www.ncbi.nlm.nih.gov/geo/quer...?acc=GSM346111).

Best solution here is to use the sratoolkit to download the fastq files directly. Here is an example of how to do this: http://seqanswers.com/forums/showpos...36&postcount=7

You can find sratoolkit binaries here: http://www.ncbi.nlm.nih.gov/Traces/s...?view=software

**MaximusPrime** · 01-22-2015, 04:15 PM

Fantastic, thank you.

I'm sure I'll be back if I run into any trouble

Topics	Statistics	Last Post
A New Method Makes Hantavirus Genome Analysis Faster and More Accessible by SEQadmin2 Started by SEQadmin2, 06-05-2026, 10:09 AM	0 responses 14 views 0 reactions	Last Post by SEQadmin2 06-05-2026, 10:09 AM
A New Single-Cell Method Maps DNA-Protein Interactions by SEQadmin2 Started by SEQadmin2, 06-04-2026, 08:59 AM	0 responses 24 views 0 reactions	Last Post by SEQadmin2 06-04-2026, 08:59 AM
Long-Read RNA Sequencing Uncovers a Hidden Layer of Immune Cell Regulation by SEQadmin2 Started by SEQadmin2, 06-02-2026, 12:03 PM	0 responses 29 views 0 reactions	Last Post by SEQadmin2 06-02-2026, 12:03 PM
DNA Methylation Study Reveals How Epigenetic Changes Pass Between Generations by SEQadmin2 Started by SEQadmin2, 06-02-2026, 11:40 AM	0 responses 23 views 0 reactions	Last Post by SEQadmin2 06-02-2026, 11:40 AM

Unconfigured Ad

Identifying the format of mysterious files

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News