Seqanswers Leaderboard Ad

**nilshomer** · 02-26-2010, 02:05 PM

Originally posted by veena View Post

Hi,

I downloaded a bam file from NCBI and am unable to index it. Here is what I've done:
samtools index file.bam

Error message:
[bam_header_read] EOF marker is absent.

I haven't really found anything in thread archives that give this error with a bam file so am pretty sure I'm doing something fundamentally wrong. I'm a samtools and bam file newbie so any help would be much appreciated!

Thanks!

Your fine as it is just a warning. The new implementations of samtools and picard add an EOF marker. Earlier BAMs did not have these.

Nils

**veena** · 03-03-2010, 02:12 PM

Thanks Nils!
Another newbie question, I'm trying to get a subset of reads from publicly available unmapped data that align to my sequence of interest.

I'm told that a read's sequence should be available in a BAM file. But isnt a BAM file by definition an alignment file (as in aligned-to-something file) to begin with? Can I run another alignment program (say Blast) on a pre-existing BAM file with a completely different query? Very confused and would appreciate any help!

**veena** · 03-03-2010, 02:15 PM

Also, any help on how to run BLAST on a BAM file would be much appreciated!

**nilshomer** · 03-03-2010, 02:20 PM

Originally posted by veena View Post

Thanks Nils!
Another newbie question, I'm trying to get a subset of reads from publicly available unmapped data that align to my sequence of interest.

I'm told that a read's sequence should be available in a BAM file. But isnt a BAM file by definition an alignment file (as in aligned-to-something file) to begin with? Can I run another alignment program (say Blast) on a pre-existing BAM file with a completely different query? Very confused and would appreciate any help!

The SAM format has support for reads that are not aligned. For example, if one end of a paired end read does not map, it can be flagged as unmapped and given the co-ordinate of the other end. I would study the SAM spec carefully. By filtering on the FLAG field, you can pull out reads that are unmapped (assuming that the aligner was kind enough to include unmapped reads).

To run BLAST on a BAM file, you would have to convert the BAM file into whatever format (FASTA?) BLAST requires. This can be done with a quick script or bugging your local bioinformatician.

**veena** · 03-03-2010, 07:13 PM

Thanks so much again Nils! The scary thought is I'm the "local bioinformatician" and I've googled my fingers silly trying to figure out how to get a fasta (thats all I really need!) from the publicly available .bam file. Nobody else around me cares to work with .bam files (yet). Is it best to convert from bam to sam and then format read name and sequence into fasta? Or is there a better way?

**nilshomer** · 03-03-2010, 07:27 PM

Originally posted by veena View Post

Thanks so much again Nils! The scary thought is I'm the "local bioinformatician" and I've googled my fingers silly trying to figure out how to get a fasta (thats all I really need!) from the publicly available .bam file. Nobody else around me cares to work with .bam files (yet). Is it best to convert from bam to sam and then format read name and sequence into fasta? Or is there a better way?

Look at Picard's SamToFastq.jar. That will get you to FASTQ and then smooth sailing to FASTA. Alternatively, you can use the many APIs (PERL, Python, C, Java, etc.) to natively read in SAM/BAM. I have personally used all of them successfully.

**veena** · 03-03-2010, 08:21 PM

Thanks Nils, I'll give it a try!

**krobison** · 03-03-2010, 08:25 PM

Using Picard's tool is probably better, but it's worth studying the line below as an example as an example of a very quick-and-dirty SAM-to-FASTA generator

Code:

samtools view myalign.bam | perl -n -e 'if (/^\@/) { @f=split(/\t/); print ">$f[0]|$f[1] $f[2]:$f[3]\n$f[9]\n"; }'

I used the flag field to disambiguate the two ends of a read

(any bugs were clearly deliberate attempts to educate the student! :-)

**veena** · 03-04-2010, 04:52 AM

Thats what I get for not readig the manual well enough

thanks krobison! And disclaimer duly noted!

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 39 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 41 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 35 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 55 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

Error indexing BAM file using samtools

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News