Seqanswers Leaderboard Ad

**qtrinh** · 03-25-2009, 11:58 AM

Hi Heng,
I did do "samtools sort" first. Here is what I did:

samtools import homo_sapiens.fasta.fai S1.sam S1.bam

samtools sort S1.bam S1_sorted

samtools pileup -f homo_sapiens.fasta -c S1_sorted.bam

Q

**thondeboer** · 04-08-2009, 09:49 AM

Is there a running list somewhere that shows which programs produce (or take as input) SAM/BAM format? It would help us make a final decision in what format we should produce our mapping results for the mapping we do here at Complete Genomics.

The negative gap structure will probably be something that is dealt with, with special tags (GS, GQ and GC) so software that wants to make optimal use of all the data should use those tags, but most of the software should work "out of the box" if they support SAM/BAM...

**apfejes** · 04-08-2009, 10:23 AM

On the subject of tools that use SAMtools, I'm very interested in adding in support for my project, based in java. I'm aware that there exists java based tools for reading/writing in this format, but I'm unable to find any documentation on the software. Has anyone come across any information on how to use the Java SAM tools code?

**lh3** · 04-09-2009, 03:28 AM

To thondeboer: currently BWA natively generates alignments in the SAM format. BFAST also generates SAM. We also provide converters for SOAP, Bowtie, Export, novoalign and even blast. However, most of these converters are incomplete in that sometimes they cannot convert every information due to the lack of documentation especially for short indels. So far as I know, all aligners generate its own format. SAM is probably the first effort in unifying the alignment format, in particular for alignment for the new sequencing data.

To apfejes: I am not able to comment much on the Java implemention. I know the I/O part is complete and actually does more nice things than the C version of samtools. you may send an email to the mailing list to ask for the documentations.

**TylerBackman** · 04-15-2009, 03:29 PM

This is an excellent, and very exciting idea. With a standard alignment format (SAM), and a standard raw read format (Phred/Sanger fastq) we can drastically reduce the time most of us spend writing our own file format parsers and converters, and eliminate a common source of error in data analysis (incorrect parsing).

It's great to see the bioinformatics community coming together in this way.

To anyone developing alignment tools: Please include support for this format in future versions of your software!

**gcrdb** · 04-17-2009, 02:05 PM

Can SAMtools convert SAM back to MAQ?

Hi lh3,
I am glad that SAMtools can do maq2sam , but will it be easy to do sam2maq?
The reason I ask is that I want to MAQ to generate SNP and INDELs. BWA can not do that (yet).

thanks
g

**lh3** · 04-18-2009, 01:07 AM

SAMtools has a SNP caller, based on the same code of MAQ. See this page for more information: http://samtools.sourceforge.net/cns0.shtml. What is missing in SAMtools is a SNP filtration script like "maq.pl SNPfilter", but it is easy to write your own at the moment.

SAMtools' indel caller uses a different algorithm. It outperforms MAQ.

**gcrdb** · 04-20-2009, 10:45 AM

lh3,
Actually, I tried SAMtools before but somehow "pileup" it's not outputing anything so I am thinking go back to use MAQ.
Did I run the program correctly? (use the example files which come with samtools package)
examples> ../samtools pileup -f ex1.fa ex1.sam
--> return nothing
examples> ../samtools pileup ex1.sam
--> return nothing

thanks again!
g

**lh3** · 04-20-2009, 11:03 AM

should be: ../samtools pileup -t ex1.fa.fai ex1.sam or ../samtools pileup ex1.bam. I have added a Makefile.

Note that there is a companion format called BAM which is the binary representation of SAM. Most of samtools commands work on BAM only. I know having two formats is a bit confusing, but this is necessary for faster parsing.

**gcrdb** · 04-20-2009, 03:46 PM

lh3,
Thanks for quick response, pileup is working now!
Here is some questions about pileup format when I look at them at first time:
(1) what is a "*" in read bases , which is not documented in "http://samtools.sourceforge.net/pileup.shtml".
(2) Is it okay for a base in the same position pile-up twice ? (chr1 1949878 occur twice in my first pileup output)
thanks,
Below is the piece pile-up output I found the problem:
chr1 1949878 A A 142 0 60 55 C$....,,,...,,.C..,.,,..,...,,,.,.,.+1C,,,,..,.,........,^F
,^], &5,2IIII5I<II+%5=I+II(II8@CII3*I0I+IIII,I$@IIAAIIDI@I*5
chr1 1949878 * */+C 38 38 * +C 13 4 30 8
chr1 1949879 A A 150 0 60 54 ....,G,...,,.-1G..-1G.,.,,..,...,,,.,.,.,,,,..,.,........,,
, II*IIII;I.II&(&1I3II%9III:II7&I&@&IIII5I"6IIIIG.I33I$6
chr1 1949879 * -G/* 481 481 -G * 6 9 30 9
chr1 1949880 G G 25 0 60 55 .$A$..,A,..A,,*.*A,A,,A.,A..,,,.,A,.,,,,.+1A.,.,........A,,
^], +(,.III)8-II8%D.I0II#,I@5III.$I+I,IIII$I2EIIIIIIIIIE&?/
chr1 1949880 * */+A 350 350 * +A 24 3 19 9
chr1 1949881 A A 162 0 60 53 ..,,,...,,....,.,,..,...,,,.,.,.,,,,..,.,........G,,, 6DI
II3I$II81D)I%II+.III'III$I2I+IIII+II<II46IIAHI.2IB

**lh3** · 04-21-2009, 11:37 AM

You are invoking pileup with "-c" and you should also read this page:

Consensus/Indel Calling

http://samtools.sourceforge.net/cns0.shtml

A read base "*" means a deletion. The second line at "chr1 1949878" shows indel call. In principle this is not part of pileup.

**nilshomer** · 04-23-2009, 11:43 PM

I have a working patch to view ABI SOLiD color space using samtools (http://samtools.sourceforge.net/) text viewer. For example, some of the features using output from BFAST (in SAM format), which includes the "CS" and "CQ" tags, are:

- option to display colors instead of nucleotides.
- option to color bases/colors based on color. This is similar if you want to color bases based on the given base.
- option to color bases/colors based on color quality.
- the "." (dot) option when displaying color space will only show those colors that were corrected during alignment (i.e. the color errors).
- option to remove all insertions in the current display (in some regions, spurious insertions can cause a headache when viewing that region).

PM me and I can supply you with a source version.

**emucaki** · 04-27-2009, 01:12 PM

Hi, I'm a novice geneticist who is interested in using the 1000 Genome project data available on NCBI and I can't quite figure how to obtain sequence information from the BAM file, SAMTools' website is little help. I am wondering if anyone knows a good place to get information for this kind of work.

(Offtopic, anyone know why the 1000 Genome project has a log-in but no register option?)

**lh3** · 04-28-2009, 11:58 AM

The first thing you may want to try is:

samtools view -h aln.bam | less -S

**mhc** · 05-01-2009, 08:51 AM

Understanding samtools pileup output

Hi,

I'm having trouble trying to parse the samtools output. In the example below, at position 60, I have 108 reads. As I understand it, 8 reads terminate (since there are 8 '$'s), and there are 2 new reads (marked by the '^') on the next line.
So the next line - line 61 - should have 108-8+2=102 reads.
Instead, it has 99.
What am I missing here?
This is the 40th line of input with 40bp reads, and this is the first instance where '$' appears. Other lines seem to work out fine.

seq1 60 a 108 .$,$,$,$,$g$....ggt*.G,g,,.,,+2tt,+3agcG.,+4atgc,+4ttgcg.c,c.$.,,,..$,+4aggat+6ccgttt,..,tt,.,,..,.
+7CTGCCTG,.,.,,.,,..,.,..,.,.,,.,,..,,.,.,.,.,.,,.,.,.,.,,^].^],^], CBB=ABBA>BBCBB7BB<BBBBCBBBBBCBB@BB@BBCB:CBBAACC>ABCBBBBBBBBBCC9BBABB@B
B<BBB7CBBBBABBBBBCBBBBB@BB;BBBC@CBCBCB
seq1 61 g 99 A$.$.$.$t$c$t$,..,,a,A,,$..,.a,,.,+4gcag,,.,,$,A.,,+7ctgtttg,t$A,a..,.,.,.,,.,,..,.,..,.,.,,.,,..,,
.,.,.,.,.,,.,.,.,.,,.,,^].^]t BCAABB@B1BB<BBBBBBBBBBBBBACBB@BBABBBBBBBBBCBBCCBBBBBBBBCCBBB6BBBBBBBBBBBABCBBBBBBBBBBBBCAC?BCCBCBB@

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 18 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 22 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 17 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 49 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News