SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
SOAP alignment format convert to SAM/BAM KevinLam Bioinformatics 31 01-10-2018 08:05 PM
SAM/BAM format to wiggle format pinki999 Bioinformatics 19 08-12-2015 12:35 AM
SAM to CUFFLINKS SAM format repinementer Bioinformatics 4 03-15-2012 08:53 AM
Looking process to convert gff3 format into ace format or sam format andylai Bioinformatics 1 05-17-2011 02:09 AM
anyone help me on bowtie format -> sam format! tninja Bioinformatics 2 04-25-2010 09:33 PM

Reply
 
Thread Tools
Old 04-21-2009, 11:37 AM   #41
lh3
Senior Member
 
Location: Boston

Join Date: Feb 2008
Posts: 693
Default

You are invoking pileup with "-c" and you should also read this page:

http://samtools.sourceforge.net/cns0.shtml

A read base "*" means a deletion. The second line at "chr1 1949878" shows indel call. In principle this is not part of pileup.
lh3 is offline   Reply With Quote
Old 04-23-2009, 11:43 PM   #42
nilshomer
Nils Homer
 
nilshomer's Avatar
 
Location: Boston, MA, USA

Join Date: Nov 2008
Posts: 1,285
Default

I have a working patch to view ABI SOLiD color space using samtools (http://samtools.sourceforge.net/) text viewer. For example, some of the features using output from BFAST (in SAM format), which includes the "CS" and "CQ" tags, are:

- option to display colors instead of nucleotides.
- option to color bases/colors based on color. This is similar if you want to color bases based on the given base.
- option to color bases/colors based on color quality.
- the "." (dot) option when displaying color space will only show those colors that were corrected during alignment (i.e. the color errors).
- option to remove all insertions in the current display (in some regions, spurious insertions can cause a headache when viewing that region).

PM me and I can supply you with a source version.
nilshomer is offline   Reply With Quote
Old 04-27-2009, 01:12 PM   #43
emucaki
Member
 
Location: .

Join Date: Apr 2009
Posts: 12
Default

Hi, I'm a novice geneticist who is interested in using the 1000 Genome project data available on NCBI and I can't quite figure how to obtain sequence information from the BAM file, SAMTools' website is little help. I am wondering if anyone knows a good place to get information for this kind of work.

(Offtopic, anyone know why the 1000 Genome project has a log-in but no register option?)
emucaki is offline   Reply With Quote
Old 04-28-2009, 11:58 AM   #44
lh3
Senior Member
 
Location: Boston

Join Date: Feb 2008
Posts: 693
Default

The first thing you may want to try is:

samtools view -h aln.bam | less -S
lh3 is offline   Reply With Quote
Old 05-01-2009, 08:51 AM   #45
mhc
Junior Member
 
Location: Boston

Join Date: Jun 2008
Posts: 2
Default Understanding samtools pileup output

Hi,

I'm having trouble trying to parse the samtools output. In the example below, at position 60, I have 108 reads. As I understand it, 8 reads terminate (since there are 8 '$'s), and there are 2 new reads (marked by the '^') on the next line.
So the next line - line 61 - should have 108-8+2=102 reads.
Instead, it has 99.
What am I missing here?
This is the 40th line of input with 40bp reads, and this is the first instance where '$' appears. Other lines seem to work out fine.


seq1 60 a 108 .$,$,$,$,$g$....ggt*.G,g,,.,,+2tt,+3agcG.,+4atgc,+4ttgcg.c,c.$.,,,..$,+4aggat+6ccgttt,..,tt,.,,..,.
+7CTGCCTG,.,.,,.,,..,.,..,.,.,,.,,..,,.,.,.,.,.,,.,.,.,.,,^].^],^], CBB=ABBA>BBCBB7BB<BBBBCBBBBBCBB@BB@BBCB:CBBAACC>ABCBBBBBBBBBCC9BBABB@B
B<BBB7CBBBBABBBBBCBBBBB@BB;BBBC@CBCBCB
seq1 61 g 99 A$.$.$.$t$c$t$,..,,a,A,,$..,.a,,.,+4gcag,,.,,$,A.,,+7ctgtttg,t$A,a..,.,.,.,,.,,..,.,..,.,.,,.,,..,,
.,.,.,.,.,,.,.,.,.,,.,,^].^]t BCAABB@B1BB<BBBBBBBBBBBBBACBB@BBABBBBBBBBBCBBCCBBBBBBBBCCBBB6BBBBBBBBBBBABCBBBBBBBBBBBBCAC?BCCBCBB@

Last edited by mhc; 05-01-2009 at 10:30 AM.
mhc is offline   Reply With Quote
Old 05-04-2009, 02:12 AM   #46
zee
NGS specialist
 
Location: Malaysia

Join Date: Apr 2008
Posts: 249
Default

I was wondering if I could get some help on the SAM specification.

I get the error:

Quote:
samtools: bam_lpileup.c:115: tview_func: Assertion `l == tv->n_pre' failed
When I page the viewer to my reads containing indels. A sample of my data is

Quote:
suis_indels_30749_30917_a0b7e 147 Streptococcus_suis 30782 150 60M7D15M * 0 0 CAGATTGTTGCGGTTTATAAGGATCAGGCGACGGCTTTTAATGGTGGTAAGAAAGAACAGGCAAGGGCCGGCTCA YX[QRQZETFQLZT[FQMX]_``\SZ```_`^``_[``^XZ```^``__``X]T[```_^`_````````````` NM:i:7 MD:Z:60M7D15M
suis_indels_30862_31024_a3384 131 Streptococcus_suis 30801 150 41M7D34M * 0 0 AGGATCAGGCGACGGCTTTTAATGGTGGTAAGAAAGAACAGGCAAGGGCCGGCTCAATAATCTGATTTCATCTTT `````H^_^^_```Z]_``````]``^]``_]```]]X````]_]^````]Z_URYT]]\\]_X``_^][U^``\ NM:i:7 MD:Z:41M7D34M
suis_indels_30874_31073_a388e 131 Streptococcus_suis 30813 150 29M7D46M * 0 0 CGGCTTTTAATGGTGGTAAGAAAGAACAGGCAAGGGCCGGCTCAATAATCTGATTTCATCTTTGATTTTTGAAAA ```^`\[`````````````_X][U]RIST\`XDDLIHGK]J\XMFVER^]`````[POHLGTU]^_Z``]]]KV NM:i:7 MD:Z:29M7D46M
suis_indels_30890_31082_a9685 131 Streptococcus_suis 30829 150 13M7D62M * 0 0 TAAGAAAGAACAGGCAAGGGCCGGCTCAATAATCTGATTTCATCTTTGATTTTTGAAAAATTGAATGAAGCTGGT `]__`ZOY^``ZUV``]TTY`_``````^LU```_`_``XV[````^\`\\^UU[`XHOPKITNEUNIRRNDWW` NM:i:7 MD:Z:13M7D62M
suis_indels_16281_16438_b9354 131 Streptococcus_suis 16224 150 6M1D69M * 0 0 AAATGAGATATTTACGAATTATCTCACGCAATACAATCACCGGGATTGATAAACGATGAATTATTAATGTTGAG`[``````_`_I_]O[__`__Y__VFMY\````^Z```^Z\U``_``````]S`\Y^[_]^```^]`_\Y^ZQXH NM:i:2 MD:Z:6M1D69M
suis_indels_2007371_2007505_bf820 147 Streptococcus_suis 2007364 150 71M3D4M * 0 0 GTTCTGACTTTTATTCACATCTTGTGGAAAACTTTTCTTAACAGTGTGGATTTTAAAAATTATCTGTGGAATTTT MRT^\_]RZ__`]]YP[^`ZT`T]VH``^]^TS]]``]^^````````_W_`_\\```_````````````\^`^ NM:i:3 MD:Z:71M3D4M
suis_indels_30783_30934_c09ed 147 Streptococcus_suis 30799 150 43M7D32M * 0 0 TAAGGATCAGGCGACGGCTTTTAATGGTGGTAAGAAAGAACAGGCAAGGGCCGGCTCAATAATCTGATTTCATCT U]PRKMJTRKIK___`Y`\]`\^```ZXVYMZX]`_```Z``_``WWQ]_````_]``]````````````Z_W` NM:i:7 MD:Z:43M7D32M
suis_indels_16171_16358_c4122 147 Streptococcus_suis 16227 150 3M1D72M * 0 0 TGAGATATTTACGAATTATCTCACGCAATACAATCACCGGGATTGATAAACGATGAATTATTAATGTTGAGAGGNFMU``_`_`NTPV]^__^]YYQLX^ZK[^[[[^^``]RUX``Z]Y\Z````````^``_`]U\`^_```]```` NM:i:1 MD:Z:3M1D72M
suis_indels_16164_16353_ce374 147 Streptococcus_suis 16222 150 8M1D67M * 0 0 CCAAATGAGATATTTACGAATTATCTCACGCAATACAATCACCGGGATTGATAAACGATGAATTATTAATGTTG[]]^[]\X[^`\ZZX`^_^\_``````\`_JL[\R]X```_`````_``_Y[]`````[\````Z```````^_H NM:i:2 MD:Z:8M1D67M
suis_indels_30842_31012_d14fb 131 Streptococcus_suis 30781 150 61M7D14M * 0 0 CCAGATTGTTGCGGTTTATAAGGATCAGGCGACGGCTTTTAATGGTGGTAAGAAAGAACAGGCAAGGGCCGGCTC ```_`_`````]`````_```^_VUW^```_^_ZXZT^`````__]YNXTEKV`_YXNRW\\REHDDLSGMFURD NM:i:7 MD:Z:61M7D14M
suis_indels_30721_30932_d9806 147 Streptococcus_suis 30797 150 45M7D30M * 0 0 TATAAGGATCAGGCGACGGCTTTTAATGGTGGTAAGAAAGAACAGGCAAGGGCCGGCTCAATAATCTGATTTCAT PPJ[TFRJ``````TVU`__`^^`_]_`````GFPZTWY\VX`_````__``````^`````````````````` NM:i:7 MD:Z:45M7D30M
suis_indels_30785_30945_e17b9 147 Streptococcus_suis 30810 150 32M7D43M * 0 0 CGACGGCTTTTAATGGTGGTAAGAAAGAACAGGCAAGGGCCGGCTCAATAATCTGATTTCATCTTTGATTTTTGA TFWOOKDUNQZ`^RU^^`STTSVW^_``````^`_V``_``````````\[[`XR_```_W\]O_``__`_T^`` NM:i:7 MD:Z:32M7D43M
suis_indels_30737_30928_e6336 147 Streptococcus_suis 30793 150 49M7D26M * 0 0 GGTTTATAAGGATCAGGCGACGGCTTTTAATGGTGGTAAGAAAGAACAGGCAAGGGCCGGCTCAATAATCTGATT LMRUPOTQPY[V]]^``]TS\^`\^`UXX^S^```_^Z``\\``_```^Y^Q]```````V\]^``^``K^```` NM:i:7 MD:Z:49M7D26M
And I'm using novoalign format where I've modified the novo2sam.pl script to produce a 'correct' CIGAR format. Most samtools operation work except for the tview
zee is offline   Reply With Quote
Old 05-04-2009, 09:00 AM   #47
nilshomer
Nils Homer
 
nilshomer's Avatar
 
Location: Boston, MA, USA

Join Date: Nov 2008
Posts: 1,285
Default

I have also had a lot of problems when viewing locations with many insertions and deletions. I have been trying to find the bug in the source code, but so far no luck.

Nils
nilshomer is offline   Reply With Quote
Old 05-04-2009, 10:19 AM   #48
lh3
Senior Member
 
Location: Boston

Join Date: Feb 2008
Posts: 693
Default

Please check out the latest version (0.1.3-9 r256) from SVN. I fixed a kind of typo but I am not sure if this also fixes this assertion failure. It is not easy to see this. Please let me know if this works. Many thanks.

Last edited by lh3; 05-04-2009 at 10:22 AM.
lh3 is offline   Reply With Quote
Old 05-04-2009, 10:37 AM   #49
gcrdb
Junior Member
 
Location: usa

Join Date: Jan 2009
Posts: 9
Default

Hi,
The bwa aln is fast (few hours),
but somehow bwa samse(or sampe) is slow.
This program have been running for almost a week:
bwa samse -n 255 mouse_genome.fa data.sai data.fq >data.sam

My data.fq has about 80 million reads. Is that to much reads for samse(pe)? Should I split them and merge?
gcrdb is offline   Reply With Quote
Old 05-04-2009, 11:00 AM   #50
lh3
Senior Member
 
Location: Boston

Join Date: Feb 2008
Posts: 693
Default

Bwa and samtools are two separated packages. I will start a new thread about questions related to bwa.

I answer your question here. I guess you are using reads shorter than 30bp. BWT based algorithms are generally slow for enumerating multiple hits and very short reads are more likely to have many hits. Therefore, BWA's 'sampe' and 'samse -n' are inefficient for these short reads. Probably non-BWT algorithms are better for such alignment. If you really want to align very short reads, you may consider use smaller -n with samse and smaller -o (say 200) with sampe. Using smaller -o will make pairing less effective, though.

The memory of bwa is almost independent of the number of input reads and the speed is linear in that. You can separate input into smaller batches.

Last edited by lh3; 05-04-2009 at 11:03 AM.
lh3 is offline   Reply With Quote
Old 05-07-2009, 11:58 PM   #51
zee
NGS specialist
 
Location: Malaysia

Join Date: Apr 2008
Posts: 249
Default

Quote:
Originally Posted by lh3 View Post
Please check out the latest version (0.1.3-9 r256) from SVN. I fixed a kind of typo but I am not sure if this also fixes this assertion failure. It is not easy to see this. Please let me know if this works. Many thanks.
I have checked out the 'latest' version from SVN:

Program: samtools (Tools for alignments in the SAM format)
Version: 0.1.3-16 (r267)


This seems to be a later version, would it contain the fix?
zee is offline   Reply With Quote
Old 05-13-2009, 08:09 AM   #52
jkbonfield
Senior Member
 
Location: Cambridge, UK

Join Date: Jul 2008
Posts: 146
Default Feature request

I find it cumbersome that samtools import *requires* an external index file to tell it what reference sequences are present. This is even true if there is a header in the .sam file.

For my own sanity I knocked up this simple, albeit slow, perl script:

Code:
#!/usr/bin/perl -w

# Reads a sam file and generates an index suitable for "samtools import".

use strict;

#--- Build index
my %ref; #key=name, value=length
while (<>) {
    my @F=split(" ", $_);
    my $rname = $F[2];

    next if $rname eq "*";
    my $end = $F[3] + length($F[9]);

    $ref{$rname} = $end if (!exists($ref{$rname}) || $ref{$rname} < $end);
}

#--- Output input to stdout
foreach (sort keys %ref) {
    print "$_\t$ref{$_}\n";
}
It generates a file suitable for passing as the 1st argument after "samtools import". Admittedly it doesn't know the true ending of the reference sequence, but for de-novo assemblies the reference is just the consensus anyway so it's faked.

Note that the above is an estimate. It doesn't consider the fact that the sequences may have CIGAR string possibly altering the mapped coordinates of the reference, but it's a start at least.
jkbonfield is offline   Reply With Quote
Old 05-13-2009, 09:49 AM   #53
lh3
Senior Member
 
Location: Boston

Join Date: Feb 2008
Posts: 693
Default

The latest SVN parses the header @SQ lines. You need to use the "view" command to do this.
lh3 is offline   Reply With Quote
Old 05-18-2009, 01:52 PM   #54
lh3
Senior Member
 
Location: Boston

Join Date: Feb 2008
Posts: 693
Default

On behalf of the Java API developers: the Java APIs and command-line tools are formally released here (also linked from http://samtools.sourceforge.net):

http://picard.sourceforge.net/

The Java implementation is more flexible than SAMtools-C on sorting/viewing/merging/rmdup. Developers may also find Java APIs are easier to use than the C APIs.

By the way, Lincoln Stein has implemented initial Perl bindings for the SAM/BAM format, based on the C APIs. This Perl module will become part of BioPerl as well as GBrowse.
lh3 is offline   Reply With Quote
Old 05-27-2009, 06:00 PM   #55
zee
NGS specialist
 
Location: Malaysia

Join Date: Apr 2008
Posts: 249
Default

Is there a way to convert a SAM consensus output (using -c option for pileup) to the old maq-style .cns consensus?

I have some maq-based pipelines I would like to use on my BWA results.
zee is offline   Reply With Quote
Old 05-30-2009, 09:16 AM   #56
pparg
Member
 
Location: NY

Join Date: Aug 2008
Posts: 19
Default

I read through the latest SAM Format Specification on optional fields. But I still don’t understand some cases in the .sam file I got. For instances:
XT:A:R NM:i:0 X0:i:30406 XM:i:0 XO:i:0 XG:i:0
XT:A:R NM:i:1 X0:i:2 X1:i:1 XM:i:1 XO:i:0 XG:i:0 MD:Z:3G32
I just can’t figure out what do those X? fileds mean? I know these are are reserved for local use and re not be formally defined. But what is the point here? What are the types A, i, Z etc for?
Any hint/ideas would be appreciated.
pparg is offline   Reply With Quote
Old 05-30-2009, 11:11 AM   #57
lh3
Senior Member
 
Location: Boston

Join Date: Feb 2008
Posts: 693
Default

X? tags are for aligner specific information. You need to read the documentation of the program that generates these tags.
lh3 is offline   Reply With Quote
Old 06-02-2009, 12:55 PM   #58
jess
Junior Member
 
Location: USA

Join Date: May 2009
Posts: 7
Default

Quote:
Originally Posted by zee View Post
Is there a way to convert a SAM consensus output (using -c option for pileup) to the old maq-style .cns consensus?

I have some maq-based pipelines I would like to use on my BWA results.
Yes, I am looking for the same. Heng li cud u help out plz?
jess is offline   Reply With Quote
Old 06-02-2009, 11:10 PM   #59
corthay
Member
 
Location: japan

Join Date: Oct 2008
Posts: 25
Default

I would like to generate bam file from blast result of EST.

Firstly, I made sam file from blast result using blsat2sam.pl in samtools packages.
Secondly, I have tried to convert from sam file into bam file using "samtools import" but I've got following error.

Parse error at 1: CIGAR and sequence length are inconsistent.

seq1 16 chr20 18289430 255 23H426M760H * 0 0 * * AS:i:844 EV:Z:0.0

Could you give me any idea please ?
corthay is offline   Reply With Quote
Old 06-03-2009, 12:43 AM   #60
lh3
Senior Member
 
Location: Boston

Join Date: Feb 2008
Posts: 693
Default

@corthay

You can convert with "blast2sam.pl -s" to save the sequence in SAM. Currently, samtools cannot parse SAM without sequence, although the specification allows this.

@jess and zee

Sorry. I am afraid that you cannot easily convert pileup output to maq-cns. You can call SNPs/indels directly from samtools and in SVN samtools.pl now filters SNPs/indels in the maq-like way. .cns has been replaced with glf, although they contain different information.
lh3 is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 08:17 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO