SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
SAM/BAM format to wiggle format pinki999 Bioinformatics 19 08-12-2015 01:35 AM
SAM to CUFFLINKS SAM format repinementer Bioinformatics 4 03-15-2012 09:53 AM
SOAP alignment format convert to SAM/BAM KevinLam Bioinformatics 29 01-24-2012 03:38 AM
Looking process to convert gff3 format into ace format or sam format andylai Bioinformatics 1 05-17-2011 03:09 AM
anyone help me on bowtie format -> sam format! tninja Bioinformatics 2 04-25-2010 10:33 PM

Reply
 
Thread Tools
Old 12-23-2008, 06:44 AM   #1
lh3
Senior Member
 
Location: Boston

Join Date: Feb 2008
Posts: 693
Default SAM: a generic alignment format

For NGS data analysis, an aligner tends to be successful when it comes with utilities for comprehensive downstream analyses such as reference based assembly, SNP/indel calling and alignment viewer. Eland/GAPipeline, Soap and Maq are such examples. Unfortunately, it is non-trivial to implement all these downstream analyses and implementing these for each aligner would be a waste of time and human resources as well. Mostly we want to separate alignment from the downstream analyses after the alignment. To achieve this, we need a generic alignment format that makes all aligners happy. NovoAlign and Bowtie can output Maq alignment format to take the advantage of Maq downstream data processing. However, Maq format does not really suit the goal. It does not support longer reads nor alignment with more than one indel and it is too specific to Maq. To solve this problem, the 1000Genome Project Committee decided to develop a generic alignment format. And now the first version of specification and implementation have come out.

The new alignment format, SAM (Sequence Alignment/Map), is the collaborative result of several major genome centres. It eliminates the major defects of Maq format while retaining its advantages. We also migrated and improved various downstream data processing implemented in Maq/Maqview, such as indexing, pileup, viewer and consensus caller. For more information, please check website:

http://samtools.sourceforge.net

I hope samtools may help aligner developers to promote their own software: once a program can generate alignment in SAM format, Maq-like downstream analysis will be available right now.
lh3 is offline   Reply With Quote
Old 12-23-2008, 02:50 PM   #2
bioinfosm
Senior Member
 
Location: USA

Join Date: Jan 2008
Posts: 482
Default

Thanks Heng.
It looks this will be very useful and make it easy to try various new upcoming tools..

Is it possible to have a workflow like MAQ's easyrun that takes through a user case for SAM/BAM?
bioinfosm is offline   Reply With Quote
Old 12-23-2008, 07:36 PM   #3
ECO
--Site Admin--
 
Location: SF Bay Area, CA, USA

Join Date: Oct 2007
Posts: 1,352
Default

Hey lh3,

Thanks for posting this here. I'm going to sticky it in the Bioinformatics forum for a while to make sure everyone sees it!
ECO is offline   Reply With Quote
Old 12-29-2008, 08:48 AM   #4
lparsons
Member
 
Location: NJ

Join Date: Nov 2008
Posts: 28
Default

The documentation notes that "Only MAQ->SAM converter is implemented." However, I could not find anywhere that referenced this conversion utility. Is there software to perform this conversion?
lparsons is offline   Reply With Quote
Old 12-31-2008, 06:01 AM   #5
lh3
Senior Member
 
Location: Boston

Join Date: Feb 2008
Posts: 693
Default

To lparsons:

After you compile samtools with "make", you will find "maq2sam-short" and "maq2sam-long" in the "misc/" directory. There is also a script "export2sam.pl" that converts Illumina's export to SAM. I have not thoroughly tested this script on all export files, though.
lh3 is offline   Reply With Quote
Old 01-05-2009, 04:29 PM   #6
corthay
Member
 
Location: japan

Join Date: Oct 2008
Posts: 25
Default

I downloaded samtools-0.1.1 but could not find "wgsim" or "wgsim_eval.pl" programs which are noted in bwa-0.3.0 documentation.
How can I get these programs ?
corthay is offline   Reply With Quote
Old 01-06-2009, 01:01 AM   #7
lh3
Senior Member
 
Location: Boston

Join Date: Feb 2008
Posts: 693
Default

To corthay:

You are quick. I am planning a new bwa release as I realized that I could improve it a little without much work (PS: the new version is released now). Wgsim, wgsim_eval.pl and converters for soap and bowtie are available from SVN only:

svn co https://samtools.svn.sourceforge.net...s/dev/samtools samtools

Last edited by lh3; 01-06-2009 at 07:34 AM.
lh3 is offline   Reply With Quote
Old 01-20-2009, 10:57 PM   #8
myrna
Member
 
Location: Vancouver, Canada

Join Date: Feb 2008
Posts: 44
Default indelpe vs samtools indels

Hi Heng Li.
Could you comment on how the indel detection works in SAM pileups vs MAQ indelpe? I am seeing many more indels in my SAM pileup generated from a MAQ alignment (as compared to the output from indelpe). Is there a good filtering strategy for these?

Thanks,

Ryan
myrna is offline   Reply With Quote
Old 01-21-2009, 01:13 AM   #9
lh3
Senior Member
 
Location: Boston

Join Date: Feb 2008
Posts: 693
Default

I am planning to release samtools-0.1.2 which fixed some bugs in the old version and added new features. For now you can check out source codes from SVN. It should be quite close to 0.1.2.

The new version comes with a Bayesian indel caller, although it is just a prototype at present. The strength of the samtools' caller is that it makes use of reads mapped without indel. Using this information helps to reduce false negatives. In addition, the new caller gives genotype rather than just saying there is an indel. You cannot easily tell from maq's indelpe if the indel is a heterozygote or a homozygote. With the new caller, the filters could be: a) the indel score; b) two indels should not be too close to each other.
lh3 is offline   Reply With Quote
Old 01-24-2009, 06:22 PM   #10
kon104
Junior Member
 
Location: Princeton, NJ

Join Date: Dec 2008
Posts: 2
Default

What's the difference between maq2sam-short and -long?

Also, short seems to segfault on 64-bit versions of Red Hat and Ubuntu... Am I missing something?
kon104 is offline   Reply With Quote
Old 01-26-2009, 04:47 PM   #11
lh3
Senior Member
 
Location: Boston

Join Date: Feb 2008
Posts: 693
Default

maq2sam-short is for the .map files generated by maq-0.6.x, while maq2sam-long for files generated by maq-0.7.x. Sorry for the confusion, and one of the aims of SAM is to avoid such confusions in future.
lh3 is offline   Reply With Quote
Old 03-05-2009, 05:46 PM   #12
webbrewer
Junior Member
 
Location: Eugene, OR

Join Date: Aug 2008
Posts: 8
Default samtools index seg fault

I am using the most current version of samtools from svn.
I successfully ran the "samtools import" command on my .sam file from bwa.
When I then run "samtools index" on the .bam file, it seg faults.
Let me know if you need more information to determine what is causing this.

Last edited by webbrewer; 03-05-2009 at 08:28 PM.
webbrewer is offline   Reply With Quote
Old 03-05-2009, 08:17 PM   #13
myrna
Member
 
Location: Vancouver, Canada

Join Date: Feb 2008
Posts: 44
Default samtools import

samtools import is for making a .bam file from a .sam file. Why are you attempting to run this command on a .bam file?
myrna is offline   Reply With Quote
Old 03-05-2009, 08:27 PM   #14
webbrewer
Junior Member
 
Location: Eugene, OR

Join Date: Aug 2008
Posts: 8
Default

Quote:
Originally Posted by myrna View Post
samtools import is for making a .bam file from a .sam file. Why are you attempting to run this command on a .bam file?
Oops. I meant to say that "samtools index" seg faults.
webbrewer is offline   Reply With Quote
Old 03-05-2009, 08:34 PM   #15
myrna
Member
 
Location: Vancouver, Canada

Join Date: Feb 2008
Posts: 44
Default samtools index

Have you tried samtools view foo.bam?

If you get the sam alignments back, then all should be well. I believe you get a warning if the .bam file is unsorted, but perhaps you should try this if you haven't already:

samtools sort foo.bam bar.sort
myrna is offline   Reply With Quote
Old 03-06-2009, 10:56 AM   #16
lh3
Senior Member
 
Location: Boston

Join Date: Feb 2008
Posts: 693
Default

Have you sorted the alignment? Indexing only works for sorted alignment. Also remember to use the latest bwa. The old version may generate some funny alignments, though this happens very rarely.
lh3 is offline   Reply With Quote
Old 03-06-2009, 11:50 AM   #17
webbrewer
Junior Member
 
Location: Eugene, OR

Join Date: Aug 2008
Posts: 8
Default

Quote:
Originally Posted by lh3 View Post
Have you sorted the alignment? Indexing only works for sorted alignment. Also remember to use the latest bwa. The old version may generate some funny alignments, though this happens very rarely.
I hadn't sorted it before. Now I ran "samtools sort", then "samtools index" on the sorted output. It resulted the same with seg fault.
I am using bwa version 0.4.5. Is there a newer svn version?
samtools index works without issue on converted MAQ alignments.
webbrewer is offline   Reply With Quote
Old 03-10-2009, 03:12 PM   #18
lparsons
Member
 
Location: NJ

Join Date: Nov 2008
Posts: 28
Default

I imported an ELAND alignment and was able to convert into SAM, then to BAM, then sort it. However, at the indexing step I too ran into a segmentation fault. I'm using the 0.1.2 version from the download page, not from SVN. Any suggestions?
lparsons is offline   Reply With Quote
Old 03-10-2009, 03:20 PM   #19
lh3
Senior Member
 
Location: Boston

Join Date: Feb 2008
Posts: 693
Default

Have you sorted the alignment first? Indexing in 0.1.2 has a bug, but should not cause segfault. Thanks.
lh3 is offline   Reply With Quote
Old 03-10-2009, 03:25 PM   #20
lparsons
Member
 
Location: NJ

Join Date: Nov 2008
Posts: 28
Default

Yes, I sorted it just fine. In fact the indexing step will complain that the file isn't sorted.

One issue could be that I just realized that the ref_list file I gave during the import didn't have the reference size in it. I assume this means the length of the reference sequence? I'll have to give that a try when I first import the file (convert to bam).
lparsons is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 05:49 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2017, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO