SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
SAM(tools) and BLAST fungs Bioinformatics 6 12-04-2018 04:01 AM
installition of SAM tools package sflong RNA Sequencing 12 03-07-2016 01:02 PM
SAM tools question jsun529 Bioinformatics 1 05-31-2011 07:13 AM
Index of all tools supporting SAM? krobison Wiki Discussion 5 03-01-2011 03:04 AM
SAM tools output question sowmyai Bioinformatics 5 07-11-2010 05:38 PM

Reply
 
Thread Tools
Old 09-02-2009, 09:55 AM   #1
hemang
Junior Member
 
Location: Maryland

Join Date: Aug 2009
Posts: 3
Default SAM tools

Hi All,

I am using BWA to align ChIP-seq data. I want to use MACS for identifying transcript factor binding sites. However, how to convert BWA output (.SAM) format to MACS input (.BED) formal?

Thanks for your help,

Hemang
hemang is offline   Reply With Quote
Old 09-02-2009, 06:16 PM   #2
quinlana
Senior Member
 
Location: Charlottesville

Join Date: Sep 2008
Posts: 119
Default

Hi,
SAM format contains all you need to make a BED entry for each aligned read. Below is a Python script that works for me. You made need to customize depending on your needs. Specifically, you might want to stuff more info into the "name" and "score" fields. Here I made the following decisions:

a. (line 40) Only create an entry for reads that were aligned (logical methinks).
b. (lines 50 and 51) Make the BED score field (#5) be the edit distance of the alignment.
c. the BED lines conform to UCSC 0-based, half-open rules
d. I chose to ignore if these are paired-end reads. each end is treated separately. if you want to change this, you'd need to make more decisions based on the SAM flag.

If you copy the code into a text editor, save it, then "chmod +x" it, you should be able to do the following to create a BED file from your data.

> sam2bed.py -s file.sam > file.bed

where file.sam is your SAM input file.

I hope this helps.
Best, Aaron

Code:
#!/usr/bin/env python
# encoding: utf-8
"""
sam2bed.py

Created by Aaron Quinlan on 2009-08-27.
Copyright (c) 2009 Aaron R. Quinlan. All rights reserved.
"""

import sys
import getopt
import re

help_message = '''
	USAGE: sam2bed -s <sam>
'''


class Usage(Exception):
	def __init__(self, msg):
		self.msg = msg


def processSAM(file):
	"""
		Load a SAM file and convert each line to BED format.
	"""		
	f = open(file,'r')
	for line in f.readlines():
		samLine = splitLine(line.strip())
		makeBED(samLine)
	f.close()	
	
					
def makeBED(samFields):
	
	samFlag = int(samFields[1])
	
	# Only create a BED entry if the read was aligned
	if (not (samFlag & 0x0004)):
		
		chrom = samFields[2]
		start = str(int(samFields[3])-1)
		end = str(int(samFields[3]) + len(samFields[9]) - 1)
		name = samFields[0]	
		strand = getStrand(samFlag)

		# Let's use the edit distance as the BED score.
		# Clearly alternatives exist.
		editPattern = re.compile('NM\:i\:(\d+)')
		editDistance = editPattern.findall(samFields[12])

		# Write out the BED entry
		print chrom + "\t" + start + "\t" + end + "\t" + name + "\t" + editDistance[0] + "\t" + strand
		
		
def splitLine(line, delim="\t"):
	splitline = line.split(delim)
	return splitline		


def getStrand(samFlag):
	strand = "+"
	if (samFlag & (0x10)):	# minus strand if true.
		strand = "-"		
	return strand
	
		
def main(argv=None):
	if argv is None:
		argv = sys.argv
	try:
		try:
			opts, args = getopt.getopt(argv[1:], "hs:", ["help", "sam"])
		except getopt.error, msg:
			raise Usage(help_message)
	
		# option processing
		samFile = ""
		for option, value in opts:
			if option in ("-h", "--help"):
				raise Usage(help_message)
			if option in ("-s", "--sam"):
				samFile = value

		try:
		   f = open(samFile, 'r')
		except IOError, msg:
			raise Usage(help_message)
				
	except Usage, err:
		print >> sys.stderr, sys.argv[0].split("/")[-1] + ": " + str(err.msg)
		return 2	

	# make a BED file of the SAM file.
	processSAM(samFile)

if __name__ == "__main__":
	sys.exit(main())
quinlana is offline   Reply With Quote
Old 09-02-2009, 11:34 PM   #3
dawe
Senior Member
 
Location: 4530'25.22"N / 915'53.00"E

Join Date: Apr 2009
Posts: 258
Default

Quote:
Originally Posted by hemang View Post
Hi All,

I am using BWA to align ChIP-seq data. I want to use MACS for identifying transcript factor binding sites. However, how to convert BWA output (.SAM) format to MACS input (.BED) formal?
Hi,
I've written a patch to add SAM/BAM support to MACS. I've submitted to MACS developers, hopefully it will be included in the next release. If you are interested in the patch I can send it to you.
dawe is offline   Reply With Quote
Old 09-03-2009, 11:35 AM   #4
hemang
Junior Member
 
Location: Maryland

Join Date: Aug 2009
Posts: 3
Default

Quote:
Originally Posted by dawe View Post
Hi,
I've written a patch to add SAM/BAM support to MACS. I've submitted to MACS developers, hopefully it will be included in the next release. If you are interested in the patch I can send it to you.
Thank you so much!
Can you please send me the MACS patch (hsmart06@gmail.com)? I really appreciate your help.
Hemang
hemang is offline   Reply With Quote
Old 09-04-2009, 06:13 AM   #5
hemang
Junior Member
 
Location: Maryland

Join Date: Aug 2009
Posts: 3
Default

Quote:
Originally Posted by quinlana View Post
Hi,
SAM format contains all you need to make a BED entry for each aligned read. Below is a Python script that works for me. You made need to customize depending on your needs. Specifically, you might want to stuff more info into the "name" and "score" fields. Here I made the following decisions:

a. (line 40) Only create an entry for reads that were aligned (logical methinks).
b. (lines 50 and 51) Make the BED score field (#5) be the edit distance of the alignment.
c. the BED lines conform to UCSC 0-based, half-open rules
d. I chose to ignore if these are paired-end reads. each end is treated separately. if you want to change this, you'd need to make more decisions based on the SAM flag.

If you copy the code into a text editor, save it, then "chmod +x" it, you should be able to do the following to create a BED file from your data.

> sam2bed.py -s file.sam > file.bed

where file.sam is your SAM input file.

I hope this helps.
Best, Aaron

Code:
#!/usr/bin/env python
# encoding: utf-8
"""
sam2bed.py

Created by Aaron Quinlan on 2009-08-27.
Copyright (c) 2009 Aaron R. Quinlan. All rights reserved.
"""

import sys
import getopt
import re

help_message = '''
	USAGE: sam2bed -s <sam>
'''


class Usage(Exception):
	def __init__(self, msg):
		self.msg = msg


def processSAM(file):
	"""
		Load a SAM file and convert each line to BED format.
	"""		
	f = open(file,'r')
	for line in f.readlines():
		samLine = splitLine(line.strip())
		makeBED(samLine)
	f.close()	
	
					
def makeBED(samFields):
	
	samFlag = int(samFields[1])
	
	# Only create a BED entry if the read was aligned
	if (not (samFlag & 0x0004)):
		
		chrom = samFields[2]
		start = str(int(samFields[3])-1)
		end = str(int(samFields[3]) + len(samFields[9]) - 1)
		name = samFields[0]	
		strand = getStrand(samFlag)

		# Let's use the edit distance as the BED score.
		# Clearly alternatives exist.
		editPattern = re.compile('NM\:i\:(\d+)')
		editDistance = editPattern.findall(samFields[12])

		# Write out the BED entry
		print chrom + "\t" + start + "\t" + end + "\t" + name + "\t" + editDistance[0] + "\t" + strand
		
		
def splitLine(line, delim="\t"):
	splitline = line.split(delim)
	return splitline		


def getStrand(samFlag):
	strand = "+"
	if (samFlag & (0x10)):	# minus strand if true.
		strand = "-"		
	return strand
	
		
def main(argv=None):
	if argv is None:
		argv = sys.argv
	try:
		try:
			opts, args = getopt.getopt(argv[1:], "hs:", ["help", "sam"])
		except getopt.error, msg:
			raise Usage(help_message)
	
		# option processing
		samFile = ""
		for option, value in opts:
			if option in ("-h", "--help"):
				raise Usage(help_message)
			if option in ("-s", "--sam"):
				samFile = value

		try:
		   f = open(samFile, 'r')
		except IOError, msg:
			raise Usage(help_message)
				
	except Usage, err:
		print >> sys.stderr, sys.argv[0].split("/")[-1] + ": " + str(err.msg)
		return 2	

	# make a BED file of the SAM file.
	processSAM(samFile)

if __name__ == "__main__":
	sys.exit(main())
Dear Aaron,

Thank you so much for the python script. It worked fine with MACS.

Best regards,

Hemang
hemang is offline   Reply With Quote
Old 09-09-2009, 04:53 PM   #6
quinlana
Senior Member
 
Location: Charlottesville

Join Date: Sep 2008
Posts: 119
Default

No problem. Others asked for the changes below, so I posted an improved version here:

http://people.virginia.edu/~arq5x/utilities.html

This version allows one to optionally create BED entries for just concordant paired-end reads or just discordant paired-end reads. I imagine that you're dealing with single-end reads...

The BED output is written to standard out for use with other UNIX utilities (e.g. sort, uniq, awk, etc.) as well as BEDTools.

Code:
samToBed -s <sam> -t <alignment type>
	
OPTIONS:
	-s	The SAM file to be converted to BED
	-t	What types of alignments should be reported?
			"all"	all aligned reads will be reported (Default)
			"con"	only concordant pairs will be reported
			"dis"	only discordant pairs will be reported
Take care,
Aaron
quinlana is offline   Reply With Quote
Old 02-02-2010, 12:05 PM   #7
sorrychen
Junior Member
 
Location: LA

Join Date: Aug 2009
Posts: 9
Default

Excuse me, I am looking for a tool to covert sam format to bed pileup format. It is a format with four fields in each row
(1)Refname (2)Location Start (3)LocationEnd (4)Coverage
like http://dw408k-1.cmb.usc.edu:8080/chr...445_884542.bed
while each row shows the coverage of a segment.
Does anyone know tools that can get the format from sam or sam pileup?
sorrychen is offline   Reply With Quote
Old 02-02-2010, 12:11 PM   #8
quinlana
Senior Member
 
Location: Charlottesville

Join Date: Sep 2008
Posts: 119
Default

BEDTools has a tool named coverageBed that will compute the coverage of one bed file "A" (e.g. aligned reads) versus another bed file "B" (e.g. the widowns for which you want to create coverage. If your reads are in BAM format, you can use another utility called bamToBed to do this as follows:

Code:
$ bamToBed -abam reads.bam | coverageBed -a stdin -b windows.bed > windows.cov.bed
The first four columns of windows.cov.bed will be in the BEDGRAPH format you requested.

Aaron
quinlana is offline   Reply With Quote
Old 02-02-2010, 10:22 PM   #9
dawe
Senior Member
 
Location: 4530'25.22"N / 915'53.00"E

Join Date: Apr 2009
Posts: 258
Default

About BEDtools and coverage: do you think it would be feasible (and correct) to pipeline bamToBed, slopeBed (to extend reads to fragment size) and mergeBed (to merge and count) to produce a wig like data format? Do you think it would be better to use coverageBed with a window file which spans all the genome in 10 nt intervals?

thanks

d
dawe is offline   Reply With Quote
Old 02-05-2010, 10:32 AM   #10
genbio64
Member
 
Location: New York

Join Date: Dec 2009
Posts: 42
Default

When I attempt to run the samToBed.py conversion script I keep receiving this error:
File "samtoBed.py", line 138, in <module>
sys.exit (main())
File "samtoBed.py", line 135, in main
processSAM(samFile, aType)
File "samtoBed.py", line 47, in processSam
makeBed (samLine, alignType)
File "samtoBed.py", line 53, in makeBed
samFlag = int(samFields[1])
ValueError: invalid literal for int() with base 10: 'VN:1.0'

I believe it has something to do with the format of my .SAm file which I created using Bowtie and the -S flag. Here is a portion of that file:

@SQ SN:gi|224589823|ref|NC_000024.9| LN:59373566
@SQ SN:gi|17981852|ref|NC_001807.4| LN:16571
@PG ID:Bowtie VN:0.12.1 CL:"./bowtie -S h_sapiens_37_asm test.fq test.sam"
Noname 16 gi|224589811|ref|NC_000002.11| 20633060 255 36M * 0 0 CTCTCTTGACTCCTTCCTGGCCTTCTGGACNAGGTG ###############################;@BBA XA:i:1 MD:Z:30C5 NM:i:1
Noname 16 gi|224589821|ref|NC_000009.11| 92065173 255 36M * 0 0 AAGGGCAAGACCAAGGATGCAGGAGAGAGANCCACT 887676878676688888888888866888#=?75B XA:i:1 MD:Z:30A5 NM:i:1
Noname 0 gi|224589808|ref|NC_000017.10| 2656051 255 36M * 0 0 GATTACAGGTGTGAGCCCCTGTGCCTGGCCTTTTTT @BB@>;5566:7:745:66:445361,,.5588778 XA:i:0 MD:Z:36 NM:i:0
Noname 0 gi|224589820|ref|NC_000008.10| 23005941 255 36M * 0 0 CATTCTGTACCTGATTTACAAACTGTACATAGCAGG BBB@B741<96::737:9:9:9999;82/5985523 XA:i:0 MD:Z:36 NM:i:0
Noname 0 gi|224589801|ref|NC_000010.10| 1060158 255 36M * 0 0 AAGCTAATATTTCTTTTTATCCTTCTCAGGTTCAGA ?@@@:729;4779282+2=>:7799722210855## XA:i:0 MD:Z:36 NM:i:0
Noname 0 gi|224589806|ref|NC_000015.9| 28699955 255 36M * 0 0 GCTGAAGTCCGGCCGGTTCGCGCCCCGGCGGCCGAC ;################################### XA:i:2 MD:Z:18G1G13C0T0 NM:i:4
Noname 0 gi|224589817|ref|NC_000005.9| 156279071 255 36M * 0 0 TTAGGAACTCTTTCTGTCACTCTTCTGTCATTTGCT B@<;<525798<7369699:6975998(88699872 XA:i:0 MD:Z:36 NM:i:0

If anyone has any insight it would be greatly appreciated.

Cheers,
Genbio64
genbio64 is offline   Reply With Quote
Old 03-31-2010, 06:12 AM   #11
strob
Member
 
Location: Belgium

Join Date: Nov 2008
Posts: 79
Default

I receive the same error message when running SAMtoBED, downloaded from this site : http://people.virginia.edu/~arq5x/utilities.html

I also created this SAM file from an Bowtie run (-S) (version Bowtie 0.12.3)

Traceback (most recent call last):
File "/tools/bioinfo/app/galaxy/prod/tools/bioim/samToBed.py", line 138, in ?
sys.exit(main())
File "/tools/bioinfo/app/galaxy/prod/tools/bioim/samToBed.py", line 135, in main
processSAM(samFile, aType)
File "/tools/bioinfo/app/galaxy/prod/tools/bioim/samToBed.py", line 47, in processSAM
makeBED(samLine, alignType)
File "/tools/bioinfo/app/galaxy/prod/tools/bioim/samToBed.py", line 53, in makeBED
samFlag = int(samFields[1])
ValueError: invalid literal for int(): VN:1.0


Can somebody help?
strob is offline   Reply With Quote
Old 03-31-2010, 07:17 AM   #12
quinlana
Senior Member
 
Location: Charlottesville

Join Date: Sep 2008
Posts: 119
Default

That script (written by me) is incorrect. I've taken the link down to prevent future confusion. You could use the bamToBed utility in BEDTools or use the Vancouver Short Read Package for this. Alternatively, you could use the example C program on the SAMTools wiki.

Apologies for the confusion---I should have put that link to death months ago.
Aaron
quinlana is offline   Reply With Quote
Old 11-29-2010, 06:01 AM   #13
maximilianh
Member
 
Location: UK

Join Date: Oct 2009
Posts: 15
Default

You can do this by combining samtools and bedtools:

samtools view <SAMFILENAME.sam> -Sb | bamToBed -i stdin > BEFILENAME.bed
maximilianh is offline   Reply With Quote
Old 11-15-2011, 06:57 AM   #14
yaten2020
Junior Member
 
Location: INDIA

Join Date: Aug 2011
Posts: 7
Default

bamToBED gives all reads with "chr-start-end-read-ID-score" fields from a BAM file . HOMER HOMER can calculate tag density per base in whole genome in a bedGraph file.
yaten2020 is offline   Reply With Quote
Old 11-17-2011, 04:28 AM   #15
pd
Member
 
Location: Leuven, Belgium

Join Date: Jan 2010
Posts: 17
Default conert only a part of sam file to bam?

HI

Is there any option by which i can convert only apart of sam file to bam because sam file is having some problem with CIGAR length.
pd is offline   Reply With Quote
Old 01-10-2012, 01:39 AM   #16
merilius
Junior Member
 
Location: Switzerland

Join Date: Jan 2009
Posts: 9
Default

Quote:
Originally Posted by strob View Post
I receive the same error message when running SAMtoBED, downloaded from this site : http://people.virginia.edu/~arq5x/utilities.html
Looks like the link is dead:
404. Not Found.
merilius is offline   Reply With Quote
Old 01-10-2012, 05:52 AM   #17
gringer
David Eccles (gringer)
 
Location: Wellington, New Zealand

Join Date: May 2011
Posts: 836
Default

you can try BAMtoBED from the bedtools package:

http://code.google.com/p/bedtools/
gringer is offline   Reply With Quote
Old 01-10-2012, 06:27 AM   #18
merilius
Junior Member
 
Location: Switzerland

Join Date: Jan 2009
Posts: 9
Default

Quote:
Originally Posted by gringer View Post
you can try BAMtoBED from the bedtools package:

http://code.google.com/p/bedtools/
Thanks. I already realized that and it works fine for me.
merilius is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 04:27 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO