Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • hemang
    Junior Member
    • Aug 2009
    • 3

    SAM tools

    Hi All,

    I am using BWA to align ChIP-seq data. I want to use MACS for identifying transcript factor binding sites. However, how to convert BWA output (.SAM) format to MACS input (.BED) formal?

    Thanks for your help,

    Hemang
  • quinlana
    Senior Member
    • Sep 2008
    • 119

    #2
    Hi,
    SAM format contains all you need to make a BED entry for each aligned read. Below is a Python script that works for me. You made need to customize depending on your needs. Specifically, you might want to stuff more info into the "name" and "score" fields. Here I made the following decisions:

    a. (line 40) Only create an entry for reads that were aligned (logical methinks).
    b. (lines 50 and 51) Make the BED score field (#5) be the edit distance of the alignment.
    c. the BED lines conform to UCSC 0-based, half-open rules
    d. I chose to ignore if these are paired-end reads. each end is treated separately. if you want to change this, you'd need to make more decisions based on the SAM flag.

    If you copy the code into a text editor, save it, then "chmod +x" it, you should be able to do the following to create a BED file from your data.

    > sam2bed.py -s file.sam > file.bed

    where file.sam is your SAM input file.

    I hope this helps.
    Best, Aaron

    Code:
    #!/usr/bin/env python
    # encoding: utf-8
    """
    sam2bed.py
    
    Created by Aaron Quinlan on 2009-08-27.
    Copyright (c) 2009 Aaron R. Quinlan. All rights reserved.
    """
    
    import sys
    import getopt
    import re
    
    help_message = '''
    	USAGE: sam2bed -s <sam>
    '''
    
    
    class Usage(Exception):
    	def __init__(self, msg):
    		self.msg = msg
    
    
    def processSAM(file):
    	"""
    		Load a SAM file and convert each line to BED format.
    	"""		
    	f = open(file,'r')
    	for line in f.readlines():
    		samLine = splitLine(line.strip())
    		makeBED(samLine)
    	f.close()	
    	
    					
    def makeBED(samFields):
    	
    	samFlag = int(samFields[1])
    	
    	# Only create a BED entry if the read was aligned
    	if (not (samFlag & 0x0004)):
    		
    		chrom = samFields[2]
    		start = str(int(samFields[3])-1)
    		end = str(int(samFields[3]) + len(samFields[9]) - 1)
    		name = samFields[0]	
    		strand = getStrand(samFlag)
    
    		# Let's use the edit distance as the BED score.
    		# Clearly alternatives exist.
    		editPattern = re.compile('NM\:i\:(\d+)')
    		editDistance = editPattern.findall(samFields[12])
    
    		# Write out the BED entry
    		print chrom + "\t" + start + "\t" + end + "\t" + name + "\t" + editDistance[0] + "\t" + strand
    		
    		
    def splitLine(line, delim="\t"):
    	splitline = line.split(delim)
    	return splitline		
    
    
    def getStrand(samFlag):
    	strand = "+"
    	if (samFlag & (0x10)):	# minus strand if true.
    		strand = "-"		
    	return strand
    	
    		
    def main(argv=None):
    	if argv is None:
    		argv = sys.argv
    	try:
    		try:
    			opts, args = getopt.getopt(argv[1:], "hs:", ["help", "sam"])
    		except getopt.error, msg:
    			raise Usage(help_message)
    	
    		# option processing
    		samFile = ""
    		for option, value in opts:
    			if option in ("-h", "--help"):
    				raise Usage(help_message)
    			if option in ("-s", "--sam"):
    				samFile = value
    
    		try:
    		   f = open(samFile, 'r')
    		except IOError, msg:
    			raise Usage(help_message)
    				
    	except Usage, err:
    		print >> sys.stderr, sys.argv[0].split("/")[-1] + ": " + str(err.msg)
    		return 2	
    
    	# make a BED file of the SAM file.
    	processSAM(samFile)
    
    if __name__ == "__main__":
    	sys.exit(main())

    Comment

    • dawe
      Senior Member
      • Apr 2009
      • 258

      #3
      Originally posted by hemang View Post
      Hi All,

      I am using BWA to align ChIP-seq data. I want to use MACS for identifying transcript factor binding sites. However, how to convert BWA output (.SAM) format to MACS input (.BED) formal?
      Hi,
      I've written a patch to add SAM/BAM support to MACS. I've submitted to MACS developers, hopefully it will be included in the next release. If you are interested in the patch I can send it to you.

      Comment

      • hemang
        Junior Member
        • Aug 2009
        • 3

        #4
        Originally posted by dawe View Post
        Hi,
        I've written a patch to add SAM/BAM support to MACS. I've submitted to MACS developers, hopefully it will be included in the next release. If you are interested in the patch I can send it to you.
        Thank you so much!
        Can you please send me the MACS patch ([email protected])? I really appreciate your help.
        Hemang

        Comment

        • hemang
          Junior Member
          • Aug 2009
          • 3

          #5
          Originally posted by quinlana View Post
          Hi,
          SAM format contains all you need to make a BED entry for each aligned read. Below is a Python script that works for me. You made need to customize depending on your needs. Specifically, you might want to stuff more info into the "name" and "score" fields. Here I made the following decisions:

          a. (line 40) Only create an entry for reads that were aligned (logical methinks).
          b. (lines 50 and 51) Make the BED score field (#5) be the edit distance of the alignment.
          c. the BED lines conform to UCSC 0-based, half-open rules
          d. I chose to ignore if these are paired-end reads. each end is treated separately. if you want to change this, you'd need to make more decisions based on the SAM flag.

          If you copy the code into a text editor, save it, then "chmod +x" it, you should be able to do the following to create a BED file from your data.

          > sam2bed.py -s file.sam > file.bed

          where file.sam is your SAM input file.

          I hope this helps.
          Best, Aaron

          Code:
          #!/usr/bin/env python
          # encoding: utf-8
          """
          sam2bed.py
          
          Created by Aaron Quinlan on 2009-08-27.
          Copyright (c) 2009 Aaron R. Quinlan. All rights reserved.
          """
          
          import sys
          import getopt
          import re
          
          help_message = '''
          	USAGE: sam2bed -s <sam>
          '''
          
          
          class Usage(Exception):
          	def __init__(self, msg):
          		self.msg = msg
          
          
          def processSAM(file):
          	"""
          		Load a SAM file and convert each line to BED format.
          	"""		
          	f = open(file,'r')
          	for line in f.readlines():
          		samLine = splitLine(line.strip())
          		makeBED(samLine)
          	f.close()	
          	
          					
          def makeBED(samFields):
          	
          	samFlag = int(samFields[1])
          	
          	# Only create a BED entry if the read was aligned
          	if (not (samFlag & 0x0004)):
          		
          		chrom = samFields[2]
          		start = str(int(samFields[3])-1)
          		end = str(int(samFields[3]) + len(samFields[9]) - 1)
          		name = samFields[0]	
          		strand = getStrand(samFlag)
          
          		# Let's use the edit distance as the BED score.
          		# Clearly alternatives exist.
          		editPattern = re.compile('NM\:i\:(\d+)')
          		editDistance = editPattern.findall(samFields[12])
          
          		# Write out the BED entry
          		print chrom + "\t" + start + "\t" + end + "\t" + name + "\t" + editDistance[0] + "\t" + strand
          		
          		
          def splitLine(line, delim="\t"):
          	splitline = line.split(delim)
          	return splitline		
          
          
          def getStrand(samFlag):
          	strand = "+"
          	if (samFlag & (0x10)):	# minus strand if true.
          		strand = "-"		
          	return strand
          	
          		
          def main(argv=None):
          	if argv is None:
          		argv = sys.argv
          	try:
          		try:
          			opts, args = getopt.getopt(argv[1:], "hs:", ["help", "sam"])
          		except getopt.error, msg:
          			raise Usage(help_message)
          	
          		# option processing
          		samFile = ""
          		for option, value in opts:
          			if option in ("-h", "--help"):
          				raise Usage(help_message)
          			if option in ("-s", "--sam"):
          				samFile = value
          
          		try:
          		   f = open(samFile, 'r')
          		except IOError, msg:
          			raise Usage(help_message)
          				
          	except Usage, err:
          		print >> sys.stderr, sys.argv[0].split("/")[-1] + ": " + str(err.msg)
          		return 2	
          
          	# make a BED file of the SAM file.
          	processSAM(samFile)
          
          if __name__ == "__main__":
          	sys.exit(main())
          Dear Aaron,

          Thank you so much for the python script. It worked fine with MACS.

          Best regards,

          Hemang

          Comment

          • quinlana
            Senior Member
            • Sep 2008
            • 119

            #6
            No problem. Others asked for the changes below, so I posted an improved version here:



            This version allows one to optionally create BED entries for just concordant paired-end reads or just discordant paired-end reads. I imagine that you're dealing with single-end reads...

            The BED output is written to standard out for use with other UNIX utilities (e.g. sort, uniq, awk, etc.) as well as BEDTools.

            Code:
            samToBed -s <sam> -t <alignment type>
            	
            OPTIONS:
            	-s	The SAM file to be converted to BED
            	-t	What types of alignments should be reported?
            			"all"	all aligned reads will be reported (Default)
            			"con"	only concordant pairs will be reported
            			"dis"	only discordant pairs will be reported
            Take care,
            Aaron

            Comment

            • sorrychen
              Junior Member
              • Aug 2009
              • 9

              #7
              Excuse me, I am looking for a tool to covert sam format to bed pileup format. It is a format with four fields in each row
              (1)Refname (2)Location Start (3)LocationEnd (4)Coverage
              like http://dw408k-1.cmb.usc.edu:8080/chr...445_884542.bed
              while each row shows the coverage of a segment.
              Does anyone know tools that can get the format from sam or sam pileup?

              Comment

              • quinlana
                Senior Member
                • Sep 2008
                • 119

                #8
                BEDTools has a tool named coverageBed that will compute the coverage of one bed file "A" (e.g. aligned reads) versus another bed file "B" (e.g. the widowns for which you want to create coverage. If your reads are in BAM format, you can use another utility called bamToBed to do this as follows:

                Code:
                $ bamToBed -abam reads.bam | coverageBed -a stdin -b windows.bed > windows.cov.bed
                The first four columns of windows.cov.bed will be in the BEDGRAPH format you requested.

                Aaron

                Comment

                • dawe
                  Senior Member
                  • Apr 2009
                  • 258

                  #9
                  About BEDtools and coverage: do you think it would be feasible (and correct) to pipeline bamToBed, slopeBed (to extend reads to fragment size) and mergeBed (to merge and count) to produce a wig like data format? Do you think it would be better to use coverageBed with a window file which spans all the genome in 10 nt intervals?

                  thanks

                  d

                  Comment

                  • genbio64
                    Member
                    • Dec 2009
                    • 42

                    #10
                    When I attempt to run the samToBed.py conversion script I keep receiving this error:
                    File "samtoBed.py", line 138, in <module>
                    sys.exit (main())
                    File "samtoBed.py", line 135, in main
                    processSAM(samFile, aType)
                    File "samtoBed.py", line 47, in processSam
                    makeBed (samLine, alignType)
                    File "samtoBed.py", line 53, in makeBed
                    samFlag = int(samFields[1])
                    ValueError: invalid literal for int() with base 10: 'VN:1.0'

                    I believe it has something to do with the format of my .SAm file which I created using Bowtie and the -S flag. Here is a portion of that file:

                    @SQ SN:gi|224589823|ref|NC_000024.9| LN:59373566
                    @SQ SN:gi|17981852|ref|NC_001807.4| LN:16571
                    @PG ID:Bowtie VN:0.12.1 CL:"./bowtie -S h_sapiens_37_asm test.fq test.sam"
                    Noname 16 gi|224589811|ref|NC_000002.11| 20633060 255 36M * 0 0 CTCTCTTGACTCCTTCCTGGCCTTCTGGACNAGGTG ###############################;@BBA XA:i:1 MD:Z:30C5 NM:i:1
                    Noname 16 gi|224589821|ref|NC_000009.11| 92065173 255 36M * 0 0 AAGGGCAAGACCAAGGATGCAGGAGAGAGANCCACT 887676878676688888888888866888#=?75B XA:i:1 MD:Z:30A5 NM:i:1
                    Noname 0 gi|224589808|ref|NC_000017.10| 2656051 255 36M * 0 0 GATTACAGGTGTGAGCCCCTGTGCCTGGCCTTTTTT @BB@>;5566:7:745:66:445361,,.5588778 XA:i:0 MD:Z:36 NM:i:0
                    Noname 0 gi|224589820|ref|NC_000008.10| 23005941 255 36M * 0 0 CATTCTGTACCTGATTTACAAACTGTACATAGCAGG BBB@B741<96::737:9:9:9999;82/5985523 XA:i:0 MD:Z:36 NM:i:0
                    Noname 0 gi|224589801|ref|NC_000010.10| 1060158 255 36M * 0 0 AAGCTAATATTTCTTTTTATCCTTCTCAGGTTCAGA ?@@@:729;4779282+2=>:7799722210855## XA:i:0 MD:Z:36 NM:i:0
                    Noname 0 gi|224589806|ref|NC_000015.9| 28699955 255 36M * 0 0 GCTGAAGTCCGGCCGGTTCGCGCCCCGGCGGCCGAC ;################################### XA:i:2 MD:Z:18G1G13C0T0 NM:i:4
                    Noname 0 gi|224589817|ref|NC_000005.9| 156279071 255 36M * 0 0 TTAGGAACTCTTTCTGTCACTCTTCTGTCATTTGCT B@<;<525798<7369699:6975998(88699872 XA:i:0 MD:Z:36 NM:i:0

                    If anyone has any insight it would be greatly appreciated.

                    Cheers,
                    Genbio64

                    Comment

                    • strob
                      Member
                      • Nov 2008
                      • 84

                      #11
                      I receive the same error message when running SAMtoBED, downloaded from this site : http://people.virginia.edu/~arq5x/utilities.html

                      I also created this SAM file from an Bowtie run (-S) (version Bowtie 0.12.3)

                      Traceback (most recent call last):
                      File "/tools/bioinfo/app/galaxy/prod/tools/bioim/samToBed.py", line 138, in ?
                      sys.exit(main())
                      File "/tools/bioinfo/app/galaxy/prod/tools/bioim/samToBed.py", line 135, in main
                      processSAM(samFile, aType)
                      File "/tools/bioinfo/app/galaxy/prod/tools/bioim/samToBed.py", line 47, in processSAM
                      makeBED(samLine, alignType)
                      File "/tools/bioinfo/app/galaxy/prod/tools/bioim/samToBed.py", line 53, in makeBED
                      samFlag = int(samFields[1])
                      ValueError: invalid literal for int(): VN:1.0


                      Can somebody help?

                      Comment

                      • quinlana
                        Senior Member
                        • Sep 2008
                        • 119

                        #12
                        That script (written by me) is incorrect. I've taken the link down to prevent future confusion. You could use the bamToBed utility in BEDTools or use the Vancouver Short Read Package for this. Alternatively, you could use the example C program on the SAMTools wiki.

                        Apologies for the confusion---I should have put that link to death months ago.
                        Aaron

                        Comment

                        • maximilianh
                          Member
                          • Oct 2009
                          • 15

                          #13
                          You can do this by combining samtools and bedtools:

                          samtools view <SAMFILENAME.sam> -Sb | bamToBed -i stdin > BEFILENAME.bed

                          Comment

                          • yaten2020
                            Junior Member
                            • Aug 2011
                            • 7

                            #14
                            bamToBED gives all reads with "chr-start-end-read-ID-score" fields from a BAM file . HOMER HOMER can calculate tag density per base in whole genome in a bedGraph file.

                            Comment

                            • pd
                              Member
                              • Jan 2010
                              • 17

                              #15
                              conert only a part of sam file to bam?

                              HI

                              Is there any option by which i can convert only apart of sam file to bam because sam file is having some problem with CIGAR length.

                              Comment

                              Latest Articles

                              Collapse

                              • SEQadmin2
                                From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
                                by SEQadmin2


                                Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


                                The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
                                ...
                                Yesterday, 10:05 AM
                              • SEQadmin2
                                Single-Cell Sequencing at an Inflection Point: Early Impacts of New Platforms and Emerging Trends
                                by SEQadmin2


                                With the launch of new single-cell sequencing platforms in 2026, the field stands at an exciting inflection point. This article surveys the most impactful advances in the field and discusses how they’re reshaping research in cancer, immunology, and beyond.


                                Introduction

                                Single-cell sequencing technologies have undergone remarkable advances over the past decade, transitioning from low-throughput experimental approaches to highly scalable platforms capable of...
                                05-22-2026, 06:42 AM
                              • SEQadmin2
                                Environmental Genomics in the Age of NGS: From Microbes to Conservation Strategies
                                by SEQadmin2

                                Studying ecosystems means dealing with complex, multi-species communities that are hard to observe at scale. This complexity, however, hides many important questions to be answered, from how biogeochemical cycles work and how climate change can affect species distribution to how conservation strategies can work best.


                                Genomics, particularly since the expansion of NGS, has transformed ecosystem ecology. By sequencing environmental DNA, we can now assess biodiversity without direct...
                                05-06-2026, 09:04 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by SEQadmin2, Yesterday, 12:03 PM
                              0 responses
                              19 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, Yesterday, 11:40 AM
                              0 responses
                              14 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 05-28-2026, 11:40 AM
                              0 responses
                              29 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 05-26-2026, 10:12 AM
                              0 responses
                              31 views
                              0 reactions
                              Last Post SEQadmin2  
                              Working...