SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Convert 1000-Genomes-proje BAM to FASTA (aligned to reference, grouped by chromosome) ce.log Bioinformatics 17 01-14-2014 12:35 AM
Convert merged BAM back to per lane BAM or FASTQ file danielsbrewer Bioinformatics 6 10-03-2013 08:29 AM
Scripture genome/chromosome.fa file EBER RNA Sequencing 1 07-24-2012 10:57 AM
MiSeq pipeline, per-genome bam vs per chromosome bam & vcf swNGS Bioinformatics 1 03-29-2012 06:00 AM
Changing header of BAM Hiroki Illumina/Solexa 6 03-23-2012 01:58 AM

Reply
 
Thread Tools
Old 08-14-2012, 06:02 AM   #1
ddaneels
Member
 
Location: Belgium

Join Date: Mar 2012
Posts: 18
Default changing chromosome notation in .BAM file

Hi everyone,

I have .bam files in which the chromosomes are notated as '1', '2', '3', ... 'X', 'Y'.
However, for further analyses I need a .bam file in which the chromosomes are notated as 'chr1', 'chr2', 'chr3', ... 'chrX'. Does someone know a way to do this?

I thought I might need the substitute (s) command ...
ddaneels is offline   Reply With Quote
Old 08-14-2012, 06:05 AM   #2
adaptivegenome
Super Moderator
 
Location: US

Join Date: Nov 2009
Posts: 437
Default

I think it might be easiest to change the reference fasta file you are using to match the BAM?
adaptivegenome is offline   Reply With Quote
Old 08-14-2012, 06:15 AM   #3
ddaneels
Member
 
Location: Belgium

Join Date: Mar 2012
Posts: 18
Default

This is of course one option ... I'm remapping my data for the moment, but since I have 5 .bam files from exome data, it takes quite some time. So I was looking for a faster option
ddaneels is offline   Reply With Quote
Old 08-14-2012, 06:19 AM   #4
dpryan
Devon Ryan
 
Location: Freiburg, Germany

Join Date: Jul 2011
Posts: 3,446
Default

Look at the samtools reheader command. Presumably you would just "samtools view -H file.bam > header.sam", edit the header, and then use that with reheader.
dpryan is offline   Reply With Quote
Old 08-14-2012, 06:38 AM   #5
ddaneels
Member
 
Location: Belgium

Join Date: Mar 2012
Posts: 18
Default

That would be a way to change the header, but if I'm not mistaking the chromosome numbers are also in the actual read-part of the file. So how to change them there?
ddaneels is offline   Reply With Quote
Old 08-14-2012, 06:46 AM   #6
ffinkernagel
Senior Member
 
Location: Marburg, Germany

Join Date: Oct 2009
Posts: 110
Default

You are mistaken. The reads only contain a number that tells which entry from the header to pick.
ffinkernagel is offline   Reply With Quote
Old 08-14-2012, 07:13 AM   #7
dpryan
Devon Ryan
 
Location: Freiburg, Germany

Join Date: Jul 2011
Posts: 3,446
Default

Quote:
Originally Posted by ddaneels View Post
That would be a way to change the header, but if I'm not mistaking the chromosome numbers are also in the actual read-part of the file. So how to change them there?
As was mentioned by ffinkernagel, this is incorrect. I should note that you should avoid swapping the order of chromosomes or any other major edits. Just adding or removing "chr" won't break anything, but changing the order of things in the header or removing chromosomes could cause issues.
dpryan is offline   Reply With Quote
Old 08-14-2012, 07:41 AM   #8
xied75
Senior Member
 
Location: Oxford

Join Date: Feb 2012
Posts: 129
Default

Quote:
Originally Posted by ffinkernagel View Post
You are mistaken. The reads only contain a number that tells which entry from the header to pick.
In SAM spec v 1.4 document, column 3 RNAME is of String type, to hold Reference sequence NAME of the alignment.

Last edited by xied75; 08-14-2012 at 09:14 AM. Reason: My mistake.
xied75 is offline   Reply With Quote
Old 08-14-2012, 07:44 AM   #9
ddaneels
Member
 
Location: Belgium

Join Date: Mar 2012
Posts: 18
Default

Thanks for the info.

Everything will work fine then. I was confused with the "1" in the 7th column of the read-part in the .bam file.

Example:

HWI-ST571_103:4:1302:9610:62449 99 1 604269 254 100M = 60324 152 GGAA...

I thought that the highlighted 1 also had to be changed to chr1.
ddaneels is offline   Reply With Quote
Old 08-14-2012, 08:23 AM   #10
xied75
Senior Member
 
Location: Oxford

Join Date: Feb 2012
Posts: 129
Default

1, This Big Huge Black 1, is column 3, not 7. It is not a number, but a string.
2, You don't need to change this 1 to chr1, is because programs like BWA and Samtools will read both format; but it doesn't mean this is a number ref to the header lines. Your understanding is more correct.
xied75 is offline   Reply With Quote
Old 08-14-2012, 08:49 AM   #11
maubp
Peter (Biopython etc)
 
Location: Dundee, Scotland, UK

Join Date: Jul 2009
Posts: 1,539
Default

Quote:
Originally Posted by ffinkernagel View Post
You are mistaken. The reads only contain a number that tells which entry from the header to pick.
Or to be more precise, in BAM the reads just store an integer to say which reference it mapped to (referencing a table at the start of the BAM file, which is separate to any embedded SAM header), but in SAM the reads store the reference sequence's name.
maubp is offline   Reply With Quote
Old 08-14-2012, 08:53 AM   #12
maubp
Peter (Biopython etc)
 
Location: Dundee, Scotland, UK

Join Date: Jul 2009
Posts: 1,539
Default

Quote:
Originally Posted by dpryan View Post
Look at the samtools reheader command. Presumably you would just "samtools view -H file.bam > header.sam", edit the header, and then use that with reheader.
I don't think that would work. Using 'samtools reheader' would only edit the embedded SAM header embedded in a BAM file, it would not IIRC update the separate BAM specific header table containing the list of references (their names and references).

You could turn the BAM file into SAM (e.g. with samtools view -h), do the replacement (e.g. with sed), and then optionally convert back to BAM (again with samtools view). That can be done as one line by piping the output from one tool to the next.
maubp is offline   Reply With Quote
Old 08-14-2012, 09:42 AM   #13
dpryan
Devon Ryan
 
Location: Freiburg, Germany

Join Date: Jul 2011
Posts: 3,446
Default

Quote:
Originally Posted by maubp View Post
I don't think that would work. Using 'samtools reheader' would only edit the embedded SAM header embedded in a BAM file, it would not IIRC update the separate BAM specific header table containing the list of references (their names and references).

You could turn the BAM file into SAM (e.g. with samtools view -h), do the replacement (e.g. with sed), and then optionally convert back to BAM (again with samtools view). That can be done as one line by piping the output from one tool to the next.
Actually, bam_reheader runs the full bam_header_write using only the new header, so it seems it does both (I haven't bothered looking into the source of bam_header_write, I should note). I decided to run a quick test, since I can't say I've ever actually run the reheader command. For that, I took the header of a sorted alignment (written to a file called header.sam), and changed "chr1" to "chr100".
Code:
samtools view accepted_hits.bam | head -n 2
HWI-ST143:530:C102UACXX:5:1101:3568:162900	272	chr1	3005607	0	51M	*	0	0	CATAAATTCATTTTTTAATAGCTGAGTAGTATTCCATTGTGTAAATGTACC	*	AS:i:0	XN:i:0	XM:i:0	XO:i:0	XG:i:0	NM:i:0	MD:Z:51	YT:Z:UU	NH:i:20	CC:Z:=	CP:i:105668244	HI:i:0
HWI-ST143:530:C102UACXX:5:1308:5464:137163	272	chr1	3006556	0	51M	*	0	0	TTAGCTCCCTTGTCAAAGATCAGGTGACCATAGGTGTGTGGATTCATCTCT	*	AS:i:0	XN:i:0	XM:i:0	XO:i:0	XG:i:0	NM:i:0	MD:Z:51	YT:Z:UU	NH:i:20	CC:Z:chr15	CP:i:16439024	HI:i:0
Code:
samtools reheader header.sam accepted_hits.bam | samtools view - | head -n 2
HWI-ST143:530:C102UACXX:5:1101:3568:162900	272	chr100	3005607	0	51M	*	0	0	CATAAATTCATTTTTTAATAGCTGAGTAGTATTCCATTGTGTAAATGTACC	*	AS:i:0	XN:i:0	XM:i:0	XO:i:0	XG:i:0	NM:i:0	MD:Z:51	YT:Z:UU	NH:i:20	CC:Z:=	CP:i:105668244	HI:i:0
HWI-ST143:530:C102UACXX:5:1308:5464:137163	272	chr100	3006556	0	51M	*	0	0	TTAGCTCCCTTGTCAAAGATCAGGTGACCATAGGTGTGTGGATTCATCTCT	*	AS:i:0	XN:i:0	XM:i:0	XO:i:0	XG:i:0	NM:i:0	MD:Z:51	YT:Z:UU	NH:i:20	CC:Z:chr15	CP:i:16439024	HI:i:0
So, it seems to work.
dpryan is offline   Reply With Quote
Old 08-14-2012, 10:21 AM   #14
maubp
Peter (Biopython etc)
 
Location: Dundee, Scotland, UK

Join Date: Jul 2009
Posts: 1,539
Default

OK - I may have been worrying over nothing then.
maubp is offline   Reply With Quote
Old 08-16-2012, 02:40 AM   #15
ddaneels
Member
 
Location: Belgium

Join Date: Mar 2012
Posts: 18
Default

my header.sam file looks OK. "chr" has been added.

But when I run the samtools reheader command, nothing changes in the original .bam file...

Code:
samtools reheader header.sam sample1.bam | samtools view -H sample1.bam

Sample1.bam is my original file, so I was hoping that the header in sample1.bam would have changed, but it didn't.

I'm a programming newbie ... so maybe there's a mistake in my code?
ddaneels is offline   Reply With Quote
Old 08-16-2012, 02:44 AM   #16
dpryan
Devon Ryan
 
Location: Freiburg, Germany

Join Date: Jul 2011
Posts: 3,446
Default

Quote:
Originally Posted by ddaneels View Post
my header.sam file looks OK. "chr" has been added.

But when I run the samtools reheader command, nothing changes in the original .bam file...
The reheader command doesn't change the original file (I'm not surprised you thought it would, the name lends itself to that belief and I don't think the documentation mentions this). You'll need to output things to a file rather than to samtools view. Try something like:
Code:
samtools reheader header.sam sample1.bam > sample1.fixed.bam
dpryan is offline   Reply With Quote
Old 09-20-2013, 11:36 AM   #17
mra
Junior Member
 
Location: USA

Join Date: Feb 2012
Posts: 7
Default BAM to BAM plus "chr"

The chromosome names can be changed on the fly. This command takes an input BAM file with chr names '1, 2, 3,..,X, Y, MT' and output a BAM file with names 'chr1, chr2, chr3,..., chrX, chrY, chrM' on both read and header lines. The output BAM doesn't include any non-canonical chromosome that could be present in the input BAM (such as JG*, Un*, etc).

Code:
samtools view -h INPUT.bam | awk 'BEGIN{FS=OFS="\t"} (/^@/ && [email protected]/){print $0} $2~/^SN:[1-9]|^SN:X|^SN:Y|^SN:MT/{print $0}  $3~/^[1-9]|X|Y|MT/{$3="chr"$3; print $0} ' | sed 's/SN:/SN:chr/g' | sed 's/chrMT/chrM/g' | samtools view -bS - > OUTPUT.bam
mra is offline   Reply With Quote
Old 06-16-2014, 09:14 AM   #18
neurongs
Junior Member
 
Location: Alicante, Spain

Join Date: Mar 2012
Posts: 7
Default

thank you mra for this useful tool!
neurongs is offline   Reply With Quote
Old 01-22-2015, 02:01 PM   #19
petervangalen
Junior Member
 
Location: Boston, US

Join Date: Jan 2015
Posts: 3
Default

The following line works well for me. It will replace the names of chromosome 1 to 22, X, Y and MT (mitochondrial DNA). It will create a new file with the name test_chr.bam instead of test.bam

Code:
use Samtools
samtools view -H test.bam | sed -e 's/SN:\([0-9XY]\)/SN:chr\1/' -e 's/SN:MT/SN:chrM/' | samtools reheader - test.bam > test_chr.bam
If you want to do this for a bunch of bam files at once, use a loop. The variable $filename is your file without the extension (assuming there is no period except the one in ".bam").

Code:
for file in *.bam; do filename=`echo $file | cut -d "." -f 1`; samtools view -H $file | sed -e 's/SN:\([0-9XY]\)/SN:chr\1/' -e 's/SN:MT/SN:chrM/' | samtools reheader - $file > ${filename}_chr.bam; done

Last edited by petervangalen; 01-22-2015 at 03:03 PM.
petervangalen is offline   Reply With Quote
Old 01-06-2017, 02:29 AM   #20
a_l_25
Junior Member
 
Location: norway

Join Date: Jan 2017
Posts: 1
Default stack in the output part

how to put the filename variable in different path than the original path where bam files located. Hence , keeping the variable $filename as part of the every new bam file's name.
a_l_25 is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 06:30 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2017, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO