Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • ddaneels
    Member
    • Mar 2012
    • 20

    changing chromosome notation in .BAM file

    Hi everyone,

    I have .bam files in which the chromosomes are notated as '1', '2', '3', ... 'X', 'Y'.
    However, for further analyses I need a .bam file in which the chromosomes are notated as 'chr1', 'chr2', 'chr3', ... 'chrX'. Does someone know a way to do this?

    I thought I might need the substitute (s) command ...
  • adaptivegenome
    Super Moderator
    • Nov 2009
    • 436

    #2
    I think it might be easiest to change the reference fasta file you are using to match the BAM?

    Comment

    • ddaneels
      Member
      • Mar 2012
      • 20

      #3
      This is of course one option ... I'm remapping my data for the moment, but since I have 5 .bam files from exome data, it takes quite some time. So I was looking for a faster option

      Comment

      • dpryan
        Devon Ryan
        • Jul 2011
        • 3478

        #4
        Look at the samtools reheader command. Presumably you would just "samtools view -H file.bam > header.sam", edit the header, and then use that with reheader.

        Comment

        • ddaneels
          Member
          • Mar 2012
          • 20

          #5
          That would be a way to change the header, but if I'm not mistaking the chromosome numbers are also in the actual read-part of the file. So how to change them there?

          Comment

          • ffinkernagel
            Senior Member
            • Oct 2009
            • 110

            #6
            You are mistaken. The reads only contain a number that tells which entry from the header to pick.

            Comment

            • dpryan
              Devon Ryan
              • Jul 2011
              • 3478

              #7
              Originally posted by ddaneels View Post
              That would be a way to change the header, but if I'm not mistaking the chromosome numbers are also in the actual read-part of the file. So how to change them there?
              As was mentioned by ffinkernagel, this is incorrect. I should note that you should avoid swapping the order of chromosomes or any other major edits. Just adding or removing "chr" won't break anything, but changing the order of things in the header or removing chromosomes could cause issues.

              Comment

              • xied75
                Senior Member
                • Feb 2012
                • 129

                #8
                Originally posted by ffinkernagel View Post
                You are mistaken. The reads only contain a number that tells which entry from the header to pick.
                In SAM spec v 1.4 document, column 3 RNAME is of String type, to hold Reference sequence NAME of the alignment.
                Last edited by xied75; 08-14-2012, 08:14 AM. Reason: My mistake.

                Comment

                • ddaneels
                  Member
                  • Mar 2012
                  • 20

                  #9
                  Thanks for the info.

                  Everything will work fine then. I was confused with the "1" in the 7th column of the read-part in the .bam file.

                  Example:

                  HWI-ST571_103:4:1302:9610:62449 99 1 604269 254 100M = 60324 152 GGAA...

                  I thought that the highlighted 1 also had to be changed to chr1.

                  Comment

                  • xied75
                    Senior Member
                    • Feb 2012
                    • 129

                    #10
                    1, This Big Huge Black 1, is column 3, not 7. It is not a number, but a string.
                    2, You don't need to change this 1 to chr1, is because programs like BWA and Samtools will read both format; but it doesn't mean this is a number ref to the header lines. Your understanding is more correct.

                    Comment

                    • maubp
                      Peter (Biopython etc)
                      • Jul 2009
                      • 1544

                      #11
                      Originally posted by ffinkernagel View Post
                      You are mistaken. The reads only contain a number that tells which entry from the header to pick.
                      Or to be more precise, in BAM the reads just store an integer to say which reference it mapped to (referencing a table at the start of the BAM file, which is separate to any embedded SAM header), but in SAM the reads store the reference sequence's name.

                      Comment

                      • maubp
                        Peter (Biopython etc)
                        • Jul 2009
                        • 1544

                        #12
                        Originally posted by dpryan View Post
                        Look at the samtools reheader command. Presumably you would just "samtools view -H file.bam > header.sam", edit the header, and then use that with reheader.
                        I don't think that would work. Using 'samtools reheader' would only edit the embedded SAM header embedded in a BAM file, it would not IIRC update the separate BAM specific header table containing the list of references (their names and references).

                        You could turn the BAM file into SAM (e.g. with samtools view -h), do the replacement (e.g. with sed), and then optionally convert back to BAM (again with samtools view). That can be done as one line by piping the output from one tool to the next.

                        Comment

                        • dpryan
                          Devon Ryan
                          • Jul 2011
                          • 3478

                          #13
                          Originally posted by maubp View Post
                          I don't think that would work. Using 'samtools reheader' would only edit the embedded SAM header embedded in a BAM file, it would not IIRC update the separate BAM specific header table containing the list of references (their names and references).

                          You could turn the BAM file into SAM (e.g. with samtools view -h), do the replacement (e.g. with sed), and then optionally convert back to BAM (again with samtools view). That can be done as one line by piping the output from one tool to the next.
                          Actually, bam_reheader runs the full bam_header_write using only the new header, so it seems it does both (I haven't bothered looking into the source of bam_header_write, I should note). I decided to run a quick test, since I can't say I've ever actually run the reheader command. For that, I took the header of a sorted alignment (written to a file called header.sam), and changed "chr1" to "chr100".
                          Code:
                          samtools view accepted_hits.bam | head -n 2
                          HWI-ST143:530:C102UACXX:5:1101:3568:162900	272	chr1	3005607	0	51M	*	0	0	CATAAATTCATTTTTTAATAGCTGAGTAGTATTCCATTGTGTAAATGTACC	*	AS:i:0	XN:i:0	XM:i:0	XO:i:0	XG:i:0	NM:i:0	MD:Z:51	YT:Z:UU	NH:i:20	CC:Z:=	CP:i:105668244	HI:i:0
                          HWI-ST143:530:C102UACXX:5:1308:5464:137163	272	chr1	3006556	0	51M	*	0	0	TTAGCTCCCTTGTCAAAGATCAGGTGACCATAGGTGTGTGGATTCATCTCT	*	AS:i:0	XN:i:0	XM:i:0	XO:i:0	XG:i:0	NM:i:0	MD:Z:51	YT:Z:UU	NH:i:20	CC:Z:chr15	CP:i:16439024	HI:i:0
                          Code:
                          samtools reheader header.sam accepted_hits.bam | samtools view - | head -n 2
                          HWI-ST143:530:C102UACXX:5:1101:3568:162900	272	chr100	3005607	0	51M	*	0	0	CATAAATTCATTTTTTAATAGCTGAGTAGTATTCCATTGTGTAAATGTACC	*	AS:i:0	XN:i:0	XM:i:0	XO:i:0	XG:i:0	NM:i:0	MD:Z:51	YT:Z:UU	NH:i:20	CC:Z:=	CP:i:105668244	HI:i:0
                          HWI-ST143:530:C102UACXX:5:1308:5464:137163	272	chr100	3006556	0	51M	*	0	0	TTAGCTCCCTTGTCAAAGATCAGGTGACCATAGGTGTGTGGATTCATCTCT	*	AS:i:0	XN:i:0	XM:i:0	XO:i:0	XG:i:0	NM:i:0	MD:Z:51	YT:Z:UU	NH:i:20	CC:Z:chr15	CP:i:16439024	HI:i:0
                          So, it seems to work.

                          Comment

                          • maubp
                            Peter (Biopython etc)
                            • Jul 2009
                            • 1544

                            #14
                            OK - I may have been worrying over nothing then.

                            Comment

                            • ddaneels
                              Member
                              • Mar 2012
                              • 20

                              #15
                              my header.sam file looks OK. "chr" has been added.

                              But when I run the samtools reheader command, nothing changes in the original .bam file...

                              Code:
                              samtools reheader header.sam sample1.bam | samtools view -H sample1.bam

                              Sample1.bam is my original file, so I was hoping that the header in sample1.bam would have changed, but it didn't.

                              I'm a programming newbie ... so maybe there's a mistake in my code?

                              Comment

                              Latest Articles

                              Collapse

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by SEQadmin2, 06-09-2026, 11:58 AM
                              0 responses
                              25 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 06-05-2026, 10:09 AM
                              0 responses
                              30 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 06-04-2026, 08:59 AM
                              0 responses
                              39 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 06-02-2026, 12:03 PM
                              0 responses
                              62 views
                              0 reactions
                              Last Post SEQadmin2  
                              Working...