![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
sam to bam conversion error, no @SQ lines in the header, missing header? | efoss | Bioinformatics | 17 | 12-03-2015 05:28 AM |
How to add a suffix to fastq file | Wallysb01 | Bioinformatics | 9 | 08-27-2014 10:52 AM |
.SAM to .BAM with SAM file header @PG | emilyjia2000 | Bioinformatics | 13 | 06-14-2011 01:21 PM |
BWA: specifying SAM/BAM file header fields before read alignment? | nora | Bioinformatics | 3 | 12-04-2010 10:11 PM |
Add a new line after the last file recording | Wei-HD | Bioinformatics | 2 | 04-15-2010 01:13 AM |
![]() |
|
Thread Tools |
![]() |
#1 |
Member
Location: Philadelphia Join Date: Feb 2012
Posts: 19
|
![]()
I have the following problem. I am trying to analyze bam files using GATK. The bam files I have do not have a header so this causes an error with GATK. I can add the header manually but I do not know what the read groups in the alignment section of the SAM file are to include in the header.
Is there any way to extract fields from the alignment section of the SAM file? I could then extract the RGIDs from the alignments and get the unique values so I could include them in the header. Are there any other solutions to this problem that I am not thinking of? I can always go brute force and add RG lines to the header until I get no errors. |
![]() |
![]() |
![]() |
#2 |
Senior Member
Location: Boston Join Date: Nov 2009
Posts: 224
|
![]()
If your RG tags are all in the same column, just do:
Code:
cut -f# file.sam | sort | uniq If they aren't in the same column, you could do this: Code:
tr '\t' '\n' < file.sam | grep RG: | sort | uniq |
![]() |
![]() |
![]() |
#3 |
Member
Location: Philadelphia Join Date: Feb 2012
Posts: 19
|
![]()
I solved this problem on my own. I had a small number of RG tags in each BAM file. I used the ValidateSamFile tool from Picard to get the different RG tags in the alignment section. Then I manually wrote the header files and added the new header to the existing bam file using the samtools reheader tool. To make sure the header is correctly written, I validated the file again. Then I indexed the new bam file.
|
![]() |
![]() |
![]() |
#4 |
Member
Location: Maryland Join Date: Apr 2010
Posts: 31
|
![]()
I think you will want to look at picard tools (http://picard.sourceforge.net/comman...laceReadGroups) to fix this problem.
Below I've listed an example of how I've used picard to fix headers in my bam files. You'll want to look at the picard website for additional details. java -jar -Xmx10g /path/to/picard-tools-1.54/AddOrReplaceReadGroups.jar INPUT=sample.bam OUTPUT=sample_with_header.bam SORT_ORDER=coordinate RGID=sample RGLB=sample RGPL=illumina RGSM=sample RGPU=name CREATE_INDEX=true |
![]() |
![]() |
![]() |
#5 |
Member
Location: Maryland Join Date: Apr 2010
Posts: 31
|
![]()
Also, if you want to loop through many bam files which are in the same directory you could use this command:
for i in *bam do java -jar -Xmx10g /path/to/picard-tools-1.54/AddOrReplaceReadGroups.jar INPUT=$i OUTPUT=/path/to/a/different/directory/$i SORT_ORDER=coordinate RGID=$i RGLB=$i RGPL=illumina RGSM=$i RGPU=name CREATE_INDEX=true done |
![]() |
![]() |
![]() |
#6 |
Member
Location: Philadelphia Join Date: Feb 2012
Posts: 19
|
![]()
Interesting. I tried using the AddOrReplaceReadGroups function in picard and it gave me an error saying it could not find the RGID in the header. Perhaps I did not include all the arguments I needed.
The call I used was java -jar AddOrReplaceReadGroups.jar INPUT=input OUTPUT=output SORT_ORDER=coordinate RGID=X10010 RGLB=LIB-X10010 RGPL=solid RGPU=slide-X10010 RGSM=X10010 Maybe the create_index argument is needed? |
![]() |
![]() |
![]() |
#7 |
Senior Member
Location: Boston Join Date: Nov 2009
Posts: 224
|
![]()
Can you share your header and the results of the command I suggested?
|
![]() |
![]() |
![]() |
#8 | |
Member
Location: italy Join Date: Jun 2011
Posts: 48
|
![]() Quote:
this is the report of i got when i try to validate so what are the RG tags here and how can i write it manually? bilbo@ubuntu:/media/ java -jar ValidateSamFile.jar I=outone.bam O=Validate_report.txt ERROR: Read groups is empty ERROR: Record 105, Read name HWI-ST365_0157:7:1206:13997:190087#GCGGGC, Mate negative strand flag does not match read negative strand flag of mate ERROR: Record 212, Read name HWI-ST365_0157:7:2203:9846:109491#GCGGTC, Mate negative strand flag does not match read negative strand flag of mate ERROR: Record 349, Read name HWI-ST365_0157:7:1103:15462:80840#GCGGTC, Mate negative strand flag does not match read negative strand flag of mate ERROR: Record 748, Read name HWI-ST365_0157:7:2104:13313:62994#GCGGTC, Mate negative strand flag does not match read negative strand flag of mate ERROR: Record 807, Read name HWI-ST365_0157:7:2106:17824:99461#GCGGCC, Mate negative strand flag does not match read negative strand flag of mate ERROR: Record 996, Read name HWI-ST365_0157:7:1102:4610:63216#GCGGTC, Mate negative strand flag does not match read negative strand flag of mate ERROR: Record 998, Read name HWI-ST365_0157:7:2101:14161:147407#GCGGTC, Mate negative strand flag does not match read negative strand flag of mate ERROR: Record 1000, Read name HWI-ST365_0157:7:2104:11372:153164#GCGGTC, Mate negative strand flag does not match read negative strand flag of |
|
![]() |
![]() |
![]() |
#9 |
Junior Member
Location: London Join Date: Nov 2012
Posts: 6
|
![]()
HI
I hope I can get an answer to one question I have. I am sure it's a real newb question but I'm a real newb When you use Picard AddOrReplaceReadGroups I don't really understand where you get the inputs for RGID, RGLB, RGPL, RGPU & RGSM Can anyone help em with this? Thanks |
![]() |
![]() |
![]() |
#10 | |
Senior Member
Location: Boston Join Date: Nov 2009
Posts: 224
|
![]() Quote:
|
|
![]() |
![]() |
![]() |
Thread Tools | |
|
|