![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Convert merged BAM back to per lane BAM or FASTQ file | danielsbrewer | Bioinformatics | 6 | 10-03-2013 08:29 AM |
Picard's SortSam error with a merged bam file | tomato2 | Bioinformatics | 1 | 11-01-2011 10:18 PM |
GATK pileup with merged BAM files | yxl | Bioinformatics | 0 | 04-22-2011 08:07 PM |
GATK calling Merged Bam files | jayce_ocean | Bioinformatics | 3 | 03-16-2011 01:15 AM |
The best to process sample with 2 lanes of data? | foxyg | Bioinformatics | 2 | 01-06-2011 03:25 PM |
![]() |
|
Thread Tools |
![]() |
#1 |
Junior Member
Location: Auckland, NZ Join Date: Dec 2010
Posts: 6
|
![]()
I know there are a few threads on this topic but I can't seem to find the answer to my particular problem!
I am merging BAM files from 8 lanes of sequencing, they are all the same sample, I want to end up with one file with all my sample data in it to use GATK downstream, which requires the RG tags. So how to I generate the RG tags at samtools merge when all my files are the same sample? The example in the documentation : perl -e 'print "@RG\tID:ga\tSM:hs\tLB:ga\tPL:Illumina\n@RG\tID:454\tSM:hs\tLB:454\tPL:454\n"' > rg.txt samtools merge -rh rg.txt - ga.bam 454.bam | samtools rmdup - - | samtools rmdup -s - aln.bam tags alignments from each input file separately, where I have lane1_sorted.bam, lane2_sorted.bam... etc I'm sure its something quite simple, any tips would be appreciated! Thanks. |
![]() |
![]() |
![]() |
#2 |
Junior Member
Location: Auckland, NZ Join Date: Dec 2010
Posts: 6
|
![]()
Just found the answer to my own question... it was right there under my nose in the documentation of course! I will post here in case anyone else encounters the same thing
![]() A concrete example may be instructive. Support I have a trio of samples: MOM, DAD, and KID. Each has two DNA libraries prepared, one with 400 bp inserts and another with 200 bp inserts. Each of these libraries is run on two lanes of an illumina hiseq, requiring 3 x 2 x 2 = 12 lanes of data. When the data come off the sequencer, I would create 12 bam files, with the following @RG fields in the header: Dad's data: @RG ID:FLOWCELL1.LANE1 PL:illumina LB:LIB-DAD-1 SM ![]() @RG ID:FLOWCELL1.LANE2 PL:illumina LB:LIB-DAD-1 SM ![]() @RG ID:FLOWCELL1.LANE3 PL:illumina LB:LIB-DAD-2 SM ![]() @RG ID:FLOWCELL1.LANE4 PL:illumina LB:LIB-DAD-2 SM ![]() Mom's data: @RG ID:FLOWCELL1.LANE5 PL:illumina LB:LIB-MOM-1 SM:MOM PI:200 @RG ID:FLOWCELL1.LANE6 PL:illumina LB:LIB-MOM-1 SM:MOM PI:200 @RG ID:FLOWCELL1.LANE7 PL:illumina LB:LIB-MOM-2 SM:MOM PI:400 @RG ID:FLOWCELL1.LANE8 PL:illumina LB:LIB-MOM-2 SM:MOM PI:400 Kid's data: @RG ID:FLOWCELL2.LANE1 PL:illumina LB:LIB-KID-1 SM:KID PI:200 @RG ID:FLOWCELL2.LANE2 PL:illumina LB:LIB-KID-1 SM:KID PI:200 @RG ID:FLOWCELL2.LANE3 PL:illumina LB:LIB-KID-2 SM:KID PI:400 @RG ID:FLOWCELL2.LANE4 PL:illumina LB:LIB-KID-2 SM:KID PI:400 Note the hierarchical relationship between read groups (unique for each lane) to libraries (sequenced on two lanes) and samples (across four lanes, two lanes for each library) |
![]() |
![]() |
![]() |
#3 |
Junior Member
Location: Auckland, NZ Join Date: Dec 2010
Posts: 6
|
![]()
Ok I thought I had fixed the problem but its still not working!
When running a GATK tool I still get a read group error ##### ERROR MESSAGE: SAM/BAM file SAMFileReader{/586/586_merged.bam} is malformed: Read HWI-ST240R_178:5:1206:13820:25121 lacks read group information; Please associate all reads with read groups I thought I had attached RG info to all reads with the following commands: perl -e 'print "@RG\tID:586_1_sorted\tSM:586\tLB:586_1\tPL:Illumina\n@RG\tID:586_2_sorted\tSM:586\tLB:586_2\tPL:Illumina\n@RG\tID:586_3_sorted\tSM:586\tLB:586_3\tPL:Illumina\n@RG\tID:586_4_sorted\tSM:586\tLB:586_4\tPL:Illumina\n@RG\tID:586_5_sorted\tSM:586\tLB:586_5\tPL:Illumina\n@RG\tID:586_6_sorted\tSM:586\tLB:586_6\tPL:Illumina\n@RG\tID:586_7_sorted\tSM:586\tLB:586_7\tPL:Illumina\n@RG\tID:586_8_sorted\tSM:586\tLB:586_8\tPL:Illumina\n"' >rg.txt samtools merge -rh rg.txt 586_merged.bam 586_1_sorted.bam 586_2_sorted.bam 586_3_sorted.bam 586_4_sorted.bam 586_5_sorted.bam 586_6_sorted.bam 586_7_sorted.bam 586_8_sorted.bam So why I am getting this error and what can I do to get GATK tools to run on my file! |
![]() |
![]() |
![]() |
#4 |
Member
Location: California Join Date: Sep 2008
Posts: 45
|
![]()
This works great for GATK downstream analysis
http://picard.sourceforge.net/comman...laceReadGroups |
![]() |
![]() |
![]() |
#5 | |
Member
Location: italy Join Date: Jun 2011
Posts: 48
|
![]() Quote:
Hello @cowgirl please how did you solve this problem, because i have the same issue now please can you share it. i remain grateful here is the error i got hello every one please i got this error while running this command elendin@elendin-HP-Pavilion-dv6700-Notebook-PC:~/analysis of rnaseq bamfiles$ java -jar GenomeAnalysisTK.jar -R VitisVinifera.fasta -T SomaticIndelDetector -o indels.vcf -verbose indels.txt -I:normal wt.bam -I:tumor newmut.bam MESSAGE: SAM/BAM file SAMFileReader{/home/elendin/analysis of rnaseq bamfiles/newmut.bam} is malformed: Read HWI-ST132_0461:3:2201:1211:140854#GTCCTA is either missing the read group or its read group is not defined in the BAM header, both of which are required by the GATK. Please use http://www.broadinstitute.org/gsa/wi...laceReadGroups to fix this problem ERROR ------------------------------------------------------------------------------------------ how can i add the missing read group for HWI-ST132_0461:3:2201:1211:140854#GTCCTA or define it in the header. thanks thanks |
|
![]() |
![]() |
![]() |
#6 |
Member
Location: italy Join Date: Jun 2011
Posts: 48
|
![]()
hello every one please i got this error while running this command
elendin@elendin-HP-Pavilion-dv6700-Notebook-PC:~/analysis of rnaseq bamfiles$ java -jar GenomeAnalysisTK.jar -R VitisVinifera.fasta -T SomaticIndelDetector -o indels.vcf -verbose indels.txt -I:normal wt.bam -I:tumor newmut.bam MESSAGE: SAM/BAM file SAMFileReader{/home/elendin/analysis of rnaseq bamfiles/newmut.bam} is malformed: Read HWI-ST132_0461:3:2201:1211:140854#GTCCTA is either missing the read group or its read group is not defined in the BAM header, both of which are required by the GATK. Please use http://www.broadinstitute.org/gsa/wi...laceReadGroups to fix this problem ERROR ------------------------------------------------------------------------------------------ how can i add the missing read group for HWI-ST132_0461:3:2201:1211:140854#GTCCTA or define it in the header. thanks thanks |
![]() |
![]() |
![]() |
#7 |
Devon Ryan
Location: Freiburg, Germany Join Date: Jul 2011
Posts: 3,480
|
![]() |
![]() |
![]() |
![]() |
#8 |
Member
Location: italy Join Date: Jun 2011
Posts: 48
|
![]()
this is what i am doing now and i am applying stringency to "Lenient hope it works. my main aim is to merge my 3 libraries(3bam files into 1bam) together after adding the read group individually is this possible? if yes please how can i do
|
![]() |
![]() |
![]() |
#9 |
Devon Ryan
Location: Freiburg, Germany Join Date: Jul 2011
Posts: 3,480
|
![]()
Sure, that's easily possible. Replace the headers with picard and then merge them with samtools or picard (MergeSamFiles, which is a somewhat poorly named command since it also says it can handle BAM files).
|
![]() |
![]() |
![]() |
#10 |
Member
Location: italy Join Date: Jun 2011
Posts: 48
|
![]()
Hi all i was able to add the readgroups fine. but now i am running the somaticindeldetector and i am having this warning below
please what does the warning mean? here is the command line i used. elendin@elendin-HP-Pavilion-dv6700-Notebook-PC:~/analysis of rnaseq bamfiles$ java -jar GenomeAnalysisTK.jar -R V.fasta -T SomaticIndelDetector -o indelswt1vmut4.vcf -verbose indels.txt -I:normal wty1.bam -I:tumor mut4.bam WARNING: a count of 10000 reads reached in a window chr1:20955379-20955578 in tumor sample. The whole window will be dropped. WARNING: a count of 10000 reads reached in a window chr1:20955680-20955879 in tumor sample. The whole window will be dropped. WARNING: a count of 10000 reads reached in a window chr1:21076423-21076622 in tumor sample. The whole window will be dropped. WARNING: a count of 10000 reads reached in a window chr1:21077327-21077526 in tumor sample. The whole window will be dropped. WARNING: a count of 10000 reads reached in a window chr1:21077618-21077817 in tumor sample. The whole window will be dropped. WARNING: a count of 10000 reads reached in a window chr1:21077909-21078108 in tumor sample. The whole window will be dropped. WARNING: a count of 10000 reads reached in a window chr1:21078412-21078611 in tumor sample. The whole window will be dropped. INFO 12:35:18,002 TraversalEngine - chr1:21174588 9.45e+06 4.5 m 28.6 s 4.4% 103.4 m 98.9 m thanks a lot |
![]() |
![]() |
![]() |
#11 |
Devon Ryan
Location: Freiburg, Germany Join Date: Jul 2011
Posts: 3,480
|
![]()
Have a look at your alignments in that region in IGV or some other genome browser. I expect you'll find the depth goes crazy high. Think about the things that could cause that and how they might affect your Indel calls.
|
![]() |
![]() |
![]() |
#12 | |
Member
Location: italy Join Date: Jun 2011
Posts: 48
|
![]() Quote:
Ok, yes i had a look at the alignment but i am confused because i do not really understand to be honest what the effect of this warning will be on the output. please if you can shed more light or explain more because i think i am a novice on this one. thanks a lot and kindly help |
|
![]() |
![]() |
![]() |
#13 |
Senior Member
Location: NikoNarita.jp Join Date: Jul 2013
Posts: 142
|
![]()
Hi there
I got the same error, can someone explain me please. Thank you |
![]() |
![]() |
![]() |
#14 |
Devon Ryan
Location: Freiburg, Germany Join Date: Jul 2011
Posts: 3,480
|
![]() |
![]() |
![]() |
![]() |
#15 |
Senior Member
Location: NikoNarita.jp Join Date: Jul 2013
Posts: 142
|
![]()
Thank you for your kind and fast reply. I got errror:
WARNING: a count of 10000 reads reached in a window chr1:20955379-20955578 in tumor sample. The whole window will be dropped. WARNING: a count of 10000 reads reached in a window chr1:20955680-20955879 in tumor sample. The whole window will be dropped. WARNING: a count of 10000 reads reached in a window chr1:21076423-21076622 in tumor sample. The whole window will be dropped. WARNING: a count of 10000 reads reached in a window chr1:21077327-21077526 in tumor sample. The whole window will be dropped. WARNING: a count of 10000 reads reached in a window chr1:21077618-21077817 in tumor sample. The whole window will be dropped. WARNING: a count of 10000 reads reached in a window chr1:21077909-21078108 in tumor sample. The whole window will be dropped. WARNING: a count of 10000 reads reached in a window chr1:21078412-21078611 in tumor sample. The whole window will be dropped ![]() |
![]() |
![]() |
![]() |
#16 |
Devon Ryan
Location: Freiburg, Germany Join Date: Jul 2011
Posts: 3,480
|
![]()
Ah, that can be am indicator of a few things, the most likely of which is that there's just a high level of PCR duplication in those regions. You can most easily check for this by opening the affected files in IGV or another viewer and just seeing if a disproportionate number of reads (or pairs) have the same start/stop bounds.
|
![]() |
![]() |
![]() |
#17 | |
Senior Member
Location: USA Join Date: Sep 2012
Posts: 130
|
![]()
Try increasing the --maxNumberOfReads argument for SomaticIndelDetector. That may be resolve your problem.
Quote:
|
|
![]() |
![]() |
![]() |
#18 |
Senior Member
Location: NikoNarita.jp Join Date: Jul 2013
Posts: 142
|
![]()
Thank you dpryan and id0,
Oh..However, I have used picard but still PCR duplicates ? I will try use id0's recommendation. There are two more problems I have after this: 1. I have converted .vcf output of SomaticIndelDetector using convert2annovar.pl. However, I am confused with MuTect .wig.txt out to convert so that I can use it as a In-put file for Annovar ? 2. MuTect can run without control samples using only tumor . SomaticIndelDetector does not work without control sample.bam. It asks minimum 2 samples. My case samples are from cell lines, there is no control or normal. How can I use SomaticIndelDetector ? I studied few articles and googled but can not understand. Can somebody explain me please ? Thank you in Advance. |
![]() |
![]() |
![]() |
Thread Tools | |
|
|