Hi there!
Recently I obtained .bam files processed by CIDRSeqSuite. One .bam file size is 15.2 GB (it is a Whole Exome Seq data). I performed the following steps:
1. SamToFastq to convert .bam to FastQ file.
2. bwa aln and bwa sampe
3. MergeSamfiles and then sort it using samtools
4. Used GATK for realign
5. Used Picard for Fixmate and MarkDuplicate
My question is the generated .bam file size is ~6.8 GB. I checked the mapping quality using SAMTOOLS FLAGSTATE. QC passed reads for the two .bam files are identical.
Can anyone suggest me what is the reason for such a huge difference of two .bam files? For you kind information: the .bam file comprised of 4 read groups and after converting to .bam to FastQ, each FastQ file size is ~4 GB.
I am very new in analyzing Whole Exome Sequence data. It is highly appreciated if anyone can help me to figure it out.
Thanks.
Recently I obtained .bam files processed by CIDRSeqSuite. One .bam file size is 15.2 GB (it is a Whole Exome Seq data). I performed the following steps:
1. SamToFastq to convert .bam to FastQ file.
2. bwa aln and bwa sampe
3. MergeSamfiles and then sort it using samtools
4. Used GATK for realign
5. Used Picard for Fixmate and MarkDuplicate
My question is the generated .bam file size is ~6.8 GB. I checked the mapping quality using SAMTOOLS FLAGSTATE. QC passed reads for the two .bam files are identical.
Can anyone suggest me what is the reason for such a huge difference of two .bam files? For you kind information: the .bam file comprised of 4 read groups and after converting to .bam to FastQ, each FastQ file size is ~4 GB.
I am very new in analyzing Whole Exome Sequence data. It is highly appreciated if anyone can help me to figure it out.
Thanks.
Comment