I successfully ran the GATK realignment and recalibration steps with basic parameters as given in their wiki's and finally ran the UnifiedGenotyper. When it got completed, it gave the following output:
INFO 13:19:37,771 UnifiedGenotyper - Visited bases 3095693983
INFO 13:19:37,771 UnifiedGenotyper - Callable bases 2861341976
INFO 13:19:37,771 UnifiedGenotyper - Confidently called bases 64255022
INFO 13:19:37,771 UnifiedGenotyper - % callable bases of all loci 92.430
INFO 13:19:37,772 UnifiedGenotyper - % confidently called bases of all loci 2.076
INFO 13:19:37,772 UnifiedGenotyper - % confidently called bases of callable loci 2.246
INFO 13:19:37,772 UnifiedGenotyper - Actual calls made 50516
INFO 13:19:37,772 TraversalEngine - Total runtime 13126.08 secs, 218.77 min, 3.65 hours
INFO 13:19:37,772 TraversalEngine - 91411584 reads were filtered out during traversal out of 128292232 total (71.25%)
INFO 13:19:37,773 TraversalEngine - -> 41037655 reads (31.99% of total) failing DuplicateReadFilter
INFO 13:19:37,773 TraversalEngine - -> 1612483 reads (1.26% of total) failing BadMateFilter
INFO 13:19:37,773 TraversalEngine - -> 48761446 reads (38.01% of total) failing FailsVendorQualityCheckReadFilter
It looks like it filtered 71% of data. Any recommendations on improving the statistics? Thanks.
INFO 13:19:37,771 UnifiedGenotyper - Visited bases 3095693983
INFO 13:19:37,771 UnifiedGenotyper - Callable bases 2861341976
INFO 13:19:37,771 UnifiedGenotyper - Confidently called bases 64255022
INFO 13:19:37,771 UnifiedGenotyper - % callable bases of all loci 92.430
INFO 13:19:37,772 UnifiedGenotyper - % confidently called bases of all loci 2.076
INFO 13:19:37,772 UnifiedGenotyper - % confidently called bases of callable loci 2.246
INFO 13:19:37,772 UnifiedGenotyper - Actual calls made 50516
INFO 13:19:37,772 TraversalEngine - Total runtime 13126.08 secs, 218.77 min, 3.65 hours
INFO 13:19:37,772 TraversalEngine - 91411584 reads were filtered out during traversal out of 128292232 total (71.25%)
INFO 13:19:37,773 TraversalEngine - -> 41037655 reads (31.99% of total) failing DuplicateReadFilter
INFO 13:19:37,773 TraversalEngine - -> 1612483 reads (1.26% of total) failing BadMateFilter
INFO 13:19:37,773 TraversalEngine - -> 48761446 reads (38.01% of total) failing FailsVendorQualityCheckReadFilter
It looks like it filtered 71% of data. Any recommendations on improving the statistics? Thanks.