I am new to bioinformatics and I just starting working on two datasets in an attempt to identify allele-specific expression. I currently am working on Galaxy until I get a new workstation/server access.
So far I have 2 BAM files, which were mapped with Tophot, that I am working with to call SNPs. I used samtools to sort both files, and tried to run mpileup locally (but got a segmentation fault). When I run mpileup on Galaxy using the unsorted files, I get output files that are approximately 100GB per BAM file. When I look at the data, there seems to be no chromosome information (reads chrM).
To troubleshoot, I took a small part of one of the BAM files and ran mpileup locally and filtered the output and the data appeared to be what I wanted (ex. chr1 64179067 A A 75 75 25 23 ., II).
Is a 100GB output file normal? Are there steps that I am missing as far as filtering the BAM prior to mpileup?
Note the starting BAM files are 6GB and 5GB.
So far I have 2 BAM files, which were mapped with Tophot, that I am working with to call SNPs. I used samtools to sort both files, and tried to run mpileup locally (but got a segmentation fault). When I run mpileup on Galaxy using the unsorted files, I get output files that are approximately 100GB per BAM file. When I look at the data, there seems to be no chromosome information (reads chrM).
To troubleshoot, I took a small part of one of the BAM files and ran mpileup locally and filtered the output and the data appeared to be what I wanted (ex. chr1 64179067 A A 75 75 25 23 ., II).
Is a 100GB output file normal? Are there steps that I am missing as far as filtering the BAM prior to mpileup?
Note the starting BAM files are 6GB and 5GB.