Hi all,
I have just completed BWA alignment on paired end reads using Galaxy. I filtered the resulting SAM file and converted to BAM.
At this point, I downloaded the BAM file to my local cluster and tried to validate the BAM file using ValidateSam. These are the errors that I am getting for all of the reads:
ERROR: Read groups is empty
ERROR: Record 1, Read name F1DQQN1:46:C1BJMACXX:4:1103:20260:28106, Mate Alignment start (10043) must be <= reference sequence length (0) on reference 1
ERROR: Record 1, Read name F1DQQN1:46:C1BJMACXX:4:1103:20260:28106, Alignment start (10002) must be <= reference sequence length (0) on reference 1
Ignoring SAM validation error: ERROR: Read name F1DQQN1:46:C1BJMACXX:4:1103:20260:28106, CIGAR M operator maps off end of reference
ERROR: Record 1, Read name F1DQQN1:46:C1BJMACXX:4:1103:20260:28106, CIGAR M operator maps off end of reference
I can easily fix the Read Groups, but can anyone tell me what the other errors are for and how to fix them?
I have successfully used this pipeline in the past without a problem. The data was generated using a HiSeq2000, 100bp PE reads.
Any help would be greatly appreciated.
I have just completed BWA alignment on paired end reads using Galaxy. I filtered the resulting SAM file and converted to BAM.
At this point, I downloaded the BAM file to my local cluster and tried to validate the BAM file using ValidateSam. These are the errors that I am getting for all of the reads:
ERROR: Read groups is empty
ERROR: Record 1, Read name F1DQQN1:46:C1BJMACXX:4:1103:20260:28106, Mate Alignment start (10043) must be <= reference sequence length (0) on reference 1
ERROR: Record 1, Read name F1DQQN1:46:C1BJMACXX:4:1103:20260:28106, Alignment start (10002) must be <= reference sequence length (0) on reference 1
Ignoring SAM validation error: ERROR: Read name F1DQQN1:46:C1BJMACXX:4:1103:20260:28106, CIGAR M operator maps off end of reference
ERROR: Record 1, Read name F1DQQN1:46:C1BJMACXX:4:1103:20260:28106, CIGAR M operator maps off end of reference
I can easily fix the Read Groups, but can anyone tell me what the other errors are for and how to fix them?
I have successfully used this pipeline in the past without a problem. The data was generated using a HiSeq2000, 100bp PE reads.
Any help would be greatly appreciated.