Hi,
I am new to the forum and just starting out with my first forays into next-gen pipelines. We are currently using a proprietary platform for alignment/variant calling, so I've been tasked with finding a suitable open-source replacement, if possible.
I have started off with bwa and am looking to use GATK for the variant calling part. My workflow so far looks like this:
- Illumina MiSEQ paired reads (~600k reads) - cutadapt to remove adapter sequences
- Reference - sequences downloaded from NCBI, indexed using bwa
- Alignment of reads to reference - using bwa mem
- Conversion of SAM to BAM using samtools
Now here's where I'm coming a bit unstuck. Before I can do anything in GATK it seems I need to sort the bam file to match the contig order in the reference file. GATK recommends ReorderSam for this. Here's my command line (having first created the sequence dictionary successfully):
I get the following error:
So... I dutifully go back to my FASTQ to look for [read name], and it's not zero length! It *is* very short though (8 base pairs...). Before the program stops I get a partial output file - if I do:
I get:
...so it's sort of right?! I think this mega-short read is tripping it up but I'm not sure where to go from here. I can probably discard reads less than a certain length but preferably would just ignore them if possible. If anyone has seen similar and can shed light I'd be very grateful indeed. Apologies for the insanely long post and I hope it makes sense.
TIA.
I am new to the forum and just starting out with my first forays into next-gen pipelines. We are currently using a proprietary platform for alignment/variant calling, so I've been tasked with finding a suitable open-source replacement, if possible.
I have started off with bwa and am looking to use GATK for the variant calling part. My workflow so far looks like this:
- Illumina MiSEQ paired reads (~600k reads) - cutadapt to remove adapter sequences
- Reference - sequences downloaded from NCBI, indexed using bwa
- Alignment of reads to reference - using bwa mem
- Conversion of SAM to BAM using samtools
Now here's where I'm coming a bit unstuck. Before I can do anything in GATK it seems I need to sort the bam file to match the contig order in the reference file. GATK recommends ReorderSam for this. Here's my command line (having first created the sequence dictionary successfully):
java -jar ReorderSam.jar INPUT=alignment.bam OUTPUT=alignment_reorder.bam REFERENCE=/path/to/humanGenome/reference.fasta
Exception in thread "main" net.sf.samtools.SAMFormatException: SAM validation error: ERROR: Record 27388, Read name [read name], Zero-length read without FZ, CS or CQ tag
samtools view -H alignment_reorder.bam | head -1
[bam_header_read] EOF marker is absent. The input is probably truncated.
@HD VN:1.4 SO:coordinate
@HD VN:1.4 SO:coordinate
TIA.
Comment