This is a problem for which there are many posts already. Yet having spent two days googling my problems, and having them unresolved, I am going to ask. I have 17 samples from two subspecies that I want to eventually analyze in vcftools for some basic population genetics. From what I can tell, this requires them to be all in one file with some sort of distinguishing header for the different subspecies. So I am trying to add headers to the .sam files and then merge them such that this will be possible.
I have tried using sampe:
./bwa sampe -r "@RG\tID:3\tSM:3\tPL:illumina" reference.txt in.sai in.sai in.fastq in.fastq > header.sam
And this does add the header, but just as one line in the file @RG. Adding it using samtools merge -rh and a text file does not seem to work at all.
Using picard to merge the files seems to maintain the headers, again as one line.
java -Xmx2g -jar MergeSamFiles.jar
but when it is finalized as a vcf all distinguishing tags are gone.
Perhaps I am mistaken as to the input that vcf requires? How else is it supposed to tell between populations?
Are these headers not enough to tag the reads?
I have tried using sampe:
./bwa sampe -r "@RG\tID:3\tSM:3\tPL:illumina" reference.txt in.sai in.sai in.fastq in.fastq > header.sam
And this does add the header, but just as one line in the file @RG. Adding it using samtools merge -rh and a text file does not seem to work at all.
Using picard to merge the files seems to maintain the headers, again as one line.
java -Xmx2g -jar MergeSamFiles.jar
but when it is finalized as a vcf all distinguishing tags are gone.
Perhaps I am mistaken as to the input that vcf requires? How else is it supposed to tell between populations?
Are these headers not enough to tag the reads?
Comment