Seqanswers Leaderboard Ad

**dpryan** · 01-07-2014, 02:37 PM

If you already have the alignments, just use picard tools AddOrReplaceReadGroups (adding read groups should alter every alignment, so if you "samtools view sample1.bam | head" before and after adding read groups you should see a difference). We'd have to see and example of from the VCF file to see if things are going amiss there.

**rskr** · 01-07-2014, 06:40 PM

Originally posted by sasignor View Post

This is a problem for which there are many posts already. Yet having spent two days googling my problems, and having them unresolved, I am going to ask. I have 17 samples from two subspecies that I want to eventually analyze in vcftools for some basic population genetics. From what I can tell, this requires them to be all in one file with some sort of distinguishing header for the different subspecies. So I am trying to add headers to the .sam files and then merge them such that this will be possible.

I have tried using sampe:

./bwa sampe -r "@RG\tID:3\tSM:3\tPL:illumina" reference.txt in.sai in.sai in.fastq in.fastq > header.sam

And this does add the header, but just as one line in the file @RG. Adding it using samtools merge -rh and a text file does not seem to work at all.

Using picard to merge the files seems to maintain the headers, again as one line.

java -Xmx2g -jar MergeSamFiles.jar

but when it is finalized as a vcf all distinguishing tags are gone.

Perhaps I am mistaken as to the input that vcf requires? How else is it supposed to tell between populations?

Are these headers not enough to tag the reads?

I use the -h option in samtools merge to specify a separate header file, for the header file I print the header from one of the bam file then concatenate the @RG tags for all of the samples to the header.

Something like:

Samtools view -h file1.bam | grep ^@ > header.txt
Cat rg.txt header.txt > rg_header.txt
Samtools merge -h rg_header.txt out.bam file*.bam

Where rg.txt has the @RG tags.

It is a weird operation IMO, if there were a simpler way to do it, I would like to know, I can post more details if it isn't clear how to do it this way.

Topics	Statistics	Last Post
Genetic Variants and Diabetes Risk in Childhood Cancer Survivors by seqadmin Started by seqadmin, Today, 08:47 AM	0 responses 12 views 0 likes	Last Post by seqadmin Today, 08:47 AM
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 60 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 59 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 54 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM

Seqanswers Leaderboard Ad

Announcement

read groups

Comment

Comment

Latest Articles

ad_right_rmr

News