View Single Post
Old 12-05-2018, 10:30 AM   #1
Location: Mountain Grove, MO, USA

Join Date: Apr 2014
Posts: 29
Default SAM header, difference between using sam tools or awk to filter ?

I have filtered SAM files to get only the aligned reads with samtools view -F4.
my output is a file in SAM format, the header is lost, but I can still convert the file to BAM and then use picardtools "addOrReplaceReadGroups" to add a header.

this time I have a list of read names in a txt file and I used awk to find the matching rows in the SAM file. of course the header is lost, and I cannot convert this file to BAM to add the header back so I can index the file and visualize the alignments.

I can open both files in a spreadsheet, they look and act exactly the same.
is there some kind of identifier that is lost when awk writes the new file? Or will I have to use the matching fastq reads and redo the alignment?

I am trying to find regions in a virus that has matching sequences in a plant genome. So I did: align all reads to virus genome, filter out aligned reads. align those virus reads to plant genome, filter again. Now I need to see where these reads align in the virus, and since they are already contained in the first alignment file I thought I could just separate them out......
sfh838t is offline   Reply With Quote