SA tag in SAM file for chimeric reads

jazz

Member

Join Date: Nov 2008

Posts: 28
- Share
- Tweet
#1

SA tag in SAM file for chimeric reads

10-27-2016, 03:51 PM

Hi,
I am using BWA-mem for split read alignment for my single end genomic DNA-seq from Illumina. I know that BWA uses SA tag for marking chimeric reads. When I manually BLAST individual reads with the SA tag I can clearly verify that they are indeed chimeras. However, I could not find details about the SA tag itself. What information is encoded in the SA field? I am posting an example of a chimeric read that maps to two separate genomic locations within the same contig (scf7180000067989)

HWI-ST387:139:C03WJABXX:5:2108:15315:193815 16 scf7180000067989 85156 60 60M41S * 0 0 TTGAAGTCAAGAAAGTGGTAAAGAGAGATTAATAGGGGTATCTCAGCTACAACAAATATTATATTAAATTAAATGGTTAATCTTGCTTTGCTCACCATAAA * NM:i:2 MD:Z:31G1C26 AS:i:50 XS:i:0 SA:Z:scf7180000067989,85273,-,54S47M,60,1;

HWI-ST387:139:C03WJABXX:5:2108:15315:193815 272 scf7180000067989 85273 60 54H47M * 0 0 AATATTATATTAAATTAAATGGTTAATCTTGCTTTGCTCACCATAAA * NM:i:1 MD:Z:11T35 AS:i:42 XS:i:22 SA:Z:scf7180000067989,85156,-,60M41S,60,2;

I am expecting a lot of genome rearrangements in the sample, so ultimately I want to isolate these reads that map to variant locations and identify the regions of microhomology, which could help identify the breakpoint. I am new to Bioinformatics so any inputs would be great.

Thanks in advance!
Tags: None
dcameron

Member

Join Date: Mar 2013

Posts: 27
- Share
- Tweet
#2

10-30-2016, 08:21 PM

It is defined in the SAM tags specifications document which was split out from the main SAM specs at the end of last year.

Note that bwa can write split alignments in which chimeric segments overlap (eg: alignments of 60M40S, and 50S50M for the same read)

>I am expecting a lot of genome rearrangements in the sample, so ultimately I want to isolate these reads that map to variant locations and identify the regions of microhomology, which could help identify the breakpoint. I am new to Bioinformatics so any inputs would be great.

I would recommend using one of the many (50+ at my last count) structural variant callers available that are designed to identify breakpoints. In terms of bwa SA tags, LUMPY is a caller that combines the bwa split read aligments with read pair and coverage information for breakpoint calling. If you're not happy to let the caller perform it's own split read analysis, then I would recommend GRIDSS (disclaimer: my caller) due to the lower false discovery rater compared to other callers. Other callers performing decently in my benchmarking results are manta, CREST and DELLY (with Pindel quite good if your focus is sensitivity).

Last edited by dcameron; 10-30-2016, 08:33 PM.
Comment

Previous template Next

Essential Discoveries and Tools in Epitranscriptomics

by seqadmin

The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist on Modified Bases...
- Channel: Articles
Yesterday, 07:01 AM
Current Approaches to Protein Sequencing

by seqadmin

Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
- Channel: Articles
04-04-2024, 04:25 PM

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 39 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 41 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 35 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 55 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

SA tag in SAM file for chimeric reads

Comment

Latest Articles

ad_right_rmr

News