SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
cuffmerge crashes when converting gtf files to sam files swbiggs4 Bioinformatics 20 02-16-2017 09:19 AM
Tab delimited text files of gene counts ronaldrcutler Bioinformatics 6 06-17-2016 08:48 AM
MEGA text editor and phylogenetic tree KerrWL Bioinformatics 0 01-14-2015 04:54 PM
Preparing text files with counts for DESeq2 KHubbard Bioinformatics 2 10-12-2013 01:42 AM
[PERL] Text files manipulation/reorganisation Kawaccino Bioinformatics 4 04-07-2013 07:44 AM

Reply
 
Thread Tools
Old 03-22-2017, 08:34 AM   #1
JacquesT
Junior Member
 
Location: France

Join Date: Mar 2017
Posts: 4
Thumbs up How can I dissect .sam files in text editor??

Hello everyone!

I'm new to bioinformatics.
I have some questions about reading (eye-balling) a .sam file.

For example:

@SQ SN:HPV11REF LN:7931
@SQ SN:HPV16REF LN:7846
@SQ SN:HPV18REF LN:7857
@SQ SN:HPV31REF LN:7906
@SQ SN:HPV33REF LN:7909
@SQ SN:HPV35REF LN:7879
@SQ SN:HPV39REF LN:7833
@SQ SN:HPV45REF LN:7858
@SQ SN:HPV51REF LN:7808
@SQ SN:HPV52REF LN:7942
@SQ SN:HPV56REF LN:7845
@SQ SN:HPV58REF LN:7824
@SQ SN:HPV59REF LN:7896
@SQ SN:HPV6REF LN:7996
@SQ SN:HPV1REF LN:7816
@SQ SN:HPV2REF LN:7860
@SQ SN:HPV3REF LN:7820
@SQ SN:HPV4REF LN:7353
@SQ SN:HPV5REF LN:7746
@SQ SN:HPV7REF LN:8027
@SQ SN:HPV8REF LN:7654
@SQ SN:HPV9REF LN:7434
@SQ SN:HPV10REF LN:7919
@SQ SN:HPV34REF LN:7723
@SQ SN:HPV40REF LN:7909
@SQ SN:HPV42REF LN:7917
@SQ SN:HPV43REF LN:7975
@SQ SN:HPV44REF LN:7833
@SQ SN:HPV53REF LN:7859
@SQ SN:HPV54REF LN:7759
@SQ SN:HPV61REF LN:7989
@SQ SN:HPV68REF LN:7822
@SQ SN:HPV69REF LN:7700
@SQ SN:HPV70REF LN:7905
@SQ SN:HPV72REF LN:7989
@SQ SN:HPV73REF LN:7700
@SQ SN:HPV80REF LN:7427


{BWA instruction}


MSQ-M1307R:269:000000000-D24BN:1:1101:15163:1383 (QNAME)
99 (FLAG)
HPV56REF (RNAME)
6262 (Position of the leftmost base)
60 (Mapping quality, Phred)
151M (CIGAR)
= (Mate Reference sequence NaMe (`=' if same as RNAME) )
6268 (1-based Mate POSition)
157 ( inferred Template LENgth (insert size))

ACATTGTACAATCCACCTGTAAATATCCTGACTATTTAAAAATGTCTGCAGATGCCTATGGTGATTCTATGTGGTTTTACTTACGCAGGGAACAATTATTTGCCAGACATTATTTTAATAGGGCTGGTAAAGTTGGGGAAACAATACCTGC

BCCCCFFFFFFFGGGGGGGGGGHHHHHHHHHHHGHHHHHHHHHHHHHHHHHHHHHHHHHHHHGHHHHHHHHHHHHGHGHHHHHHGGGGGGGHGHHHHHHHHHHHHGHGHHHHHHHHHHHHHHHHHGGFHGHHHHHHGGGGHHHHHHHHHHH

NM:i:0 (OPTional fields in the format “ TAG:VTYPE:VALUE”)

MD:Z:151

AS:i:151

XS:i:0


In this first read of the sam file, I pressed "Enter" when seeing a "Tabulation", for better understanding each part.

Now, my question is about the following (copied) line (that you can find above):
= (Mate Reference sequence NaMe (`=' if same as RNAME) )

Does this mean: "if it were not '=' but 'gene X', then 'gene X' is contiguous to 'HPV56REF'(RNAME)." ???

Thank you so much for your precious help!!

Jacques T
JacquesT is offline   Reply With Quote
Old 03-22-2017, 08:53 AM   #2
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 6,490
Default

Have you checked out SAM format specification?
GenoMax is offline   Reply With Quote
Old 03-22-2017, 09:40 AM   #3
JacquesT
Junior Member
 
Location: France

Join Date: Mar 2017
Posts: 4
Smile

Thanks GenoMax!!!

No I didn't look at that .pdf

Still, tell me if I am wrong:
In "Ref. name of the mate/next read": "next read", does it mean the one encompassing 2 genes if RNAME is not "="?
JacquesT is offline   Reply With Quote
Old 03-22-2017, 11:45 AM   #4
mastal
Senior Member
 
Location: uk

Join Date: Mar 2009
Posts: 659
Default

Mate Reference Sequence will not be '=' if the mate maps to a different contig or chromosome (the sequences listed with @SQ at the start of the sam file).

Occasionally you get read pairs where the 2 reads of the pair map to different chromosomes.
mastal is offline   Reply With Quote
Old 03-23-2017, 02:16 AM   #5
JacquesT
Junior Member
 
Location: France

Join Date: Mar 2017
Posts: 4
Default

OK. It's clearer now. Thanks Mastal

Just in case I didn't understand, I have a dumb question: each read and its mate read are from the same sequence, except that one is forward and the other is reverse. Right?
JacquesT is offline   Reply With Quote
Old 03-23-2017, 07:34 AM   #6
mastal
Senior Member
 
Location: uk

Join Date: Mar 2009
Posts: 659
Default

Yes, each read and its mate are from the same fragment, starting from different ends of the fragment.
mastal is offline   Reply With Quote
Reply

Tags
bioinformatics, cigar, flag, sam, samtools

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 07:07 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2017, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO