Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • How can I dissect .sam files in text editor??

    Hello everyone!

    I'm new to bioinformatics.
    I have some questions about reading (eye-balling) a .sam file.

    For example:

    @SQ SN:HPV11REF LN:7931
    @SQ SN:HPV16REF LN:7846
    @SQ SN:HPV18REF LN:7857
    @SQ SN:HPV31REF LN:7906
    @SQ SN:HPV33REF LN:7909
    @SQ SN:HPV35REF LN:7879
    @SQ SN:HPV39REF LN:7833
    @SQ SN:HPV45REF LN:7858
    @SQ SN:HPV51REF LN:7808
    @SQ SN:HPV52REF LN:7942
    @SQ SN:HPV56REF LN:7845
    @SQ SN:HPV58REF LN:7824
    @SQ SN:HPV59REF LN:7896
    @SQ SN:HPV6REF LN:7996
    @SQ SN:HPV1REF LN:7816
    @SQ SN:HPV2REF LN:7860
    @SQ SN:HPV3REF LN:7820
    @SQ SN:HPV4REF LN:7353
    @SQ SN:HPV5REF LN:7746
    @SQ SN:HPV7REF LN:8027
    @SQ SN:HPV8REF LN:7654
    @SQ SN:HPV9REF LN:7434
    @SQ SN:HPV10REF LN:7919
    @SQ SN:HPV34REF LN:7723
    @SQ SN:HPV40REF LN:7909
    @SQ SN:HPV42REF LN:7917
    @SQ SN:HPV43REF LN:7975
    @SQ SN:HPV44REF LN:7833
    @SQ SN:HPV53REF LN:7859
    @SQ SN:HPV54REF LN:7759
    @SQ SN:HPV61REF LN:7989
    @SQ SN:HPV68REF LN:7822
    @SQ SN:HPV69REF LN:7700
    @SQ SN:HPV70REF LN:7905
    @SQ SN:HPV72REF LN:7989
    @SQ SN:HPV73REF LN:7700
    @SQ SN:HPV80REF LN:7427


    {BWA instruction}


    MSQ-M1307R:269:000000000-D24BN:1:1101:15163:1383 (QNAME)
    99 (FLAG)
    HPV56REF (RNAME)
    6262 (Position of the leftmost base)
    60 (Mapping quality, Phred)
    151M (CIGAR)
    = (Mate Reference sequence NaMe (`=' if same as RNAME) )
    6268 (1-based Mate POSition)
    157 ( inferred Template LENgth (insert size))

    ACATTGTACAATCCACCTGTAAATATCCTGACTATTTAAAAATGTCTGCAGATGCCTATGGTGATTCTATGTGGTTTTACTTACGCAGGGAACAATTATTTGCCAGACATTATTTTAATAGGGCTGGTAAAGTTGGGGAAACAATACCTGC

    BCCCCFFFFFFFGGGGGGGGGGHHHHHHHHHHHGHHHHHHHHHHHHHHHHHHHHHHHHHHHHGHHHHHHHHHHHHGHGHHHHHHGGGGGGGHGHHHHHHHHHHHHGHGHHHHHHHHHHHHHHHHHGGFHGHHHHHHGGGGHHHHHHHHHHH

    NM:i:0 (OPTional fields in the format “ TAG:VTYPE:VALUE”)

    MD:Z:151

    AS:i:151

    XS:i:0


    In this first read of the sam file, I pressed "Enter" when seeing a "Tabulation", for better understanding each part.

    Now, my question is about the following (copied) line (that you can find above):
    = (Mate Reference sequence NaMe (`=' if same as RNAME) )

    Does this mean: "if it were not '=' but 'gene X', then 'gene X' is contiguous to 'HPV56REF'(RNAME)." ???

    Thank you so much for your precious help!!

    Jacques T

  • #2
    Have you checked out SAM format specification?

    Comment


    • #3
      Thanks GenoMax!!!

      No I didn't look at that .pdf

      Still, tell me if I am wrong:
      In "Ref. name of the mate/next read": "next read", does it mean the one encompassing 2 genes if RNAME is not "="?

      Comment


      • #4
        Mate Reference Sequence will not be '=' if the mate maps to a different contig or chromosome (the sequences listed with @SQ at the start of the sam file).

        Occasionally you get read pairs where the 2 reads of the pair map to different chromosomes.

        Comment


        • #5
          OK. It's clearer now. Thanks Mastal

          Just in case I didn't understand, I have a dumb question: each read and its mate read are from the same sequence, except that one is forward and the other is reverse. Right?

          Comment


          • #6
            Yes, each read and its mate are from the same fragment, starting from different ends of the fragment.

            Comment

            Latest Articles

            Collapse

            • seqadmin
              Essential Discoveries and Tools in Epitranscriptomics
              by seqadmin


              The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist on Modified Bases...
              Today, 07:01 AM
            • seqadmin
              Current Approaches to Protein Sequencing
              by seqadmin


              Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
              04-04-2024, 04:25 PM

            ad_right_rmr

            Collapse

            News

            Collapse

            Topics Statistics Last Post
            Started by seqadmin, 04-11-2024, 12:08 PM
            0 responses
            37 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-10-2024, 10:19 PM
            0 responses
            41 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-10-2024, 09:21 AM
            0 responses
            35 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-04-2024, 09:00 AM
            0 responses
            54 views
            0 likes
            Last Post seqadmin  
            Working...
            X