Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Aligned to UCSC genome using Bowtie2, how do I interpret QNAME in SAM file?

    I get identifiers like M00830:112:000000000-A6EGB:1:1101:13084:1783.

  • #2
    There's nothing really to interpret. That's just the name the sequencer gave to the read (it has nothing to do with alignment). If you really must know, usually the first part is the machine ID and the last part denotes which lane was used and where on the flow-cell the read was seen. That's the case for data from Illumina machines at least.

    Late addition: BTW, the page on Fastq on wikipedia happens to mention illumina read name formats.
    Last edited by dpryan; 12-11-2013, 08:10 AM. Reason: Mention and link to the wikipedia page

    Comment


    • #3
      Here are a couple of pages with SAM format details.




      What you have posted above looks like a Illumina sequence ID.

      Edit: Did not see Devon's message when I posted this. See the "Illumina sequence identifiers" to get details: http://en.wikipedia.org/wiki/FASTQ_format
      Last edited by GenoMax; 12-11-2013, 08:07 AM.

      Comment


      • #4
        So then why aren't there any gene specific idnetifiers with this genome alignment, when all other SAM examples I see outputed from bowtie 2 have these identifiers?

        Comment


        • #5
          You need to post the whole line for us to know what you're talking about, not just the QNAME field.

          Comment


          • #6
            Originally posted by dpryan View Post
            You need to post the whole line for us to know what you're talking about, not just the QNAME field.
            Here are my first 3 reads.

            M00830:112:000000000-A6EGB:1:1101:16729:1705 16 chr2 156079200 22 21S29M * 0 0 AGACGTGTGCTCTTCCGATCTACACAGGGCTTGAGCAGTTGCGAACACGT B/B/B0B1A0000000AA212D110BAA1113BBA1FA11>1DFC1A1>1 AS:i:53 XN:i:0 XM:i:1 XO:i:0 XG:i:0NM:i:1 MD:Z:2T26 YT:Z:UU

            M00830:112:000000000-A6EGB:1:1101:18463:1733 0 chr17 39846570 36 35M15S * 0 0 TGCGTGCATTTATCAGATCAAAACCAACCCGGTGAAATCGGAAGCGCCCA AAAAA1>1BFFBEG331BB1111A00000000A001AAB///////A/A/ AS:i:70 XN:i:0 XM:i:0 XO:i:0 XG:i:0NM:i:0 MD:Z:35 YT:Z:UU

            M00830:112:000000000-A6EGB:1:1101:16633:1749 4 * 0 0 * * 0 0 CGTGCATTCATCAGATCAAAACCGACCCGGTGAGATCGGAAGAGCACACT >AAAA1BDFBFFBBBGC11111A00A0A00A0/01DB/////00B1B0A0 YT:Z:UU

            Comment


            • #7
              Right, so the first two reads map and the third doesn't. The original read in question isn't included among those you listed.

              Comment


              • #8
                Originally posted by dpryan View Post
                Right, so the first two reads map and the third doesn't. The original read in question isn't included among those you listed.
                right, so now how would I go about determining which gene the first 2 correspond to? theoretically, these should all be exonic.

                Comment


                • #9
                  Why don't you tell us what your biological goal is? I'm guessing that this is RNAseq data and you eventually want counts per gene for downstream statistics. In that case, just use htseq-count or featureCounts (from subRead). Actually, htseq-count will even annotate the reads for you if you really want (normally you'd just do that to debug a problem).

                  Comment


                  • #10
                    Just add to Devon's post: featureCounts can also output detailed assignment results for each read when -R option is specified, although it only includes read names in this read-level output (other fields in SAM/BAM files are omitted).

                    Comment


                    • #11
                      Originally posted by shi View Post
                      Just add to Devon's post: featureCounts can also output detailed assignment results for each read when -R option is specified, although it only includes read names in this read-level output (other fields in SAM/BAM files are omitted).
                      One of these days I really should fully familiarize myself with featureCounts

                      Comment

                      Latest Articles

                      Collapse

                      • seqadmin
                        Essential Discoveries and Tools in Epitranscriptomics
                        by seqadmin




                        The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
                        04-22-2024, 07:01 AM
                      • seqadmin
                        Current Approaches to Protein Sequencing
                        by seqadmin


                        Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                        04-04-2024, 04:25 PM

                      ad_right_rmr

                      Collapse

                      News

                      Collapse

                      Topics Statistics Last Post
                      Started by seqadmin, Today, 08:47 AM
                      0 responses
                      9 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, 04-11-2024, 12:08 PM
                      0 responses
                      60 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, 04-10-2024, 10:19 PM
                      0 responses
                      57 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, 04-10-2024, 09:21 AM
                      0 responses
                      53 views
                      0 likes
                      Last Post seqadmin  
                      Working...
                      X