Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • human chromosome index

    Hi,

    I got an alignment output, for each read which has a human chromosome index and its location. I'm trying to convert these records to corresponding gene ids.

    Now, the question is in the output chromosome index ranges from number 1 to number 25; but the reference table I downloaded from ensemble website
    list:

    "7" "17" "9" "6" "20" "5"
    "14" "3" "2" "4" "22" "16"
    "15" "18" "1" "12" "Y" "X"
    "19" "11" "8" "10" "c6_QBL" "NT_113958"
    "NT_113871" "13" "NT_113935" "21" "NT_113930" "NT_113888"
    "NT_113924" "c6_COX" "NT_113932" "NT_113898" "NT_113954" "MT"
    "NT_113926" "NT_113933" "NT_113880" "NT_113886" "NT_113925" "NT_113936"
    "NT_113951" "NT_113965" "NT_113944" "NT_113923" "NT_113931" "NT_113870"
    "NT_113899" "NT_113901" "NT_113956" "NT_113934" "NT_113915" "NT_113964"
    "c5_H2" "NT_113946" "NT_113957" "NT_113916" "NT_113929" "NT_113874"
    "NT_113890" "NT_113949" "NT_113884" "NT_113878" "NT_113917" "NT_113906"
    "NT_113960" "NT_113911" "NT_113963" "NT_113872" "NT_113881" "NT_113912"
    "NT_113910" "NT_113903" "NT_113953" "NT_113937" "NT_113889" "NT_113909"
    "NT_113927" "NT_113902" "NT_113885" "NT_113961" "NT_113962" "NT_113908"
    "NT_113943" "NT_113966" "NT_113939"

    Anyone knows which should match which? Thanks

  • #2
    The 1-22,x,y,m are straightforward: our favorite chromosomes that we learned in school.

    The other stuff is one of two things:

    Tthe NT_s are chunks of genome that the jigsaw puzzle folks piecing together the genome reference can't quite place. Typically the best they can do is know that it's part of a specific chromosome but where on the chromosome ... they don't know. Type in the NT_strings into entrez at ncbi to find out more. A typical name for the NT_ contig is "Homo sapiens chromosome 4 unlocalized genomic contig, GRCh37.p2 reference primary assembly".

    The c* entries are alternative segments of a chromosome, i.e. replacement parts of the jigsaw puzzle: both pieces fit a slot but to be consistent many folks just use the pieces from the original box(chr1-22,x,y,m). You can typically ignore these.

    Comment


    • #3
      Thank you very much!

      1:22 - 1:22
      23 - X
      24 - Y
      25 - MT

      Is it correct by what you meant?

      Comment


      • #4
        Not exactly.
        Note that there is no 23,24,25 in your data.
        There is an X , Y and MT for x chromosome, y chromosome and Mitochondria. Many folks come up with their own indexing using 23,24 and 25 for chrX, chrY and chrM in other situations. There is no standard.
        Last edited by Richard Finney; 02-15-2011, 03:37 PM.

        Comment


        • #5
          Thanks again. It's really helpful.

          Comment


          • #6
            You don't say where you got your output which contained the indices but you should probably go and look at their documentation to see how to do the translation.

            In at least one aligner we tried we found that the output used these kinds of indices but that the chromosomes were arranged alphabetically, ie:

            1
            10
            11
            12
            ....
            2
            3
            4
            ..
            MT
            X
            Y

            You'll probably find out pretty quickly if you're getting it wrong as you'll start to see locations off the end of the chromosome, but I'd certainly be happier if I had a definitive answer rather than guessing.

            Comment


            • #7
              My output is generated by AB WT Pipeline with .max format.

              I tried by 1:22 - 1:22, 23 - X, 24 - Y, 25 - MT and got 755 reads annotated by gene ids from "X", 57 from "Y", and 27 from "MT". Is it a suspicious result?

              Appreciate!

              Comment

              Latest Articles

              Collapse

              • seqadmin
                Current Approaches to Protein Sequencing
                by seqadmin


                Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                04-04-2024, 04:25 PM
              • seqadmin
                Strategies for Sequencing Challenging Samples
                by seqadmin


                Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                03-22-2024, 06:39 AM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by seqadmin, 04-11-2024, 12:08 PM
              0 responses
              18 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 04-10-2024, 10:19 PM
              0 responses
              22 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 04-10-2024, 09:21 AM
              0 responses
              16 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 04-04-2024, 09:00 AM
              0 responses
              47 views
              0 likes
              Last Post seqadmin  
              Working...
              X