Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • extracting predicted gene from scaffold: end position precedes start position

    I am trying to extract sequences for a list of predicted genes from genomic scaffolds. The list of predicted genes with Scaffold IDs, start and end positions, and other info comes from published supplementary data. My script to extract the sequences doesn't work because for some genes, the start position is a larger number than the end position (fourth-to-last and third-to-last columns below). Here is an example (numbers have been changed from original):
    geneID Gene_family Class ScaffoldID start_position end_position Number_of_exons Annotation_status
    CSP1 cs Protein candidate gi|294506227|gb|GL650210.1| 61498 52100 2 intact
    CSP10 cs Protein candidate gi|294507212|gb|GL649715.1| 293074 297989 2 intact
    CSP2 cs Protein candidate gi|294507210|gb|GL650017.1| 234944 236074 2 intact
    CSP3 cs Protein candidate gi|294507295|gb|GL649612.1| 323100 323743 2 intact
    CSP4 cs Protein candidate gi|294506227|gb|GL650210.1| 41911 40888 2 intact
    CSP5 cs Protein candidate gi|294507205|gb|GL649712.1| 274408 272617 2 intact
    I am new to working with annotated genomes. Does it make sense that the some "starts" come after the "ends"? Is this because the ORF for this gene is on the opposite strand of the scaffold? If so, and if I want to obtain that sequence, what's the best way to get it--should I extract the sequence in the scaffold between the two numbers and then find the reverse complement?

    Thanks for any pointers.

  • #2
    Some genes transcribed from opposite strand of DNA, resulting in reverse coordinates. You can add additional column (i.e. strand) adding '+' in cases when start_position < end_position and '-' start_position > end_position.

    Comment

    Latest Articles

    Collapse

    • seqadmin
      Essential Discoveries and Tools in Epitranscriptomics
      by seqadmin


      The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist on Modified Bases...
      Today, 07:01 AM
    • seqadmin
      Current Approaches to Protein Sequencing
      by seqadmin


      Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
      04-04-2024, 04:25 PM

    ad_right_rmr

    Collapse

    News

    Collapse

    Topics Statistics Last Post
    Started by seqadmin, 04-11-2024, 12:08 PM
    0 responses
    37 views
    0 likes
    Last Post seqadmin  
    Started by seqadmin, 04-10-2024, 10:19 PM
    0 responses
    41 views
    0 likes
    Last Post seqadmin  
    Started by seqadmin, 04-10-2024, 09:21 AM
    0 responses
    35 views
    0 likes
    Last Post seqadmin  
    Started by seqadmin, 04-04-2024, 09:00 AM
    0 responses
    54 views
    0 likes
    Last Post seqadmin  
    Working...
    X