Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • gene annotation in sequenced cancer genome

    Hi all,

    I am dealing with ChIP-seq data for a small lung cancer sample. The genomic sequences of this cancer sample is available in this paper: "A small-cell lung cancer genome with complex signatures of tobacco exposure". We are trying to build a specific reference genome based on the whole genome sequences provided in this paper and map our ChIP-seq data back to this specific reference genome.
    My problem is that the gene annotation files like GTF files will be very different for this specific reference genome and common reference genome (Hg19), because of the somatic variation (insertions, deletions, rearrangements) in this cancer genome. The coordinates for transcripts will change a lot. Is there a software of program to solve this problem?
    Something like by providing hg19 GTF files and information for all somatic variations as listed in this link, it can output a gene annotation file for this specific cancer reference genome.
    or if not, could someone give a clue about how to cleverly do that?

    Thanks a lot!

  • #2
    If this is human (I'm assuming it's not one of these fellows : http://www.holytaco.com/25-smoking-monkeys/ ), you can get the sequences for the human genes from genbak/ncbi/entrez and blat them against your custom(?) genome. This is fairly easy to set up if you know how to script and process a big data set. It is still a big job in terms of horsepower needed but easily paralleizeable if you have access to many computers. You may need to fine tune the blat alignment parameters and filter the results for good hits. The results will give you the coordinates for the genes.

    Comment

    Latest Articles

    Collapse

    • seqadmin
      Current Approaches to Protein Sequencing
      by seqadmin


      Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
      04-04-2024, 04:25 PM
    • seqadmin
      Strategies for Sequencing Challenging Samples
      by seqadmin


      Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
      03-22-2024, 06:39 AM

    ad_right_rmr

    Collapse

    News

    Collapse

    Topics Statistics Last Post
    Started by seqadmin, 04-11-2024, 12:08 PM
    0 responses
    25 views
    0 likes
    Last Post seqadmin  
    Started by seqadmin, 04-10-2024, 10:19 PM
    0 responses
    27 views
    0 likes
    Last Post seqadmin  
    Started by seqadmin, 04-10-2024, 09:21 AM
    0 responses
    24 views
    0 likes
    Last Post seqadmin  
    Started by seqadmin, 04-04-2024, 09:00 AM
    0 responses
    52 views
    0 likes
    Last Post seqadmin  
    Working...
    X