Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Go from list of genes to all exon coordinates?

    Hey all,

    I want to use eArray to create a custom capture set of baits for a few hundred genes. I'm ignorant in non-wetlab stuff, and looking at the website it appears that I cannot just upload a list of genes; rather I have to upload a list of the exon coordinates within the genes that I would like to design baits for. What would be the easiest way for me to go from a list of genes to a list of these exon coordinates? Thanks a lot for any help.

  • #2
    You can use accession numbers instead of gene names separated by a | if I remember correctly.
    Getting exon positions out of a list of gene names is e.g. possible in ensembl - BIOMART.

    Comment


    • #3
      Originally posted by doc.ramses View Post
      You can use accession numbers instead of gene names separated by a | if I remember correctly.
      Getting exon positions out of a list of gene names is e.g. possible in ensembl - BIOMART.
      Getting accession numbers wouldn't be too bad but would it select for just the exons as opposed to the entire gene? I have a hard time believing there is no fairly easy/straightforward way to do this. Thanks for the tip on ensembl, I will look at that.

      Comment


      • #4
        Originally posted by Heisman View Post
        Getting accession numbers wouldn't be too bad but would it select for just the exons as opposed to the entire gene?
        If you use the "exon finder" it will exactly do this. My advice is to ask an Agilent representative to do the design for you as earray is indeed not very handy.

        Comment


        • #5
          Originally posted by doc.ramses View Post
          If you use the "exon finder" it will exactly do this. My advice is to ask an Agilent representative to do the design for you as earray is indeed not very handy.
          Ok, I think I have it figured out, but I'll definitely email them and see if they are willing to design it (we will be placing a big order so hopefully they'll be more amenable) as that would obviously be the easiest. Thanks a lot!

          Comment


          • #6
            They will definately do. They will also have a more detailed look on GC-content etc.. And if you're placeing a big order - let them do the job for earning the money

            Comment


            • #7
              Here is a general procedure you can follow if you want to try it yourself.

              1. http://genome.ucsc.edu/cgi-bin/hgTables
              2. group - "Gene and Gene Prediction Tracks", track - "UCSC genes", table - knownGene
              or use the refGene table if you like refseq genes
              3. paste in your list of gene identifiers
              4. output as a bed file
              5. restrict to just coding exons
              6. save the file

              7. use bedtools to merge overlapping regions, pad as you feel appropriate etc
              8. load the track back into the ucsc genome browser to spot check the regions
              9. convert into a format eArray likes
              IIRC - chr1:100-1000
              conversion program:
              Code:
              awk '{print $1":"$2+1"-"$3}' myRegions.bed > myRegions.txt
              10. upload to agilent

              Comment


              • #8
                adamdeluca, thank you for your post. I'm with you on steps 1-6. I've never used bedtools but I could probably figure it out if necessary. I'm curious as to why one would expect to have overlapping regions? Also, for loading it back into the USCS to spot check it, where exactly would I load it and what would I be checking for? Thanks a lot!

                Comment


                • #9
                  Originally posted by Heisman View Post
                  adamdeluca, thank you for your post. I'm with you on steps 1-6. I've never used bedtools but I could probably figure it out if necessary. I'm curious as to why one would expect to have overlapping regions? Also, for loading it back into the USCS to spot check it, where exactly would I load it and what would I be checking for? Thanks a lot!
                  Exons will be duplicated for every different splice form of the gene. It has to do with the way UCSC stores data.

                  To run the bedtools merge:
                  Code:
                  mergeBed -i in.bed -d 60 > out.bed
                  This will combine any features that are <=60bp apart into a single feature.
                  You can also use slopBed to make the baits overlap a bit into the introns if that is desirable.

                  To preform the sanity check you want to add a custom track. From the main page, under the "genomes" tab, click the "add custom tracks" button. Just look at a few of the exons you are intending to target, and make sure the design region looks the way you are expecting. You will also want to make sure that all of the genes you really care about are included, they sometimes get missed due to difficulties parsing gene names.

                  Comment


                  • #10
                    Ok, excellent. Thanks a bunch!

                    Comment


                    • #11
                      You can also use Galaxy to do 7. There should be a "send results to galaxy" checkbox in the UCSC interface. Working with command lines tools is more powerful though.

                      Comment

                      Latest Articles

                      Collapse

                      • seqadmin
                        Current Approaches to Protein Sequencing
                        by seqadmin


                        Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                        04-04-2024, 04:25 PM
                      • seqadmin
                        Strategies for Sequencing Challenging Samples
                        by seqadmin


                        Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                        03-22-2024, 06:39 AM

                      ad_right_rmr

                      Collapse

                      News

                      Collapse

                      Topics Statistics Last Post
                      Started by seqadmin, 04-11-2024, 12:08 PM
                      0 responses
                      25 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, 04-10-2024, 10:19 PM
                      0 responses
                      29 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, 04-10-2024, 09:21 AM
                      0 responses
                      24 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, 04-04-2024, 09:00 AM
                      0 responses
                      52 views
                      0 likes
                      Last Post seqadmin  
                      Working...
                      X