Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • rkusko
    Junior Member
    • Jul 2010
    • 4

    Gene level ensembl bed file?

    Hey all,
    I'm trying to run Scripture's score task and look at human gene level expression. Does anyone know where I can find a bed6 format file with Ensembl Gene annotation (ENSG) instead of Ensembl transcript annotation (ENST)?
    I realize I could use the counts from the transcript level output to get gene level output, but then I lose the fwer p-values that scripture outputs.
  • malachig
    Senior Member
    • Aug 2010
    • 117

    #2
    Do you mean that you wish to merge overlapping exons for Ensembl genes that have multiple isoforms? So that you have a single BED line for each gene?

    Comment

    • rkusko
      Junior Member
      • Jul 2010
      • 4

      #3
      Yes, I am looking for a file (or working on creating a file) where each ensembl gene is one line of a bed6 format file.

      I'm not so concerned with isoforms. Most ensembl genes have multiple ensembl transcripts that point within the location of the gene. The only ensembl Human Genome bed6 files I have found have contained ensembl transciprt (ENST) ids rather than ensembl gene ids (ENSG). I do have a bed12 file at the gene level instead of the transcript level, but it is taking me some time to write a script to take care of this. Thus why I am looking for a human ENSG bed6 file.

      Does that answer your question?

      Comment

      • BetterPrimate
        Member
        • May 2010
        • 15

        #4
        Use USCS table browser. Instead of selecting output to BED, choose "selected fields from table" and choose the 6 fields you need. Not sure if fields will naturally occur in the order you prefer. If not, that'd be easy to fix if you do it through galaxy.

        Comment

        • malachig
          Senior Member
          • Aug 2010
          • 117

          #5
          This was my initial thought as well. However, when I tried it in Galaxy, the import just sat there for ever with a message: 'waiting to run'. Perhaps this was just a temporary problem and Galaxy will do the trick...

          When I tried it directly in the UCSC table browser it gave me a BED file with one line per transcript, even though I selected the Gene table and specified that the BED be created with 'one line per whole gene'.

          Of course, the info you would need to create a gene-level BED file, is in this file. I was also able to get the necessary info from Ensembl Biomart, but again I didn't see an obvious way to output directly to BED6 format.

          Since rkusko already has the required info but in BED12 format, neither the UCSC or Ensembl option seems more convenient.

          Where did the BED12 version come from? Can you post it, or a sample of it in case someone has a ready-made converter to try...

          Comment

          • malachig
            Senior Member
            • Aug 2010
            • 117

            #6
            Update:

            My Galaxy task did finally complete. I used Galaxy to import the Ensembl gene annotations from UCSC and output them as BED (which can be done entirely at UCSC just as easily). Unfortunately, this produces one line per transcript not one line per gene. In retrospect, this is not surprising given that UCSC's concept of a 'gene' is basically a transcript.

            Anyway, why not start with the transcript level file and use BEDTOOLS to merge overlapping features on the same strand. You should be able to do this using the 'mergeBed' function with the '-s' option to force strandedness.

            One potential problem I see with this approach is that in rare cases there may be multiple genes, on the same strand with some overlap... So you might accidentally merge these into a single gene... mergeBed allows you to report the names of the things that were merged so you could use this option and then explicitly look for cases where transcripts from different genes were merged.

            Comment

            • adamdeluca
              Member
              • Jul 2010
              • 95

              #7
              Originally posted by malachig View Post
              One potential problem I see with this approach is that in rare cases there may be multiple genes, on the same strand with some overlap... So you might accidentally merge these into a single gene
              A hack, but... BEDTools's mergeBed just treats the chromosome as a string. Concatenate the ENSG id and the chr# and it will merge the way you want.

              Code:
              ENSG00000166157_chr21	9928080	10012775
              ENSG00000166157_chr21	9928080	10012775
              ENSG00000166157_chr21	9928080	9993593
              ENSG00000166157_chr21	9928080	10012775
              ENSG00000166157_chr21	9928611	10012753
              becomes
              Code:
              ENSG00000166157_chr21	9928080	10012775

              Comment

              • biocyberman
                Junior Member
                • Nov 2010
                • 2

                #8
                Originally posted by malachig View Post
                Update:

                My Galaxy task did finally complete. I used Galaxy to import the Ensembl gene annotations from UCSC and output them as BED (which can be done entirely at UCSC just as easily). Unfortunately, this produces one line per transcript not one line per gene. In retrospect, this is not surprising given that UCSC's concept of a 'gene' is basically a transcript.

                Anyway, why not start with the transcript level file and use BEDTOOLS to merge overlapping features on the same strand. You should be able to do this using the 'mergeBed' function with the '-s' option to force strandedness.

                One potential problem I see with this approach is that in rare cases there may be multiple genes, on the same strand with some overlap... So you might accidentally merge these into a single gene... mergeBed allows you to report the names of the things that were merged so you could use this option and then explicitly look for cases where transcripts from different genes were merged.
                The Bedtools are interesting. Thanks :-)

                Comment

                Latest Articles

                Collapse

                ad_right_rmr

                Collapse

                News

                Collapse

                Topics Statistics Last Post
                Started by SEQadmin2, 06-09-2026, 11:58 AM
                0 responses
                24 views
                0 reactions
                Last Post SEQadmin2  
                Started by SEQadmin2, 06-05-2026, 10:09 AM
                0 responses
                29 views
                0 reactions
                Last Post SEQadmin2  
                Started by SEQadmin2, 06-04-2026, 08:59 AM
                0 responses
                39 views
                0 reactions
                Last Post SEQadmin2  
                Started by SEQadmin2, 06-02-2026, 12:03 PM
                0 responses
                61 views
                0 reactions
                Last Post SEQadmin2  
                Working...