Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • help:combine interesting gff and tophat bedfile

    I have a problem unsolved,listed below
    I get one gff file which contains interesting genes,transcripts,exon,intron etc.eg:
    9 ensembl gene 19154953 19164082 . + . ID=GRMZM2G443985;Name=GRMZM2G443985;biotype=protein_coding
    9 ensembl mRNA 19154953 19160236 . + . ID=GRMZM2G443985_T01;Parent=GRMZM2G443985;Name=GRMZM2G443985_T01;biotype=protein_coding
    9 ensembl intron 19155466 19156260 . + . Parent=GRMZM2G443985_T01;Name=intron.9805
    9 ensembl intron 19156308 19156394 . + . Parent=GRMZM2G443985_T01;Name=intron.9806
    9 ensembl intron 19156435 19156543 . + . Parent=GRMZM2G443985_T01;Name=intron.9807
    9 ensembl intron 19156607 19156733 . + . Parent=GRMZM2G443985_T01;Name=intron.9808
    9 ensembl intron 19156872 19157016 . + . Parent=GRMZM2G443985_T01;Name=intron.9809
    9 ensembl intron 19157176 19157333 . + . Parent=GRMZM2G443985_T01;Name=intron.9810
    9 ensembl intron 19157388 19157527 . + . Parent=GRMZM2G443985_T01;Name=intron.9811
    9 ensembl intron 19157597 19157714 . + . Parent=GRMZM2G443985_T01;Name=intron.9812
    9 ensembl intron 19157845 19158737 . + . Parent=GRMZM2G443985_T01;Name=intron.9813
    9 ensembl intron 19158779 19158877 . + . Parent=GRMZM2G443985_T01;Name=intron.9814
    9 ensembl intron 19159100 19159294 . + . Parent=GRMZM2G443985_T01;Name=intron.9815
    9 ensembl intron 19159341 19159419 . + . Parent=GRMZM2G443985_T01;Name=intron.9816
    9 ensembl intron 19159575 19159711 . + . Parent=GRMZM2G443985_T01;Name=intron.9817
    9 ensembl exon 19154953 19155465 . + . Parent=GRMZM2G443985_T01;Name=GRMZM2G443985_E41
    9 ensembl exon 19156261 19156307 . + . Parent=GRMZM2G443985_T01;Name=GRMZM2G443985_E43
    9 ensembl exon 19156395 19156434 . + . Parent=GRMZM2G443985_T01;Name=GRMZM2G443985_E10
    9 ensembl exon 19156544 19156606 . + . Parent=GRMZM2G443985_T01;Name=GRMZM2G443985_E37
    9 ensembl exon 19156734 19156871 . + . Parent=GRMZM2G443985_T01;Name=GRMZM2G443985_E17
    9 ensembl exon 19157017 19157175 . + . Parent=GRMZM2G443985_T01;Name=GRMZM2G443985_E07
    9 ensembl exon 19157334 19157387 . + . Parent=GRMZM2G443985_T01;Name=GRMZM2G443985_E29
    9 ensembl exon 19157528 19157596 . + . Parent=GRMZM2G443985_T01;Name=GRMZM2G443985_E39
    9 ensembl exon 19157715 19157844 . + . Parent=GRMZM2G443985_T01;Name=GRMZM2G443985_E11
    9 ensembl exon 19158738 19158778 . + . Parent=GRMZM2G443985_T01;Name=GRMZM2G443985_E23
    9 ensembl exon 19158878 19159099 . + . Parent=GRMZM2G443985_T01;Name=GRMZM2G443985_E08
    9 ensembl exon 19159295 19159340 . + . Parent=GRMZM2G443985_T01;Name=GRMZM2G443985_E19
    9 ensembl exon 19159420 19159574 . + . Parent=GRMZM2G443985_T01;Name=GRMZM2G443985_E20
    9 ensembl exon 19159712 19160236 . + . Parent=GRMZM2G443985_T01;Name=GRMZM2G443985_E42
    9 ensembl CDS 19155187 19155465 . + . Parent=GRMZM2G443985_T01;Name=CDS.9832
    9 ensembl CDS 19156261 19156307 . + 0 Parent=GRMZM2G443985_T01;Name=CDS.9833
    9 ensembl CDS 19156395 19156434 . + 2 Parent=GRMZM2G443985_T01;Name=CDS.9834
    9 ensembl CDS 19156544 19156606 . + 0 Parent=GRMZM2G443985_T01;Name=CDS.9835
    9 ensembl CDS 19156734 19156871 . + 0 Parent=GRMZM2G443985_T01;Name=CDS.9836
    9 ensembl CDS 19157017 19157175 . + 0 Parent=GRMZM2G443985_T01;Name=CDS.9837
    9 ensembl CDS 19157334 19157387 . + 0 Parent=GRMZM2G443985_T01;Name=CDS.9838
    9 ensembl CDS 19157528 19157596 . + 0 Parent=GRMZM2G443985_T01;Name=CDS.9839
    9 ensembl CDS 19157715 19157844 . + 0 Parent=GRMZM2G443985_T01;Name=CDS.9840
    9 ensembl CDS 19158738 19158778 . + 1 Parent=GRMZM2G443985_T01;Name=CDS.9841
    9 ensembl CDS 19158878 19159099 . + 0 Parent=GRMZM2G443985_T01;Name=CDS.9842
    9 ensembl CDS 19159295 19159340 . + 0 Parent=GRMZM2G443985_T01;Name=CDS.9843
    9 ensembl CDS 19159420 19159574 . + 1 Parent=GRMZM2G443985_T01;Name=CDS.9844
    9 ensembl CDS 19159712 19159885 . + 0 Parent=GRMZM2G443985_T01;Name=CDS.9845
    9 ensembl mRNA 19155187 19164082 . + . ID=GRMZM2G443985_T02;Parent=GRMZM2G443985;Name=GRMZM2G443985_T02;biotype=protein_coding
    9 ensembl intron 19155466 19156260 . + . Parent=GRMZM2G443985_T02;Name=intron.9846
    9 ensembl intron 19156308 19156394 . + . Parent=GRMZM2G443985_T02;Name=intron.9847
    9 ensembl intron 19156435 19156543 . + . Parent=GRMZM2G443985_T02;Name=intron.9848
    9 ensembl intron 19156607 19156733 . + . Parent=GRMZM2G443985_T02;Name=intron.9849
    9 ensembl intron 19156872 19157016 . + . Parent=GRMZM2G443985_T02;Name=intron.9850
    9 ensembl intron 19157176 19157333 . + . Parent=GRMZM2G443985_T02;Name=intron.9851
    9 ensembl intron 19157388 19157527 . + . Parent=GRMZM2G443985_T02;Name=intron.9852
    9 ensembl intron 19157597 19157714 . + . Parent=GRMZM2G443985_T02;Name=intron.9853
    9 ensembl intron 19157845 19158737 . + . Parent=GRMZM2G443985_T02;Name=intron.9854
    9 ensembl intron 19158779 19158877 . + . Parent=GRMZM2G443985_T02;Name=intron.9855
    9 ensembl intron 19159100 19159294 . + . Parent=GRMZM2G443985_T02;Name=intron.9856
    9 ensembl intron 19159341 19159419 . + . Parent=GRMZM2G443985_T02;Name=intron.9857
    9 ensembl intron 19159575 19160730 . + . Parent=GRMZM2G443985_T02;Name=intron.9858
    9 ensembl intron 19161065 19161925 . + . Parent=GRMZM2G443985_T02;Name=intron.9859
    9 ensembl intron 19162122 19162246 . + . Parent=GRMZM2G443985_T02;Name=intron.9860
    9 ensembl intron 19162375 19162557 . + . Parent=GRMZM2G443985_T02;Name=intron.9861
    9 ensembl intron 19162761 19162871 . + . Parent=GRMZM2G443985_T02;Name=intron.9862
    9 ensembl intron 19162998 19163104 . + . Parent=GRMZM2G443985_T02;Name=intron.9863
    9 ensembl intron 19163264 19163338 . + . Parent=GRMZM2G443985_T02;Name=intron.9864
    9 ensembl intron 19163533 19163614 . + . Parent=GRMZM2G443985_T02;Name=intron.9865
    9 ensembl exon 19155187 19155465 . + . Parent=GRMZM2G443985_T02;Name=GRMZM2G443985_E25
    9 ensembl exon 19156261 19156307 . + . Parent=GRMZM2G443985_T02;Name=GRMZM2G443985_E34
    9 ensembl exon 19156395 19156434 . + . Parent=GRMZM2G443985_T02;Name=GRMZM2G443985_E21
    9 ensembl exon 19156544 19156606 . + . Parent=GRMZM2G443985_T02;Name=GRMZM2G443985_E28
    9 ensembl exon 19156734 19156871 . + . Parent=GRMZM2G443985_T02;Name=GRMZM2G443985_E04
    9 ensembl exon 19157017 19157175 . + . Parent=GRMZM2G443985_T02;Name=GRMZM2G443985_E33
    9 ensembl exon 19157334 19157387 . + . Parent=GRMZM2G443985_T02;Name=GRMZM2G443985_E13
    9 ensembl exon 19157528 19157596 . + . Parent=GRMZM2G443985_T02;Name=GRMZM2G443985_E18
    9 ensembl exon 19157715 19157844 . + . Parent=GRMZM2G443985_T02;Name=GRMZM2G443985_E44
    9 ensembl exon 19158738 19158778 . + . Parent=GRMZM2G443985_T02;Name=GRMZM2G443985_E30
    9 ensembl exon 19158878 19159099 . + . Parent=GRMZM2G443985_T02;Name=GRMZM2G443985_E32
    9 ensembl exon 19159295 19159340 . + . Parent=GRMZM2G443985_T02;Name=GRMZM2G443985_E02
    9 ensembl exon 19159420 19159574 . + . Parent=GRMZM2G443985_T02;Name=GRMZM2G443985_E26
    9 ensembl exon 19160731 19161064 . + . Parent=GRMZM2G443985_T02;Name=GRMZM2G443985_E24
    9 ensembl exon 19161926 19162121 . + . Parent=GRMZM2G443985_T02;Name=GRMZM2G443985_E14
    Other files I get from tophat are the junction.bed,deletion.bed and insertion.bed,eg:
    chromosome9 19155376 19156307 JUNC00137067 36 + 19155376 19156307 255,0,0 2 89,47 0,884
    chromosome9 19156260 19156434 JUNC00137068 15 + 19156260 19156434 255,0,0 2 47,40 0,134
    chromosome9 19156394 19156606 JUNC00137069 11 + 19156394 19156606 255,0,0 2 40,63 0,149
    chromosome9 19156543 19156794 JUNC00137070 4 + 19156543 19156794 255,0,0 2 63,61 0,190
    chromosome9 19156786 19157101 JUNC00137071 21 + 19156786 19157101 255,0,0 2 85,85 0,230
    chromosome9 19157087 19157387 JUNC00137072 27 + 19157087 19157387 255,0,0 2 88,54 0,246
    chromosome9 19157333 19157596 JUNC00137073 21 + 19157333 19157596 255,0,0 2 54,69 0,194
    chromosome9 19157527 19157802 JUNC00137074 42 + 19157527 19157802 255,0,0 2 69,88 0,187
    chromosome9 19157814 19158174 JUNC00137075 1 + 19157814 19158174 255,0,0 2 30,60 0,300
    chromosome9 19157765 19158826 JUNC00137076 53 + 19157765 19158826 255,0,0 2 79,89 0,972
    chromosome9 19158723 19158964 JUNC00137077 87 + 19158723 19158964 255,0,0 2 55,87 0,154
    chromosome9 19159013 19159340 JUNC00137078 64 + 19159013 19159340 255,0,0 2 86,46 0,281
    chromosome9 19159294 19159508 JUNC00137079 134 + 19159294 19159508 255,0,0 2 46,89 0,125
    chromosome9 19159486 19159789 JUNC00137080 133 + 19159486 19159789 255,0,0 2 88,78 0,225
    chromosome9 19160992 19162008 JUNC00137081 3 + 19160992 19162008 255,0,0 2 72,83 0,933
    chromosome9 19162074 19162331 JUNC00137082 5 + 19162074 19162331 255,0,0 2 47,85 0,172
    chromosome9 19162293 19162638 JUNC00137083 17 + 19162293 19162638 255,0,0 2 81,81 0,264
    chromosome9 19162675 19162960 JUNC00137084 23 + 19162675 19162960 255,0,0 2 85,89 0,196
    chromosome9 19162920 19163191 JUNC00137085 21 + 19162920 19163191 255,0,0 2 77,87 0,184
    chromosome9 19163178 19163427 JUNC00137086 24 + 19163178 19163427 255,0,0 2 85,89 0,160
    chromosome9 19163446 19163679 JUNC00137087 30 + 19163446 19163679 255,0,0 2 86,65 0,168
    the red marked are supporting reads.
    How can i get each gene,exon ,intron and transcript read count from two or more files?Any help?

  • #2
    anyone help?

    Comment


    • #3
      Use cufflinks. That will produce most of the count data that you require (at least gene-level and transcript-level counts). For exon-level, you might need to use DEXSeq -- exon counting is somewhat tricky. I'm not sure about intron counting.

      Comment


      • #4
        thank you.I'll have a try

        Comment

        Latest Articles

        Collapse

        • seqadmin
          Current Approaches to Protein Sequencing
          by seqadmin


          Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
          04-04-2024, 04:25 PM
        • seqadmin
          Strategies for Sequencing Challenging Samples
          by seqadmin


          Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
          03-22-2024, 06:39 AM

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by seqadmin, 04-11-2024, 12:08 PM
        0 responses
        30 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-10-2024, 10:19 PM
        0 responses
        32 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-10-2024, 09:21 AM
        0 responses
        28 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-04-2024, 09:00 AM
        0 responses
        53 views
        0 likes
        Last Post seqadmin  
        Working...
        X