Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Tophat: generate junctions.bed file from BAM file

    Dear all,
    I have previously mapped RNA-seq samples to the human genome using Tophat 2.0.6. This generated an "accepted_hits.bam" result file, and a "junctions.bed" file I am interested in for details on exon-exon junctions mapping.
    To be sure to compare fairly my different samples, I want to downsample the BAM files to have the same number of mapped reads for each sample. This is pretty straighforward using samtools or picard.
    Now I would like to regenerate new "junctions.bed" files from the downsampled BAM files. I couldn't find in the log files which program of the tophat package does that.
    Anyone ever dealt with a similar issue? Any hint?
    Thanks for your help
    Julien

  • #2
    I am in the process of writing a perl script that generates the file junctions.bed from a given tophat alignment BAM file and I discovered a small bug that sometimes affects the definition of the flanking regions covered by mapped reads next to the exon-exon junction. When there is an insertion or a deletion on one side of the junction, the number of flanking covered nucleotides is not counted correctly.
    For example in the following cigar line 14M1I90M664N2M, there are 14+90 nucleotides covered on the left of the junction, but tophat erroneously counts only 14. The same thing happens when there is a deletion.

    I thought this would be of interest for people working on related matters.
    Julien

    Comment


    • #3
      VCF file from lifescope

      hi everyone
      i need help in understanding the VCF file output
      GT: GQ : DP: FDP: AD: AST: AMQV
      0/1: 13: 22: 13: 2,9: 2,6: 31,21
      CHROM POS ID REF ALT QUAL FILTER INFO
      chr1 762273 . G A . PASS DP=22
      Result:
      At this site GT is 0/1 (i guess it is heterozygous) the confidence is not good GQ=13 largely because there are only 22 reads at this site DP= 22, why AD is 2,9 what i understand is it should be equal to DP
      i read somewhere that this Sum can be smaller than DP because low-quality bases are not counted is it correct.
      how will this FDP, AST and AMQV be interpreted?
      Another question is
      how i can correlate the SNPse to specific diseases by comparing allelic frequencies

      Comment


      • #4
        Hi Julien, very important information. Thank you.

        I have the same needness, and find sam_juncs, component of tophat, can output the junction position information without depth and overhang.

        The needness of a script to get junctions.bed from bam file produced by tophat is, from my point:
        I want to subset the bam file by several different criteria and check the influence on the alternative splicing detection. This script will make the alignment unnecessary.

        Thank you.
        Last edited by pengchy; 07-09-2013, 06:58 PM. Reason: add information

        Comment


        • #5
          There is a bug for sam_juncs script, which is 1 smaller than the output of bed_to_juncs script.
          here is the example:
          output of sam_juncs
          Code:
          Scaffold184317  456     11046   +
          Scaffold184317  11239   21151   +
          Scaffold184317  21385   29565   +
          Scaffold184317  34619   60733   +
          Scaffold184317  61209   68778   +
          Scaffold184317  69066   75556   +
          Scaffold184317  75631   84372   +
          Scaffold184317  75631   102072  +
          Scaffold184317  104376  112269  -
          Scaffold184317  112465  115516  -
          Scaffold184317  115669  115888  -
          Scaffold184317  116034  116975  -
          Scaffold184317  117130  117392  -
          Scaffold184317  117631  123099  -
          Scaffold184317  123260  134557  -
          here is the output of bed_to_juncs
          Code:
          Scaffold184317  457     11046   +
          Scaffold184317  11240   21151   +
          Scaffold184317  21386   29565   +
          Scaffold184317  34620   60733   +
          Scaffold184317  61210   68778   +
          Scaffold184317  69067   75556   +
          Scaffold184317  75632   84372   +
          Scaffold184317  75632   102072  +
          Scaffold184317  104377  112269  -
          Scaffold184317  112466  115516  -
          Scaffold184317  115670  115888  -
          Scaffold184317  116035  116975  -
          Scaffold184317  117131  117392  -
          Scaffold184317  117632  123099  -
          Scaffold184317  123261  134557  -

          Comment


          • #6
            Try regtools - https://github.com/griffithlab/regtools, the `junctions extract` command will do this.

            Comment

            Latest Articles

            Collapse

            • seqadmin
              Techniques and Challenges in Conservation Genomics
              by seqadmin



              The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

              Avian Conservation
              Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
              03-08-2024, 10:41 AM
            • seqadmin
              The Impact of AI in Genomic Medicine
              by seqadmin



              Artificial intelligence (AI) has evolved from a futuristic vision to a mainstream technology, highlighted by the introduction of tools like OpenAI's ChatGPT and Google's Gemini. In recent years, AI has become increasingly integrated into the field of genomics. This integration has enabled new scientific discoveries while simultaneously raising important ethical questions1. Interviews with two researchers at the center of this intersection provide insightful perspectives into...
              02-26-2024, 02:07 PM

            ad_right_rmr

            Collapse

            News

            Collapse

            Topics Statistics Last Post
            Started by seqadmin, 03-14-2024, 06:13 AM
            0 responses
            32 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 03-08-2024, 08:03 AM
            0 responses
            71 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 03-07-2024, 08:13 AM
            0 responses
            79 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 03-06-2024, 09:51 AM
            0 responses
            68 views
            0 likes
            Last Post seqadmin  
            Working...
            X