Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Tophat junctions.bed

    Has anyone seen a wierd junctions file as an output. Although most of my paired-end reads aligned to the genome (both pairs), my junctions.bed file doesn't link together many of my exons. In fact, most of the links that were made are not even between exons, rather they were intergeneic and scattered.

    This is not due to loading it onto the wrong assembly. Has anyone else had this problem with paired end reads?

  • #2
    Also, the coverage.wig looks good.

    Comment


    • #3
      See the image attached

      Here is a good example of what I'm talking about. For this gene I have a lot of coverage, but only 3 splice junctions are detected. Any ideas?
      Attached Files

      Comment


      • #4
        Originally posted by RockChalkJayhawk View Post
        Here is a good example of what I'm talking about. For this gene I have a lot of coverage, but only 3 splice junctions are detected. Any ideas?
        I looked at my TopHat results for a 75nt run (18M paired-end reads, illumina GA IIx), there are 12 junctions detected in chr1:879,584-894,679 region, however, only 4 junctions have more than 5x coverage. If you have shorter reads and lower coverage than
        my data, you may see less number of junctions here.

        FYI, I detected 185,238 total junctions.

        Here is the 4 junctions from my 75nt run:
        chr1 887451 887857 JUNC00121200 40 - 887451 887857 255,0,0 2 68,66 0,340
        chr1 889202 889440 JUNC00121202 37 - 889202 889440 255,0,0 2 70,57 0,181
        chr1 891331 891529 JUNC00121205 17 - 891331 891529 255,0,0 2 62,55 0,143
        chr1 894390 894642 JUNC00121207 7 - 894390 894642 255,0,0 2 71,48 0,204

        Comment


        • #5
          Originally posted by RockChalkJayhawk View Post
          Has anyone seen a wierd junctions file as an output. Although most of my paired-end reads aligned to the genome (both pairs), my junctions.bed file doesn't link together many of my exons. In fact, most of the links that were made are not even between exons, rather they were intergeneic and scattered.

          This is not due to loading it onto the wrong assembly. Has anyone else had this problem with paired end reads?
          This would depend on the settings you use. SNPs in the right location could cause the junction search to fail.

          What do you mean by most? if you mean over 50% of your gene exhibit that kind of behavior despite good coverage that is a bit strange. Maybe you should try less strict settings.

          or you could give SpliceMap a try
          SpliceMap: De novo detection of splice junctions from RNA-seq
          Download SpliceMap Comment here

          Comment


          • #6
            Originally posted by john_mu View Post
            This would depend on the settings you use. SNPs in the right location could cause the junction search to fail.

            What do you mean by most? if you mean over 50% of your gene exhibit that kind of behavior despite good coverage that is a bit strange. Maybe you should try less strict settings.

            or you could give SpliceMap a try
            I thought about it, but my reads are 2x36 -- too short for SpliceMap. I'm still playing around with some of the parameters...

            Comment


            • #7
              Hi lifeng.tian,

              I'm running tophat with the following command:
              Code:
              tophat -r 110 -p 4 --solexa1.3-quals --library-type fr-unstranded -G homo_sapiens.gtf human_reference_genome s1_1.fq s1_2.fq
              OUtput file generated from tophat:
              Code:
              junctions.bed
              insertions.bed
              deletions.bed
              logs/
              accepted_hits.bam
              Based on the junctions.bed, can I know that how you group them based on the region in each chromosome?
              This is the total number of junction identify by tophat:
              Code:
              chr1    13677
              chr10   5130
              chr11   7780
              chr12   7558
              chr13   2293
              chr14   4818
              chr15   4368
              chr16   6604
              chr17   8343
              chr18   1778
              chr19   8369
              chr2    9800
              chr20   3278
              chr21   1460
              chr22   3419
              chr3    7619
              chr4    4735
              chr5    5970
              chr6    6789
              chr7    6513
              chr8    4306
              chr9    5461
              chrX    4042
              chrY    312
              What is the important to identify its coverage at this stage?
              Is it the figure in column 4 represent the coverage of each junction?
              Really thanks for any of your advice to interpret the junctions.bed shown in tophat_out.
              Many thanks in advance.

              Comment


              • #8
                Originally posted by lifeng.tian View Post
                I looked at my TopHat results for a 75nt run (18M paired-end reads, illumina GA IIx), there are 12 junctions detected in chr1:879,584-894,679 region, however, only 4 junctions have more than 5x coverage. If you have shorter reads and lower coverage than
                my data, you may see less number of junctions here.

                FYI, I detected 185,238 total junctions.

                Here is the 4 junctions from my 75nt run:
                chr1 887451 887857 JUNC00121200 40 - 887451 887857 255,0,0 2 68,66 0,340
                chr1 889202 889440 JUNC00121202 37 - 889202 889440 255,0,0 2 70,57 0,181
                chr1 891331 891529 JUNC00121205 17 - 891331 891529 255,0,0 2 62,55 0,143
                chr1 894390 894642 JUNC00121207 7 - 894390 894642 255,0,0 2 71,48 0,204
                for this file (in response to #4) how did you calculate the total amount of junctions? is it the total amount of entries in your junctions.bed file? ie. the amount of columns in your file?

                Comment

                Latest Articles

                Collapse

                • seqadmin
                  Essential Discoveries and Tools in Epitranscriptomics
                  by seqadmin



                  The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist on Modified...
                  Yesterday, 07:01 AM
                • seqadmin
                  Current Approaches to Protein Sequencing
                  by seqadmin


                  Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                  04-04-2024, 04:25 PM

                ad_right_rmr

                Collapse

                News

                Collapse

                Topics Statistics Last Post
                Started by seqadmin, 04-11-2024, 12:08 PM
                0 responses
                55 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 04-10-2024, 10:19 PM
                0 responses
                52 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 04-10-2024, 09:21 AM
                0 responses
                45 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 04-04-2024, 09:00 AM
                0 responses
                55 views
                0 likes
                Last Post seqadmin  
                Working...
                X