Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • tophat does not find any junction

    Dear all,

    I am very new to the NGS world and my question might seem stupid.

    In order to train using RNA-seq data, I downloaded the datasets coming from the paper by Marioni et al (2008) (see http://genome.cshlp.org/content/18/9/1509.full) where they compare the respective merits of affymetrix microarrays vs RNA sequencing.

    I decided to apply tophat on this tool in order to do the mapping.

    This dataset is composed of 2 sets of short reads of size 36 nucleotides (2 lanes for the sequecing of the kidney transcriptome as well as 2 lanes for the liver).

    I used these commands :for the kidney and the liver respectively :

    Code:
    tophat --segment-length 12 -p 11 -o kidney_out $BOWTIE_INDEXES/complete_hg19_reference /data/datasylvain/marioni/kidney_2runs_1.fastq,/data/datasylvain/marioni/kidney_2runs_2.fastq
    
    tophat -p 11 --segment-length 12 -o liver_out $BOWTIE_INDEXES/complete_hg19_reference /data/datasylvain/marioni/liver_2runs_1.fastq,/data/datasylvain/marioni/liver_2runs_2.fastq
    The problem is that I never get any junction found and I don't understand why.
    E.g. : this is the content of the file log/reports.log

    Code:
    tophat_reports v1.3.2 (2689)
    ---------------------------------------
    [samopen] SAM header is present: 93 sequences.
    Loaded 0 junctions
    Reporting final accepted alignments...done.
    Printing junction BED track...done
    Printing insertions...done
    Printing deletions...done
    Found 0 junctions from happy spliced reads
    Any idea? Thanks a lot to all of you!

    Sylvain

  • #2
    Hi - I had quite the same problem that tophat 1.3.0 could not find any junctions, but the old version (1.2.0) can find. Have you tried 1.2.0?

    Comment


    • #3
      No, I haven't... Thanks for the hint! I'll give a try.
      Last edited by sbrohee; 09-30-2011, 07:20 AM. Reason: mispelling

      Comment


      • #4
        It would be great if you could let us know if it worked. I am planning to make the switch from tophat 1.2.0 to tophat 1.3.1 (1.3.2 is still in beta) and therefore am curious to know.

        Thank you.

        Comment


        • #5
          It's the first thing I'll try when arriving at work tomorrow morning! Of course, I'll let you know if I succeeded.

          Thanks to you all for your help.

          Cheers

          Comment


          • #6
            Unfortunately... The algorithm does not find any function with tophat 1.2!

            Do anyone has another idea?

            Moreover, would some of you be able to advice me a good dataset downloadable from some public repository and that you know it works? So that I could have some kind of positive control!

            Many many thanks!

            Comment


            • #7
              1) How many reads are there in each of the fastq files, each for kidney and liver?
              2) Are they preprocessed (clipped for adapters + removal of barcodes if any and optionally trimmed for quality)?

              3) Also, try adding these parameters and running again.
              -F 0 - (isoform fraction = 0.15 in tophat 1.3.2 and 1.2.0, I use -F 0).
              --library-type fr-unstranded (if its illumina, check other options if necessary)
              --solexa-quals (or) --solexa1.3-quals (if illumina, depending on which one was used). I hope you can get these from the literature.

              In short, set -F to 0 and both other parameters according to your data. Here I assume its Illumina. Let us know if it works. Good luck!

              Comment


              • #8
                1) Number of reads per file
                a) kidney_2runs_1.fastq : 39,266,713
                b) kidney_2runs_2.fastq : 27,137,793
                c) liver_2runs_1.fastq : 54,856,271
                d) liver_2runs_2.fastq : 14,761,931

                2) There is no indication about a given filtering in the paper nor on SRA (SRA000299)

                3) I re-ran the analysis using the parameters you adviced me to use ... more news in the coming hours / minutes.

                Code:
                tophat --solexa-quals -F 0 --segment-length 12 -p 22 -o /data/datasylvain/marioni/liver_out_3 $BOWTIE_INDEXES/complete_hg19_reference /data/datasylvain/marioni/liver_2runs_1.fastq,/data/datasylvain/marioni/liver_2runs_2.fastq
                Again, Thank you for your help!

                Comment


                • #9
                  ... still not working ...

                  If someone has the least idea ...

                  Thanks to all of you for the help you gave me!

                  Comment


                  • #10
                    After some brain tapping and coffee, try removing the --segment-length 12 and use tophat 1.2.0 and see if it works.

                    Arun.

                    Comment


                    • #11
                      Thanks for the hint. I'll test that tomorrow morning and tell you what!

                      Comment


                      • #12
                        Bon...

                        Apparently, the solution of cedance does work ... and I have a nice BAM output file with the options he proposed. Thanks a lot for your help!

                        Comment


                        • #13
                          Great! I wanted to localize the problem and therefore asked you to remove the --segment-length parameter. I am glad it works. But probably, you can play with the --segment-length and --segment-mismatches parameter (to 0 or 1 as they explain in their website) and try to see if it works. It seems a bit tricky. In the event of it not working, I guess the developers should be informed.

                          How many reads are uniquely mapped and how good do you find your results??

                          Comment

                          Latest Articles

                          Collapse

                          • seqadmin
                            Strategies for Sequencing Challenging Samples
                            by seqadmin


                            Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                            03-22-2024, 06:39 AM
                          • seqadmin
                            Techniques and Challenges in Conservation Genomics
                            by seqadmin



                            The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                            Avian Conservation
                            Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                            03-08-2024, 10:41 AM

                          ad_right_rmr

                          Collapse

                          News

                          Collapse

                          Topics Statistics Last Post
                          Started by seqadmin, 03-27-2024, 06:37 PM
                          0 responses
                          12 views
                          0 likes
                          Last Post seqadmin  
                          Started by seqadmin, 03-27-2024, 06:07 PM
                          0 responses
                          11 views
                          0 likes
                          Last Post seqadmin  
                          Started by seqadmin, 03-22-2024, 10:03 AM
                          0 responses
                          52 views
                          0 likes
                          Last Post seqadmin  
                          Started by seqadmin, 03-21-2024, 07:32 AM
                          0 responses
                          68 views
                          0 likes
                          Last Post seqadmin  
                          Working...
                          X