Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Tophat/cufflinks workflow question

    If I have 6 samples, call them 1,2,3,4,5,6 (they could be 6 lanes or a 6 way multiplexed experiement). Then I want to compare 1,2,3 vs 4,5,6.

    So if I map the reads using tophat, I assume I can run all the 1,2,3,4,5,6 in parallel and then cat and sort the relavant sam files together before I run cufflinks.

    So

    cat tophat1 tophat2 tophat3 | sort > 123.sam
    cat tophat4 tophat5 tophat6 | sort > 456.sam

    Then if I run cufflinks on 123.sam and 456.sam I get 123.transcripts.gtf and 456.transcripts.gtf.

    So I was wondering what is the next step? Can I do

    cuffcompare 123.transcripts.gtf -r geneannotations.gtf
    cuffcompare 456.transcripts.gtf -r geneannotations.gtf

    (geneannotations.gtf is an Ensembl annotations file)

    so I get

    123.transcripts.tmap and 456.transcripts.tmap

    then I just compare the two data sets RPKM values using Ensembl ID as the in the same way you would use probe_id in a traditional array experiement?

    Do I need to do any further normalization?

    Am I missing something!?

    Thanks in advance!

  • #2

    Then if I run cufflinks on 123.sam and 456.sam I get 123.transcripts.gtf and 456.transcripts.gtf.

    So I was wondering what is the next step? Can I do

    cuffcompare 123.transcripts.gtf -r geneannotations.gtf
    cuffcompare 456.transcripts.gtf -r geneannotations.gtf

    (geneannotations.gtf is an Ensembl annotations file)

    You probably want:
    cuffcompare -r geneannotations.gtf 123.transcripts.gtf

    Looking then at the out.tracking file will then tell you how the Ensembl transcripts (along with whatever new transcripts you find) compare to each other in the two samples.


    Do I need to do any further normalization?


    That's a tougher question. RPKM is a measure of relative gene expression designed to allow comparisons between experiments, just as you are asking to do. There are a fair number of people working on ways of correcting and normalizing for various biases in RNA-Seq, but there are no widely accepted means of doing so that I am aware of.

    Comment


    • #3
      Originally posted by Cole Trapnell View Post
      You probably want:
      cuffcompare -r geneannotations.gtf 123.transcripts.gtf

      Looking then at the out.tracking file will then tell you how the Ensembl transcripts (along with whatever new transcripts you find) compare to each other in the two samples.
      Do you mean I need to do

      cuffcompare -r geneannotations.gtf 123.transcripts.gtf 456.transcripts.gtf

      otherwise how will the tracking file know about the two samples?

      That's a tougher question. RPKM is a measure of relative gene expression designed to allow comparisons between experiments, just as you are asking to do. There are a fair number of people working on ways of correcting and normalizing for various biases in RNA-Seq, but there are no widely accepted means of doing so that I am aware of.
      Ok. Thanks for your help on this.

      Comment


      • #4
        Do you mean I need to do

        cuffcompare -r geneannotations.gtf 123.transcripts.gtf 456.transcripts.gtf

        otherwise how will the tracking file know about the two samples?
        Yes, sorry. That's what I get for responding before coffee.

        Comment


        • #5
          :-) No problem.

          Thanks for your help.

          Comment


          • #6
            I did

            cuffcompare -r geneannotations.gtf 123.transcripts.gtf 456.transcripts.gtf

            and got:

            - - q1:CUFF.49|CUFF.49.0|100|320.741503|0.000000|0.000000|uniq -
            - - q1:CUFF.922|CUFF.922.0|100|2080.321131|0.211081|0.210872|uniq -
            - - q1:CUFF.2093|CUFF.2093.0|100|320.213468|0.000000|0.000000|uniq -
            - - q1:CUFF.923|CUFF.923.1|100|2498.467902|0.159820|0.159979|uniq -
            - - q1:CUFF.2964|CUFF.2964.0|100|136.004645|0.000000|0.000000|uniq -
            - - q1:CUFF.3939|CUFF.3939.0|100|2146.036538|0.000000|0.000000|uniq -
            - - q1:CUFF.4125|CUFF.4125.0|100|612.020902|0.000000|0.000000|uniq -
            - - q1:CUFF.4410|CUFF.4410.0|100|4534.343539|0.000000|0.000000|uniq -
            - - q1:CUFF.5384|CUFF.5384.0|100|1515.019712|0.000000|0.000000|uniq -
            - - q1:CUFF.5667|CUFF.5667.0|100|1515.969648|0.000000|0.000000|uniq -
            - - q1:CUFF.5952|CUFF.5952.0|100|5568.492065|0.000000|0.000000|uniq -
            - - q1:CUFF.6333|CUFF.6333.0|100|2299.714905|0.000000|0.000000|uniq -
            - - - q2:CUFF.611|CUFF.611.0|100|2209.360572|0.000000|0.000000|uniq
            - - q1:CUFF.7536|CUFF.7536.0|100|702.690665|0.000000|0.000000|uniq -
            - - q1:CUFF.7629|CUFF.7629.0|100|3756.201011|0.000000|0.000000|uniq -
            - - q1:CUFF.9212|CUFF.9212.0|100|417.816071|0.000000|0.000000|uniq -
            - - q1:CUFF.9554|CUFF.9554.0|100|9625.347396|0.000000|0.000000|uniq -
            - - q1:CUFF.9797|CUFF.9797.0|100|2347.966478|0.000000|0.000000|uniq -
            - - q1:CUFF.9965|CUFF.9965.0|100|702.690665|0.000000|0.000000|uniq -
            - - q1:CUFF.8952|CUFF.8952.0|100|719.829462|0.000000|0.000000|uniq -
            - - q1:CUFF.10691|CUFF.10691.0|100|1806.918854|0.000000|0.000000|uniq -
            - - q1:CUFF.10799|CUFF.10799.1|100|3619.236510|0.284840|0.284870|uniq -
            - - q1:CUFF.10839|CUFF.10839.0|100|2153.582326|0.000000|0.000000|uniq -
            - - q1:CUFF.11538|CUFF.11538.0|100|7683.301808|0.000000|0.000000|uniq -
            - - q1:CUFF.10782|CUFF.10782.0|100|89.705191|0.000000|0.000000|uniq -
            - - q1:CUFF.10800|CUFF.10800.0|100|3902.596250|0.174811|0.174792|uniq -
            - - q1:CUFF.10825|CUFF.10825.0|100|6510.429805|0.000000|0.000000|uniq -
            - - q1:CUFF.12645|CUFF.12645.0|100|58.557555|0.000000|0.000000|uniq -
            - - q1:CUFF.13320|CUFF.13320.0|100|1702.673535|0.000000|0.000000|uniq -
            - - q1:CUFF.12088|CUFF.12088.0|100|6322.264673|0.000000|0.000000|uniq -
            - - q1:CUFF.13147|CUFF.13147.0|100|8946.451886|0.000000|0.000000|uniq -
            - - q1:CUFF.14256|CUFF.14256.0|100|782.998170|0.000000|0.000000|uniq -
            - - q1:CUFF.14910|CUFF.14910.0|100|1953.822826|0.000000|0.000000|uniq -
            - - - q2:CUFF.1367|CUFF.1367.0|100|2051.549103|0.000000|0.000000|uniq
            - - q1:CUFF.15282|CUFF.15282.0|100|1686.457597|0.000000|0.000000|uniq -
            - - q1:CUFF.15535|CUFF.15535.0|100|2326.148410|0.000000|0.000000|uniq -
            - - q1:CUFF.16741|CUFF.16741.0|100|1297.275075|0.000000|0.000000|uniq -
            - - q1:CUFF.17421|CUFF.17421.0|100|3055.953562|0.000000|0.000000|uniq -
            - - q1:CUFF.17710|CUFF.17710.0|100|2845.146363|0.000000|0.000000|uniq -
            - - q1:CUFF.19743|CUFF.19743.0|100|133.631524|0.000000|0.000000|uniq -
            - - q1:CUFF.19375|CUFF.19375.0|100|2369.916793|0.000000|0.000000|uniq -
            - - q1:CUFF.19582|CUFF.19582.0|100|4355.505260|0.000000|0.000000|uniq -
            - - q1:CUFF.19789|CUFF.19789.0|100|5320.372181|0.000000|0.000000|uniq -
            - - q1:CUFF.22524|CUFF.22524.0|100|647.153215|0.000000|0.000000|uniq -
            - - q1:CUFF.22483|CUFF.22483.0|100|21957.950620|0.000000|0.000000|uniq -
            - - q1:CUFF.22641|CUFF.22641.0|100|13234.040114|0.000000|0.000000|uniq -

            Does this mean there are no overlaps for annotated genes? This sample was mutiplexed on 1 lane so there aren't as many reads as you would normally expect. Would this have something to do with it?

            Comment


            • #7
              In cufflinks document, cufflinks only accepted input SAM format by “sort –k 3,3 –k 4,4n *.sam”. but when I use the *sam format generated by BWA, and type sort command, the format change from

              NCI-GA2:1:1:4:358#0 147 chr1 53372939 60 35M = 53372816 -158 ATGGGCTGGAT
              GATCCCTGTTCAGGCCTAATCCGC A>BAA>=BBBBBB@>?BBBA@8>BB>BBABB>BCB XT:A:U NM:i:0 SM:i:37 AM:i:23 X0:i:1 X1:
              i:0 XM:i:0 XO:i:0 XG:i:0 MD:Z:35
              NCI-GA2:1:1:4:1683#0 99 chrUn_gl000220 159630 0 35M = 159744 149 CTAGGGCGCGGGCCCGGGT
              GGAGCCGCCGCAGGTG BB@A=?B<B?:>BBBAA<9BABABBBBB<BA=78A XT:A:R NM:i:0 SM:i:0 AM:i:0 X0:i:2 X1:i:0 XM:
              i:0 XO:i:0 XG:i:0 MD:Z:35

              to

              NCI-GA2:1:100:1000:1645#0 133 * 0 0 * * 0 0 TCCTCCTTTTTCACTTGAT
              CCCACCGATGTCTGCC BCBCCCBBCCCBCB@BACABBCCCA>BAAABABBB
              NCI-GA2:1:100:1000:1645#0 69 * 0 0 * * 0 0 CAAGTCTGCATGGCTGTTG
              ACATAGGCAGACATCG BCCB=ABBABB@@AA@;>;A75/86=376;:;/9:
              NCI-GA2:1:100:1000:391#0 133 * 0 0 * * 0 0 TACCGCGGCTGCTGGCACC
              AGACTTGCCCAGATCG BAAAAABBAB>5>7998;<1=9:A@@>A@=>;=7=


              Some part of the line are missing, so cufflinks would not recognize it. How can I change the sort parameter to get correct format for cufflinks?

              Comment


              • #8
                Try filtering for hits only (simple way is grep for chr if your genome uses this in the fasta header line) and then sort.

                Comment

                Latest Articles

                Collapse

                • seqadmin
                  Strategies for Sequencing Challenging Samples
                  by seqadmin


                  Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                  03-22-2024, 06:39 AM
                • seqadmin
                  Techniques and Challenges in Conservation Genomics
                  by seqadmin



                  The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                  Avian Conservation
                  Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                  03-08-2024, 10:41 AM

                ad_right_rmr

                Collapse

                News

                Collapse

                Topics Statistics Last Post
                Started by seqadmin, Yesterday, 06:37 PM
                0 responses
                7 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, Yesterday, 06:07 PM
                0 responses
                7 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 03-22-2024, 10:03 AM
                0 responses
                49 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 03-21-2024, 07:32 AM
                0 responses
                66 views
                0 likes
                Last Post seqadmin  
                Working...
                X