Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #16
    Another check:

    My hg18.fa constructed from hg18 UCSC from TopHat website has the size of 3131776827 bytes, same as yours?

    Comment


    • #17
      Originally posted by jiwu2573 View Post
      I only know my fastq file for 1 sample is around 3 GB after converting and joining all the 120 qseq.txt files, not sure how to find out how many reads in total? How do you know?
      I guess there are more 10 million reads. You can count the number of lines of FASEQ file, and the number of reads equals to 1/4 of the number of lines.

      The read length is 76 bp. I am running tophat with the default configuration without any argument except --solexa1.3-quals. I guess you are designating the number of mismatches, number of multi-aligned loci by the argument. If that's the case, what number do you use?
      I usually discard all the multi-reads, by specifying "-g 1". I think it is normal to take several hours to finish your job.

      PS. I am running TopHat through univ connection to the Linux server. Is it supposed to be faster than running on my local computer? How many processors do you have in your computer? Is a normal PC enough?
      I think so. The computational server will be more efficient than PC.
      Xi Wang

      Comment


      • #18
        Originally posted by jiwu2573 View Post
        Another question:

        Is there any need to run Bowtie alone as TopHat will call Bowtie anyway?
        You can just run Tophat.
        Xi Wang

        Comment


        • #19
          Originally posted by jiwu2573 View Post
          Another check:

          My hg18.fa constructed from hg18 UCSC from TopHat website has the size of 3131776827 bytes, same as yours?
          It should be ok. The size of mine is 3169831337. How about anybody else?
          Xi Wang

          Comment


          • #20
            Hi Xi,

            Thanks a lot for your help.

            Last night I managed to run 1 sample by tophat successfully (it took 17 hours).

            I tried to visualize the output, coverage.wig and junctions.bed in UCSC genome browser.

            When I load coverage.wig, it shows Error File 'coverage.wig' - Error line 3771557 of custom track: chromEnd less than 1 (0)

            When I load junctions.bed, it only shows chromosome 20?
            Name Description Type Doc Items Pos
            junctions TopHat junctions bed 80207 chr20:

            By the way, my junctions file is 6430K and my coverage file is around 173M, are these normal?

            How do you do your visualization?

            Do you use Cufflinks to quantify the expression after TopHat?

            Comment


            • #21
              1. for 'coverage.wig', you can use a text editor to see what happened in line 3771557.
              2. for "junctions.bed", you can switch the chr by typing other chromosome ID in the blank following "position/search" . E.g: chr1
              3. the sizes of the file look fine.

              You can alternatively use some local viewers to visualize the results. But UCSC genome browser is recommended. And Cufflink is useful for identifying transcripts with expression.
              Xi Wang

              Comment


              • #22
                Hi Xi,

                After I run Cufflinks and Cuffcompare, any output files can be used directly to feed into DEGseq to get differential expression analysis? I have 2 groups with each group 2 replicates.

                If conversion of files are needed, could you please tell me the easiest and fastest way to get DEGseq running for my analysis done by TopHat and then Cufflinks and Cuffcompare?

                Comment


                • #23
                  Another question is about identification of novel transcripts.

                  I understand that junctions.bed gives out all the possible splices, but how can we dig out the novel form from all these data? Certainly not by eyeballing through UCSC genome browser?

                  Comment


                  • #24
                    I also get huge size of tophat_out folder (~22 GB for 1 human RNA seq).

                    Do you recommend to keep all these folders untouched or just keep sam, wig, and bed files instead? The inside folder log and tmp not useful at all?

                    Comment


                    • #25
                      1. DEGseq: you can use my perl script for the conversion from SAM to BED (attached). I didn't test the script much, if any problem, let me know.
                      2. New transcript/junction: there should be a tool for doing such thing. Anyone knows such tools?
                      3. Tophat_out folder: the newest version of tophat will delete the tmp files automaticly, and I don't think the tmp files much help.
                      Attached Files
                      Xi Wang

                      Comment


                      • #26
                        Thanks!

                        I've converted all my samples. No error message!

                        I know there are paper and manual explaining how to use DEGSeq, I am just a bit lazy, well, it does take long for me to digest everything.

                        Could you please just tell me the command line I put in for my 4 bed files (2 groups with 2 samples in each group) for DE gene analysis?

                        Comment


                        • #27
                          I put an example below. Before you run DEGseq, you should first download a gene annotation file in refFlat format, maybe from UCSC genome browser. Some parameters for DEGseq should also be tuned according to your analysis purpose.

                          Code:
                            mapResultBatch1 <- c("group1_file1", "group1_file2")  
                            mapResultBatch2 <- c("group2_file2", "group2_file2")
                            outputDir <- "DEGseq_out_dir"
                            refFlat <- "gene annotation in refFlat format"
                            DEGseq(mapResultBatch1, mapResultBatch2, fileFormat="bed", refFlat=refFlat,
                                   outputDir=outputDir, method="MARS")
                          Last edited by Xi Wang; 01-21-2010, 08:24 PM.
                          Xi Wang

                          Comment


                          • #28
                            Hi Xi,

                            I've run the DEGseq, it works!

                            But I doubt actually I should use samWrapper instead?
                            2 groups are case and control, 2 samples in each group are from different individuals.
                            DEGseq seems to deal with the same sample with technology replicates, but not biology replicates.

                            Is my understanding correct? If use samWrapper, could you please tell me the command line as I can hardly copy your examples in the manual? Thanks

                            Comment


                            • #29
                              Hi,

                              Yes, for biological replicates, it is better to use samWrapper. But I am afraid that your sample size (2 vs 2) is a little bit small.
                              A piece of code example for samWrapper:
                              Code:
                              # get gene express
                                mapResultBatch <- c("group1_file1", "group1_file2", "group2_file2", "group2_file2")  
                                output <- "gene.exp"
                                refFlat <- "gene annotation in refFlat format"
                                exp <- getGeneExp(mapResultBatch, refFlat = refFlat, output = output)
                              
                              # apply samWrapper
                              geneExpFile <- "gene.exp"
                              set.seed(100)
                              geneExpFile1 <- geneExpFile
                              geneExpFile2 <- geneExpFile
                              output <- "samWrapperOut.txt"
                              expCol1 = c(2, 4)
                              expCol2 = c(6, 8)
                              measure1 = c(-1, -2)
                              measure2 = c(1, 2)
                              samWrapper(geneExpFile1 = geneExpFile1, geneCol1 = 1, expCol1 = expCol1, measure1 = measure1, 
                              geneExpFile2 = geneExpFile2, geneCol2 = 1, expCol2 = expCol2, measure2 = measure2, 
                              nperms = 100, min.foldchange = 2, max.qValue = 1e-04, output = output, paired = TRUE)
                              Xi Wang

                              Comment


                              • #30
                                Hi Xi,

                                The samWrapper also works well. But I am a bit confused about the column designation.

                                When I looked at the output "gene.exp", 1st col gives the gene ID, 2nd col for raw counts, 3rd col for RPKM, and 4th for all reads, then the pattern keeps on repeating for each sample. In this order, expCol1 should be col 2 and 5 (raw counts columns), but how comes 2 and 4 as you wrote?

                                Also I don't understand measure1 or measure2. What's the use of them? By the way, my samples are unpaired.

                                Comment

                                Latest Articles

                                Collapse

                                • seqadmin
                                  Recent Advances in Sequencing Analysis Tools
                                  by seqadmin


                                  The sequencing world is rapidly changing due to declining costs, enhanced accuracies, and the advent of newer, cutting-edge instruments. Equally important to these developments are improvements in sequencing analysis, a process that converts vast amounts of raw data into a comprehensible and meaningful form. This complex task requires expertise and the right analysis tools. In this article, we highlight the progress and innovation in sequencing analysis by reviewing several of the...
                                  Today, 07:48 AM
                                • seqadmin
                                  Essential Discoveries and Tools in Epitranscriptomics
                                  by seqadmin




                                  The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
                                  04-22-2024, 07:01 AM

                                ad_right_rmr

                                Collapse

                                News

                                Collapse

                                Topics Statistics Last Post
                                Started by seqadmin, Today, 07:17 AM
                                0 responses
                                11 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 05-02-2024, 08:06 AM
                                0 responses
                                19 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 04-30-2024, 12:17 PM
                                0 responses
                                20 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 04-29-2024, 10:49 AM
                                0 responses
                                28 views
                                0 likes
                                Last Post seqadmin  
                                Working...
                                X