Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • DESeq2 SummarizedExperiment help

    Hi!

    I feel totally stupid, but I have 48 BAMs after alignment with Tophat2 and want to try DESeq2 for diff.exp. analysis.

    I dont understand what "SummarizedExperiment" is and how to create it? I have googled to death, but now I have to ask here.

    Thank you so much, this is a new field to me.

  • #2
    Have you used htseq-count yet to get counts?

    Comment


    • #3
      Hi!
      Do I have to? Im stuck here:



      Im trying the "2 A First Example", but I dont understand what "features <- GRanges" is (seqnames/IRanges).

      Im also not quite sure how to type in "group_id": I have 2 different groups at two timepoints (as you already know).

      Thank you very much!
      Last edited by sindrle; 10-16-2013, 11:30 AM.

      Comment


      • #4
        Well, you don't have to do anything you don't want to do :P However, it's generally a bit easier to just use a quick command line script than to try to deal with GenomicRanges. The later can do it, but has a much steeper learning curve and will just produce the same results (unless you're trying to do something unusual).

        Regarding your GRanges question, seqnames would be things like "chr1" (UCSC chromosome naming scheme) or "10" (Ensembl naming scheme). IRanges are genomic bounds, so just the 5' and 3' boundaries of an exon/CDS/whatever. In general, you can get that stuff all made for you by reading in a GTF or GFF file (I've used import.gff from rtracklayer). The "group_id" in the example is more of a place-holder for gene_id later on. In essence, it's probably meant to be the actual feature level that's counted over (though the don't do that in the example), so you don't actually need it that example. In a real case, you'd read in your annotation file into a GRanges object and then
        Code:
         split(GRangesObject, mcols(GRangesObject)$label_of_interest)
        to get a GRangesList that you would then summarizeOverlaps on. Here, label_of_interest would probably be the gene_id field of the annotation file (though it could be something else, such as sprintf("%s:E%03d", mcols(blah)$gene_id, mcols(blah)$exon), which is what you'd do for DEXSeq counts).

        Comment


        • #5
          Thank you so much!

          I installed HTseq and have started HTseq-count on all 48 BAMs.

          From one BAM, is this a problem?:

          no_feature 7984183
          ambiguous 521840
          too_low_aQual 0
          not_aligned 0
          alignment_not_unique 13730878


          And also, can you compress the HTseq-count output? Its 16gb for each file..
          Last edited by sindrle; 10-17-2013, 05:46 AM.

          Comment


          • #6
            Ah, you're saving the wrong file, don't use -o. The command to execute is something like:
            Code:
            samtools view namesorted.bam | htseq-count -m intersection-nonempty -s no -a 10 - GTF_FILE > namesorted.counts
            That file will be much much much smaller and is all that you need.

            Comment


            • #7
              I should add, "namesorted.bam" and "GTF_FILE" above are just place-holder names. You'll want to change "namesorted.counts" to something meaningful so you can keep track of which counts are associated with which sample. Finally, those options for htseq-count are just examples, you can change them to fit your needs.

              Comment


              • #8
                Great!
                I aborted the run and updated the CMD as you wrote it.

                I was using the "union" option, but changed it to "intersection-nonempty". But I dont actually know why.. :P

                Thanks again!
                Last edited by sindrle; 10-17-2013, 06:45 AM.

                Comment


                • #9
                  Originally posted by sindrle View Post
                  Great!
                  I aborted the run and updated the CMD as you wrote it.

                  I was using the "union" option, but changed it to "intersection-nonempty". But I dont actually know why.. :P

                  Thanks again!
                  take a look here to see the difference between the union and the intersection-nonempty.

                  It basically tells htseq-count how to deal with reads matched to two genes, or what to do if the overlap isn't complete.

                  Comment


                  • #10
                    Thanks I have looked at that one, but what is the implications for differential expression?

                    Comment


                    • #11
                      As far as I understand it, depends on the option you take you'll be counting more or less reads.

                      If I am not mistaken, reads marked as ambiguous are not taken into consideration in the count of read numbers. So if you take union you'll have more ambiguous reads than the intersection options.

                      BTW, take a look at featureCounts from the sub read package. It works also very good with the data. It finds more results than htseq-counts and works faster.

                      Comment

                      Latest Articles

                      Collapse

                      • seqadmin
                        Recent Advances in Sequencing Analysis Tools
                        by seqadmin


                        The sequencing world is rapidly changing due to declining costs, enhanced accuracies, and the advent of newer, cutting-edge instruments. Equally important to these developments are improvements in sequencing analysis, a process that converts vast amounts of raw data into a comprehensible and meaningful form. This complex task requires expertise and the right analysis tools. In this article, we highlight the progress and innovation in sequencing analysis by reviewing several of the...
                        05-06-2024, 07:48 AM
                      • seqadmin
                        Essential Discoveries and Tools in Epitranscriptomics
                        by seqadmin




                        The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
                        04-22-2024, 07:01 AM

                      ad_right_rmr

                      Collapse

                      News

                      Collapse

                      Topics Statistics Last Post
                      Started by seqadmin, Yesterday, 02:46 PM
                      0 responses
                      11 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, 05-07-2024, 06:57 AM
                      0 responses
                      13 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, 05-06-2024, 07:17 AM
                      0 responses
                      17 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, 05-02-2024, 08:06 AM
                      0 responses
                      23 views
                      0 likes
                      Last Post seqadmin  
                      Working...
                      X