Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • sindrle
    Senior Member
    • Aug 2013
    • 266

    DESeq2 SummarizedExperiment help

    Hi!

    I feel totally stupid, but I have 48 BAMs after alignment with Tophat2 and want to try DESeq2 for diff.exp. analysis.

    I dont understand what "SummarizedExperiment" is and how to create it? I have googled to death, but now I have to ask here.

    Thank you so much, this is a new field to me.
  • dpryan
    Devon Ryan
    • Jul 2011
    • 3478

    #2
    Have you used htseq-count yet to get counts?

    Comment

    • sindrle
      Senior Member
      • Aug 2013
      • 266

      #3
      Hi!
      Do I have to? Im stuck here:



      Im trying the "2 A First Example", but I dont understand what "features <- GRanges" is (seqnames/IRanges).

      Im also not quite sure how to type in "group_id": I have 2 different groups at two timepoints (as you already know).

      Thank you very much!
      Last edited by sindrle; 10-16-2013, 11:30 AM.

      Comment

      • dpryan
        Devon Ryan
        • Jul 2011
        • 3478

        #4
        Well, you don't have to do anything you don't want to do :P However, it's generally a bit easier to just use a quick command line script than to try to deal with GenomicRanges. The later can do it, but has a much steeper learning curve and will just produce the same results (unless you're trying to do something unusual).

        Regarding your GRanges question, seqnames would be things like "chr1" (UCSC chromosome naming scheme) or "10" (Ensembl naming scheme). IRanges are genomic bounds, so just the 5' and 3' boundaries of an exon/CDS/whatever. In general, you can get that stuff all made for you by reading in a GTF or GFF file (I've used import.gff from rtracklayer). The "group_id" in the example is more of a place-holder for gene_id later on. In essence, it's probably meant to be the actual feature level that's counted over (though the don't do that in the example), so you don't actually need it that example. In a real case, you'd read in your annotation file into a GRanges object and then
        Code:
         split(GRangesObject, mcols(GRangesObject)$label_of_interest)
        to get a GRangesList that you would then summarizeOverlaps on. Here, label_of_interest would probably be the gene_id field of the annotation file (though it could be something else, such as sprintf("%s:E%03d", mcols(blah)$gene_id, mcols(blah)$exon), which is what you'd do for DEXSeq counts).

        Comment

        • sindrle
          Senior Member
          • Aug 2013
          • 266

          #5
          Thank you so much!

          I installed HTseq and have started HTseq-count on all 48 BAMs.

          From one BAM, is this a problem?:

          no_feature 7984183
          ambiguous 521840
          too_low_aQual 0
          not_aligned 0
          alignment_not_unique 13730878


          And also, can you compress the HTseq-count output? Its 16gb for each file..
          Last edited by sindrle; 10-17-2013, 05:46 AM.

          Comment

          • dpryan
            Devon Ryan
            • Jul 2011
            • 3478

            #6
            Ah, you're saving the wrong file, don't use -o. The command to execute is something like:
            Code:
            samtools view namesorted.bam | htseq-count -m intersection-nonempty -s no -a 10 - GTF_FILE > namesorted.counts
            That file will be much much much smaller and is all that you need.

            Comment

            • dpryan
              Devon Ryan
              • Jul 2011
              • 3478

              #7
              I should add, "namesorted.bam" and "GTF_FILE" above are just place-holder names. You'll want to change "namesorted.counts" to something meaningful so you can keep track of which counts are associated with which sample. Finally, those options for htseq-count are just examples, you can change them to fit your needs.

              Comment

              • sindrle
                Senior Member
                • Aug 2013
                • 266

                #8
                Great!
                I aborted the run and updated the CMD as you wrote it.

                I was using the "union" option, but changed it to "intersection-nonempty". But I dont actually know why.. :P

                Thanks again!
                Last edited by sindrle; 10-17-2013, 06:45 AM.

                Comment

                • frymor
                  Senior Member
                  • May 2010
                  • 151

                  #9
                  Originally posted by sindrle View Post
                  Great!
                  I aborted the run and updated the CMD as you wrote it.

                  I was using the "union" option, but changed it to "intersection-nonempty". But I dont actually know why.. :P

                  Thanks again!
                  take a look here to see the difference between the union and the intersection-nonempty.

                  It basically tells htseq-count how to deal with reads matched to two genes, or what to do if the overlap isn't complete.

                  Comment

                  • sindrle
                    Senior Member
                    • Aug 2013
                    • 266

                    #10
                    Thanks I have looked at that one, but what is the implications for differential expression?

                    Comment

                    • frymor
                      Senior Member
                      • May 2010
                      • 151

                      #11
                      As far as I understand it, depends on the option you take you'll be counting more or less reads.

                      If I am not mistaken, reads marked as ambiguous are not taken into consideration in the count of read numbers. So if you take union you'll have more ambiguous reads than the intersection options.

                      BTW, take a look at featureCounts from the sub read package. It works also very good with the data. It finds more results than htseq-counts and works faster.

                      Comment

                      Latest Articles

                      Collapse

                      • SEQadmin2
                        Nine Things a Sample Prep Scientist Thinks About Before Sequencing
                        by SEQadmin2


                        I’m not a sequencing expert. I’m a purification scientist who uses NGS to evaluate workflows my group develops. With this perspective, we think about the sample first and the NGS workflow second. The sequencer is an exceptionally honest reporter, but it can only report on what you give it, so whether you get clean, interpretable data from an NGS workflow is largely determined before you begin.

                        Here are nine questions we think about, in roughly the order they matter, before...
                        06-18-2026, 07:11 AM
                      • SEQadmin2
                        From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
                        by SEQadmin2


                        Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


                        The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
                        ...
                        06-02-2026, 10:05 AM

                      ad_right_rmr

                      Collapse

                      News

                      Collapse

                      Topics Statistics Last Post
                      Started by SEQadmin2, Today, 05:37 AM
                      0 responses
                      5 views
                      0 reactions
                      Last Post SEQadmin2  
                      Started by SEQadmin2, 06-26-2026, 11:10 AM
                      0 responses
                      16 views
                      0 reactions
                      Last Post SEQadmin2  
                      Started by SEQadmin2, 06-17-2026, 06:09 AM
                      0 responses
                      49 views
                      0 reactions
                      Last Post SEQadmin2  
                      Started by SEQadmin2, 06-09-2026, 11:58 AM
                      0 responses
                      109 views
                      0 reactions
                      Last Post SEQadmin2  
                      Working...