Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • New to RNA-Seq: Help obtaining sequencing summary needed.

    We have just recently submitted our first RNA-Seq sample in lab and were directed to using galaxy to do our alignment to the bovine genome to obtain our transcript profile. I have little experience working with this data and am just trying to figure out which tool to use to get a basic summary of our read alignment to the reference genome. The goal is to have raw numbers on total sequenced fragments, uniquely mapped fragments, fragments mapped to annotated exons, fragments mapped to annotated genes (exons + introns), etc. Is there anyone that can point me in the right direction? Thank you in advance for the help, trying to get started in the RNA-Seq world.

  • #2
    Galaxy is a community-driven web-based analysis platform for life science research.


    There is a tutorial on this.

    Comment


    • #3
      This tutorial is fantastic, but I'm not sure the tutorial answers the OP's question, which is more geared toward post-alignment analytics. I've often wondered what else I should look at besides running samtools flagstat or EstimateLibraryComplexity.jar (picard). (I'm not even really sure how to interpret the output from EstimateLibraryComplexity.)

      Comment


      • #4
        I am definitely looking more for post-alignment information. I just want to know how to find out how many of my reads are mapping to exons vs introns. How many reads are aligning to annotated vs novel junctions. A general overview of how many reads are aligning to different things within the genome. This may be a very basic task but my lack of knowledge in working with the data is proving troublesome. I have looked at our data in the past and gotten it aligned and then done some cufflinks work with what transcripts are being expressed at various levels based on FPKM and such. However I need to backtrack and get these general statistics of what my reads are aligning to in the genome (exons, introns, annotated junctions, novel junctions, annotated exons, annotated genes, etc).

        Comment


        • #5
          ever-seq is pretty good for this kind of thing

          Comment


          • #6
            You can use intersectBed from BEDtools to check and count alignments overlapping various genomic features (anything described by GFF or BED) - you could set thresholds for overlap to reduce minimally overlapping reads... Not really feasible for splice junctions, however...

            Comment


            • #7
              RNASeQC for post-alignment metrics

              There is a GATK-based package called RNASeQC that provides a large number of post-alignment metrics along the lines of what you are looking for, including total, unique, duplicate reads, mapped reads and mapped unique reads, rRNA reads, strand specificity, GC bias, correlation to a reference sequence, and many coverage metrics.

              It is available as a standalone piece of software and as a module on the GenePattern public server at http://genepattern.broadinstitute.org. More information is at https://confluence.broadinstitute.or...Tools/RNA-SeQC .

              Michael

              Comment


              • #8
                Hi Michael,
                Have you used RNAseQC? I am getting an error - wondering if you know how to fix it.

                Thanks,

                Comment


                • #9
                  At CLC bio we have several tutorials for RNASeq analysis. They are a great way for a beginner to understand the various steps required, and you will also get detailed reports of your mappings. This will require that you download a trial version of the software, but it is free for at least two weeks, and you can also analyze your RNASeq samples. Let me know if you need more guidance.

                  Comment


                  • #10
                    I have found picard tools very useful for generating post alignment statistics

                    1. Get read map statistics
                    java -Xmx10000m -jar picard-tools-1.58/BamIndexStats.jar I= sorted.bam >sorted.stats

                    2. Get quality score of aligned read statistics
                    java -Xmx10000m -jar picard-tools-1.58/QualityScoreDistribution.jar I=sorted.bam O=qualstats CHART=qualstats.pdf

                    3. Get RNAseq mapped reads metrics
                    java -Xmx10000m -jar picard-tools-1.58/picard-tools-1.58/CollectRnaSeqMetrics.jar STRAND_SPECIFICITY=NONE REF_FLAT= annotation.refflat CHART_OUTPUT=graph.pdf INPUT= sorted.bam OUTPUT=RNA_seq.stats

                    Getting genomic reflat files for all genes of an organism can be tricky hence it can be build with this simple command

                    a) Download gtf annotation file for all genes of the organism
                    b) Convert gtf into refflat

                    gtfToGenePred -genePredExt annotation.gtf tmp

                    awk 'BEGIN{FS="\t"};{print $12"\t"$1"\t"$2"\t"$3"\t"$4"\t"$5"\t"$6"\t"$7"\t"$8"\t"$9"\t"$10}' tmp > annotation.refflat

                    The gtfToGenePred software is available at http://hgdownload.cse.ucsc.edu/admin/exe/

                    I hope this is helpful.

                    Comment


                    • #11
                      RNAseQC

                      RNAseQC seems to be a nice piece of software but I haven't seen much conversation about it on here yet.

                      So far, I have been able to get the read metrics on a single sample but have been unable to get the 'text-delimited description of samples and their bams' list into the correct format so that it can be recognized. Has anyone else run into this problem and/or know what the format of the .list file should look like?

                      Thanks so much, I'm a big fan of all the people on here who take the time to answer noob questions.

                      Comment


                      • #12
                        This is how it should look like:

                        Sample ID Bam File Notes
                        Y903GFAZ Y903GFAZ.bam Y903GFAZ

                        Aparna

                        Comment


                        • #13
                          Thank you Aparna, that format works perfectly!

                          Have you been able to get the coverage output? I am running the newest version (v1.1.6) but have been unable to generate anything other than the read metric files.

                          Comment

                          Latest Articles

                          Collapse

                          • seqadmin
                            Current Approaches to Protein Sequencing
                            by seqadmin


                            Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                            04-04-2024, 04:25 PM
                          • seqadmin
                            Strategies for Sequencing Challenging Samples
                            by seqadmin


                            Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                            03-22-2024, 06:39 AM

                          ad_right_rmr

                          Collapse

                          News

                          Collapse

                          Topics Statistics Last Post
                          Started by seqadmin, 04-11-2024, 12:08 PM
                          0 responses
                          25 views
                          0 likes
                          Last Post seqadmin  
                          Started by seqadmin, 04-10-2024, 10:19 PM
                          0 responses
                          27 views
                          0 likes
                          Last Post seqadmin  
                          Started by seqadmin, 04-10-2024, 09:21 AM
                          0 responses
                          24 views
                          0 likes
                          Last Post seqadmin  
                          Started by seqadmin, 04-04-2024, 09:00 AM
                          0 responses
                          52 views
                          0 likes
                          Last Post seqadmin  
                          Working...
                          X