Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • ALEXA-Seq : Alternative expression analysis by RNA sequencing paper

    The Marra lab is pleased to announce the recent publication of a manuscript describing the use of RNA-seq data for alternative expression analysis.

    The advance online publication can be found at Nature Methods here:
    Griffith et al. 2010

    Briefly, the method utilizes RNA-seq data to profile transcriptomes, identify transcript features expressed above background noise levels, identify differentially expressed genes, and identify alternatively processed transcripts. Particular emphasis is placed on comparisons between experimental conditions (tumor vs. normal, drug sensitive vs. resistant, etc.)

    Results generated using this method/pipeline can be found here:
    ALEXA-seq.

    To date, 76 libraries corresponding to 16 projects have been analyzed by the ALEXA-seq approach.

    Some specific examples of the output can be viewed here:
    UMPS expression and splicing in 5-FU sensitive and resistant cell lines

    CA12 expression and splicing among normal breast tissue sub-types

    To view these examples, your browser must have SVG support (scalable vector graphics). FireFox produces the best results in my experience.

  • #2
    Hi,

    Nice looking application. Do you have any suggestion for the minimum number of read pairs per sample? For the hypothetical events in the database would this include all possible exon junctions (ie. assuming no known transcript or est support for alternatives) for the following example:

    Exon1--Exon2--Exon3--Exon4--Exon5

    Canonical transcript/est supported junctions

    Exon1-Exon2
    Exon2-Exon3
    Exon3-Exon4
    Exon4-Exon5

    Hypothetical junctions generated

    Exon1-Exon3
    Exon1-Exon4
    Exon1-Exon5
    Exon2-Exon4
    Exon2-Exon5
    Exon3-Exon5

    Comment


    • #3
      Hello Jon,

      Thanks for the encouraging word. Your two questions are quite different so I will answer them separately.

      "Do you have any suggestion for the minimum number of read pairs per sample?"

      This is a straight-forward and reasonable question to ask but is difficult to answer directly. This is by far the number one question I am asked about RNA-seq analysis. It has been discussed in various places in this forum including by myself here: How much coverage we need?

      The answer really depends on the particulars of your input material (e.g. RNA quality, cell heterogeneity), the type of library construction (e.g. polyA+ RNA vs. ribominus RNA), the tissues they were created from, the goals of the analysis, etc. I would always rather have more data than less. When absolutely forced to give a hard number I say that for alternative expression analysis with ALEXA-seq, the results really started to shine when I had at least 100 million paired 42-mers (of which say ~40-70% map to known transcripts depending on the library). If you have longer paired reads, you can get away with less of them.

      I have analyzed libraries of highly varying depth and quality and many of these analyses are summarized here. You can browse through these and see what the outcome looks like to get a more hands-on feel for what increasing depth gets you in terms of alternative expression analysis. For example, the REMC, Morgen, and 5-FU datasets have ~100-200 million mapped paired-end reads (36-mers to 75-mers) and produce beautiful alternative expression results. On the other hand the Sutent dataset has only ~10 million mapped reads and is really only good for gene-level analysis. Similarly, the AllenBrain libraries suffered from poor quality input RNA and this caused all kinds of problems with the analysis even though the number of reads was reasonable.

      Comment


      • #4
        Your second question is more straightforward. Yes, that is how we create the hypothetical events in the junction databases. Using Ensembl exons as a starting point, we create the combinatorial pairwise connections of these exons. A subset thus correspond to canonical junctions but the majority correspond to hypothetical connections. The number of possible junctions for a gene with n known exons is n!/(2!(n – 2)!)

        For the human hg19 transcriptome annotated by Ensembl, this results in 3,305,170 junctions, only 284,796 of which correspond to a known transcript. If you think such a database might be useful to you, please refer to the downloads page. Junction databases are available for human hg18 and hg19 here. See links to 'additional junction DBs' on this page.

        Junction databases including the sequences (fasta format) and corresponding annotation info for each are provided for 20 lengths of junction sequences (from 60mers up to 150mers). Included in the annotation files are chromosome coordinates, number of exons skipped, Ensembl support, EST and mRNA support from human and all other species, predicted peptide sequence, etc.

        Comment


        • #5
          I'm playing with the ALEXA-Seq image, and I'm wondering what kind of data path the scripts require. I ask because I just point it at a common folder /home/alexa-seq/seq_files with .fastq files named s_n_1/2_sequence.txt (just for test, 2 lanes). Does it need the full pipeline analysis path?

          Comment


          • #6
            I assume you mean in the config file where you point to the data... If so, then the data path can be anywhere, but it has to be a complete path to a directory that contains your data files... This doesn't have to be where the data files where originally generated. If you are using fastq files, you will have to change the SeqFileType column to fastq. I recommend using qseq files instead as the first step will be faster.

            Comment


            • #7
              Originally posted by malachig View Post
              I assume you mean in the config file where you point to the data... If so, then the data path can be anywhere, but it has to be a complete path to a directory that contains your data files... This doesn't have to be where the data files where originally generated. If you are using fastq files, you will have to change the SeqFileType column to fastq. I recommend using qseq files instead as the first step will be faster.
              I figured out my issue. Now I have another question: have you processed any HiSeq data with the pipeline? I started a couple HiSeq lanes 4 hours ago and it isn't even done with the read pre-processing step (processRawSolexaReads.sh). The last message was that the BerkleyDB was being created to save memory. Thanks for the help.

              Comment


              • #8
                Hello malachig,
                I checked ALEXA-Seq web site and found that this tool support only paired-end data.
                Do you have a plan to develope a tool for single-end data?

                Comment


                • #9
                  Lee Sam. Yes, the support for fastq was added near the end of development to support another user. It still needs some optimization as the initial read processing step is slow. If you are impatient you can convert your fastq file to either qseq or seq format and this step will run faster. We have processed some HiSeq data, and because each lane is so much larger it did tend to take longer for each step (and use more memory).

                  micrornas, no we don't have a specific plan to develop a tool for single-end data as we never generate single end RNA-seq data... I am aware of another user that processed single end data by creating 'dummy' read pairs (somewhat of a hack but apparently it worked).

                  Comment


                  • #10
                    single-end data

                    micrornas. I have processed single-end data with alexa-seq. I created dummy R2 qseq files with sequences of Ns at the same length as the real read and quality strings comprised of all "B" values. This allows the pipeline to run and all dummy reads are filtered out at the first step as "Low Quality" reads. A few of the library summary figures and stats will be affected by this. But, the results I got out were still usable and useful.

                    Comment


                    • #11
                      We're trying to get the heavy lifting (preprocessing, alignment) parts of ALEXA going on our cluster which uses the Torque scheduler. I know that ALEXA was designed to run on a cluster, was there a particular configuration it was designed to work with? I was hoping to edit some of the configuration and script batch generation code to generate jobs that could be submitted.

                      Comment


                      • #12
                        Our cluster uses Sun Grid Engine (sge). Submitting jobs to the cluster is accomplished using a wrapper for the 'qsub' utility of sge. Basically the submission command is just pointing to a batch file containing bash commands (one job per line). I assume this is a somewhat common theme in cluster job submission. If this is the case for you, it shouldn't be too hard to modify the 'createAnalysisCommands' step. You would just need to modify all the lines containing 'mqsub' to match the submission style of your cluster and then when you run createAnalysisCommands use the option '--cluster_commands=1'

                        Comment


                        • #13
                          alexa-seq cluster

                          I guess there are too many different cluster configurations for alexa-seq to anticipate. So, simple bash files are produced which can be run serially (for very small libraries) or submitted to your cluster according to its protocols. You will probably have to work with your cluster administrator to get things running optimally.

                          Our cluster here (lawrencium) uses PBS Torque Resource manager and Moab job scheduler. And, with some work, I have been able to submit Alexa-seq jobs to it. I have processed four projects with over 100 libraries to date. So, it is doable. Instead of trying to edit all those parts of the alexa-seq pipeline code that produce job batch files and submission commands, I created a simple perl script which takes an alexa-seq job batch file (essentially just an sh file with one "task/command" per line) and produces the submission files compatible with our scheduler. I strongly recommend this strategy. Changing the alexa-seq code will be a lot more work. What I do is run the alexa-seq pipeline as instructed for steps 0 to 5B. Step 5C (submitMapBatch.sh) is the first step that requires submitting to a cluster. That sh file contains a whole bunch of bash commands for additional sh files (e.g., blast_vs_intergenics.sh). It is those files which should be submitted to a cluster, not the parent submitMapBatch.sh file. You can do them individually or cat them into combined files. I create one combined batch file for all libraries separated only by feature type (repeats, transcripts, etc) because they have different memory and runtime requirements. I can thus optimize cluster submission parameters for each of the 6 feature types. This is necessary because our cluster uses wallclock estimates and task number to determine job priority in the queue. Maybe your cluster has a more simple setup and this step will be unnecessary for you. Once I have combined the bash files I run my submitjobs.pl script on it and wait for it to finish. In later steps, whenever alexa says to submit some jobs to a cluster, the bash file typically contains the tasks/commands (instead of additional bash commands as above). I just run my submitjobs.pl script on each of those bash files. Check .output and .error files for problems and then proceed to the next step.

                          For each project, once the alexa-seq .commands file is produced, I make a new copy of this file and edit it to add my own commands that are necessary for job submission. This file can then be used as a template for running future projects.

                          Comment


                          • #14
                            Originally posted by obig View Post
                            micrornas. I have processed single-end data with alexa-seq. I created dummy R2 qseq files with sequences of Ns at the same length as the real read and quality strings comprised of all "B" values. This allows the pipeline to run and all dummy reads are filtered out at the first step as "Low Quality" reads. A few of the library summary figures and stats will be affected by this. But, the results I got out were still usable and useful.
                            Could you share what advantage you had of tweaking this particular tool and not using any of the specific microRNA tools?
                            --
                            bioinfosm

                            Comment


                            • #15
                              Dear bioinfosm,

                              I was responding to a question from the user with user name = "micrornas". This thread doesn't actually have anything specifically to do with the biological entity called microRNA. And, I'm afraid I have no experience to share regarding microRNA tools. This is perhaps a cautionary tale for those choosing a user name that has specific meaning and is commonly used and searched for in the forums.
                              Last edited by obig; 11-17-2010, 03:13 PM. Reason: grammar

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Strategies for Sequencing Challenging Samples
                                by seqadmin


                                Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                                03-22-2024, 06:39 AM
                              • seqadmin
                                Techniques and Challenges in Conservation Genomics
                                by seqadmin



                                The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                                Avian Conservation
                                Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                                03-08-2024, 10:41 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, 03-27-2024, 06:37 PM
                              0 responses
                              12 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 03-27-2024, 06:07 PM
                              0 responses
                              11 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 03-22-2024, 10:03 AM
                              0 responses
                              53 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 03-21-2024, 07:32 AM
                              0 responses
                              68 views
                              0 likes
                              Last Post seqadmin  
                              Working...
                              X