Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • TCGA : RNASeq version 1 pipeline

    The following documentation provides good details of the pipeline :


    However, on visiting the the project website at http://seqware.sf.net,
    not able to find any data under the files menu. Also browsed UNC website with UNCids, with no results

    Please could you guide me?. Where can I obtain the RNASeqversion1
    pipeline?

  • #2
    Are you looking for the TCGA data from UNC? That is available from TCGA data portal: https://tcga-data.nci.nih.gov/tcga/

    Comment


    • #3
      I have a few sequence read archive(SRA) studies. I want to perform gene quantification for the studies using TCGA RNASeq version 1 pipeline.

      I need the script which could run the entire pipeline for RNASeq version 1 on my sra files.

      And as I mentioned earlier the following link provides details of obtaining the pipeline. However on visiting the gitshub page, here is no data(RNASeq version 1 pipeline) listed! I donot want to use RNASeq vwersion 2 right now, want to reuse TCGA RNASeq V1 pipeline!

      https://confluence.broadinstitute.or...=1363806109000

      Please advice.
      Last edited by Sajna; 10-27-2015, 08:22 PM.

      Comment


      • #4
        And as I mentioned earlier the follwoing link provides details of obtaining the pipeline. However on visiting the gitshub page, here is no data(RNASeq version 1 pipeline) listed! I donot want to use RNASeq vwersion 2 right now, want to reuse TCGA RNASeq V1 pipeline!

        https://confluence.broadinstitute.or...=1363806109000

        Comment


        • #5
          At this point in time trying to run SeqWare and version 1 of TCGA RNAseq pipeline would at best be an exercise in futility. You may be better off using new versions of bwa and MapSplice .

          That said this file has additional details about software used in v.1 and v.2: https://tcga-data.nci.nih.gov/tcgafi...ESCRIPTION.txt

          All the data that was submitted under TCGA was reprocessed using v.2 of the pipeline and that is what should be considered current based on communication from UNC TCGA folks.

          Comment


          • #6
            Thanks Genomax. I will get into details of version 2 and process using BWA or I will consider Mapsplice for quantification.
            Last edited by Sajna; 10-29-2015, 09:45 AM.

            Comment


            • #7
              TCGA Mapsplice RNASeqV2 pipeline : Error: check reads format failed

              Hi All,

              I am using Mapsplice run (v2.0). My fastq files have the Sanger/Illumina 1.9 format. I removed the blank spaces and also removed length= and now the head of the file looks like this:

              head ERR519523_1.fastq
              @ERR519523.1:1:100
              CAAACCAATGGCTCCACCCGTACCTGGCTCTGCCTCTACCCACCGACATTGCTCCTGTGGTCCTACTCAGAAGTAGTTCAGCACTCAGGACAGCTTCCAC
              +ERR519523.1:1:100
              CCCFFFFFHHHHHJJIJJJJGHIJGIIJIIJIIIGIGIHIIJJJJGHJJJIJFJIHHHHHDFFFFECCCEEDD>CCCCDEDDDDDD?CDABC@BDCCC3>
              @ERR519523.2:2:65
              TGCATAGAGATAGAAACAGAAAATAGAATGGTGGTTGCAGGGTCTGGAAAGAGAGGAGGAGCGCA
              +ERR519523.2:2:65
              @@@DDDDDHDDDHIIBHA@FEH@@C<EEEHCFHH)?FDC<DF9BDHG9B9B;D=BF=FG;C(:5'
              @ERR519523.3:3:100
              GGACGCATAAGAGTTACAGGCTCTATACACAGGGACTTTCCTTCCTGGAAACCCGGTAGGAAATCCCATTATGGCTGCCTGTTTGCCAAACTATTCCCTT


              When I run mapsplice.py script using the following command, I encounter the error :

              "pairend read name not end with /1 or /2 the 1th read in /ERR519523/ERR519523_1.fastq
              @ERR519523.1:1:100
              [FAILED]
              Error: check reads format failed"

              COMMAND :
              python /opt/MapSplice_multi_threads_2.0.1.9/mapsplice.py -c /hg19_chromosomes/ -x /ebwt/humanchridx_M_rCRS -1 /ERR519523_1.fastq -2 ERR519523_2.fastq
              [Thu Oct 29 17:31:33 2015] Preparing output location mapsplice_out/

              [Thu Oct 29 17:31:33 2015] Beginning Mapsplice run (v2.0)
              -----------------------------------------------
              bin directory: [/opt/MapSplice_multi_threads_2.0.1.9/bin/]
              [Thu Oct 29 17:31:33 2015] Checking for files or directory
              [Thu Oct 29 17:31:33 2015] Checking for files or directory
              [Thu Oct 29 17:31:33 2015] Checking for files or directory
              [Thu Oct 29 17:31:33 2015] Checking for Bowtie index files
              [Thu Oct 29 17:31:33 2015] reads all chromo sizes
              [Thu Oct 29 17:31:42 2015] check reads format
              ERR519523_1.fastq is fastq format
              pairend read name not end with /1 or /2
              the 1th read in /ERR519523/ERR519523_1.fastq
              @ERR519523.1:1:100
              [FAILED]
              Error: check reads format failed

              Please help!!
              Last edited by Sajna; 10-29-2015, 09:45 AM.

              Comment


              • #8
                When you extracted the reads from the SRA file did you use the -F/--origfmt switch to preserve the illumina read ID?

                Comment


                • #9
                  converted the .sra format files to fastq format using latest sratoolkit version with the function fastq-dump srafilenames.sra --split-3 since the data was paired-end.

                  No other specifications were made.

                  Comment


                  • #10
                    When I converted sra file to fastq using fastq-dump it looked like this :

                    @ERR519523.1 1 length=100
                    CAAACCAATGGCTCCACCCGTACCTGGCTCTGCCTCTACCCACCGACATTGCTCCTGTGGTCCTACTCAGAAGTAGTTCAGCACTCAGGACAGCTTCCAC
                    +ERR519523.1 1 length=100
                    CCCFFFFFHHHHHJJIJJJJGHIJGIIJIIJIIIGIGIHIIJJJJGHJJJIJFJIHHHHHDFFFFECCCEEDD>CCCCDEDDDDDD?CDABC@BDCCC3>
                    @ERR519523.2 2 length=65
                    TGCATAGAGATAGAAACAGAAAATAGAATGGTGGTTGCAGGGTCTGGAAAGAGAGGAGGAGCGCA
                    +ERR519523.2 2 length=65
                    @@@DDDDDHDDDHIIBHA@FEH@@C<EEEHCFHH)?FDC<DF9BDHG9B9B;D=BF=FG;C(:5'
                    @ERR519523.3 3 length=100
                    GGACGCATAAGAGTTACAGGCTCTATACACAGGGACTTTCCTTCCTGGAAACCCGGTAGGAAATCCCATTATGGCTGCCTGTTTGCCAAACTATTCCCTT

                    Then I removed blank spaces and replaced with ' :' and 'length=' was removed and the fastq files were sent to mapsplice, but i got the below mentioned error :

                    "pairend read name not end with /1 or /2 the 1th read in /ERR519523/ERR519523_1.fastq
                    @ERR519523.1:1:100
                    [FAILED]
                    Error: check reads format failed"

                    Please help...

                    Comment


                    • #11
                      You should have used --split-files. Re-extract your data from the SRA file.

                      Edit: Let me look at that SRA#.

                      Edit 2: It appears that the submitters have modified the original illumina fastq read headers in this submission (or they were never submitted to SRA as -F option is only generating a number). After you split the files with just "--split-files" you are going to have to add the /1 and /2 at the end of the fastq headers since MapSplice expects them to be present.
                      Last edited by GenoMax; 10-29-2015, 04:39 AM.

                      Comment


                      • #12
                        Otherwise, I tried the tool that Mapsplice pipeline uses (UNC ubu.jar) for preparing fastq files for Mapsplice. Command to format fastq is as follows:

                        java -Xmx512M -jar ubu.jar fastq-format --phred33to64 --strip --suffix /1 –in raw_1.fastq --out working/prep_1.fastq >
                        working/mapsplice_prep1.log

                        I tried that, however I get the error : Fastq format not recognizable...

                        I will tryout what you suggested tomorrow morning when at work...and hopefully that should work..lets see
                        Last edited by Sajna; 10-29-2015, 09:55 AM.

                        Comment


                        • #13
                          That is correct.

                          Comment


                          • #14
                            Genomax, it worked!!!! Many Thanks and good day to you.

                            Comment


                            • #15
                              TCGA RSEM_ref files

                              I have used "Mapsplice" to align all the SRA fastq samples successfully, and used bedtools coverage function to retrieve the raw read counts. But then the next task was to combine level 3 data from TCGA with the mapsplice aligned SRA samples for differential expression analysis. Having done that I noticed that the number of DE genes are very high. Referencing back, I understood that the "raw counts" reported by TCGA are expected counts from the RSEM software. Although in the RSEM paper, it is mentioned that edgeR and DESeq can process the RSEM counts, it appears that edgeR requires intergers as input. Well...I have now decided to run RSEM on the SRA Sam/Bam files.

                              The TCGA mRNA_Seq pipeline detailed at the following URL requires the hg19_M_rCRS_ref.transcripts.fa file for running RSEM-calculate-expression and to Translate to transcriptome coords.



                              However the file which should be available from the follwoing URL is missing:



                              Also I require the reference mapping file to run RSEM: https://webshare.bioinf.unc.edu/publ...ownToLocus.txt

                              The file is truncated fromGithub' as well.

                              Where can I access the files?
                              Last edited by Sajna; 11-23-2015, 10:27 PM.

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Strategies for Sequencing Challenging Samples
                                by seqadmin


                                Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                                03-22-2024, 06:39 AM
                              • seqadmin
                                Techniques and Challenges in Conservation Genomics
                                by seqadmin



                                The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                                Avian Conservation
                                Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                                03-08-2024, 10:41 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, Yesterday, 06:37 PM
                              0 responses
                              8 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, Yesterday, 06:07 PM
                              0 responses
                              8 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 03-22-2024, 10:03 AM
                              0 responses
                              49 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 03-21-2024, 07:32 AM
                              0 responses
                              66 views
                              0 likes
                              Last Post seqadmin  
                              Working...
                              X