Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to generate counts/statistics such as % mapped to known exons f/ cufflinks output

    I'm having trouble in searching the literature online etc... of finding a way to generate counts/statistics from my cufflinks run to brake down things like:

    % mapped to known exons
    % mapped to introns, etc.....

    Does anyone know of any scripts that can do this?

  • #2
    Picard RNAseqmetrics

    Comment


    • #3
      As easy as that, thanks

      Comment


      • #4
        Ok now I'm having problems getting results from RNAseqmetrics. I'm using my BAM file as a input:

        java -Xmx2g -jar CollectRNASeqMetrics.jar REF_FLAT=Monodelphis_refFlat.txt STRAND_SPECIFICITY=NONE VALIDATION_STRINGENCY=LENIENT CHART_OUTPUT=Monodelphis_chart.pdf INPUT=/Users/Papio/tophat_out_Monodelphis/accepted_hits.bam OUTPUT=Monodelphis_RNASeqMetrics


        But I keep getting an output that looks like this:

        ## net.sf.picard.metrics.StringHeader
        # net.sf.picard.analysis.CollectRnaSeqMetrics REF_FLAT=Monodelphis_refFlat.txt STRAND_SPECIFICITY=NONE CHART_OUTPUT=Monodelphis_chart.pdf INPUT=/Users/Papio/tophat_out_Monodelphis/accepted_hits.bam OUTPUT=Monodelphis_RNASeqMetrics VALIDATION_STRINGENCY=LENIENT MINIMUM_LENGTH=500 RRNA_FRAGMENT_PERCENTAGE=0.8 METRIC_ACCUMULATION_LEVEL=[ALL_READS] ASSUME_SORTED=true STOP_AFTER=0 VERBOSITY=INFO QUIET=false COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false
        ## net.sf.picard.metrics.StringHeader
        # Started on: Fri Dec 02 15:13:54 EST 2011

        ## METRICS CLASS net.sf.picard.analysis.RnaSeqMetrics
        PF_BASES PF_ALIGNED_BASES RIBOSOMAL_BASES CODING_BASES UTR_BASES INTRONIC_BASES INTERGENIC_BASES IGNORED_READS CORRECT_STRAND_READS INCORRECT_STRAND_READS PCT_RIBOSOMAL_BASES PCT_CODING_BASES PCT_UTR_BASES PCT_INTRONIC_BASES PCT_INTERGENIC_BASES PCT_MRNA_BASES PCT_USABLE_BASES PCT_CORRECT_STRAND_READS MEDIAN_CV_COVERAGE MEDIAN_5PRIME_BIAS MEDIAN_3PRIME_BIAS MEDIAN_5PRIME_TO_3PRIME_BIAS SAMPLE LIBRARY READ_GROUP
        1565548600 1565442430 0 0 0 1565442430 0 0 0 0 0 0 1 0 0 0 0 0 0 0

        I assume I'm using the refFlat format and have tried various variations to see if I'm doing anything wrong. I've also sorted the data etc... I'm positive I have mapped exons so why isn't it showing up??

        Comment


        • #5
          My command


          Code:
          ${JAVA_PATH} -Xmx2g -jar $HOME/local/bin/CollectRnaSeqMetrics.jar \
          REF_FLAT=${REFFLAT_FILE} \
          RIBOSOMAL_INTERVALS=${RIBOSOME_LIST} \
          STRAND_SPECIFICITY=NONE \
          REFERENCE_SEQUENCE=${REFERENCE_FILE} \
          INPUT=${SAMPLE_NAME}_${GENOME_NAME}_${TOPHAT_TAG}.bam \ OUTPUT=${SAMPLE_NAME}_${GENOME_NAME}_${TOPHAT_TAG}_Picard_RNAseqMetrics.txt \
          CHART_OUTPUT=${SAMPLE_NAME}_${GENOME_NAME}_${TOPHAT_TAG}_Picard_RNAseqMetrics.pdf \
          VALIDATION_STRINGENCY=SILENT \
          TMP_DIR=${TEMP_DIRECTORY}
          Something seems off. You are missing a ribosomal_intervals and Reference_sequence. Not sure if that is the difference

          Make sure the refflat file is formated correctly and the chromosome ids match your genome (ie. ensemble genome chromsome 1 = 1 vs UCSC chromosome 1 equals = chr1) as they must match
          Code:
          ENSG00000252921 ENST00000517112 18      -       23879078        23879219        23879219        23879219        1       23879078,       23879219,
          ENSG00000207160 ENST00000384431 18      -       23946264        23946370        23946370        23946370        1       23946264,       23946370,
          ENSG00000134504 ENST00000317932 18      -       24034873        24209206        24035706        24081199        5       24034873,24039583,24056478,24081035,24209110,   24035865,24039889,24056623,24081214,24209206,
          ENSG00000134504 ENST00000417602 18      -       24034876        24128500        24035706        24128500        5       24034876,24039583,24056478,24081035,24126691,   24035865,24039889,24056623,24081214,24128500,
          ENSG00000134504 ENST00000408011 18      -       24034876        24129399        24035706        24081199        5       24034876,24039583,24056478,24081035,24128854,   24035865,24039889,24056623,24081214,24129399,
          ENSG00000252846 ENST00000517037 18      -       24166677        24166762        24166762        24166762        1       24166677,       24166762,
          ENSG00000212367 ENST00000391065 18      -       24269280        24269493        24269493        24269493        1       24269280,       24269493,
          ENSG00000171885 ENST00000383168 18      -       24432001        24445749        24436174        24445653        5       24432001,24440735,24441094,24442145,24445621,   24436453,24440816,24441259,24442560,24445749,
          ENSG00000171885 ENST00000440832 18      -       24436121        24445679        24436174        24445653        6       24436121,24440735,24441094,24442145,24442340,24445621,  24436453,24440816,24441259,24442280,24442560,24445679,
          ENSG00000171885 ENST00000383170 18      -       24436174        24445716        24436174        24445674        3       24436174,24442223,24445621,     24436444,24442560,24445716,

          Comment


          • #6
            Great thanks--now I made sure it was in the correct format as well as including the reference sequence, but now it says:


            Runtime.totalMemory()=2120679424
            Exception in thread "main" java.lang.OutOfMemoryError: Java heap space

            Comment


            • #7
              Increased memory--its working--thanks a lot!

              Comment


              • #8
                CollectRNAMetrices-Error

                I am getting very same error. Could you please let me know what could be the problem?


                Originally posted by mmcgo002 View Post
                Ok now I'm having problems getting results from RNAseqmetrics. I'm using my BAM file as a input:

                java -Xmx2g -jar CollectRNASeqMetrics.jar REF_FLAT=Monodelphis_refFlat.txt STRAND_SPECIFICITY=NONE VALIDATION_STRINGENCY=LENIENT CHART_OUTPUT=Monodelphis_chart.pdf INPUT=/Users/Papio/tophat_out_Monodelphis/accepted_hits.bam OUTPUT=Monodelphis_RNASeqMetrics


                But I keep getting an output that looks like this:

                ## net.sf.picard.metrics.StringHeader
                # net.sf.picard.analysis.CollectRnaSeqMetrics REF_FLAT=Monodelphis_refFlat.txt STRAND_SPECIFICITY=NONE CHART_OUTPUT=Monodelphis_chart.pdf INPUT=/Users/Papio/tophat_out_Monodelphis/accepted_hits.bam OUTPUT=Monodelphis_RNASeqMetrics VALIDATION_STRINGENCY=LENIENT MINIMUM_LENGTH=500 RRNA_FRAGMENT_PERCENTAGE=0.8 METRIC_ACCUMULATION_LEVEL=[ALL_READS] ASSUME_SORTED=true STOP_AFTER=0 VERBOSITY=INFO QUIET=false COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false
                ## net.sf.picard.metrics.StringHeader
                # Started on: Fri Dec 02 15:13:54 EST 2011

                ## METRICS CLASS net.sf.picard.analysis.RnaSeqMetrics
                PF_BASES PF_ALIGNED_BASES RIBOSOMAL_BASES CODING_BASES UTR_BASES INTRONIC_BASES INTERGENIC_BASES IGNORED_READS CORRECT_STRAND_READS INCORRECT_STRAND_READS PCT_RIBOSOMAL_BASES PCT_CODING_BASES PCT_UTR_BASES PCT_INTRONIC_BASES PCT_INTERGENIC_BASES PCT_MRNA_BASES PCT_USABLE_BASES PCT_CORRECT_STRAND_READS MEDIAN_CV_COVERAGE MEDIAN_5PRIME_BIAS MEDIAN_3PRIME_BIAS MEDIAN_5PRIME_TO_3PRIME_BIAS SAMPLE LIBRARY READ_GROUP
                1565548600 1565442430 0 0 0 1565442430 0 0 0 0 0 0 1 0 0 0 0 0 0 0

                I assume I'm using the refFlat format and have tried various variations to see if I'm doing anything wrong. I've also sorted the data etc... I'm positive I have mapped exons so why isn't it showing up??

                Comment


                • #9
                  Are the chromosome ID's matching across your files?

                  Comment


                  • #10
                    I shall post my reflat file
                    Ec-00_000010 Ec-00_000010 chr_00 - 149 6731 149 6731 10 149,897,1535,2091,2535,3474,4006,4702,6245,6709, 428,1100,1674,2268,3070,3557,4155,4968,6363,6731,
                    Ec-00_000020 Ec-00_000020 chr_00 - 28572 29122 28572 29122 2 28572,28937, 28582,29122,
                    Ec-00_000030 Ec-00_000030 chr_00 + 29412 32214 29412 32214 1 29412, 32214,
                    Ec-00_000040 Ec-00_000040 chr_00 + 34287 34360 34287 34360 1 34287, 34360,
                    And my bam file looks like
                    DHKW5DQ1:246:C1D7AACXX:1:1101:1480:2209 83 Ec-00_007150 1310 42 100M = 1274 -136 CAGTGAACAGTTTCATGTACCAGGAGGGCGGTAGCCAAGTACCTTGCCGTCAGCGTGTGAGAATTGCGGTCGGCGTCACAGNAGCAACCGTGTAGAAGGA @DDDCCDCDCCC>DEDDA>DCCDDDDDDBCCDDDDDEDDDCDDBDDDDDDDDFFHFHHHHIJJJIIJJIGIIJIJIGFFA3#JJIHFGHHHHFFFFF@CC AS:i:-1 XN:i:0 XM:i:1 XO:i:0 XG:i:0 NM:i:1 MD:Z:81C18 YS:i:0 YT:Z:CP
                    DHKW5DQ1:246:C1D7AACXX:1:1101:1480:2209 163 Ec-00_007150 1274 42 100M = 1310 136 ATGAAGCCAACGCAGCGAGCAAAGTCAGCCCCATCGCAGTGAACAGTTTCATGTACCAGGAGGGCGGTAGCCAAGTACCTTGCCGTCAGCGTGTGAGAAT +=?DFFFFHHHHHIJIJJJJIIGIJGJJJJJJIJJJJJIIIJJIGIEHHHIIJHHHHHHFFFDDDDDDDDDDDDDCDDCDCCDDDDDDDDDDDDDCDCDD AS:i:0 XN:i:0 XM:i:0 XO:i:0 XG:i:0 NM:i:0 MD:Z:100 YS:i:-1 YT:Z:CP

                    And by Chromosome ids you mean the ones like Ec-00_007150? the transcript ids right? I so i think they are similar.

                    Comment


                    • #11
                      Chromosome name is in the 3rd column, which in your case is chr_00 and does not match your bam file.

                      Comment


                      • #12
                        I tried to change and now it looks like
                        chr_00 Ec-00_000010 Ec-00_000010 - 149 6731 149 6731 10 149,897,1535,2091,2535,3474,4006,4702,6245,6709, 428,1100,1674,2268,3070,3557,4155,4968,6363,6731,
                        chr_00 Ec-00_000020 Ec-00_000020 - 28572 29122 28572 29122 2 28572,28937, 28582,29122,
                        chr_00 Ec-00_000030 Ec-00_000030 + 29412 32214 29412 32214 1 29412, 32214,
                        chr_00 Ec-00_000040 Ec-00_000040 + 34287 34360 34287 34360 1 34287, 34360,
                        chr_00 Ec-00_000050 Ec-00_000050 - 36705 39329 36705 37902 3 36705,37422,39143, 36870,37944,39329,
                        chr_00 Ec-00_000060 Ec-00_000060 + 43007 44099 43007 44099 3 43007,43404,43829, 43046,43455,44099,
                        but i still get the same output.
                        I am sorry. I bitnew to this and couldnt figure out what is going wrong?

                        Comment


                        • #13
                          I am not exactly sure what is going wrong. I just tested this on a file and got an expected result.

                          Does you bam file have a header? What does the output of this look like?

                          Code:
                          $ samtools view -H your_bam

                          Comment


                          • #14
                            for that the output looks like
                            @HD VN:1.0 SO:coordinate
                            @SQ SN:Ec-00_000010 LN:1971
                            @SQ SN:Ec-00_000020 LN:195
                            @SQ SN:Ec-00_000030 LN:2802
                            @SQ SN:Ec-00_000050 LN:871
                            @SQ SN:Ec-00_000060 LN:360
                            @SQ SN:Ec-00_000070 LN:964
                            @SQ SN:Ec-00_000080 LN:1908
                            @SQ SN:Ec-00_000090 LN:1366
                            @SQ SN:Ec-00_000100 LN:168
                            And this is the same id in my refflat file too

                            Comment

                            Latest Articles

                            Collapse

                            • seqadmin
                              Strategies for Sequencing Challenging Samples
                              by seqadmin


                              Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                              03-22-2024, 06:39 AM
                            • seqadmin
                              Techniques and Challenges in Conservation Genomics
                              by seqadmin



                              The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                              Avian Conservation
                              Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                              03-08-2024, 10:41 AM

                            ad_right_rmr

                            Collapse

                            News

                            Collapse

                            Topics Statistics Last Post
                            Started by seqadmin, 03-27-2024, 06:37 PM
                            0 responses
                            13 views
                            0 likes
                            Last Post seqadmin  
                            Started by seqadmin, 03-27-2024, 06:07 PM
                            0 responses
                            11 views
                            0 likes
                            Last Post seqadmin  
                            Started by seqadmin, 03-22-2024, 10:03 AM
                            0 responses
                            53 views
                            0 likes
                            Last Post seqadmin  
                            Started by seqadmin, 03-21-2024, 07:32 AM
                            0 responses
                            69 views
                            0 likes
                            Last Post seqadmin  
                            Working...
                            X