Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • mmcgo002
    Member
    • Nov 2011
    • 10

    How to generate counts/statistics such as % mapped to known exons f/ cufflinks output

    I'm having trouble in searching the literature online etc... of finding a way to generate counts/statistics from my cufflinks run to brake down things like:

    % mapped to known exons
    % mapped to introns, etc.....

    Does anyone know of any scripts that can do this?
  • Jon_Keats
    Senior Member
    • Mar 2010
    • 279

    #2
    Picard RNAseqmetrics

    Comment

    • mmcgo002
      Member
      • Nov 2011
      • 10

      #3
      As easy as that, thanks

      Comment

      • mmcgo002
        Member
        • Nov 2011
        • 10

        #4
        Ok now I'm having problems getting results from RNAseqmetrics. I'm using my BAM file as a input:

        java -Xmx2g -jar CollectRNASeqMetrics.jar REF_FLAT=Monodelphis_refFlat.txt STRAND_SPECIFICITY=NONE VALIDATION_STRINGENCY=LENIENT CHART_OUTPUT=Monodelphis_chart.pdf INPUT=/Users/Papio/tophat_out_Monodelphis/accepted_hits.bam OUTPUT=Monodelphis_RNASeqMetrics


        But I keep getting an output that looks like this:

        ## net.sf.picard.metrics.StringHeader
        # net.sf.picard.analysis.CollectRnaSeqMetrics REF_FLAT=Monodelphis_refFlat.txt STRAND_SPECIFICITY=NONE CHART_OUTPUT=Monodelphis_chart.pdf INPUT=/Users/Papio/tophat_out_Monodelphis/accepted_hits.bam OUTPUT=Monodelphis_RNASeqMetrics VALIDATION_STRINGENCY=LENIENT MINIMUM_LENGTH=500 RRNA_FRAGMENT_PERCENTAGE=0.8 METRIC_ACCUMULATION_LEVEL=[ALL_READS] ASSUME_SORTED=true STOP_AFTER=0 VERBOSITY=INFO QUIET=false COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false
        ## net.sf.picard.metrics.StringHeader
        # Started on: Fri Dec 02 15:13:54 EST 2011

        ## METRICS CLASS net.sf.picard.analysis.RnaSeqMetrics
        PF_BASES PF_ALIGNED_BASES RIBOSOMAL_BASES CODING_BASES UTR_BASES INTRONIC_BASES INTERGENIC_BASES IGNORED_READS CORRECT_STRAND_READS INCORRECT_STRAND_READS PCT_RIBOSOMAL_BASES PCT_CODING_BASES PCT_UTR_BASES PCT_INTRONIC_BASES PCT_INTERGENIC_BASES PCT_MRNA_BASES PCT_USABLE_BASES PCT_CORRECT_STRAND_READS MEDIAN_CV_COVERAGE MEDIAN_5PRIME_BIAS MEDIAN_3PRIME_BIAS MEDIAN_5PRIME_TO_3PRIME_BIAS SAMPLE LIBRARY READ_GROUP
        1565548600 1565442430 0 0 0 1565442430 0 0 0 0 0 0 1 0 0 0 0 0 0 0

        I assume I'm using the refFlat format and have tried various variations to see if I'm doing anything wrong. I've also sorted the data etc... I'm positive I have mapped exons so why isn't it showing up??

        Comment

        • Jon_Keats
          Senior Member
          • Mar 2010
          • 279

          #5
          My command


          Code:
          ${JAVA_PATH} -Xmx2g -jar $HOME/local/bin/CollectRnaSeqMetrics.jar \
          REF_FLAT=${REFFLAT_FILE} \
          RIBOSOMAL_INTERVALS=${RIBOSOME_LIST} \
          STRAND_SPECIFICITY=NONE \
          REFERENCE_SEQUENCE=${REFERENCE_FILE} \
          INPUT=${SAMPLE_NAME}_${GENOME_NAME}_${TOPHAT_TAG}.bam \ OUTPUT=${SAMPLE_NAME}_${GENOME_NAME}_${TOPHAT_TAG}_Picard_RNAseqMetrics.txt \
          CHART_OUTPUT=${SAMPLE_NAME}_${GENOME_NAME}_${TOPHAT_TAG}_Picard_RNAseqMetrics.pdf \
          VALIDATION_STRINGENCY=SILENT \
          TMP_DIR=${TEMP_DIRECTORY}
          Something seems off. You are missing a ribosomal_intervals and Reference_sequence. Not sure if that is the difference

          Make sure the refflat file is formated correctly and the chromosome ids match your genome (ie. ensemble genome chromsome 1 = 1 vs UCSC chromosome 1 equals = chr1) as they must match
          Code:
          ENSG00000252921 ENST00000517112 18      -       23879078        23879219        23879219        23879219        1       23879078,       23879219,
          ENSG00000207160 ENST00000384431 18      -       23946264        23946370        23946370        23946370        1       23946264,       23946370,
          ENSG00000134504 ENST00000317932 18      -       24034873        24209206        24035706        24081199        5       24034873,24039583,24056478,24081035,24209110,   24035865,24039889,24056623,24081214,24209206,
          ENSG00000134504 ENST00000417602 18      -       24034876        24128500        24035706        24128500        5       24034876,24039583,24056478,24081035,24126691,   24035865,24039889,24056623,24081214,24128500,
          ENSG00000134504 ENST00000408011 18      -       24034876        24129399        24035706        24081199        5       24034876,24039583,24056478,24081035,24128854,   24035865,24039889,24056623,24081214,24129399,
          ENSG00000252846 ENST00000517037 18      -       24166677        24166762        24166762        24166762        1       24166677,       24166762,
          ENSG00000212367 ENST00000391065 18      -       24269280        24269493        24269493        24269493        1       24269280,       24269493,
          ENSG00000171885 ENST00000383168 18      -       24432001        24445749        24436174        24445653        5       24432001,24440735,24441094,24442145,24445621,   24436453,24440816,24441259,24442560,24445749,
          ENSG00000171885 ENST00000440832 18      -       24436121        24445679        24436174        24445653        6       24436121,24440735,24441094,24442145,24442340,24445621,  24436453,24440816,24441259,24442280,24442560,24445679,
          ENSG00000171885 ENST00000383170 18      -       24436174        24445716        24436174        24445674        3       24436174,24442223,24445621,     24436444,24442560,24445716,

          Comment

          • mmcgo002
            Member
            • Nov 2011
            • 10

            #6
            Great thanks--now I made sure it was in the correct format as well as including the reference sequence, but now it says:


            Runtime.totalMemory()=2120679424
            Exception in thread "main" java.lang.OutOfMemoryError: Java heap space

            Comment

            • mmcgo002
              Member
              • Nov 2011
              • 10

              #7
              Increased memory--its working--thanks a lot!

              Comment

              • bicodeuser
                Junior Member
                • Feb 2016
                • 4

                #8
                CollectRNAMetrices-Error

                I am getting very same error. Could you please let me know what could be the problem?


                Originally posted by mmcgo002 View Post
                Ok now I'm having problems getting results from RNAseqmetrics. I'm using my BAM file as a input:

                java -Xmx2g -jar CollectRNASeqMetrics.jar REF_FLAT=Monodelphis_refFlat.txt STRAND_SPECIFICITY=NONE VALIDATION_STRINGENCY=LENIENT CHART_OUTPUT=Monodelphis_chart.pdf INPUT=/Users/Papio/tophat_out_Monodelphis/accepted_hits.bam OUTPUT=Monodelphis_RNASeqMetrics


                But I keep getting an output that looks like this:

                ## net.sf.picard.metrics.StringHeader
                # net.sf.picard.analysis.CollectRnaSeqMetrics REF_FLAT=Monodelphis_refFlat.txt STRAND_SPECIFICITY=NONE CHART_OUTPUT=Monodelphis_chart.pdf INPUT=/Users/Papio/tophat_out_Monodelphis/accepted_hits.bam OUTPUT=Monodelphis_RNASeqMetrics VALIDATION_STRINGENCY=LENIENT MINIMUM_LENGTH=500 RRNA_FRAGMENT_PERCENTAGE=0.8 METRIC_ACCUMULATION_LEVEL=[ALL_READS] ASSUME_SORTED=true STOP_AFTER=0 VERBOSITY=INFO QUIET=false COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false
                ## net.sf.picard.metrics.StringHeader
                # Started on: Fri Dec 02 15:13:54 EST 2011

                ## METRICS CLASS net.sf.picard.analysis.RnaSeqMetrics
                PF_BASES PF_ALIGNED_BASES RIBOSOMAL_BASES CODING_BASES UTR_BASES INTRONIC_BASES INTERGENIC_BASES IGNORED_READS CORRECT_STRAND_READS INCORRECT_STRAND_READS PCT_RIBOSOMAL_BASES PCT_CODING_BASES PCT_UTR_BASES PCT_INTRONIC_BASES PCT_INTERGENIC_BASES PCT_MRNA_BASES PCT_USABLE_BASES PCT_CORRECT_STRAND_READS MEDIAN_CV_COVERAGE MEDIAN_5PRIME_BIAS MEDIAN_3PRIME_BIAS MEDIAN_5PRIME_TO_3PRIME_BIAS SAMPLE LIBRARY READ_GROUP
                1565548600 1565442430 0 0 0 1565442430 0 0 0 0 0 0 1 0 0 0 0 0 0 0

                I assume I'm using the refFlat format and have tried various variations to see if I'm doing anything wrong. I've also sorted the data etc... I'm positive I have mapped exons so why isn't it showing up??

                Comment

                • GenoMax
                  Senior Member
                  • Feb 2008
                  • 7142

                  #9
                  Are the chromosome ID's matching across your files?

                  Comment

                  • bicodeuser
                    Junior Member
                    • Feb 2016
                    • 4

                    #10
                    I shall post my reflat file
                    Ec-00_000010 Ec-00_000010 chr_00 - 149 6731 149 6731 10 149,897,1535,2091,2535,3474,4006,4702,6245,6709, 428,1100,1674,2268,3070,3557,4155,4968,6363,6731,
                    Ec-00_000020 Ec-00_000020 chr_00 - 28572 29122 28572 29122 2 28572,28937, 28582,29122,
                    Ec-00_000030 Ec-00_000030 chr_00 + 29412 32214 29412 32214 1 29412, 32214,
                    Ec-00_000040 Ec-00_000040 chr_00 + 34287 34360 34287 34360 1 34287, 34360,
                    And my bam file looks like
                    DHKW5DQ1:246:C1D7AACXX:1:1101:1480:2209 83 Ec-00_007150 1310 42 100M = 1274 -136 CAGTGAACAGTTTCATGTACCAGGAGGGCGGTAGCCAAGTACCTTGCCGTCAGCGTGTGAGAATTGCGGTCGGCGTCACAGNAGCAACCGTGTAGAAGGA @DDDCCDCDCCC>DEDDA>DCCDDDDDDBCCDDDDDEDDDCDDBDDDDDDDDFFHFHHHHIJJJIIJJIGIIJIJIGFFA3#JJIHFGHHHHFFFFF@CC AS:i:-1 XN:i:0 XM:i:1 XO:i:0 XG:i:0 NM:i:1 MD:Z:81C18 YS:i:0 YT:Z:CP
                    DHKW5DQ1:246:C1D7AACXX:1:1101:1480:2209 163 Ec-00_007150 1274 42 100M = 1310 136 ATGAAGCCAACGCAGCGAGCAAAGTCAGCCCCATCGCAGTGAACAGTTTCATGTACCAGGAGGGCGGTAGCCAAGTACCTTGCCGTCAGCGTGTGAGAAT +=?DFFFFHHHHHIJIJJJJIIGIJGJJJJJJIJJJJJIIIJJIGIEHHHIIJHHHHHHFFFDDDDDDDDDDDDDCDDCDCCDDDDDDDDDDDDDCDCDD AS:i:0 XN:i:0 XM:i:0 XO:i:0 XG:i:0 NM:i:0 MD:Z:100 YS:i:-1 YT:Z:CP

                    And by Chromosome ids you mean the ones like Ec-00_007150? the transcript ids right? I so i think they are similar.

                    Comment

                    • GenoMax
                      Senior Member
                      • Feb 2008
                      • 7142

                      #11
                      Chromosome name is in the 3rd column, which in your case is chr_00 and does not match your bam file.

                      Comment

                      • bicodeuser
                        Junior Member
                        • Feb 2016
                        • 4

                        #12
                        I tried to change and now it looks like
                        chr_00 Ec-00_000010 Ec-00_000010 - 149 6731 149 6731 10 149,897,1535,2091,2535,3474,4006,4702,6245,6709, 428,1100,1674,2268,3070,3557,4155,4968,6363,6731,
                        chr_00 Ec-00_000020 Ec-00_000020 - 28572 29122 28572 29122 2 28572,28937, 28582,29122,
                        chr_00 Ec-00_000030 Ec-00_000030 + 29412 32214 29412 32214 1 29412, 32214,
                        chr_00 Ec-00_000040 Ec-00_000040 + 34287 34360 34287 34360 1 34287, 34360,
                        chr_00 Ec-00_000050 Ec-00_000050 - 36705 39329 36705 37902 3 36705,37422,39143, 36870,37944,39329,
                        chr_00 Ec-00_000060 Ec-00_000060 + 43007 44099 43007 44099 3 43007,43404,43829, 43046,43455,44099,
                        but i still get the same output.
                        I am sorry. I bitnew to this and couldnt figure out what is going wrong?

                        Comment

                        • GenoMax
                          Senior Member
                          • Feb 2008
                          • 7142

                          #13
                          I am not exactly sure what is going wrong. I just tested this on a file and got an expected result.

                          Does you bam file have a header? What does the output of this look like?

                          Code:
                          $ samtools view -H your_bam

                          Comment

                          • bicodeuser
                            Junior Member
                            • Feb 2016
                            • 4

                            #14
                            for that the output looks like
                            @HD VN:1.0 SO:coordinate
                            @SQ SN:Ec-00_000010 LN:1971
                            @SQ SN:Ec-00_000020 LN:195
                            @SQ SN:Ec-00_000030 LN:2802
                            @SQ SN:Ec-00_000050 LN:871
                            @SQ SN:Ec-00_000060 LN:360
                            @SQ SN:Ec-00_000070 LN:964
                            @SQ SN:Ec-00_000080 LN:1908
                            @SQ SN:Ec-00_000090 LN:1366
                            @SQ SN:Ec-00_000100 LN:168
                            And this is the same id in my refflat file too

                            Comment

                            Latest Articles

                            Collapse

                            • SEQadmin2
                              From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
                              by SEQadmin2


                              Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


                              The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
                              ...
                              Yesterday, 10:05 AM
                            • SEQadmin2
                              Single-Cell Sequencing at an Inflection Point: Early Impacts of New Platforms and Emerging Trends
                              by SEQadmin2


                              With the launch of new single-cell sequencing platforms in 2026, the field stands at an exciting inflection point. This article surveys the most impactful advances in the field and discusses how they’re reshaping research in cancer, immunology, and beyond.


                              Introduction

                              Single-cell sequencing technologies have undergone remarkable advances over the past decade, transitioning from low-throughput experimental approaches to highly scalable platforms capable of...
                              05-22-2026, 06:42 AM
                            • SEQadmin2
                              Environmental Genomics in the Age of NGS: From Microbes to Conservation Strategies
                              by SEQadmin2

                              Studying ecosystems means dealing with complex, multi-species communities that are hard to observe at scale. This complexity, however, hides many important questions to be answered, from how biogeochemical cycles work and how climate change can affect species distribution to how conservation strategies can work best.


                              Genomics, particularly since the expansion of NGS, has transformed ecosystem ecology. By sequencing environmental DNA, we can now assess biodiversity without direct...
                              05-06-2026, 09:04 AM

                            ad_right_rmr

                            Collapse

                            News

                            Collapse

                            Topics Statistics Last Post
                            Started by SEQadmin2, Yesterday, 12:03 PM
                            0 responses
                            19 views
                            0 reactions
                            Last Post SEQadmin2  
                            Started by SEQadmin2, Yesterday, 11:40 AM
                            0 responses
                            14 views
                            0 reactions
                            Last Post SEQadmin2  
                            Started by SEQadmin2, 05-28-2026, 11:40 AM
                            0 responses
                            29 views
                            0 reactions
                            Last Post SEQadmin2  
                            Started by SEQadmin2, 05-26-2026, 10:12 AM
                            0 responses
                            31 views
                            0 reactions
                            Last Post SEQadmin2  
                            Working...