SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
problems with zero counts using cufflinks yaaraore Bioinformatics 1 08-07-2013 06:52 AM
Cufflinks extending/merging exons kbushley Bioinformatics 2 10-18-2011 10:39 PM
Coverage statistics on single exons gedoardo83 Bioinformatics 0 07-06-2011 06:40 AM
cufflinks probem: start pos of exons cbil Bioinformatics 2 03-25-2011 02:20 PM
How to convert cufflinks output to raw counts jebe RNA Sequencing 0 01-26-2010 12:29 PM

Reply
 
Thread Tools
Old 11-30-2011, 02:53 PM   #1
mmcgo002
Member
 
Location: Detroit, MI, USA

Join Date: Nov 2011
Posts: 10
Default How to generate counts/statistics such as % mapped to known exons f/ cufflinks output

I'm having trouble in searching the literature online etc... of finding a way to generate counts/statistics from my cufflinks run to brake down things like:

% mapped to known exons
% mapped to introns, etc.....

Does anyone know of any scripts that can do this?
mmcgo002 is offline   Reply With Quote
Old 12-01-2011, 02:57 PM   #2
Jon_Keats
Senior Member
 
Location: Phoenix, AZ

Join Date: Mar 2010
Posts: 279
Default

Picard RNAseqmetrics
Jon_Keats is offline   Reply With Quote
Old 12-01-2011, 08:03 PM   #3
mmcgo002
Member
 
Location: Detroit, MI, USA

Join Date: Nov 2011
Posts: 10
Default

As easy as that, thanks
mmcgo002 is offline   Reply With Quote
Old 12-02-2011, 12:27 PM   #4
mmcgo002
Member
 
Location: Detroit, MI, USA

Join Date: Nov 2011
Posts: 10
Default

Ok now I'm having problems getting results from RNAseqmetrics. I'm using my BAM file as a input:

java -Xmx2g -jar CollectRNASeqMetrics.jar REF_FLAT=Monodelphis_refFlat.txt STRAND_SPECIFICITY=NONE VALIDATION_STRINGENCY=LENIENT CHART_OUTPUT=Monodelphis_chart.pdf INPUT=/Users/Papio/tophat_out_Monodelphis/accepted_hits.bam OUTPUT=Monodelphis_RNASeqMetrics


But I keep getting an output that looks like this:

## net.sf.picard.metrics.StringHeader
# net.sf.picard.analysis.CollectRnaSeqMetrics REF_FLAT=Monodelphis_refFlat.txt STRAND_SPECIFICITY=NONE CHART_OUTPUT=Monodelphis_chart.pdf INPUT=/Users/Papio/tophat_out_Monodelphis/accepted_hits.bam OUTPUT=Monodelphis_RNASeqMetrics VALIDATION_STRINGENCY=LENIENT MINIMUM_LENGTH=500 RRNA_FRAGMENT_PERCENTAGE=0.8 METRIC_ACCUMULATION_LEVEL=[ALL_READS] ASSUME_SORTED=true STOP_AFTER=0 VERBOSITY=INFO QUIET=false COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false
## net.sf.picard.metrics.StringHeader
# Started on: Fri Dec 02 15:13:54 EST 2011

## METRICS CLASS net.sf.picard.analysis.RnaSeqMetrics
PF_BASES PF_ALIGNED_BASES RIBOSOMAL_BASES CODING_BASES UTR_BASES INTRONIC_BASES INTERGENIC_BASES IGNORED_READS CORRECT_STRAND_READS INCORRECT_STRAND_READS PCT_RIBOSOMAL_BASES PCT_CODING_BASES PCT_UTR_BASES PCT_INTRONIC_BASES PCT_INTERGENIC_BASES PCT_MRNA_BASES PCT_USABLE_BASES PCT_CORRECT_STRAND_READS MEDIAN_CV_COVERAGE MEDIAN_5PRIME_BIAS MEDIAN_3PRIME_BIAS MEDIAN_5PRIME_TO_3PRIME_BIAS SAMPLE LIBRARY READ_GROUP
1565548600 1565442430 0 0 0 1565442430 0 0 0 0 0 0 1 0 0 0 0 0 0 0

I assume I'm using the refFlat format and have tried various variations to see if I'm doing anything wrong. I've also sorted the data etc... I'm positive I have mapped exons so why isn't it showing up??
mmcgo002 is offline   Reply With Quote
Old 12-02-2011, 12:38 PM   #5
Jon_Keats
Senior Member
 
Location: Phoenix, AZ

Join Date: Mar 2010
Posts: 279
Default

My command


Code:
${JAVA_PATH} -Xmx2g -jar $HOME/local/bin/CollectRnaSeqMetrics.jar \
REF_FLAT=${REFFLAT_FILE} \
RIBOSOMAL_INTERVALS=${RIBOSOME_LIST} \
STRAND_SPECIFICITY=NONE \
REFERENCE_SEQUENCE=${REFERENCE_FILE} \
INPUT=${SAMPLE_NAME}_${GENOME_NAME}_${TOPHAT_TAG}.bam \ OUTPUT=${SAMPLE_NAME}_${GENOME_NAME}_${TOPHAT_TAG}_Picard_RNAseqMetrics.txt \
CHART_OUTPUT=${SAMPLE_NAME}_${GENOME_NAME}_${TOPHAT_TAG}_Picard_RNAseqMetrics.pdf \
VALIDATION_STRINGENCY=SILENT \
TMP_DIR=${TEMP_DIRECTORY}
Something seems off. You are missing a ribosomal_intervals and Reference_sequence. Not sure if that is the difference

Make sure the refflat file is formated correctly and the chromosome ids match your genome (ie. ensemble genome chromsome 1 = 1 vs UCSC chromosome 1 equals = chr1) as they must match
Code:
ENSG00000252921 ENST00000517112 18      -       23879078        23879219        23879219        23879219        1       23879078,       23879219,
ENSG00000207160 ENST00000384431 18      -       23946264        23946370        23946370        23946370        1       23946264,       23946370,
ENSG00000134504 ENST00000317932 18      -       24034873        24209206        24035706        24081199        5       24034873,24039583,24056478,24081035,24209110,   24035865,24039889,24056623,24081214,24209206,
ENSG00000134504 ENST00000417602 18      -       24034876        24128500        24035706        24128500        5       24034876,24039583,24056478,24081035,24126691,   24035865,24039889,24056623,24081214,24128500,
ENSG00000134504 ENST00000408011 18      -       24034876        24129399        24035706        24081199        5       24034876,24039583,24056478,24081035,24128854,   24035865,24039889,24056623,24081214,24129399,
ENSG00000252846 ENST00000517037 18      -       24166677        24166762        24166762        24166762        1       24166677,       24166762,
ENSG00000212367 ENST00000391065 18      -       24269280        24269493        24269493        24269493        1       24269280,       24269493,
ENSG00000171885 ENST00000383168 18      -       24432001        24445749        24436174        24445653        5       24432001,24440735,24441094,24442145,24445621,   24436453,24440816,24441259,24442560,24445749,
ENSG00000171885 ENST00000440832 18      -       24436121        24445679        24436174        24445653        6       24436121,24440735,24441094,24442145,24442340,24445621,  24436453,24440816,24441259,24442280,24442560,24445679,
ENSG00000171885 ENST00000383170 18      -       24436174        24445716        24436174        24445674        3       24436174,24442223,24445621,     24436444,24442560,24445716,
Jon_Keats is offline   Reply With Quote
Old 12-02-2011, 02:00 PM   #6
mmcgo002
Member
 
Location: Detroit, MI, USA

Join Date: Nov 2011
Posts: 10
Default

Great thanks--now I made sure it was in the correct format as well as including the reference sequence, but now it says:


Runtime.totalMemory()=2120679424
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
mmcgo002 is offline   Reply With Quote
Old 12-02-2011, 02:21 PM   #7
mmcgo002
Member
 
Location: Detroit, MI, USA

Join Date: Nov 2011
Posts: 10
Default

Increased memory--its working--thanks a lot!
mmcgo002 is offline   Reply With Quote
Old 02-03-2016, 07:50 AM   #8
bicodeuser
Junior Member
 
Location: Sweden

Join Date: Feb 2016
Posts: 4
Default CollectRNAMetrices-Error

I am getting very same error. Could you please let me know what could be the problem?


Quote:
Originally Posted by mmcgo002 View Post
Ok now I'm having problems getting results from RNAseqmetrics. I'm using my BAM file as a input:

java -Xmx2g -jar CollectRNASeqMetrics.jar REF_FLAT=Monodelphis_refFlat.txt STRAND_SPECIFICITY=NONE VALIDATION_STRINGENCY=LENIENT CHART_OUTPUT=Monodelphis_chart.pdf INPUT=/Users/Papio/tophat_out_Monodelphis/accepted_hits.bam OUTPUT=Monodelphis_RNASeqMetrics


But I keep getting an output that looks like this:

## net.sf.picard.metrics.StringHeader
# net.sf.picard.analysis.CollectRnaSeqMetrics REF_FLAT=Monodelphis_refFlat.txt STRAND_SPECIFICITY=NONE CHART_OUTPUT=Monodelphis_chart.pdf INPUT=/Users/Papio/tophat_out_Monodelphis/accepted_hits.bam OUTPUT=Monodelphis_RNASeqMetrics VALIDATION_STRINGENCY=LENIENT MINIMUM_LENGTH=500 RRNA_FRAGMENT_PERCENTAGE=0.8 METRIC_ACCUMULATION_LEVEL=[ALL_READS] ASSUME_SORTED=true STOP_AFTER=0 VERBOSITY=INFO QUIET=false COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false
## net.sf.picard.metrics.StringHeader
# Started on: Fri Dec 02 15:13:54 EST 2011

## METRICS CLASS net.sf.picard.analysis.RnaSeqMetrics
PF_BASES PF_ALIGNED_BASES RIBOSOMAL_BASES CODING_BASES UTR_BASES INTRONIC_BASES INTERGENIC_BASES IGNORED_READS CORRECT_STRAND_READS INCORRECT_STRAND_READS PCT_RIBOSOMAL_BASES PCT_CODING_BASES PCT_UTR_BASES PCT_INTRONIC_BASES PCT_INTERGENIC_BASES PCT_MRNA_BASES PCT_USABLE_BASES PCT_CORRECT_STRAND_READS MEDIAN_CV_COVERAGE MEDIAN_5PRIME_BIAS MEDIAN_3PRIME_BIAS MEDIAN_5PRIME_TO_3PRIME_BIAS SAMPLE LIBRARY READ_GROUP
1565548600 1565442430 0 0 0 1565442430 0 0 0 0 0 0 1 0 0 0 0 0 0 0

I assume I'm using the refFlat format and have tried various variations to see if I'm doing anything wrong. I've also sorted the data etc... I'm positive I have mapped exons so why isn't it showing up??
bicodeuser is offline   Reply With Quote
Old 02-03-2016, 07:58 AM   #9
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 6,550
Default

Are the chromosome ID's matching across your files?
GenoMax is offline   Reply With Quote
Old 02-03-2016, 08:05 AM   #10
bicodeuser
Junior Member
 
Location: Sweden

Join Date: Feb 2016
Posts: 4
Default

I shall post my reflat file
Quote:
Ec-00_000010 Ec-00_000010 chr_00 - 149 6731 149 6731 10 149,897,1535,2091,2535,3474,4006,4702,6245,6709, 428,1100,1674,2268,3070,3557,4155,4968,6363,6731,
Ec-00_000020 Ec-00_000020 chr_00 - 28572 29122 28572 29122 2 28572,28937, 28582,29122,
Ec-00_000030 Ec-00_000030 chr_00 + 29412 32214 29412 32214 1 29412, 32214,
Ec-00_000040 Ec-00_000040 chr_00 + 34287 34360 34287 34360 1 34287, 34360,
And my bam file looks like
Quote:
DHKW5DQ1:246:C1D7AACXX:1:1101:1480:2209 83 Ec-00_007150 1310 42 100M = 1274 -136 CAGTGAACAGTTTCATGTACCAGGAGGGCGGTAGCCAAGTACCTTGCCGTCAGCGTGTGAGAATTGCGGTCGGCGTCACAGNAGCAACCGTGTAGAAGGA @DDDCCDCDCCC>DEDDA>DCCDDDDDDBCCDDDDDEDDDCDDBDDDDDDDDFFHFHHHHIJJJIIJJIGIIJIJIGFFA3#[email protected] AS:i:-1 XN:i:0 XM:i:1 XO:i:0 XG:i:0 NM:i:1 MD:Z:81C18 YS:i:0 YT:Z:CP
DHKW5DQ1:246:C1D7AACXX:1:1101:1480:2209 163 Ec-00_007150 1274 42 100M = 1310 136 ATGAAGCCAACGCAGCGAGCAAAGTCAGCCCCATCGCAGTGAACAGTTTCATGTACCAGGAGGGCGGTAGCCAAGTACCTTGCCGTCAGCGTGTGAGAAT +=?DFFFFHHHHHIJIJJJJIIGIJGJJJJJJIJJJJJIIIJJIGIEHHHIIJHHHHHHFFFDDDDDDDDDDDDDCDDCDCCDDDDDDDDDDDDDCDCDD AS:i:0 XN:i:0 XM:i:0 XO:i:0 XG:i:0 NM:i:0 MD:Z:100 YS:i:-1 YT:Z:CP

And by Chromosome ids you mean the ones like Ec-00_007150? the transcript ids right? I so i think they are similar.
bicodeuser is offline   Reply With Quote
Old 02-03-2016, 08:13 AM   #11
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 6,550
Default

Chromosome name is in the 3rd column, which in your case is chr_00 and does not match your bam file.
GenoMax is offline   Reply With Quote
Old 02-03-2016, 09:03 AM   #12
bicodeuser
Junior Member
 
Location: Sweden

Join Date: Feb 2016
Posts: 4
Default

I tried to change and now it looks like
Quote:
chr_00 Ec-00_000010 Ec-00_000010 - 149 6731 149 6731 10 149,897,1535,2091,2535,3474,4006,4702,6245,6709, 428,1100,1674,2268,3070,3557,4155,4968,6363,6731,
chr_00 Ec-00_000020 Ec-00_000020 - 28572 29122 28572 29122 2 28572,28937, 28582,29122,
chr_00 Ec-00_000030 Ec-00_000030 + 29412 32214 29412 32214 1 29412, 32214,
chr_00 Ec-00_000040 Ec-00_000040 + 34287 34360 34287 34360 1 34287, 34360,
chr_00 Ec-00_000050 Ec-00_000050 - 36705 39329 36705 37902 3 36705,37422,39143, 36870,37944,39329,
chr_00 Ec-00_000060 Ec-00_000060 + 43007 44099 43007 44099 3 43007,43404,43829, 43046,43455,44099,
but i still get the same output.
I am sorry. I bitnew to this and couldnt figure out what is going wrong?
bicodeuser is offline   Reply With Quote
Old 02-03-2016, 10:58 AM   #13
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 6,550
Default

I am not exactly sure what is going wrong. I just tested this on a file and got an expected result.

Does you bam file have a header? What does the output of this look like?

Code:
$ samtools view -H your_bam
GenoMax is offline   Reply With Quote
Old 02-03-2016, 11:43 PM   #14
bicodeuser
Junior Member
 
Location: Sweden

Join Date: Feb 2016
Posts: 4
Default

for that the output looks like
Quote:
@HD VN:1.0 SO:coordinate
@SQ SN:Ec-00_000010 LN:1971
@SQ SN:Ec-00_000020 LN:195
@SQ SN:Ec-00_000030 LN:2802
@SQ SN:Ec-00_000050 LN:871
@SQ SN:Ec-00_000060 LN:360
@SQ SN:Ec-00_000070 LN:964
@SQ SN:Ec-00_000080 LN:1908
@SQ SN:Ec-00_000090 LN:1366
@SQ SN:Ec-00_000100 LN:168
And this is the same id in my refflat file too
bicodeuser is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 01:33 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2017, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO