Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Picard Tools "loaded 0 genes"

    I am testing the CollectRnaSeqMetrics script in Picard Tools for the first time and received the following error message:


    INFO Date Time CollectRnaSeqMetrics Loaded 0 genes.

    Then the output begins to show “Processed 1,000,000 records” and so on. The file created at the end of the process includes numbers for PF_Bases, PF_Aligned_Bases, and Intergenic_bases but all other fields (Ribosomal_bases, coding_bases, pct_UTR_bases, etc) contain 0’s.

    What am I doing wrong? All help is appreciated, thanks.

  • #2
    Likely the reflat file and/or ribosome reference file is not formatted correctly. Which genome are you aligning against?

    reflat example
    Code:
    #geneName	name	chrom	strand	txStart	txEnd	cdsStart	cdsEnd	exonCount	exonStarts	exonEnds
    WASH7P	NR_024540	chr1	-	14361	29370	29370	29370	11	14361,14969,15795,16606,16857,17232,17605,17914,18267,24737,29320,	14829,15038,15947,16765,17055,17368,17742,18061,18366,24891,29370,
    FAM138A	NR_026818	chr1	-	34610	36081	36081	36081	3	34610,35276,35720,	35174,35481,36081,
    FAM138F	NR_026820	chr1	-	34610	36081	36081	36081	3	34610,35276,35720,	35174,35481,36081,
    OR4F5	NM_001005484	chr1	+	69090	70008	69090	70008	1	69090,	70008,
    LOC729737	NR_039983	chr1	-	136697	140566	140566	140566	5	136697,136805,136952,139789,140074,	136756,136854,139696,139847,140566,
    LOC100132287	NR_028322	chr1	+	323891	328581	328581	328581	3	323891,324287,324438,	324060,324345,328581,
    LOC100133331	NR_028327	chr1	+	323891	328581	328581	328581	4	323891,324287,324438,327035,	324060,324345,326938,328581,
    LOC100132062	NR_028325	chr1	+	323891	328581	328581	328581	3	323891,324287,324438,	324060,324345,328581,
    OR4F3	NM_001005224	chr1	+	367658	368597	367658	368597	1	367658,	368597,
    OR4F16	NM_001005277	chr1	+	367658	368597	367658	368597	1	367658,	368597,
    OR4F29	NM_001005221	chr1	+	367658	368597	367658	368597	1	367658,	368597,
    OR4F16	NM_001005277	chr1	-	621095	622034	621095	622034	1	621095,	622034,
    ribosome location file
    Code:
    @SQ	SN:chr1	LN:249250621
    @SQ	SN:chr10	LN:135534747
    @SQ	SN:chr11	LN:135006516
    @SQ	SN:chr12	LN:133851895
    @SQ	SN:chr13	LN:115169878
    @SQ	SN:chr14	LN:107349540
    @SQ	SN:chr15	LN:102531392
    @SQ	SN:chr16	LN:90354753
    @SQ	SN:chr17	LN:81195210
    @SQ	SN:chr18	LN:78077248
    @SQ	SN:chr19	LN:59128983
    @SQ	SN:chr2	LN:243199373
    @SQ	SN:chr20	LN:63025520
    @SQ	SN:chr21	LN:48129895
    @SQ	SN:chr22	LN:51304566
    @SQ	SN:chr3	LN:198022430
    @SQ	SN:chr4	LN:191154276
    @SQ	SN:chr5	LN:180915260
    @SQ	SN:chr6	LN:171115067
    @SQ	SN:chr7	LN:159138663
    @SQ	SN:chr8	LN:146364022
    @SQ	SN:chr9	LN:141213431
    @SQ	SN:chrM	LN:16571
    @SQ	SN:chrX	LN:155270560
    @SQ	SN:chrY	LN:59373566
    chr18	9844714	9844843	-	ENSG00000252680
    chr18	9923917	9924018	+	ENSG00000223138
    chr18	19508946	19509055	-	ENSG00000222520
    chr18	21191674	21191827	+	ENSG00000240442

    Comment


    • #3
      Thanks!

      Hi Jon,

      You are correct - formatting is the problem. I am aligning to hg19. The bam file does not include the "chr" in "chr1". Any suggestions on how to change the formatting of an existing file or recreate the file with the proper formatting?

      Thanks!

      Comment


      • #4
        If the bam file does not contain chr under the chromosome column then you didn't align to hg19 downloaded from UCSC. Sometimes people get the reference file from say ensembl but then rename it hg19 because they don't like the long description used. This obviously is bad practice as there are subtle differences in the two version chr1 vs 1 and chrM vs MT. If you want do a "samtools view -H YOURBAMFILE.bam" post it or PM me and if I have the matching genome I'll send you the required files that should work.

        Comment

        Latest Articles

        Collapse

        • seqadmin
          Current Approaches to Protein Sequencing
          by seqadmin


          Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
          04-04-2024, 04:25 PM
        • seqadmin
          Strategies for Sequencing Challenging Samples
          by seqadmin


          Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
          03-22-2024, 06:39 AM

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by seqadmin, 04-11-2024, 12:08 PM
        0 responses
        18 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-10-2024, 10:19 PM
        0 responses
        22 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-10-2024, 09:21 AM
        0 responses
        17 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-04-2024, 09:00 AM
        0 responses
        49 views
        0 likes
        Last Post seqadmin  
        Working...
        X