Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • dexseq-count.py DEXSeq gives no results

    Hello,

    I am running into a bit of a problem with dexseq-count.py from DEXSeq.

    dexseq-count.py. was installed and it runs OK with the sorted bam file and the gtf I provide (see below) but there are no counts in the output - not even the exons are annotated in the output. My output file looks like:


    Code:
    _ambiguous	0
    _ambiguous_readpair_position	290728
    _empty	12569972
    _lowaqual	0
    _notaligned	0

    My sequencing data is paired and the bam file has been sorted by name (samtools sort -n file.bam). This is how it looks like:

    Code:
    HS21_13853:1:1101:1000:39323#6	99	Schisto_mansoni.Chr_ZW	29878411	50	97M3243N3M	=	29878440	3372	NGCCGAACCAACAGATAATTTGAACTAATAATGGATCAAATGAATAACTCAAATCAGCTGTATCCGAATAATAATACTGAATTAGTGATTTTTTATTTTT	!4=DDDFFHHHHHJIJJJJJJJHHIIIIIJJJJJFHJJJJJJIIIIJJJJJJJJJJJJJJHIJJJJHHHHHHHHFFFFFCDCEEECCADDEDDDDDDEED	AS:i:-1	XN:i:0	XM:i:1	XO:i:0	XG:i:0	NM:i:3	MD:Z:0T99	YT:Z:UU	XS:A:-	NH:i:1
    HS21_13853:1:1101:1000:39323#6	147	Schisto_mansoni.Chr_ZW	29878440	50	68M3243N32M	=	29878411	-3372	AATGGATCAAATGAATAACTCAAATCAGCTGTATCCGAATAATAATACTGAATTAGTGATTTTTTATTTTTTCTGCTTGACAAAGAATTTTTGGTATCCG	EDDDDDEEFEEFDDEDDCBAEEEE@EFFFFFFHHJJJJIHIHGJHIHJJJIGHIJJJJJJJJJJJJJJJJJIJJJJIJIIJJJJJJJHHHHHFFFFFCCC	AS:i:0	XM:i:0	XO:i:0	XG:i:0	MD:Z:100	NM:i:0	XS:A:-	NH:i:1
    HS21_13853:1:1101:1000:44900#6	89	Schisto_mansoni.Chr_3	22069811	50	100M	*	0	0	TGTTGGTTTGGCGATGGAGAGAGTGGACTGTGAAGAGGTGTAGTTGTATCGAGAGATGTGAGCGAGCGAGTGTATTTTTGAATTGGAGTGTACTGACTAN	<<B@?<B=B;8?:@A?>>AAB@@;;;;@BBBBBFFECDEEEDCC84=C8F?@BF>IIEEIIGDDDBIIFGEFHFIIIIHIIHEEFG?FC?FFDDBDB:1!	AS:i:-1	XN:i:0	XM:i:1	XO:i:0	XG:i:0	NM:i:1	MD:Z:99G0	YT:Z:UU	NH:i:1	XS:A:+
    HS21_13853:1:1101:1000:58791#6	99	Schisto_mansoni.Chr_ZW	51543719	50	19M34N81M	=	51543732	147	NTTCGAGATCCCGTTCCAGGTCTTACTCCAATTCTAGACAAAAGGGGGTCGAACCGTACTATCGTGGTCGCCCTAACGACCGCCGAGATGATTCACGAAG	!1=DDFFFHHHHHIJJIJJJEHIJIJJJJJJJJJJIIDHJJJJIJJJJHGIGHHFFCDDDEFDDCDDDDDDDDDDDDDDDDDDDB@BBDDCEEEDDDDBD	AS:i:-1	XM:i:1	XO:i:0	XG:i:0	MD:Z:0G99	NM:i:1	XS:A:+	NH:i:1
    HS21_13853:1:1101:1000:58791#6	147	Schisto_mansoni.Chr_ZW	51543732	50	6M34N94M	=	51543719	-147	TTCCAGGTCTTACTCCAATTCTAGACAAAAGGGGGTCGAACCGTACTATCGTGGTCGCCCTAACGACCGCCGAGATGATTCACGAAGGCGAATAAATAAT	:2BCCAAA>A?:A:C@ACA:>:@:8>DDB>DDB@?<7<?9B?:B@;<9;<?@B>>D?HHEAE;A@@IIIJIJJIJIIGGJIHJJJJJHHGHHFFFFFCCB	AS:i:0	XN:i:0	XM:i:0	XO:i:0	XG:i:0	NM:i:2	MD:Z:100	YT:Z:UU	XS:A:+	NH:i:1
    HS21_13853:1:1101:1001:7799#6	97	Schisto_mansoni.Chr_ZW	17759587	50	99M42N1M	=	17760095	608	NGTCTGACATTTTTTCAGTGCACCTCTCAGAGCATTTTCATTCATATGATAAAATTCTATTTCTTTCTCCAACTCGAAAACCTTTCTGTTTAGGTTTACC	!1=DDFFFHHHHHJJJJJIJJJJJJJJJJJJJJJJJJJIIIJIIIIJJIIJJJJJJIJJIJJJJJJJJJJJJJJJJJHHFFFFFEEEEEEEDDDDDDDDC	AS:i:-1	XN:i:0	XM:i:1	XO:i:0	XG:i:0	NM:i:1	MD:Z:0G99	YT:Z:UU	XS:A:-	NH:i:1
    HS21_13853:1:1101:1001:7799#6	145	Schisto_mansoni.Chr_ZW	17760095	50	100M	=	17759587	-608	GAGTATTAAACATCTCTAATAGGTATAAATTCTTGTCTCCAATCTGAGCAACTAAATCTCTGAGTTCATCATTTTCTTTACTGTTATTTTTCAACATTTG	CEEEEEEEFFFFFFFHHHHHHIIJJIJJIJJJJJIIIIIIHGEIIIIJIIJJJJIHFHFHIJJJJJJJJJJJJJIJJJIHGJJJJJJHHHFHFFFFFCCC	AS:i:-6	XN:i:0	XM:i:1	XO:i:0	XG:i:0	NM:i:1	MD:Z:21C78	YT:Z:UU	NH:i:1	XS:A:-
    HS21_13853:1:1101:1001:25299#6	99	Schisto_mansoni.Chr_ZW	32388361	50	100M	=	32388529	266	NGTTTTTGTAATTTTGACTGGTGTTACCTCAATTATGGGTAAACGAAGATCTAGATTATTCAATGTTTGTTGTCTTTTTCTTTTGGTAACATAAGGTGAA	!1=DDFFFHHHHHJJJJJJJJIHHIJJJJJJJIJJJJJJFHIIJJIIIHJIIJJIJJJJJHIJJJIJJJHIJHEHHHHFFFFFFEDDEDDDDDDDDCDDC	AS:i:-1	XN:i:0	XM:i:1	XO:i:0	XG:i:0	NM:i:1	MD:Z:0T99	YT:Z:UU	NH:i:1	XS:A:-
    HS21_13853:1:1101:1001:25299#6	147	Schisto_mansoni.Chr_ZW	32388529	50	48M2I50M	=	32388361	-266	CATAGGAAAGAATAGAGAAAAATCAAAATATAGATCTCAAAATAAACAATATATATATATATATAGCATAATACTAGAATCTATCAAAAAGTTAAAACTG	FFDECEHHHEHHEIGJHJJIHE@IJJIHGJIHHHFHGIJJIHBJJIGJIGIJJJJJIIJJJJJJJIIIHJIJIIIJJJIHJJJIJJJHHHHHFFFFFB@B	AS:i:-11	XN:i:0	XM:i:0	XO:i:1	XG:i:2	NM:i:2	MD:Z:98	YT:Z:UU	NH:i:1	XS:A:-
    HS21_13853:1:1101:1001:48797#6	97	Schisto_mansoni.Chr_1.unplaced.SC_0094	893948	50	33M20753N60M1378N7M	=	916349	22533	NTTTCTTTCGGAATTTTCATATGTATCGATCGCCCTCTGCAACTGTTCCCGTAAGTCTTTTTCCTTCATTTCCAAATCATCTGTAGATAGATACTGTATA	!1=DDFFFHHHHHJJJIJJJJJJJJJJIJJJIIJJIJJJJJIIHJJJFJIIJIIJJIHIIJJIGHHHHHHFDFFFFECEEEEEDEDEEEDDEEDDEDCED	AS:i:-1	XM:i:1	XO:i:0	XG:i:0	MD:Z:0G99	NM:i:1	XS:A:-	NH:i:1
    My gtf file is costume made - maybe here's where the problem lies:

    Code:
    Schisto_mansoni.SC_0612	chado	exon	10730	10930	.	+	0	gene_id	"Smp_205760";
    Schisto_mansoni.SC_0762	chado	exon	5085	5118	.	-	1	gene_id	"Smp_186560";
    Schisto_mansoni.SC_0762	chado	exon	5155	5342	.	-	0	gene_id	"Smp_186560";
    Schisto_mansoni.SC_0762	chado	exon	5632	5868	.	-	0	gene_id	"Smp_186560";
    Schisto_mansoni.SC_0762	chado	exon	5906	6023	.	-	1	gene_id	"Smp_186560";
    Schisto_mansoni.SC_0762	chado	exon	6055	6168	.	-	1	gene_id	"Smp_186560";
    Schisto_mansoni.SC_0762	chado	exon	6208	6285	.	-	1	gene_id	"Smp_186560";
    Schisto_mansoni.SC_0762	chado	exon	6321	6430	.	-	0	gene_id	"Smp_186560";
    Schisto_mansoni.SC_0762	chado	exon	6471	6479	.	-	0	gene_id	"Smp_186560";
    Schisto_mansoni.SC_0307	chado	exon	14907	15752	.	+	0	gene_id	"Smp_139870";
    Finally, the command I used to run dexseq_counts.py

    Code:
    python ~/bin/dexseq_count.py -p yes -s yes -f bam -r name in_annotation.gtf file.sorted.bam counts.out
    I've tried with switching the -s flag but that doesn't seem to make any difference. These same sample I have run before using HTseq_counts and it works a treat.

    Any help would be much appreciated.
    Thanks,
    Anna

  • #2
    Shouldn't the first field of the GTF file correspond to the seqnames?

    The third field of the BAM file contains the seqnames Schisto_mansoni.Chr_ZW, Schisto_mansoni.Chr_3, ...
    The first field of the GTF file contains the seqnames Schisto_mansoni.SC_0612, Schisto_mansoni.SC_0762, ...

    The seqnames do not seem to correspond.
    This would seem to be at least one apparent problem.

    You can check the proper format for a GTF file here.

    The format for a BAM file is defined here.
    Last edited by blancha; 07-02-2015, 10:06 AM. Reason: Added links describing the file formats

    Comment


    • #3
      Hi Blancha,

      Thanks for your comments.

      Agreed, the bam and gtf files do look different. But these are only the ten first lines of the files. Both the bam and the gtf have the information for the mapping and annotation for the whole genome. As far as know, the bam file needs to be sorted by read name and there is no specific order for the gtf.

      In addition, I have used the same bam and the same gtf (albeit slightly modified) to run HTseq-counts and it work fine.

      Cheers,

      Anna

      Comment


      • #4
        As expected, the problem was in the GFF file.

        It was probably me just not reading/interpreting the directions properly, but I suspect this could happen to other people as well.

        Be aware that:

        1) your GFF (input for dexseq_prepare_annotation.py) MUST contain a transcript_id tag in the last column, as well as a gene_id
        2) it is possible that gene_id and transcript_id are expected in that order and not the other way around.

        Your input GFF should look like:

        Code:
        Schisto_mansoni.SC_0612	chado	exon	10730	10930	.	+	0	gene_id	"Smp_205760";  transcript_id "Smp_205760.1";
        Schisto_mansoni.SC_0762	chado	exon	5085	5118	.	-	1	gene_id	"Smp_186560"; transcript_id "Smp_186560.1";
        Schisto_mansoni.SC_0762	chado	exon	5155	5342	.	-	0	gene_id	"Smp_186560"; transcript_id "Smp_186560.1";
        Schisto_mansoni.SC_0762	chado	exon	5632	5868	.	-	0	gene_id	"Smp_186560"; transcript_id "Smp_186560.1";
        Schisto_mansoni.SC_0762	chado	exon	5906	6023	.	-	1	gene_id	"Smp_186560"; transcript_id "Smp_186560.1";
        Schisto_mansoni.SC_0762	chado	exon	6055	6168	.	-	1	gene_id	"Smp_186560"; transcript_id "Smp_186560.1";
        Schisto_mansoni.SC_0762	chado	exon	6208	6285	.	-	1	gene_id	"Smp_186560"; transcript_id "Smp_186560.1";
        Schisto_mansoni.SC_0762	chado	exon	6321	6430	.	-	0	gene_id	"Smp_186560"; transcript_id "Smp_186560.1";
        Schisto_mansoni.SC_0762	chado	exon	6471	6479	.	-	0	gene_id	"Smp_186560"; transcript_id "Smp_186560.1";
        Schisto_mansoni.SC_0307	chado	exon	14907	15752	.	+	0	gene_id	"Smp_139870"; transcript_id "Smp_139870.1";

        Comment

        Latest Articles

        Collapse

        • seqadmin
          Strategies for Sequencing Challenging Samples
          by seqadmin


          Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
          03-22-2024, 06:39 AM
        • seqadmin
          Techniques and Challenges in Conservation Genomics
          by seqadmin



          The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

          Avian Conservation
          Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
          03-08-2024, 10:41 AM

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by seqadmin, Yesterday, 06:37 PM
        0 responses
        10 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, Yesterday, 06:07 PM
        0 responses
        9 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 03-22-2024, 10:03 AM
        0 responses
        50 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 03-21-2024, 07:32 AM
        0 responses
        67 views
        0 likes
        Last Post seqadmin  
        Working...
        X