Hi all
When do RNA-Seq analysis, after I have finished the alignment by Tophat and get the accpeted_hits.bam. Now I want to looked at if it is has differential expression at the corresponding area (We called it 'pseduo gene' which haven't been annotated by reference genome ). How could I do it ?
I think the best way is to add this pseduo gene to reference genome GTF file. And then I could play HTSeq-count or Cuffdiff to look at the differential expression of this gene .
Here is the effort I made so far . I have add the Chr21:100000-200000 as 'exon' (a pseudo gene) to the genes.gtf of Ensembl, rename the GTF file.
However, if I run the Cuffdiff directly, the output (DE result) didn't contain any my pseudo gene information. And if I ran HTSeq-count , it gave me the error:
It seems that I didn't add the pseduo gene properly. Does anyone know how to do it?
Thank you very much!
When do RNA-Seq analysis, after I have finished the alignment by Tophat and get the accpeted_hits.bam. Now I want to looked at if it is has differential expression at the corresponding area (We called it 'pseduo gene' which haven't been annotated by reference genome ). How could I do it ?
I think the best way is to add this pseduo gene to reference genome GTF file. And then I could play HTSeq-count or Cuffdiff to look at the differential expression of this gene .
Here is the effort I made so far . I have add the Chr21:100000-200000 as 'exon' (a pseudo gene) to the genes.gtf of Ensembl, rename the GTF file.
Code:
$ head genes_hello.gtf [B]21 protein_coding exon 100000 200000 . + . exon_id "ENSECAE00000000001"; exon_number "1"; gene_biotype "protein_coding"; gene_id "ENSECAG00000099999"; gene_name "HELLO"; gene_source "ensembl"; p_id "P99999"; transcript_id "ENSECAT00000099999"; transcript_name "HELLO-001"; transcript_source "ensembl"; tss_id "TSS9999";[/B] 1 protein_coding UTR 11193 11209 . + . gene_biotype "protein_coding"; gene_id "ENSECAG00000012421"; gene_name "SYCE1"; gene_source "ensembl"; p_id "P20975"; transcript_id "ENSECAT00000013004"; transcript_name "SYCE1-201"; transcript_source "ensembl"; tss_id "TSS1013"; 1 protein_coding exon 11193 11261 . + . exon_id "ENSECAE00000079002"; exon_number "1"; gene_biotype "protein_coding"; gene_id "ENSECAG00000012421"; gene_name "SYCE1"; gene_source "ensembl"; p_id "P20975"; transcript_id "ENSECAT00000013004"; transcript_name "SYCE1-201"; transcript_source "ensembl"; tss_id "TSS1013";
Code:
$ htseq-count -s yes aligned_sorted.sam genes_hello.gtf>gene_count.txt Error occured when processing GFF file (line 1 of file genes.gtf): need more than 1 value to unpack [Exception type: ValueError, raised in __init__.py:207] Error occured when processing GFF file (line 1 of file genes.gtf): need more than 1 value to unpack [Exception type: ValueError, raised in __init__.py:207]
Thank you very much!
Comment