Hi all
When do RNA-Seq analysis, after I have finished the alignment by Tophat and get the accpeted_hits.bam. Now I want to looked at if it is has differential expression at the corresponding area (We called it 'pseduo gene' which haven't been annotated by reference genome ). How could I do it ?
I think the best way is to add this pseduo gene to reference genome GTF file. And then I could play HTSeq-count or Cuffdiff to look at the differential expression of this gene .
Here is the effort I made so far . I have add the Chr21:100000-200000 as 'exon' (a pseudo gene) to the genes.gtf of Ensembl, rename the GTF file.
However, if I run the Cuffdiff directly, the output ( i.e. gene_exp.diff) didn't contain any my pseudo gene information. And if I ran HTSeq-count , it gave me the error:And if I ran HTSeq-count , it gave me the error:
[CODE]$ htseq-count -s yes aligned_sorted.sam genes_hello.gtf>gene_count.txt
Error occured when processing GFF file (line 1 of file genes.gtf):
need more than 1 value to unpack
[Exception type: ValueError, raised in __init__.py:207]
It seems that I didn't add the pseduo gene properly. Does anyone know how to do it?
Thank you very much!
When do RNA-Seq analysis, after I have finished the alignment by Tophat and get the accpeted_hits.bam. Now I want to looked at if it is has differential expression at the corresponding area (We called it 'pseduo gene' which haven't been annotated by reference genome ). How could I do it ?
I think the best way is to add this pseduo gene to reference genome GTF file. And then I could play HTSeq-count or Cuffdiff to look at the differential expression of this gene .
Here is the effort I made so far . I have add the Chr21:100000-200000 as 'exon' (a pseudo gene) to the genes.gtf of Ensembl, rename the GTF file.
Code:
$ head genes_hello.gtf [B]29 protein_coding exon 28820825 28834944 . + . exon_id "ENSECAE00000000001"; exon_number "1"; gene_biotype "protein_coding"; gene_id "ENSECAG00000099999"; gene_name "HELLO"; gene_source "ensembl"; p_id "P99999"; transcript_id "ENSECAT00000099999"; transcript_name "HELLO-001"; transcript_source "ensembl"; tss_id "TSS9999";[/B] 1 protein_coding UTR 11193 11209 . + . gene_biotype "protein_coding"; gene_id "ENSECAG00000012421"; gene_name "SYCE1"; gene_source "ensembl"; p_id "P20975"; transcript_id "ENSECAT00000013004"; transcript_name "SYCE1-201"; transcript_source "ensembl"; tss_id "TSS1013"; 1 protein_coding exon 11193 11261 . + . exon_id "ENSECAE00000079002"; exon_number "1"; gene_biotype "protein_coding"; gene_id "ENSECAG00000012421"; gene_name "SYCE1"; gene_source "ensembl"; p_id "P20975"; transcript_id "ENSECAT00000013004"; transcript_name "SYCE1-201"; transcript_source "ensembl"; tss_id "TSS1013"; 1 protein_coding transcript 11193 15975 . + . gene_biotype "protein_coding"; gene_id "ENSECAG00000012421"; gene_name "SYCE1"; gene_source "ensembl"; p_id "P20975"; transcript_id "ENSECAT00000013004"; transcript_name "SYCE1-201"; transcript_source "ensembl"; tss_id "TSS1013";
[CODE]$ htseq-count -s yes aligned_sorted.sam genes_hello.gtf>gene_count.txt
Error occured when processing GFF file (line 1 of file genes.gtf):
need more than 1 value to unpack
[Exception type: ValueError, raised in __init__.py:207]
It seems that I didn't add the pseduo gene properly. Does anyone know how to do it?
Thank you very much!
Comment