Hi everyone
I used Cufflinks in the following work-flow:
CuffLinks -> Cuffmerge -> CuffDiff
Finally I got gene_exp.diff file.
1. I extract significant transcripts from gene_exp.diff and then extract the fourth column, the locus in genome, to make a list (list_significant).
2. use bedtools getfasta -fi ref_genome.fasta -bed cuffmerge_out/merged.gtf -fo pool.fa to extract the sequences corresponding to the annotation in merged.gtf.
3. Use perl script to extract the sequences of list_significant from pool.fa.
Then I met some problems:
1. The annatation files, ethier transcripts.gtf or merged.gtf, annotate genes exon by exon. For example, gene1 exon1 chr1: 1-500;
gene1 exon2 chr1: 2000-5000.
But in gene_exp.diff file, each gene only appear once with all exons, for eaxmple, gene1 chr1: 1-5000. all genes in gene_exp.diff with only one exon are extracted. But for genes with multiple exons like gene1 above, I can't get them. How can I extract the sequences of genes with multiple exons?
2. In my case, I have 3000 defferentially expressed genes. 2800 genes of them are single-exon genes. The other 200 genes are multiple-exons gene. Does anyone can explain this result for me? This looks weird.
Thanks
I used Cufflinks in the following work-flow:
CuffLinks -> Cuffmerge -> CuffDiff
Finally I got gene_exp.diff file.
1. I extract significant transcripts from gene_exp.diff and then extract the fourth column, the locus in genome, to make a list (list_significant).
2. use bedtools getfasta -fi ref_genome.fasta -bed cuffmerge_out/merged.gtf -fo pool.fa to extract the sequences corresponding to the annotation in merged.gtf.
3. Use perl script to extract the sequences of list_significant from pool.fa.
Then I met some problems:
1. The annatation files, ethier transcripts.gtf or merged.gtf, annotate genes exon by exon. For example, gene1 exon1 chr1: 1-500;
gene1 exon2 chr1: 2000-5000.
But in gene_exp.diff file, each gene only appear once with all exons, for eaxmple, gene1 chr1: 1-5000. all genes in gene_exp.diff with only one exon are extracted. But for genes with multiple exons like gene1 above, I can't get them. How can I extract the sequences of genes with multiple exons?
2. In my case, I have 3000 defferentially expressed genes. 2800 genes of them are single-exon genes. The other 200 genes are multiple-exons gene. Does anyone can explain this result for me? This looks weird.
Thanks