Hi!
I hope these questions won't be to stupid as I'm pretty new in the Bioinformatics field. And if these questions have been answered please point me in the right direction as I haven't found it.
So the project I'm working on needs expression values for transcripts, genes or proteins and I am suppose to look in to high throughput sequencing data. Mainly RNA-Seq in this case.
So I have sett up bowtie->tophat->cufflinks to extract expression values for transcripts and i put in a fastq file in one end and get a couple of output files in the other.
I use the basic commands
and
1) Is this a good enough approach? Is there some flags that are common to use to get better(more accurate) results? Is there something I generally should think about?
2) When i get my output (genes.expr, transcripts.expr, transcripts.gtf) files. How do I map cufflinks internal gene ID to a real ID found at for example ensemble or other more general sources.
3) When i find for example an already processed .bam file at the GEO-Database and try to run it through cufflinks it spits out mainly errors. I assume there is more then one type of .bam formats. How do i convert (if possible) to something cufflinks accepts.
4) Does cufflinks accept other types of processed files like wiggle files or .bed files and convert to expression values for transcirpts? Is there any other programs that does or do these files not contain that type of information. The main thing here is if there is a way to cheat a couple of steps and lower the risk of making errors in the procedure.
Any help is appreciated and I hope my neewbiness doesn't shine through all that much. Thanks
I hope these questions won't be to stupid as I'm pretty new in the Bioinformatics field. And if these questions have been answered please point me in the right direction as I haven't found it.
So the project I'm working on needs expression values for transcripts, genes or proteins and I am suppose to look in to high throughput sequencing data. Mainly RNA-Seq in this case.
So I have sett up bowtie->tophat->cufflinks to extract expression values for transcripts and i put in a fastq file in one end and get a couple of output files in the other.
I use the basic commands
Code:
$tophat d_mel_genome_fb5_22/d_melanogaster_fb5_22 SRR034813.fastq
Code:
$cufflinks ../tophat_out/accepted_hits.bam
2) When i get my output (genes.expr, transcripts.expr, transcripts.gtf) files. How do I map cufflinks internal gene ID to a real ID found at for example ensemble or other more general sources.
3) When i find for example an already processed .bam file at the GEO-Database and try to run it through cufflinks it spits out mainly errors. I assume there is more then one type of .bam formats. How do i convert (if possible) to something cufflinks accepts.
4) Does cufflinks accept other types of processed files like wiggle files or .bed files and convert to expression values for transcirpts? Is there any other programs that does or do these files not contain that type of information. The main thing here is if there is a way to cheat a couple of steps and lower the risk of making errors in the procedure.
Any help is appreciated and I hope my neewbiness doesn't shine through all that much. Thanks
Comment