![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
transcripts missing from cufflinks | djm2007 | RNA Sequencing | 3 | 05-30-2013 12:39 AM |
Cufflinks Annotated Transcripts | magarolo | RNA Sequencing | 1 | 12-08-2011 02:34 AM |
Visualizing novel transcripts through cufflinks | adrian | Bioinformatics | 0 | 06-13-2011 09:07 AM |
transcripts by cufflinks and cuffdiff | mrfox | Bioinformatics | 1 | 11-22-2010 06:44 PM |
Need Reference Annotation for MCF-7 Cell Line Transcripts | AndrewCarpenter | Bioinformatics | 0 | 07-20-2010 07:52 AM |
![]() |
|
Thread Tools |
![]() |
#1 |
Junior Member
Location: Jena Join Date: Nov 2011
Posts: 5
|
![]()
Hello everyone,
I have a little problem with using the cufflinks pipeline. My data consists of RNA Seq reads (one experiment single end and one experiment paired end) which have to be mapped against the human hg19 genome and finally assembled to transcripts. I started my experiments with the new tophat version (1.4.1) to map the reads. I tried it with an annotation file (Illumina package for cufflinks with ncbi annotation) and without one. There seem to be no problem with the mapping, so i started the next step using cufflinks with the same annotation file. And here is the problem. My output consists of two typs of transcripts, exons... . I have transcripts which have a name like in the annotation file, but an FPKM value 0 and transcripts with an cufflinks identifier, that have an FPKM > 0. I tried also using cuffcompare to map the annotation to the cufflinks transcripts but nothing happens. I have tried three different annotations: ensembl, refseq, ucsc; 2 different parameters for the annotation (-g and -G) and also use the -b option for bias correction. But I dont understand why cufflinks do not label my transcripts with one of the annotation names. Have anybody the same problem and/or any solution for me? Here are my commands Code:
cufflinks -p 4 -o ../cufflinks.out.deNovo/ -b ../../../human_genom/Illumina_paket/Homo_sapiens/NCBI/build37.2/Sequence/WholeGenomeFasta/genome.fa -g ../../../human_genom/Illumina_paket/Homo_sapiens/NCBI/build37.2/Annotation/Genes/genes.gtf accepted_hits.bam You are using Cufflinks v1.3.0, which is the most recent release. [16:48:35] Loading reference annotation. [16:48:41] Inspecting reads and determining fragment length distribution. > Processed 408173 loci. [*************************] 100% > Map Properties: > Total Map Mass: 141260632.27 > Fragment Length Distribution: Empirical (learned) > Estimated Mean: 161.78 > Estimated Std Dev: 47.67 [17:55:29] Assembling transcripts and estimating abundances. > Processed 408173 loci. [*************************] 100% [18:51:09] Loading reference annotation and sequence. Warning: couldn't find fasta record for 'chr1'! This contig will not be bias corrected. Warning: couldn't find fasta record for 'chr10'! This contig will not be bias corrected. ... Warning: couldn't find fasta record for 'Un|NT_167236.1'! This contig will not be bias corrected. [18:51:53] Learning bias parameters. > Processed 66680 loci. [*************************] 100% [19:48:34] Re-estimating abundances with bias correction. > Processed 66680 loci. Code:
cufflinks -p 4 -b /Daten2/rna-seq/human_genom/Illumina_paket/Homo_sapiens/NCBI/build37.2/Sequence/WholeGenomeFasta/genome.fa -o cufflinks.out.ensembl -g ../../human_genom/Homo_sapiens_ENSEMBL.gtf tophat.out.sample1.reference/accepted_hits.bam Code:
chr feature start end score strand gene_id transcript_id FPKM frac conf_lo conf_hi cov full_read_support chr1 transcript 135727 136178 1000 . CUFF.3 CUFF.3.1 0.2633463896 1 0.104543 0.42215 1.920432 yes chr1 transcript 165727 169225 1 - CUFF.2 CUFF.2.1 0 0 0 0 0 yes chr1 transcript 167772 169225 1000 - CUFF.2 CUFF.2.2 0.9961441143 1 0.769102 1.223186 11 transcript 57219163 57219495 1 + ENSG00000222998 ENST00000411066 0 0 0 0 0 no 11 transcript 57174429 57194523 1 - ENSG00000134802 ENST00000395123 0 0 0 0 0 no 11 transcript 57174429 57194594 1 - ENSG00000134802 ENST00000395124 0 0 0 0 0 no |
![]() |
![]() |
![]() |
#2 |
Senior Member
Location: USA, Midwest Join Date: May 2008
Posts: 1,178
|
![]()
Psikon,
Your hg19 reference file has chromosome names in the format 'chr1' so that is the format they would be in the BAM file(s) generated by TopHat. Whereas the Ensembl GTF you showed in the example has chromosome names as just the numbers (e.g. '11'). Since the alignments in your BAM file(s) are to references named 'chrN' and the annotations in the GTF file are to references named just 'N' Cufflinks naturally won't be able to match them. You need to ensure that the chromosome names in the reference FASTA you use to map exactly match the names in the annotation file. This is typically best done by getting your reference and annotation from the same source. |
![]() |
![]() |
![]() |
#3 |
Junior Member
Location: Jena Join Date: Nov 2011
Posts: 5
|
![]()
Thank you for your reply. Is it also possible to convert the annotation from ensembl to the needs of cufflinks. The only thing I have to do is to convert the chromosome names from e.g. "11" to "chr11", isn't it?
|
![]() |
![]() |
![]() |
#4 |
Member
Location: Newcastle upon Tyne Join Date: Aug 2009
Posts: 18
|
![]()
Alternatively, I think you can go to cufflinks page to download hg19 annotion set and use the GTF file in the tar ball as the annotation.
http://cufflinks.cbcb.umd.edu/igenomes.html |
![]() |
![]() |
![]() |
#5 |
Junior Member
Location: Jena Join Date: Nov 2011
Posts: 5
|
![]()
Sorry for the long delay. I have tested your suggestion and my machine is not the fastest. Also my Datasets are very huge. I have now downloaded two Igenome packages. One from NCBI and one from Ensembl. For my analysis i need the ensembl annotation. But after inspecting the annotation file i recognized that the chromsome names are not in the format of cufflinks.
I have started an test run with with this command: Code:
cufflinks -p 6 -o ../cufflinks.out.ensembl -g /Daten2/rna-seq/human_genom/Illumina_paket/Homo_sapiens/Ensembl/GRCh37/Annotation/Genes/genes.gtf -b /Daten2/rna-seq/human_genom/Illumina_paket/Homo_sapiens/Ensembl/GRCh37/Sequence/WholeGenomeFasta/genome.fa accepted_hits-sorted.bam Take the labeling part at the end of the run or just in time? If it just in time the format of the annotation file is also wrong like the other ones. I need help please. I dont understand why there is no labeling or how i must change my annotation to label the transcripts. The problem also is, how is the command for changing the chromosome names in the annotation.gtf. I tried to change the chromosome names with R. Change the <name> to chr<name>. Is it enough for cufflinks or it there anything i forgot. The basis file for this manipulation was the iGenome annotation from ensembl. Last edited by Psikon; 03-28-2012 at 08:04 AM. Reason: Forgot something |
![]() |
![]() |
![]() |
#6 |
Junior Member
Location: Jena Join Date: Nov 2011
Posts: 5
|
![]()
Hello everybody,
Thank you for your advises. The annotation now works fine and i am really happy that i can go on with my analysis. Thank you very much. |
![]() |
![]() |
![]() |
#7 | |
Junior Member
Location: US Join Date: Mar 2013
Posts: 9
|
![]()
Hello,
I know this is an older topic but I had the exact same issue and this thread solved it. I wrote a very simple Perl script to convert GFF's which don't have "chr" in the chromosome number to those that do to resolve this. So no one has to duplicate my work, the code is attached below. Its not the most elegant solution but it worked for me! Quote:
|
|
![]() |
![]() |
![]() |
#8 |
Member
Location: Seoul, Korea Join Date: Feb 2013
Posts: 57
|
![]()
I got the same problem, I read your previous tags but just confused... I am using latest version of cufflinks 2.1.1 and all data from ensembl.
Warning: couldn't find fasta record for 'HSCHR9_3_CTG35'! This contig will not be bias corrected. if my .bam file is in different format then can I try converting in .sam. will it work or you have another solution... please do reply thanks |
![]() |
![]() |
![]() |
#9 | |
Junior Member
Location: US Join Date: Mar 2013
Posts: 9
|
![]() Quote:
What does your transcripts.gtf file look like? |
|
![]() |
![]() |
![]() |
#10 |
Member
Location: Seoul, Korea Join Date: Feb 2013
Posts: 57
|
![]()
Dear Helical
I am gettin that error for most of my genes, not one specific at cuffmerge and cuffdiff. |
![]() |
![]() |
![]() |
#11 |
Junior Member
Location: US Join Date: Mar 2013
Posts: 9
|
![]()
That is strange. When you run cufflinks, are you running it with -g (use reference transcripts and assemble new ones) or -G (only use reference transcripts)? I am no expert but I imagine this error could arise depending on how you mapped to the known transcripts.
|
![]() |
![]() |
![]() |
#12 | |
Member
Location: Seoul, Korea Join Date: Feb 2013
Posts: 57
|
![]() Quote:
these seems to be lincRNA or pseudo... and excluded from process by cufflinks... do you know something about what will happen if I just continue because these are just warnings not errors.....?? and what if I want to include these lincRNAs ?? please do reply |
|
![]() |
![]() |
![]() |
#13 |
Senior Member
Location: East Coast USA Join Date: Feb 2008
Posts: 7,080
|
![]()
Charitra used the -G option based on the log file provided in the other post.
|
![]() |
![]() |
![]() |
#14 |
Member
Location: Seoul, Korea Join Date: Feb 2013
Posts: 57
|
![]() |
![]() |
![]() |
![]() |
#15 |
Member
Location: Seoul, Korea Join Date: Feb 2013
Posts: 57
|
![]()
I got confusions again and posted here http://seqanswers.com/forums/showthr...818#post102818
is there anything to add large bundle ? I got he answer for my questions... cufflinks exclude the large bundle possibly pseudogene and the default value is good enough for dealing with human and mouse genome... and if so then these warnings which I have are good to exclude or ignore... and I don't have to include as a asked here. Thank you GenoMax and Helical for you very kind help in this great forum. However, I will put my q here again if my differentially expressed genes are not good enough .. Second thing is for me to know about sorting .bam file using sam tools.... is one command is enough to sort .bam files... samtools sort aln.bam aln.sorted or there are bunch of commands to do complete sorting...? Last edited by Charitra; 04-26-2013 at 06:42 PM. |
![]() |
![]() |
![]() |
#16 |
Senior Member
Location: NikoNarita.jp Join Date: Jul 2013
Posts: 142
|
![]()
Hi
may I get your views on this thread of my question: http://seqanswers.com/forums/showthr...098#post112098 |
![]() |
![]() |
![]() |
#17 | |
Senior Member
Location: NikoNarita.jp Join Date: Jul 2013
Posts: 142
|
![]()
Hi all
I am having the same problem... Warning: couldn't find fasta record for 'NT_166469'! This contig will not be bias corrected. Warning: couldn't find fasta record for 'X'!Warning: couldn't find fasta record for 'Y'! This contig will not be bias corrected. Initially, I used ensemble ref and got this warning and then I used UCSC genome.fa. Both gives me the same error. Why ?? Quote:
|
|
![]() |
![]() |
![]() |
Thread Tools | |
|
|