SEQanswers

SEQanswers (http://seqanswers.com/forums/index.php)
-   Bioinformatics (http://seqanswers.com/forums/forumdisplay.php?f=18)
-   -   Cufflinks/Cuffmerge/Cuffdiff error 'subsequence cannot be larger than 16338' (http://seqanswers.com/forums/showthread.php?t=50804)

super0925 03-05-2015 04:33 AM

Cufflinks/Cuffmerge/Cuffdiff error 'subsequence cannot be larger than 16338'
 
Hi all
I'm doing DE analysis on cow samples.
I use the reference genome of Ensembl UMD3.1, which I thought is the latest version.
However, when I ran Tophat2-Cuffdiff2 pipeline (default parameter setting), I still get this warning:

Warning: couldn't find fasta record for 'GJ058256.1'!
This contig will not be bias corrected.
Warning: couldn't find fasta record for 'GJ058424.1'!
......



(1) What does it mean? Is it very trouble for my downstream analysis?

And if I ran Tophat2-Cufflinks/Cuffmerge-Cuffdiff2 ((default parameter setting)), I got the error:
Warning: couldn't find fasta record for 'GJ060129.1'!
This contig will not be bias corrected.
......

Error (GFaSeqGet): subsequence cannot be larger than 16338
Error getting subseq for TCONS_00062149 (1..16448)!


(2) Why could I get this result? How can I go on with Tophat2-Cufflinks/Cuffmerge-Cuffdiff2?


Thank you !

super0925 03-05-2015 05:41 AM

Cufflinks/Cuffmerge/Cuffdiff error 'subsequence cannot be larger than 16338'
 
Hi all
I'm doing DE analysis on cow samples.
I use the reference genome of Ensembl UMD3.1, which I thought is the latest version.
However, when I ran Tophat2-Cuffdiff2 pipeline (default parameter setting), I still get this warning:


Warning: couldn't find fasta record for 'GJ058256.1'!
This contig will not be bias corrected.
Warning: couldn't find fasta record for 'GJ058424.1'!
......



(1) What does it mean? Is it very trouble for my downstream analysis?

And if I ran Tophat2-Cufflinks/Cuffmerge-Cuffdiff2 ((default parameter setting)), I got the error:

Warning: couldn't find fasta record for 'GJ060129.1'!
This contig will not be bias corrected.

......

Error (GFaSeqGet): subsequence cannot be larger than 16338
Error getting subseq for TCONS_00062149 (1..16448)!


(2) Why could I get this result? How can I go on with Tophat2-Cufflinks/Cuffmerge-Cuffdiff2?


Thank you !

GenoMax 03-05-2015 06:10 AM

I don't have the cow iGenomes set but my guess is that "GJ058424.1" is in the GTF file but is not in the genome sequence file. It appears to be SH3YL1 gene now.

Someone else will need to comment on the other error.

super0925 03-05-2015 07:40 AM

Quote:

Originally Posted by GenoMax (Post 161668)
I don't have the cow iGenomes set but my guess is that "GJ058424.1" is in the GTF file but is not in the genome sequence file. It appears to be SH3YL1 gene now.

Someone else will need to comment on the other error.

Why do I get this warning in (1) and error in (2)?
Is warning in (1) very critical for downstream analysis?
How to solve it?
Cheers

super0925 05-12-2015 11:12 AM

Anyone could help?

GTF file are genes.gtf from UMD3.1

The first column at genes.gtf (I think it is chromosome) is
1
10
11
12
13
14
15
16
17
18
19
2
20
21
22
23
24
25
26
27
28
29
3
4
5
6
7
8
9
GJ058256.1
GJ058424.1
GJ058425.1
GJ058430.1
GJ058433.1
GJ058437.1
GJ058729.1
GJ059463.1
GJ059486.1
GJ059509.1
GJ059556.1
GJ059670.1
GJ060027.1
GJ060032.1
GJ060118.1
GJ060120.1
GJ060129.1
MT
X

super0925 05-12-2015 11:17 AM

Anyone could help?

GTF file are genes.gtf from UMD3.1

The first column at genes.gtf (I think it is chromosome) is
1
10
11
12
13
14
15
16
17
18
19
2
20
21
22
23
24
25
26
27
28
29
3
4
5
6
7
8
9
GJ058256.1
GJ058424.1
GJ058425.1
GJ058430.1
GJ058433.1
GJ058437.1
GJ058729.1
GJ059463.1
GJ059486.1
GJ059509.1
GJ059556.1
GJ059670.1
GJ060027.1
GJ060032.1
GJ060118.1
GJ060120.1
GJ060129.1
MT
X

GenoMax 05-12-2015 11:43 AM

Did you get these files from iGenomes or is this something you put together by getting files (seq, annotation etc) from individual sources?

super0925 05-12-2015 01:50 PM

Quote:

Originally Posted by GenoMax (Post 172310)
Did you get these files from iGenomes or is this something you put together by getting files (seq, annotation etc) from individual sources?

I downloaded from iGenome...
Do you mean my files is abnormal?

GenoMax 05-12-2015 02:02 PM

Quote:

Originally Posted by super0925 (Post 172323)
I downloaded from iGenome...
Do you mean my files is abnormal?

No. One of the reasons to get this data from iGenomes is it has (supposedly) been checked for consistency so the kind of thing you have run into does not happen. It is possible that you may have downloaded a flawed version that has since been fixed (you could download a new copy and compare).

I hesitate to recommend that you get sequences of missing fasta from NCBI and append them to your genome.fa file (you will likely need to re-index it again). But this may get you past one of the errors.

I am not sure how much work you have put into this already but if the new download from iGenomes does have these sequences then you could use that genome.fa file.

As for your second error this thread seems to have some options: https://www.biostars.org/p/57249/

karimhasanpur@yahoo.com 06-26-2015 01:35 AM

cufflinks warnings: could not find fasta records
 
Dar friends,

I am getting the same warnings. I have downloaded Galgal4 reference files from iGenome. When running cufflinks, I am getting "warning: couldn't find fasta record for LGE64 ...". I think these are contigs that are present in genes.gtf but not in the genome.fasta. My question is: could these warnings affect my downstream analyses? If so, what should I do to resolve these problem?

any comment would be appreciated
Karim


All times are GMT -8. The time now is 11:05 PM.

Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.