Hi,
I could finally run cuffmerge but I realized that me and a colleague of mine got some
annoying skipping with human samples:
chr21:38435145-45760353 Warning: Skipping large bundle.
chr6:126102278-130463972 Warning: Skipping large bundle.
Looking at the genome browser, I see a lot of genes in these regions. Do I have to assume now that cuffmerge has produced one single geneID, spanning the whole region, when in fact I have a lot of reads distributed for many smaller gene entries?
I naively thought to fix this with increasing the --max-bundle-length by adding the option in the python script for cuffmerge:
But this resulted in this error at the level of cuffcompare:
Error: duplicate GFF ID 'ENST00000506472' encountered!
[FAILED]
So I wonder if
first it is a good idea to use the merge.gtf at all, given that you would either skip whole chromosome regions or potentially get huge merged gene entries.
And second how I could run the script with the --max-bundle-length option ?
Thanks,
Marc
I could finally run cuffmerge but I realized that me and a colleague of mine got some
annoying skipping with human samples:
chr21:38435145-45760353 Warning: Skipping large bundle.
chr6:126102278-130463972 Warning: Skipping large bundle.
Looking at the genome browser, I see a lot of genes in these regions. Do I have to assume now that cuffmerge has produced one single geneID, spanning the whole region, when in fact I have a lot of reads distributed for many smaller gene entries?
I naively thought to fix this with increasing the --max-bundle-length by adding the option in the python script for cuffmerge:
Code:
def cufflinks(out_dir, sam_file, min_isoform_frac, gtf_file=None, extra_opts=["-q", "--overhang-tolerance", "200", "--library-type=transfrags", "-A","0.0", "--min-frags-per-transfrag", "0", "--no-5-extend", "--max-bundle-length", "9925208"], lsf=False, curr_queue=None):
Error: duplicate GFF ID 'ENST00000506472' encountered!
[FAILED]
So I wonder if
first it is a good idea to use the merge.gtf at all, given that you would either skip whole chromosome regions or potentially get huge merged gene entries.
And second how I could run the script with the --max-bundle-length option ?
Thanks,
Marc
Comment