Seqanswers Leaderboard Ad

**KAP** · 12-22-2011, 04:36 PM

Hi James,

Your post was curious, could you please tell us some more about what you're working with?
- What are your reads from, are they RNA-seq data or from genomic DNA?
- What version of cufflinks are you using (since you're getting an update error)?
- What sort of size genome do you expect, is it a very small? Eukaryotic?
- Do you have a reference annotation?

The reason I ask is because most of your output looks quite odd to me, even before the segmentation fault, so I wondered whether Cufflinks might not be appropriate for what you're trying to do. Compare your map mass and number of loci processed to mine... This is my output, so you can see what I mean (although it's not from the most up-to-date version of cufflinks either):

Code:

cufflinks: /usr/lib64/libz.so.1: no version information available (required by cufflinks)
You are using Cufflinks v1.1.0, which is the most recent release.
[15:21:00] Loading reference annotation.
[15:21:05] Inspecting reads and determining fragment length distribution.
> Processed 149777 loci.                       [*************************] 100%
> Map Properties:
>       Total Map Mass: 60566592.87
>       Read Type: 0bp single-end
>       Fragment Length Distribution: Truncated Gaussian (default)
>                     Default Mean: 200
>                  Default Std Dev: 80
[15:30:57] Assembling transcripts and estimating abundances.
> Processed 149777 loci.                       [*************************] 100%

**jawalsh2** · 12-22-2011, 06:55 PM

Hi,

I have RNA seq data and version 1.2.0 of cufflinks.

I work with sugarcane, it has a complex, large genome. S. officinarum has a monoploid genome of about 930 megabases, just slightly larger than Sorghum, at 760 mbp.

That being said, I have two choices for references, Sorghum, or I have a decent assembly from 42 million paired end reads from S. Officinarum, specifically LA Purple. Its N50 is around 950.

I have not been able to get decent assemblies from any of my other sequenced sugarcanes. I expect this is due to the high ploidy, resulting in pulling apart contigs because of haplotypes or something based on the differences between the many copies of homologous chromosomes.

I am hoping to get around that by using cufflinks to create contigs based on the alignment. I can get 70-80% of my reads aligned to a reference, so by creating contigs of a sort based on the alignment, I may be able to get a decent alignment which would greatly aid my work.

Thanks for your help!

**KAP** · 12-23-2011, 10:15 AM

Hi again,

First of all, a couple more questions

When you say you could not get good assemblies, do you mean using RNA-seq data or genomic data? If RNA-seq, do you think the poor assembly is due to alternative splicing isoforms?

I am still trying to get my head around what you're trying to do but it does not seem to be the standard use of Cufflinks. Cufflinks is really only good for 1 type of thing: using alignments of reads to define where the genes lie on your reference chromosome/contig. It will help you figure out which reads originate from the same gene where alternative splicing is occurring and the structure of your gene, but that is about all. It will output genomic coordinates but generally not contigs, and if you do get contigs, they will be based on the reference species genomic sequence, with mismatches between your reads and the genome assumed to be errors. What type of information are you hoping to obtain? Gene expression, gene structure, ?

Just noticed your original post and wondered whether you should use Tophat to align your reads anyway? Tophat will chop your reads into smaller chunks/segments (i think ~25-40 bp is default? check the manual) and then you can specify how many mismatches to allow per segment. You should have no problem getting as many mismatches as you want with this method, and you can change the size of the segments if you wish also. In doing this, TopHat will produce alignments with gaps for introns, which will result in the best output from Cufflinks in the long run.

Topics	Statistics	Last Post
Genetic Variants and Diabetes Risk in Childhood Cancer Survivors by seqadmin Started by seqadmin, Today, 08:47 AM	0 responses 11 views 0 likes	Last Post by seqadmin Today, 08:47 AM
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 60 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 59 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 54 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM

Seqanswers Leaderboard Ad

Announcement

cufflinks question

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News