Seqanswers Leaderboard Ad

**pardonliang** · 02-29-2012, 10:47 PM

Hi gfmgfm,
I have the same problem with.By now I haven't get the answer.Can you tell me the result you deal with this problem.Thank you!

**sklages** · 03-01-2012, 01:53 AM

You might want to have a look at cd-hit (http://code.google.com/p/cdhit/) to remove some redundancy.

**pardonliang** · 03-01-2012, 11:15 PM

Thank you for your advise.

Originally posted by sklages View Post

You might want to have a look at cd-hit (http://code.google.com/p/cdhit/) to remove some redundancy.

These day I use the software called Trinty to denovo assemle transcriptome which have 25804627 reads-paired with 90bp.I have got 67683 est by Trinty.But when I use TGICL to cluster them with default parameter.I just got 205 cluster and 67247singleton.There was so little est to be cluster.I think there are some problem.But I haven't got the idea.Did someone can give me some advise.

**pardonliang** · 03-01-2012, 11:16 PM

TGICL for denovo transcriptome

These day I use the software called Trinty to denovo assemle transcriptome which have 25804627 reads-paired with 90bp.I have got 67683 est by Trinty.But when I use TGICL to cluster them with default parameter.I just got 205 cluster and 67247singleton.There was so little est to be cluster.I think there are some problem.But I haven't got the idea.Did someone can give me some advise.

**arvid** · 03-01-2012, 11:46 PM

Trinity is already geared towards low redundancy, that's why you won't gain much by clustering the contigs with cd-hit-est afterwards - the numbers you gave sound reasonable.
67 k transcript contigs sounds reasonable for a sample from a heterozygous eukaryote. How many of your reads multi-map? Did you try to discard low-support contigs by checking the RSEM support (see the Trinity website for details)?

**pardonliang** · 03-08-2012, 09:34 PM

Originally posted by arvid View Post

Trinity is already geared towards low redundancy, that's why you won't gain much by clustering the contigs with cd-hit-est afterwards - the numbers you gave sound reasonable.
67 k transcript contigs sounds reasonable for a sample from a heterozygous eukaryote. How many of your reads multi-map? Did you try to discard low-support contigs by checking the RSEM support (see the Trinity website for details)?

Thanks for your advise.I use 25668103paired-reads to map the reference which clustered by tgicl.There were 20387812(79.43%)paired-reads and 3690778(7.19%)single-reads could map to the reference.Did you think the result was reasonable.I had a question why I should discard low-support contigs.Did the low-support contig affect the expression of the gene analysis.

**arvid** · 03-08-2012, 11:50 PM

Originally posted by pardonliang View Post

Thanks for your advise.I use 25668103paired-reads to map the reference which clustered by tgicl.There were 20387812(79.43%)paired-reads and 3690778(7.19%)single-reads could map to the reference.Did you think the result was reasonable.I had a question why I should discard low-support contigs.Did the low-support contig affect the expression of the gene analysis.

If I understand your numbers, you're saying that ~80 % of your paired reads map (as pairs), and ~7 % map as singles. If so, I think that is very reasonable. You might want to try to scaffold your contigs with the 7 % of the reads that only maps as singles, however you risk to get more chimeras, and I guess the benefit for expression analysis is marginal.

If you have contigs with low read support it shouldn't interfer with the expression analysis, but slow it down (and slow other downstream analysis down). I usually discard contigs to which RSEM assigns no reads.

**pardonliang** · 03-15-2012, 08:18 AM

cluster of different transcriptome

Thanks arvid.But I have other question.I have sequenced two transcriptome of same species.Susceptible I use Trinty to assemble transcriptome of two species espectively.I want to know the differentially expressed transcripts between susceptible and resistance species.But I used cluster software to cluster them which used to mapping reads to same reference.But there were so little assembled EST which could cluster.So I used the assembled contigs of susceptible and resistance species as reference for differentially expressed transcripts analysis.But I found a interesting result.I used the assembled contigs of susceptible species as reference.There were 211 up expressed transcripts.But for resistance species,there were 3021 up expressed transcripts.I didn't know what I can do to get a correct up expressed transcripts.

**arvid** · 03-15-2012, 11:59 PM

Originally posted by pardonliang View Post

Thanks arvid.But I have other question.I have sequenced two transcriptome of same species.Susceptible I use Trinty to assemble transcriptome of two species espectively.I want to know the differentially expressed transcripts between susceptible and resistance species.But I used cluster software to cluster them which used to mapping reads to same reference.But there were so little assembled EST which could cluster.So I used the assembled contigs of susceptible and resistance species as reference for differentially expressed transcripts analysis.But I found a interesting result.I used the assembled contigs of susceptible species as reference.There were 211 up expressed transcripts.But for resistance species,there were 3021 up expressed transcripts.I didn't know what I can do to get a correct up expressed transcripts.

Now it is much clearer what you are trying to achieve - previously you didn't mention that you have samples from genetically diverging material (if you are talking about the same experiment). IMHO "correct up expressed transcripts" in that context will be very difficult to define, unless you have useful prior information from both strains/ecotypes/species (your information here is not clear, first you say two of the same species, then you say resistant and susceptible species).
If your strains/ecotypes/species are very closely related (only small indels and SNPs) you might be fine using one of the transcriptomes as reference like you did, provided that your alignment allows for such sequence variation (still, I would carefully study the alignments from both samples on transcripts with DE calls). If they are not closely related this changes everything, please let us know.
The number of differentially expressed transcripts is not relevant to the interest of the experiment and can not be judged unless you give an exact description on how you came up with that number: the software and statistics used, the amount and type of replicates, the biological system you are working on, and the way the samples were collected. On a genome-wide level, it is easy to find ~3000 differentially expressed transcripts - the more important question is to find out which of them are actually differentially expressed due to biological reasons and are interesting to study further. That might be all, 1000, 100, 10, 1 or none of them.

**pardonliang** · 03-18-2012, 08:17 AM

I'm sorry for my unclear description

I’m sorry for my unclear description.For example,I have two Drosophila melanogaster species.The susceptible species have been breeded in lab without insecticide for several years.The resistance species were survived by high concentration insecticide.These two species were sequenced by solexa.I wanted to know the up expressed transcripts.I used SOAPaligner to map the reads to the reference and to calculate the Unigene expression uses RPKM method.The formula is shown below.
RPKM=1000000*C/(NL/1000).
Set RPKM to be the expression of Unigene A, and C to be number of reads that uniquely aligned to Unigene A, N to be total number of reads that uniquely aligned to all Unigenes, and L to be the base number in the CDS of Unigene A . The RPKM method is able to eliminate the influence of different gene length and sequencing level on the calculation of gene expression. Therefore the calculated gene expression can be directly used for comparing the difference of gene expression between samples.

I have used susceptible、resistance and susceptible and resistance assebled transcriptomes as reference to compare the number of paired-maping reads、singled-mapping reads、up expressed transcripts.The result of comparison was show below.

As the result,I wanted used susceptible and resistance assebled transcriptomes as reference to obtain the up and down expressed transcripts because the number of total mapping reads and diference expressed transcrits were most.I didn't know the choice whether had problem or not.Thanks arvid.[/QUOTE]

**arvid** · 03-19-2012, 12:21 AM

I'm no fly researcher, so I can't judge your choices based on the information you gave above. How divergent are your Drosophila melanogaster strains from the publically available sequence references at FlyBase? I guess you would come further in resolving common transcripts a with a genome reference-based mapping and assembly approach (e.g. TopHat-Cufflinks). In your case I would definately try that in addition to the de novo assembly approach you did, or at least map the assembled transcripts to the reference genome.
I just see an image with a :-( smiley and something written in Chinese (I guess this is an error from a image host?), and since you didn't say how you compared your numbers or how you replicated your samples (or if you used any statistics), there is no way to tell whether your comparison method is sound.
In any case, you need to examine your candidate differentially expressed transcripts for sequence variants between the samples!

**pardonliang** · 03-19-2012, 12:43 AM

Thank you for your advice.

Thank you your arvid adivce.I have uploaded the picture again,I hoped you will see it.I Because the data have not submitted to the magazine.So I just taked Drosophila melanogaster as example.The species which I researched was a agricultural insect which haven't reference genomes.Thank you so much for your valuable advice.

**arvid** · 03-19-2012, 12:45 AM

Originally posted by pardonliang View Post

Thank you your arvid adivce.I have uploaded the picture again,I hoped you will see it.I Because the data have not submitted to the magazine.So I just taked Drosophila melanogaster as example.The species which I researched was a agricultural insect which haven't reference genomes.Thank you so much for your valuable advice.

There is still no image. And please, please stop providing incorrect information about your experiment - just say up front that you can't talk about the details. I guess the best thing you can do at the moment is to assemble the transcriptomes together and analyze the differential expression of the transcripts that have good read support from both samples, but bear in mind that your analysis can't be really quantitative.

**pardonliang** · 03-20-2012, 04:22 PM

I'm so sorry for my behavior.
My original intention just wanted to what happened to my data.I'm sorry for the link of my image was not unreachable.SO I felt so sorry to arvid.I didn't want to waste your time and your patience.
I have used susceptible、resistance 、susceptible and resistance assebled transcriptomes as reference to compare the number of paired-maping reads、singled-mapping reads、up expressed transcripts.The result of stored in Attach Files.
According to your question(the software and statistics used, the amount and type of replicates, the biological system you are working on, and the way the samples were collected).I haven't used statistics、replicateds yet.The susceptible species have been breeded in lab without insecticide for ten years.The resistance species were collected in field and were selected by high concentration of one insecticide.Finally,the susceptible and the remain alived insects by insecticide selection ,which both of them was collected in same stage of growth period,was used to sequenced.
Thanks so much for arvid's advices.
.

Originally posted by arvid View Post

There is still no image. And please, please stop providing incorrect information about your experiment - just say up front that you can't talk about the details. I guess the best thing you can do at the moment is to assemble the transcriptomes together and analyze the differential expression of the transcripts that have good read support from both samples, but bear in mind that your analysis can't be really quantitative.

Attached Files

22.jpg (91.9 KB, 50 views)

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 27 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 30 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 26 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 52 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

TGICL for denovo transcriptome

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News