Seqanswers Leaderboard Ad

**flxlex** · 07-14-2010, 11:55 PM

Isotigs are transcripts, build out of the contigs. Different isogroups within the same isogroup represent alternative splice variants. This makes the isogroup the equivalent of a gene.

Take this with a grain of salt, though, it is based on mining the contig graph for subgraphs (isogroups) and traversing all possible subgraphs (isotigs). We find, for example, small variations (SNPs, indels) generating almost identical isotigs. So, perhaps cluster the isotigs using CD-hit would help.

Visualizing the graph is a wish we all have.

**litali** · 07-15-2010, 12:14 AM

more about cDNA

Thanks alot, I have read your blog which explains in a very good way. Still, some questions are left:
1. In the file 454AllContigs, there are some "contigs" with one or a few nucleotides.
What are those "contigs"?
2. some isogroups include only contigs and not isotigs (the first 2 groups in our case), the short "contigs" from the previous question are also assigned to this isogroup. So what is this isogroup? it is all the same gene? different genes? why there are no isotigs?
3. In the file " 454 graph" there is the scaffold section, however, we had non-paired end sequencing, so what is the basis for this scaffold?
4. Which of the files are recommended for further analysis, such as blast? The 454Isotigs.fna ? The 454AllContigs.fna (and then how all the very short sequences should be treated?)

**cram** · 07-15-2010, 09:25 AM

1. In the file 454AllContigs, there are some "contigs" with one or a few nucleotides.
What are those "contigs"?

These very small contigs seem to be produced when Newbler has difficulty resolving the edges of real contigs. We often see these in very highly abundant transcripts, presumably because the number of sequencing errors is high enough to make Newbler think these are real variations. So if the edge of an exon look like:

...CATGCATGAAA
...CATGCATGAAA
...CATGCATGAAA
...CATGCATGAAAA
...CATGCATGAAAA

Newbler might consider that fourth 'A' in the last two reads to be a separate exon/contig.

2. some isogroups include only contigs and not isotigs (the first 2 groups in our case), the short "contigs" from the previous question are also assigned to this isogroup. So what is this isogroup? it is all the same gene? different genes? why there are no isotigs?

The isotigs are computed by traversing the contig graph, and Newbler has limits to how deep it will recurse when doing this. So if you have a bunch of these false contigs, it will eventually give up on trying to produce isotigs. You can try increasing the default limts, but in my experience even the max allowed values are not always sufficient.

Which of the files are recommended for further analysis, such as blast? The 454Isotigs.fna ? The 454AllContigs.fna (and then how all the very short sequences should be treated?)

Unfortunately, the only way to make sure your further analyses are using all your data is to take the 454Isotigs.fna plus the larger contigs from those isogroups where proper isotig formation failed.

**flxlex** · 07-16-2010, 12:24 AM

Originally posted by litali View Post

3. In the file " 454 graph" there is the scaffold section, however, we had non-paired end sequencing, so what is the basis for this scaffold?

Scaffolding is not really scaffolding here, just a description of the relation between the contigs and the isotigs. The same description is given in different ways in the 454IsotigsLayout.txt and 454Isotigs.txt files

**CHRYSES** · 09-10-2010, 01:38 AM

Originally posted by flxlex View Post

Different isogroups within the same isogroup represent alternative splice variants.

I guess you meant: Different "isotigs" within the same isogroup represent (...)

**flxlex** · 09-12-2010, 09:42 PM

Originally posted by CHRYSES View Post

I guess you meant: Different "isotigs" within the same isogroup represent (...)

Yep. Thanks...

**jordi** · 10-15-2010, 04:13 AM

Hi all!
I did a Newbler transcriptome assembly a year ago and it was very difficult to find some information about the process outcome (flxlex , thank you very much for your blog!). About this, I tried to know how many reads assembled, and I got different results depending the file I saw. For instance, according to 454AllContigs.fna 12310 reads were assembled in a sample identified by a MID tag (multiplexed) (I added all reads from the last column, numreads=), but I got such information in the 454NewblerMetrics.txt file:
numberAssembled = 6603;
numberPartial = 5359;
numberSingleton = 8674;
numberRepeat = 1101;
numberOutlier = 723;
Total reads = 22460
Which could be the reason for this discrepancy?
I did the assembly with the release 1.1.03.24 of Newbler.
Regards,

**westerman** · 10-15-2010, 09:40 AM

Originally posted by jordi View Post

Hi all!
Which could be the reason for this discrepancy?

I suspect that some of the reads are being split among the contigs. Such reads would be counted twice.

**poisson200** · 10-16-2010, 05:45 AM

mmmmm, Ponder

Hello.
I also want to make sure every possibly sequence is used in my further data analyses;

Originally posted by flxlex

"Isotigs are transcripts, build out of the contigs."

Originally posted by cram

"Unfortunately, the only way to make sure your further analyses are using all your data is to take the 454Isotigs.fna plus the larger contigs from those isogroups where proper isotig formation failed.

Originally posted by flxlex

CD-hit would help

Thanks flxlex, that program is a real help.

To clarify; would combining the Isotig.fna and the contigs.fna files into a single file and then running CD-hit give you a comprehensive, non-redundant set of transcripts from your 454 transcriptome for further analyses?

Are there are single reads anywhere else that are neither contigs nor isotigs but are still useful?

Thank you for any advice,

John.

**flxlex** · 10-16-2010, 06:11 AM

Originally posted by poisson200 View Post

To clarify; would combining the Isotig.fna and the contigs.fna files into a single file and then running CD-hit give you a comprehensive, non-redundant set of transcripts from your 454 transcriptome for further analyses?

Hmm, that could actually work, hadn't thought of that. I always thought of running CD-HIT per isogroup with some looping script. Taking all contigs and isotigs into a CD-HIT run might collapse paralogues, though...

Are there are single reads anywhere else that are neither contigs nor isotigs but are still useful?

Yep, but so far, newbler does not output them in a separate file. You can get the IDs of the singleton reads from the 454ReadStatus file. Further, check this post:

singeltons+contigs for 454 data - SEQanswers

http://seqanswers.com/forums/showthread.php?t=6929

Discussion of next-gen sequencing related bioinformatics: resources, algorithms, open source efforts, etc

**poisson200** · 10-16-2010, 06:34 AM

Hi flxlex,
Thanks for the quick reply and the answers.

Originally posted by flxlex

Taking all contigs and isotigs into a CD-HIT run might collapse paralogues, though...

Looking at CD-hit, by default it looks for 98% identity or greater, which I think should be stringent enough not to collapse any paralogs (paralogs would have to be from a very recent gene duplication event or from a CNV for that to happen) but it is a good point to bear in mind.

To correct; cdhit-est, for me, should be set to 0.98, which is 0.9 by default.

Thanks again,

John.

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 57 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 53 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 45 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 55 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

cDNA analysis 454 assembler

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News