Seqanswers Leaderboard Ad

**kmcarr** · 01-30-2013, 12:21 PM

Hi Veronica,

I'm sorry to say that I don't have a good, easy method to identify chimeras in de novo assembled putative transcripts. To be honest, normally I acknowledge that it is likely there will be chimeras but don't do anything to identify them.

Here are some theoretical methods:

Use BLASTX alignment of a reference protein set and examine results to see if multiple proteins align to different segments of the putative transcript.

Use ORF prediction software on the putative transcripts. If multiple large ORFs are identified, BLAST the translated protein sequences to test if all of them are consistent (i.e. the multiple ORFs in different frames on the same strand may result from frame shifts introduced by misassembly).

Align the original RNA-Seq reads to your putative transcripts and examine how even the depth of coverage is across the length of the transcript. A contig which has dramatically different coverage at one end vs. the other, or if the two ends have deep coverage separated by a region of very shallow coverage between them may be a chimera.

**jordi** · 02-13-2014, 05:13 AM

Hi all!
I am now dealing with this issue. I mean, how to determine chimeras in a RNA-Seq assembly without a reference transcriptome.
I've read that we are not able to tackle chimeras from different genes without a reference. Instead of this, self-chimeras could be detected with repeated regions in the same contig.
However a sudden change in the coverage in a certain contig sequence could aid to estimate the number of chimeras in an assembly project. Here is my question: given a coverage of a transcript, how to set a threshold to determine that a change in the coverage could points to a chimera? According to kmcarr comment: what is a " dramatically different coverage "?? And how determining it??
Thank you very much for your help!!

**martin2** · 03-04-2014, 06:26 AM

Originally posted by kmcarr View Post

I'm sorry to say that I don't have a good, easy method to identify chimeras in de novo assembled putative transcripts. To be honest, normally I acknowledge that it is likely there will be chimeras but don't do anything to identify them.

Here are some theoretical methods:

Use BLASTX alignment of a reference protein set and examine results to see if multiple proteins align to different segments of the putative transcript.

Use ORF prediction software on the putative transcripts. If multiple large ORFs are identified, BLAST the translated protein sequences to test if all of them are consistent (i.e. the multiple ORFs in different frames on the same strand may result from frame shifts introduced by misassembly).

The most important check is whether you have full-length matches. Often, an N/C-terminus will be placed on a different contig/scaffold compared to the core of protein. In diploid/polypoloid organisms due to sequencing errors you won't even find a definite answer whether a fragment of a transcript originated from locus 1 or 2 or 3, provided they all have 95-100% identity (and they do at least in some places, thanks to the recent whole-genome duplication events). There are many cases like this. This is one of the reasons why I always say that using NGS one can never, ever, get a correct answer in case of alternatively spliced genes. Unless we sequence a transcript as a whole pice, it is all just a guesswork. A short, 80nt long overlap between two reads does not justify for a conclusion that exon C and D are present in a same transscript. Assembler will always propose that A-B-C-D-E-F-G are in a transcript but hardly ever reveal that actually only A-B-E-F and A-B-C-F-G are expressed. With high coverages the situation could be more optimistic but here it depends on the number of biological and lab replicates, not just on a number of emPCR droplets or clusters derived from same PCR experiment. Although instructing an assembler to watch uniformity of coverage is cheating a bit I believe it helps at least in some cases.

Second, important check is for seemingly new exon extensions or truncations, and for seemingly "unremoved" introns, just breaking a multiple sequence alignment of your favourite gene.

Originally posted by kmcarr View Post

Align the original RNA-Seq reads to your putative transcripts and examine how even the depth of coverage is across the length of the transcript. A contig which has dramatically different coverage at one end vs. the other, or if the two ends have deep coverage separated by a region of very shallow coverage between them may be a chimera.

From my experience, the chimeras are even left in combined, shotgun + paired-end datasets, even in combined technologies, like Illumina+454. That is a nightmare for me. I have some idea why is that and what "requirements" need to be fullfilled so that they remain. Luckily, in other cases removal of chimeras results in longer contigs/scaffolds, less contig/scaffold counts, better N50/N90 numbers. But the numbers are not 10x better, you have to understand that if you ban one chimeric join you split 1 contig into 2, so the assembler is starting with much worse outlook initially and has to find completely different assembly paths. Once you accept the situation, it is pleasing that in the end one receives a bit better assembly in terms of these semi-usefull numbers. But the scaffolds/contigs are different.

Depends what lab protocol you have used to obtain the data but maybe you would appreciate a commercial service from me? I can properly trim datasets from some complex protocols, with almost no overtrimming and no misses. See http://www.bioinformatics.cz/softwar...rted-protocols . Although I developed that for 454-based datasets I could help with data from some other technologies. Depends.

**JackieBadger** · 03-04-2014, 07:06 AM

MIRA assembler can detect chimeras I believe

Topics	Statistics	Last Post
TIGR Systems Offer a Compact Alternative to CRISPR for Gene Editing by seqadmin Started by seqadmin, 03-03-2025, 01:15 PM	0 responses 162 views 0 likes	Last Post by seqadmin 03-03-2025, 01:15 PM
Highlights from AGBT 2025 – Part II by seqadmin Started by seqadmin, 02-28-2025, 12:58 PM	0 responses 251 views 0 likes	Last Post by seqadmin 02-28-2025, 12:58 PM
Highlights from AGBT 2025 – Part I by seqadmin Started by seqadmin, 02-24-2025, 02:48 PM	0 responses 625 views 0 likes	Last Post by seqadmin 02-24-2025, 02:48 PM
Selecting the Right AI Model for Bioinformatics Research by seqadmin Started by seqadmin, 02-21-2025, 02:46 PM	0 responses 265 views 0 likes	Last Post by seqadmin 02-21-2025, 02:46 PM

Seqanswers Leaderboard Ad

Announcement

How to determine chimeras in my de novo assembly?

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News