Seqanswers Leaderboard Ad

**swbarnes2** · 05-12-2008, 03:05 PM

Try using something like velvet to align all the unaligned reads to each other, then BLAST those contigs against nr. If they are crummy reads, they won't align to each other.

We tested an in-house clone collection, and I found a fair bit of e.coli contamination. And I've found vector-looking things in microbial samples...stuff like that. If your reference has a biggish deletion compared to what you really sequenced, you might find it this way.

**acnoll** · 05-15-2008, 01:26 PM

Originally posted by bioinfosm View Post

Anyone looking into the No Match eland reads, or reads that come off solexa that are not mapped to the reference?
Any other kind of contamination control like eColi, etc?

I was looking into blat on the entire nt, but would love to hear what people are using.

sm

One approach that takes a while but exhaustively looks at all the NMs is to do a blat on the genome of interest to kick out gapped hits and take what is left and then blast to nr to find contaminants. I was thinking to then take the top couple contaminants and look at the matching hits to see if there is any overlap since maybe reads from the contaminant intersect with those mapped to the genome of interest. This might be most important for SNP calling.

**Mr. Gunn** · 05-16-2008, 03:11 PM

Here's a nice comparison of the various short-read aligners, including eland.

http://massgenomics.wordpress.com/20...nd-and-others/

**bioinfosm** · 05-21-2008, 01:18 PM

thanks for your inputs...

Edena and velvet - 2 de novo assemblers using short read data gave so different outputs!

Velvet gave 2 contigs that pointed to a fragment that was supposedly deleted out and should not have been sequenced

edena on the other hand gave 10 or so contigs 100-120 bp long, that align perfectly to the eColi K-12!

**zee** · 08-06-2008, 08:09 AM

Reads that aren't matched by Eland are interesting because we would suppose that they're not repeats because Eland reports the matches with multiple locations.
I would say that gaps in a read would probably be missed by Eland, so use a short read aligner that can find gaps on these reads. I've been using novoalign (www.novocraft.com) and it can find up to 7/8 gaps in a 36bp read matching to a reference sequence, and fast on large ones. I've even tested it on simulated data with mutation rates in excess of 15% and it still finds them. Use a very high threshold e.g. -t 200 to find potentially all permutations for your read.
I'd be interested to know how much more you may be able to match out of your Eland NM reads.

**kmay** · 08-06-2008, 09:10 AM

Just a note from my side:

As you know from other threads, we can map from 10bp onwards, with gaps and PMs. However, before tweaking the unmapped reads into the reference genome, look at viral genomes, vectors etc.
We found numerous perfect matches there. Especially when working on specific cell lines, check the history of that line, how it was immortalized etc. You´ll be surprised how many good old retroviral friends you find!

Cheers

Klaus

**Chipper** · 08-06-2008, 10:19 AM

Interresting note, have you looked also at if you can remap the retroviral sequences with mismatches to human and if it seems to be a source of background in alignments?

**kmay** · 08-07-2008, 06:33 AM

Chipper,

more on that with HEK cells and SV40 and Adenovirus is described in our paper

Klaus

**zee** · 08-07-2008, 08:25 PM

I just read the Sultan paper Kmay, nice work

However, I am a little confused because it says that reads were mapped with ELAND, " Illumina deep sequencing was used to generate 27-bp reads from replicate samples for each cell line. Reads were mapped to the human genome (hg18, NCBI build 36.1) using the Eland software, allowing up to two mismatches (see SOM). Of the total reads, 50% matched to unique genomic locations," (http://www.sciencemag.org/cgi/content/full/1160342/DC1)

And the actual read data is unavailable

. So I'm assuming that you'll used the proprietary genomatix mapper in a separate study?? Where can we get this read data?

**kmay** · 08-08-2008, 02:50 AM

zee,

you are right. The original data were mapped with ELAND. At those days our GMS was under development. Later we looked at the ELAND non mapped reads and ran those over the viral genomes with our GMS. The actual data reads are deposited at the GEO.

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 18 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 22 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 17 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 49 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

short reads missed by aligners

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News