Seqanswers Leaderboard Ad

**ketan_bnf** · 10-07-2011, 10:15 PM

Hi Peitx,

Sequences may be of low quality and/or small in length (<20 bp dufault). It is not necessary all sequences will be used for mapping to genome.

Regards,

**sklages** · 10-09-2011, 07:43 AM

Have a look at 454ReadStatus.txt

Read Mapping Mapped % of Read Ref Ref Ref
Accno Status Accuracy(%) Mapped Accno Start Stop Strand
G5FF2WU01DTSD6 Full 95 100 chr2 227896723 227896852 -
G5FF2WU01CKAXT Full 97 100 chr10 73453619 73453688 +
G5FF2WU01BP3ZV Full 98 100 chr12 48373154 48373213 +
G5FF2WU01CMIB1 Full 99 100 chr14 76948381 76948530 -
G5FF2WU01ARMHW TooShort
G5FF2WU01EVYYN Repeat
G5FF2WU01EL8WA Repeat
[...]

It should at least answer your question why your reads are not mapped.

cheers,
Sven

**Peitx** · 10-10-2011, 10:44 AM

Thanks both for the reply

Ketan, I'm sure that that the length is more than 20bp (minimum is 146, like you can see below). I dont have quality scores, but the seems that this is not a problem, because it accept the sequence (and I'm not interested in variant calling)

Accno Trimpoints Used Used Trimmed Length Orig Trimpoints Orig Trimmed Length Raw Length
ADR_F. 1-146 146 1-146 146 146
ATROP_F. 151-151 1 1-341 341 341
CITOCHROME-C_F. 105-105 1 1-280 280 280
CITRATO3. 42-42 1 1-645 645 645
CITRATO5. 43-43 1 1-551 551 551
GNRH3-1_F. 1-197 197 1-197 197 197
HGFL_R. 1-558 558 1-558 558 558
HIF2-3_F. 20-20 1 1-575 575 575
INTFGP_F. 1-307 307 1-307 307 307
INTRAOPCO2_F. 1-306 306 1-306 306 306
L12_F. 1-551 551 1-551 551 551
LACDB_F. 37-37 1 1-368 368 368
LYS2_F. 1-591 591 1-591 591 591
MTF_F. 1-636 636 1-636 636 636
S7-2_F. 1-605 605 1-605 605 605

I've check the position where the trim is executed, and in some cases I've found IUPAC nucleotide (i.e. Y). In another sequences the problem is a N nucleotide. The fact is that in some sequences the reason is one and other the other, so I can't obtain a final razon. I've been finding this issues in the documentation, but without success...

Skiages, this is my file:

Read Mapping Mapped % of Read Ref Ref Ref
Accno Status Accuracy(%) Mapped Accno Start Stop Strand
ADR_F. Unmapped
GNRH3-1_F. Unmapped
HGFL_R. Repeat
INTFGP_F. Unmapped
INTRAOPCO2_F. Unmapped
L12_F. Partial 94 99 clc_genomicrefv1_contig102970 4336 4884 +
LYS2_F. Unmapped
MTF_F. Unmapped
S7-2_F. Full 94 100 clc_genomicrefv1_contig88520 6032 6633 +

Like you can see, most of the reads are unmapped, but my problem is that some reads are trimmed, and without knowing why this is a problem.

I've try to map using only 40 bp up and downstream the SNP (to avoid IUPAC nucleotides and to check for different mapping) and I've find differencies:

DR_F. Unmapped
ATROP_F. Unmapped
CITOCHROME-C_F. Full 93 100 clc_genomicrefv1_contig152775 2837 2917 +
CITRATO3. Unmapped
CITRATO5. Unmapped
GNRH3-1_F. Unmapped
HGFL_R. Unmapped
HIF2-3_F. Unmapped
INTFGP_F. Unmapped
INTRAOPCO2_F. Unmapped
L12_F. Full 99 100 clc_genomicrefv1_contig102970 4717 4797 +
LACDB_F. Unmapped
LYS2_F. Unmapped
MTF_F. Unmapped
S7-2_F. Full 96 100 clc_genomicrefv1_contig88520 6304 6384 +

Now all the sequences are accepted and I obtain another sequence! Do you know what is happening? I known that the sequence are too long in the first case, but I also thought that the mapper will "split" the sequences into smaller parts, using the seed value. I'm wrong? this will definitively clear up some of my doubts...

Thanks for helping in this silly questions, I'm new in this field and I want to learn

**sklages** · 10-10-2011, 10:58 AM

These are pre-assembled contigs, not reads. gsMapper won't split the large sequences in smaller chunks.
What are you mapping against? Finished (contigous) or draft (multi contigs). Why don't you map your reads directly against your reference genome instead of preassembling and mapping afterwards?

Maybe you should give blast or blat a try (you don't have too many contigs) for mapping/positioning your contigs on your reference.

my 2p,
Sven

**Peitx** · 10-10-2011, 11:14 AM

Sorry for this misinformation sklages.

What I'm trying to map are sanger sequences, not reads of NGS, to a draft genome (contructed by hiseq sequencing + assembly). So, as I suspected, the the reads are too long to mapping and definitively are not splited. Now my approximation of using the 40 bp up and downstream make more sense.

I'll try also blat, but I've to install it and I've no experience with it. Do you think is worth after doing the 80bp approximation, taking into account that my only objective is to identify if my sequences are in the reference genome?

You can give me your address to send you some cookies for the help? :P

**sklages** · 10-10-2011, 11:23 AM

OK, sanger-reads on CLC-assembled contigs ... as you don't have any NGS reads, there is no need to use gsMapper. 'Blast'/'Blast+' should do the job for your handful of sequences; have a look at NCBI's software archive. You could also use 'blat' (have alook at UCSC) or even CLC Genomics WB, if you have access to that software (which is commercial).

Do you have a usable N50 size of your genome assembly?

cheers, Sven

Topics	Statistics	Last Post
Genetic Variants and Diabetes Risk in Childhood Cancer Survivors by seqadmin Started by seqadmin, Today, 08:47 AM	0 responses 12 views 0 likes	Last Post by seqadmin Today, 08:47 AM
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 60 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 59 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 54 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM

Seqanswers Leaderboard Ad

Announcement

GSMapper trimming

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News