![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Please Help: What is the differences between standard trimming and adaptive trimming | byou678 | Bioinformatics | 8 | 08-22-2011 01:05 PM |
gsMapper problem | pcg | 454 Pyrosequencing | 2 | 12-06-2010 02:29 AM |
gsMapper contigs | haonmada | 454 Pyrosequencing | 1 | 01-22-2010 12:25 PM |
Roche's gsMapper | Layla | 454 Pyrosequencing | 6 | 09-16-2009 01:22 PM |
gsMapper issues | mjleaks | 454 Pyrosequencing | 1 | 05-12-2009 07:13 AM |
![]() |
|
Thread Tools |
![]() |
#1 |
Junior Member
Location: Spain Join Date: Jan 2011
Posts: 3
|
![]()
Hello everyone,
I've some SNP containing sequences, obtained some months ago from a fish species, and now we've obtain a genome close to our fish. I'm using GSmapper to map them to that genome, but I don't known why the program deletes 6 of our 15 sequences from the analysis. I've specified that I don't want a trimming step so I can't understand why the program is doing this. The documentation didn't help neither. Is a very silly question, but I can't find the solution. Any help of experienced people? ![]() Thanks in advance! |
![]() |
![]() |
![]() |
#2 |
Member
Location: India Join Date: Oct 2010
Posts: 59
|
![]()
Hi Peitx,
Sequences may be of low quality and/or small in length (<20 bp dufault). It is not necessary all sequences will be used for mapping to genome. Regards, |
![]() |
![]() |
![]() |
#3 |
Senior Member
Location: Berlin, DE Join Date: May 2008
Posts: 628
|
![]()
Have a look at 454ReadStatus.txt
Read Mapping Mapped % of Read Ref Ref Ref Accno Status Accuracy(%) Mapped Accno Start Stop Strand G5FF2WU01DTSD6 Full 95 100 chr2 227896723 227896852 - G5FF2WU01CKAXT Full 97 100 chr10 73453619 73453688 + G5FF2WU01BP3ZV Full 98 100 chr12 48373154 48373213 + G5FF2WU01CMIB1 Full 99 100 chr14 76948381 76948530 - G5FF2WU01ARMHW TooShort G5FF2WU01EVYYN Repeat G5FF2WU01EL8WA Repeat [...] It should at least answer your question why your reads are not mapped. cheers, Sven |
![]() |
![]() |
![]() |
#4 |
Junior Member
Location: Spain Join Date: Jan 2011
Posts: 3
|
![]()
Thanks both for the reply
![]() Ketan, I'm sure that that the length is more than 20bp (minimum is 146, like you can see below). I dont have quality scores, but the seems that this is not a problem, because it accept the sequence (and I'm not interested in variant calling) Accno Trimpoints Used Used Trimmed Length Orig Trimpoints Orig Trimmed Length Raw Length ADR_F. 1-146 146 1-146 146 146 ATROP_F. 151-151 1 1-341 341 341 CITOCHROME-C_F. 105-105 1 1-280 280 280 CITRATO3. 42-42 1 1-645 645 645 CITRATO5. 43-43 1 1-551 551 551 GNRH3-1_F. 1-197 197 1-197 197 197 HGFL_R. 1-558 558 1-558 558 558 HIF2-3_F. 20-20 1 1-575 575 575 INTFGP_F. 1-307 307 1-307 307 307 INTRAOPCO2_F. 1-306 306 1-306 306 306 L12_F. 1-551 551 1-551 551 551 LACDB_F. 37-37 1 1-368 368 368 LYS2_F. 1-591 591 1-591 591 591 MTF_F. 1-636 636 1-636 636 636 S7-2_F. 1-605 605 1-605 605 605 I've check the position where the trim is executed, and in some cases I've found IUPAC nucleotide (i.e. Y). In another sequences the problem is a N nucleotide. The fact is that in some sequences the reason is one and other the other, so I can't obtain a final razon. I've been finding this issues in the documentation, but without success... Skiages, this is my file: Read Mapping Mapped % of Read Ref Ref Ref Accno Status Accuracy(%) Mapped Accno Start Stop Strand ADR_F. Unmapped GNRH3-1_F. Unmapped HGFL_R. Repeat INTFGP_F. Unmapped INTRAOPCO2_F. Unmapped L12_F. Partial 94 99 clc_genomicrefv1_contig102970 4336 4884 + LYS2_F. Unmapped MTF_F. Unmapped S7-2_F. Full 94 100 clc_genomicrefv1_contig88520 6032 6633 + Like you can see, most of the reads are unmapped, but my problem is that some reads are trimmed, and without knowing why this is a problem. I've try to map using only 40 bp up and downstream the SNP (to avoid IUPAC nucleotides and to check for different mapping) and I've find differencies: DR_F. Unmapped ATROP_F. Unmapped CITOCHROME-C_F. Full 93 100 clc_genomicrefv1_contig152775 2837 2917 + CITRATO3. Unmapped CITRATO5. Unmapped GNRH3-1_F. Unmapped HGFL_R. Unmapped HIF2-3_F. Unmapped INTFGP_F. Unmapped INTRAOPCO2_F. Unmapped L12_F. Full 99 100 clc_genomicrefv1_contig102970 4717 4797 + LACDB_F. Unmapped LYS2_F. Unmapped MTF_F. Unmapped S7-2_F. Full 96 100 clc_genomicrefv1_contig88520 6304 6384 + Now all the sequences are accepted and I obtain another sequence! Do you know what is happening? I known that the sequence are too long in the first case, but I also thought that the mapper will "split" the sequences into smaller parts, using the seed value. I'm wrong? this will definitively clear up some of my doubts... Thanks for helping in this silly questions, I'm new in this field and I want to learn ![]() |
![]() |
![]() |
![]() |
#5 |
Senior Member
Location: Berlin, DE Join Date: May 2008
Posts: 628
|
![]()
These are pre-assembled contigs, not reads. gsMapper won't split the large sequences in smaller chunks.
What are you mapping against? Finished (contigous) or draft (multi contigs). Why don't you map your reads directly against your reference genome instead of preassembling and mapping afterwards? Maybe you should give blast or blat a try (you don't have too many contigs) for mapping/positioning your contigs on your reference. my 2p, Sven |
![]() |
![]() |
![]() |
#6 |
Junior Member
Location: Spain Join Date: Jan 2011
Posts: 3
|
![]()
Sorry for this misinformation sklages.
What I'm trying to map are sanger sequences, not reads of NGS, to a draft genome (contructed by hiseq sequencing + assembly). So, as I suspected, the the reads are too long to mapping and definitively are not splited. Now my approximation of using the 40 bp up and downstream make more sense. I'll try also blat, but I've to install it and I've no experience with it. Do you think is worth after doing the 80bp approximation, taking into account that my only objective is to identify if my sequences are in the reference genome? You can give me your address to send you some cookies for the help? :P |
![]() |
![]() |
![]() |
#7 |
Senior Member
Location: Berlin, DE Join Date: May 2008
Posts: 628
|
![]()
OK, sanger-reads on CLC-assembled contigs ... as you don't have any NGS reads, there is no need to use gsMapper. 'Blast'/'Blast+' should do the job for your handful of sequences; have a look at NCBI's software archive. You could also use 'blat' (have alook at UCSC) or even CLC Genomics WB, if you have access to that software (which is commercial).
Do you have a usable N50 size of your genome assembly? cheers, Sven |
![]() |
![]() |
![]() |
Thread Tools | |
|
|