Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • GSMapper trimming

    Hello everyone,

    I've some SNP containing sequences, obtained some months ago from a fish species, and now we've obtain a genome close to our fish. I'm using GSmapper to map them to that genome, but I don't known why the program deletes 6 of our 15 sequences from the analysis. I've specified that I don't want a trimming step so I can't understand why the program is doing this. The documentation didn't help neither.

    Is a very silly question, but I can't find the solution. Any help of experienced people?

    Thanks in advance!

  • #2
    Hi Peitx,

    Sequences may be of low quality and/or small in length (<20 bp dufault). It is not necessary all sequences will be used for mapping to genome.

    Regards,

    Comment


    • #3
      Have a look at 454ReadStatus.txt

      Read Mapping Mapped % of Read Ref Ref Ref
      Accno Status Accuracy(%) Mapped Accno Start Stop Strand
      G5FF2WU01DTSD6 Full 95 100 chr2 227896723 227896852 -
      G5FF2WU01CKAXT Full 97 100 chr10 73453619 73453688 +
      G5FF2WU01BP3ZV Full 98 100 chr12 48373154 48373213 +
      G5FF2WU01CMIB1 Full 99 100 chr14 76948381 76948530 -
      G5FF2WU01ARMHW TooShort
      G5FF2WU01EVYYN Repeat
      G5FF2WU01EL8WA Repeat
      [...]


      It should at least answer your question why your reads are not mapped.

      cheers,
      Sven

      Comment


      • #4
        Thanks both for the reply

        Ketan, I'm sure that that the length is more than 20bp (minimum is 146, like you can see below). I dont have quality scores, but the seems that this is not a problem, because it accept the sequence (and I'm not interested in variant calling)


        Accno Trimpoints Used Used Trimmed Length Orig Trimpoints Orig Trimmed Length Raw Length
        ADR_F. 1-146 146 1-146 146 146
        ATROP_F. 151-151 1 1-341 341 341
        CITOCHROME-C_F. 105-105 1 1-280 280 280
        CITRATO3. 42-42 1 1-645 645 645
        CITRATO5. 43-43 1 1-551 551 551
        GNRH3-1_F. 1-197 197 1-197 197 197
        HGFL_R. 1-558 558 1-558 558 558
        HIF2-3_F. 20-20 1 1-575 575 575
        INTFGP_F. 1-307 307 1-307 307 307
        INTRAOPCO2_F. 1-306 306 1-306 306 306
        L12_F. 1-551 551 1-551 551 551
        LACDB_F. 37-37 1 1-368 368 368
        LYS2_F. 1-591 591 1-591 591 591
        MTF_F. 1-636 636 1-636 636 636
        S7-2_F. 1-605 605 1-605 605 605

        I've check the position where the trim is executed, and in some cases I've found IUPAC nucleotide (i.e. Y). In another sequences the problem is a N nucleotide. The fact is that in some sequences the reason is one and other the other, so I can't obtain a final razon. I've been finding this issues in the documentation, but without success...

        Skiages, this is my file:

        Read Mapping Mapped % of Read Ref Ref Ref
        Accno Status Accuracy(%) Mapped Accno Start Stop Strand
        ADR_F. Unmapped
        GNRH3-1_F. Unmapped
        HGFL_R. Repeat
        INTFGP_F. Unmapped
        INTRAOPCO2_F. Unmapped
        L12_F. Partial 94 99 clc_genomicrefv1_contig102970 4336 4884 +
        LYS2_F. Unmapped
        MTF_F. Unmapped
        S7-2_F. Full 94 100 clc_genomicrefv1_contig88520 6032 6633 +

        Like you can see, most of the reads are unmapped, but my problem is that some reads are trimmed, and without knowing why this is a problem.

        I've try to map using only 40 bp up and downstream the SNP (to avoid IUPAC nucleotides and to check for different mapping) and I've find differencies:

        DR_F. Unmapped
        ATROP_F. Unmapped
        CITOCHROME-C_F. Full 93 100 clc_genomicrefv1_contig152775 2837 2917 +
        CITRATO3. Unmapped
        CITRATO5. Unmapped
        GNRH3-1_F. Unmapped
        HGFL_R. Unmapped
        HIF2-3_F. Unmapped
        INTFGP_F. Unmapped
        INTRAOPCO2_F. Unmapped
        L12_F. Full 99 100 clc_genomicrefv1_contig102970 4717 4797 +
        LACDB_F. Unmapped
        LYS2_F. Unmapped
        MTF_F. Unmapped
        S7-2_F. Full 96 100 clc_genomicrefv1_contig88520 6304 6384 +

        Now all the sequences are accepted and I obtain another sequence! Do you know what is happening? I known that the sequence are too long in the first case, but I also thought that the mapper will "split" the sequences into smaller parts, using the seed value. I'm wrong? this will definitively clear up some of my doubts...

        Thanks for helping in this silly questions, I'm new in this field and I want to learn

        Comment


        • #5
          These are pre-assembled contigs, not reads. gsMapper won't split the large sequences in smaller chunks.
          What are you mapping against? Finished (contigous) or draft (multi contigs). Why don't you map your reads directly against your reference genome instead of preassembling and mapping afterwards?

          Maybe you should give blast or blat a try (you don't have too many contigs) for mapping/positioning your contigs on your reference.

          my 2p,
          Sven

          Comment


          • #6
            Sorry for this misinformation sklages.

            What I'm trying to map are sanger sequences, not reads of NGS, to a draft genome (contructed by hiseq sequencing + assembly). So, as I suspected, the the reads are too long to mapping and definitively are not splited. Now my approximation of using the 40 bp up and downstream make more sense.

            I'll try also blat, but I've to install it and I've no experience with it. Do you think is worth after doing the 80bp approximation, taking into account that my only objective is to identify if my sequences are in the reference genome?

            You can give me your address to send you some cookies for the help? :P

            Comment


            • #7
              OK, sanger-reads on CLC-assembled contigs ... as you don't have any NGS reads, there is no need to use gsMapper. 'Blast'/'Blast+' should do the job for your handful of sequences; have a look at NCBI's software archive. You could also use 'blat' (have alook at UCSC) or even CLC Genomics WB, if you have access to that software (which is commercial).

              Do you have a usable N50 size of your genome assembly?

              cheers, Sven

              Comment

              Latest Articles

              Collapse

              • seqadmin
                Essential Discoveries and Tools in Epitranscriptomics
                by seqadmin




                The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
                04-22-2024, 07:01 AM
              • seqadmin
                Current Approaches to Protein Sequencing
                by seqadmin


                Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                04-04-2024, 04:25 PM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by seqadmin, Today, 08:47 AM
              0 responses
              12 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 04-11-2024, 12:08 PM
              0 responses
              60 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 04-10-2024, 10:19 PM
              0 responses
              59 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 04-10-2024, 09:21 AM
              0 responses
              54 views
              0 likes
              Last Post seqadmin  
              Working...
              X