Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Mosaik Aligning with Solexa Reads

    I've used Mosaik tools quite sucessfully in the past and I'm having an issue that I want to ask for help on....

    I have _sequence.txt files which were sent to me from another lab for analysis, so I don't have access to the raw data or QC data. The reads are from an enriched library (Nimblegen, I believe; Exons from a region of interest).

    For a given lane of solexa data (36 nt, not paired end), I have 11.6 M reads. I use MosaikBuild to create the dat file, and then I align these reads to an artificial sequence which represents all the exons from the enrichment region. my Aligner parameters are:

    MosaikAligner -in lane4.dat -ia chip.dat -out lane4.align -hs 15 -mm 3 -p 7 -a all -m all -mhp 100

    the resulting output is the conundrum ...... Why would I be losing almost 60% of the reads to a hash failure??? and 30 more to filtering ???? I'm losing 90% of my sequence in this step. I've tried several samples from 2 different solexa runs and gotten the same result.

    All thoughts and comments are welcome !!

    Jim

    *******************
    - Using the following alignment algorithm: all positions
    - Using the following alignment mode: aligning reads to all possible locations
    - Using a maximum mismatch threshold of 3
    - Using a hash size of 15
    - Using 7 processors
    - Setting hash position threshold to 100

    Hashing reference sequence:
    100%[==========================================================================================] 621,565.7 ref bases/s in 5 s

    - loading reference sequence... finished.

    Aligning read library (11573312):
    100%[==============================================================================================] 12,524.9 reads/s in 15:24

    Alignment statistics:
    ===================================
    # failed hash: 6818036 (58.9 %)
    # filtered out: 3537110 (30.6 %)
    # unique: 343500 ( 3.0 %)
    # non-unique: 874666 ( 7.6 %)
    ---------------------------------------------
    total: 11573312
    total aligned: 1218166 (10.5 %)

  • #2
    Did you try a different hash size? Do you have Solexa's Summary report (Eland's error values)?
    --
    bioinfosm

    Comment


    • #3
      i've played with Mosaik a bit and while perhaps already known, a hash failure basically means that you have no seeds/alignments at the hash length - lowering this will certainly get you less hash failures, but it seems there are larger issues here. the filtered out means that % that did pass the hash (aligned) have >3 errors at your read length.

      i'd try another aligner as well to make sure the reads aren't in bad shape.

      good luck

      Comment


      • #4
        JimC, I am having a similar problem, did you figure out why you were loosing most reads?
        Thanks!

        Comment


        • #5
          I have bacteria data for illumina in scarf format which i convert into fastq format but by using Mosaik build command it give an error like -
          parsing paired-end/mate-pair FASTQ files:
          ERROR: The number of qualities (127) do not match the number of bases (75) in HWUSI-EAS1688_9337_FC618BE_1_1_1112_15990#CGATGT/1.

          so an body can help me what is this error and how it can remove.

          Comment


          • #6
            Was this problem ever solved? I have converted solexa reads using Maq sol2sanger & it adds an extra quality to each read. So then of course Mosaik complains that no. of qualities (66) do not match no. of bases (65)

            Can anyone help?

            Thank you alig

            Comment

            Latest Articles

            Collapse

            • seqadmin
              Current Approaches to Protein Sequencing
              by seqadmin


              Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
              04-04-2024, 04:25 PM
            • seqadmin
              Strategies for Sequencing Challenging Samples
              by seqadmin


              Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
              03-22-2024, 06:39 AM

            ad_right_rmr

            Collapse

            News

            Collapse

            Topics Statistics Last Post
            Started by seqadmin, 04-11-2024, 12:08 PM
            0 responses
            23 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-10-2024, 10:19 PM
            0 responses
            24 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-10-2024, 09:21 AM
            0 responses
            20 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-04-2024, 09:00 AM
            0 responses
            52 views
            0 likes
            Last Post seqadmin  
            Working...
            X