Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • short reads missed by aligners

    Anyone looking into the No Match eland reads, or reads that come off solexa that are not mapped to the reference?
    Any other kind of contamination control like eColi, etc?

    I was looking into blat on the entire nt, but would love to hear what people are using.

    sm
    --
    bioinfosm

  • #2
    Try using something like velvet to align all the unaligned reads to each other, then BLAST those contigs against nr. If they are crummy reads, they won't align to each other.

    We tested an in-house clone collection, and I found a fair bit of e.coli contamination. And I've found vector-looking things in microbial samples...stuff like that. If your reference has a biggish deletion compared to what you really sequenced, you might find it this way.

    Comment


    • #3
      Originally posted by bioinfosm View Post
      Anyone looking into the No Match eland reads, or reads that come off solexa that are not mapped to the reference?
      Any other kind of contamination control like eColi, etc?

      I was looking into blat on the entire nt, but would love to hear what people are using.

      sm
      One approach that takes a while but exhaustively looks at all the NMs is to do a blat on the genome of interest to kick out gapped hits and take what is left and then blast to nr to find contaminants. I was thinking to then take the top couple contaminants and look at the matching hits to see if there is any overlap since maybe reads from the contaminant intersect with those mapped to the genome of interest. This might be most important for SNP calling.

      Comment


      • #4
        Here's a nice comparison of the various short-read aligners, including eland.

        http://massgenomics.wordpress.com/20...nd-and-others/

        Comment


        • #5
          thanks for your inputs...

          Edena and velvet - 2 de novo assemblers using short read data gave so different outputs!

          Velvet gave 2 contigs that pointed to a fragment that was supposedly deleted out and should not have been sequenced

          edena on the other hand gave 10 or so contigs 100-120 bp long, that align perfectly to the eColi K-12!
          --
          bioinfosm

          Comment


          • #6
            Reads that aren't matched by Eland are interesting because we would suppose that they're not repeats because Eland reports the matches with multiple locations.
            I would say that gaps in a read would probably be missed by Eland, so use a short read aligner that can find gaps on these reads. I've been using novoalign (www.novocraft.com) and it can find up to 7/8 gaps in a 36bp read matching to a reference sequence, and fast on large ones. I've even tested it on simulated data with mutation rates in excess of 15% and it still finds them. Use a very high threshold e.g. -t 200 to find potentially all permutations for your read.
            I'd be interested to know how much more you may be able to match out of your Eland NM reads.

            Comment


            • #7
              Just a note from my side:

              As you know from other threads, we can map from 10bp onwards, with gaps and PMs. However, before tweaking the unmapped reads into the reference genome, look at viral genomes, vectors etc.
              We found numerous perfect matches there. Especially when working on specific cell lines, check the history of that line, how it was immortalized etc. You´ll be surprised how many good old retroviral friends you find!

              Cheers

              Klaus

              Comment


              • #8
                Interresting note, have you looked also at if you can remap the retroviral sequences with mismatches to human and if it seems to be a source of background in alignments?

                Comment


                • #9
                  Chipper,

                  more on that with HEK cells and SV40 and Adenovirus is described in our paper

                  Klaus

                  Comment


                  • #10
                    I just read the Sultan paper Kmay, nice work

                    However, I am a little confused because it says that reads were mapped with ELAND, " Illumina deep sequencing was used to generate 27-bp reads from replicate samples for each cell line. Reads were mapped to the human genome (hg18, NCBI build 36.1) using the Eland software, allowing up to two mismatches (see SOM). Of the total reads, 50% matched to unique genomic locations," (http://www.sciencemag.org/cgi/content/full/1160342/DC1)

                    And the actual read data is unavailable . So I'm assuming that you'll used the proprietary genomatix mapper in a separate study?? Where can we get this read data?

                    Comment


                    • #11
                      zee,

                      you are right. The original data were mapped with ELAND. At those days our GMS was under development. Later we looked at the ELAND non mapped reads and ran those over the viral genomes with our GMS. The actual data reads are deposited at the GEO.

                      Comment

                      Latest Articles

                      Collapse

                      • seqadmin
                        Current Approaches to Protein Sequencing
                        by seqadmin


                        Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                        04-04-2024, 04:25 PM
                      • seqadmin
                        Strategies for Sequencing Challenging Samples
                        by seqadmin


                        Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                        03-22-2024, 06:39 AM

                      ad_right_rmr

                      Collapse

                      News

                      Collapse

                      Topics Statistics Last Post
                      Started by seqadmin, 04-11-2024, 12:08 PM
                      0 responses
                      18 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, 04-10-2024, 10:19 PM
                      0 responses
                      22 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, 04-10-2024, 09:21 AM
                      0 responses
                      17 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, 04-04-2024, 09:00 AM
                      0 responses
                      49 views
                      0 likes
                      Last Post seqadmin  
                      Working...
                      X