Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Checking reads for contamination

    Is there a tool to check for the source of contamination in sequencing reads? I am looking for something like BLAST, but that would summarize across many reads.

    For example, I have a FASTQ that is supposed to be human. Only 50% of the reads align to human. Where are the other reads coming from?

  • #2
    maybe this https://github.com/blaxterlab/blobology it looks like a good tool for quick looks.

    Comment


    • #3
      Originally posted by skbrimer View Post
      maybe this https://github.com/blaxterlab/blobology it looks like a good tool for quick looks.
      It looks like it performs an ABySS assembly. That seems computationally intensive. More importantly, I am not sure how well it would do with dilute samples.

      Comment


      • #4
        I suggest that you use BBSplit from BBMap with human reference and then collect the unmapped reads in a separate file for examination.
        Last edited by GenoMax; 06-22-2015, 03:58 PM.

        Comment


        • #5
          Both BBMap and BBDuk (and BBSplit) can output a file indicating the percent and number of reads matching a given sequence, and can do so quickly for large numbers of reads. We run all of our reads through BBDuk for screening against small synthetic contaminants (primers, spike-ins, vectors, etc), and it does a nice job of quantifying their absolute abundance, but it would run out of memory processing a reference as big as nt (I don't normally give BBDuk a reference bigger than 1Gbp or so). If you follow GenoMax's advice, just grab a handful (~1000) of the reads that don't map to human and blast them against nt; hopefully something will turn up.

          Comment


          • #6
            If you barcode check the reads which are not de-multiplexed. Any reads which have a barcode you didn't make the libraries with are contamination.

            Comment


            • #7
              In my sequencing class the students take a cheek swab, do a Nextera prep and get 10M reads. The first exercise is to see what is living in their mouth. So they align to the human reference with Novoalign, then pull out the non-aligners, convert to fasta and submit a blastn job to see which bacteria are in there. Students report a huge increase in flossing frequency after seeing the typical results! The one-liner to find the non-aligners and make a fasta file is:

              cat yourname_vs_hg19.align | grep NM | head -500 | cut -f 3 | awk '{print ">" $1 "\n" $1}'
              This is for Novoalign which reports a 'NM' for non-aligners and has the sequence in column 3. You can modify for other aligners, I think, pretty easily.

              As part of our genotyping of populations we always check 1000 reads from each sample. It often explains some discordant results (lots of reads but low depth at the loci because most the sample is something else!).
              Providing nextRAD genotyping and PacBio sequencing services. http://snpsaurus.com

              Comment

              Latest Articles

              Collapse

              • seqadmin
                Techniques and Challenges in Conservation Genomics
                by seqadmin



                The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                Avian Conservation
                Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                03-08-2024, 10:41 AM
              • seqadmin
                The Impact of AI in Genomic Medicine
                by seqadmin



                Artificial intelligence (AI) has evolved from a futuristic vision to a mainstream technology, highlighted by the introduction of tools like OpenAI's ChatGPT and Google's Gemini. In recent years, AI has become increasingly integrated into the field of genomics. This integration has enabled new scientific discoveries while simultaneously raising important ethical questions1. Interviews with two researchers at the center of this intersection provide insightful perspectives into...
                02-26-2024, 02:07 PM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by seqadmin, 03-14-2024, 06:13 AM
              0 responses
              33 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 03-08-2024, 08:03 AM
              0 responses
              72 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 03-07-2024, 08:13 AM
              0 responses
              80 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 03-06-2024, 09:51 AM
              0 responses
              68 views
              0 likes
              Last Post seqadmin  
              Working...
              X