Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Massive (viral?) contamination of Illumina reads

    Hello,

    We recently did a Paired-End run on the Illumina machine to sequence several bacterial genomes (one genome per lane). When we tried mapping our reads to a reference genome, none of the reads would map. After performing a de novo assembly, which resulted in ~2,000 contigs, we found that most of the sequences have some identity to viral genomes. Furthermore, the contigs are highly AT rich (average of 38% GC) and predicted open reading frames are short (300-825bp), all indicative of viral sequence.

    We are fairly confident that the viral sequences did not contaminate the bacterial cultures we extracted DNA from since the cells would have been lysed (and we wouldn't have extracted then). Furthermore, one of our lanes contained an environmental sample that had never been cultured. These observations make us think that something may have happened during the Illumina sample prep.

    Has anyone had similar contamination issues?

    Many thanks in advance.

  • #2
    It's not a spike-in PhiX control, is it?

    Comment


    • #3
      I'm fairly confident it isn't a PhiX spike. When we do the PhiX controls they get their own lane. Not sure where it came from...

      Comment


      • #4
        Viral and Bacterial contamenation is common

        I've dumped the unmapped reads from several, diverse human sample bam files. I've run bwa and a home-brew alignment on bacterial and viral genomes from NCBI on theses unmappeds. Bacteria and virus reads (spot confirmed by blast on NR), indicate that there is often some environmental contamination. Some is likely really in the organism (the adeno and herpesvirae); but the mottled virus and it's buddies are probably floating in the air. Also, there's no reason virae can't attack bacteria (I think). I've heard that others find bacteria and virus contaminations.

        Comment


        • #5
          Bacteria have their own viruses -- they won't get ours!

          If you'd put a sample in a public location, some folks (such as myself) might have fun playing with the data & perhaps some insight could arise.

          Comment


          • #6
            bwa on viral genomes

            Originally posted by Richard Finney View Post
            I've dumped the unmapped reads from several, diverse human sample bam files. I've run bwa and a home-brew alignment on bacterial and viral genomes from NCBI on theses unmappeds.
            Could you please give a hint on how you selected these genomes? We want to check for viral infections in human cancer samples and wonder if we should take all available genomes or a representative sample of them. I'm concerned that if we take all, because of the high homology bwa will find a lot of ambiguos alignments. Also, are there any special parameters to consider for using bwa, besides making indexes with the option for short genomes?

            Thanks in advance,

            Barbara

            Comment


            • #7
              aligning to the virome (notes from the underground)

              I'm concerned that if we take all, because of the high homology bwa will find a lot of ambiguos alignments. Also, are there any special parameters to consider for using bwa, besides making indexes with the option for short genomes?

              Yeah, you can get "repeat hits" to several similar viruses (or bacteria). Unique hits means you might have narrowed down the subspecies at best. There's probably unassembled cousins of the herpes virus you hit.
              Culling the list to a "representative" sampling sounds like a daunting task.

              Perhaps a "mappability" database for virae might be useful: make reads from all assembled viromes and run against all known viromes (should keep your beowulf cluster processors warm and toasty for a few days).

              BWA indexing might complain if too big or too small. Just use the other option and it will shut up.

              Here's a script to grab the viral genomes from NCBI ...
              echo "NC_" > mustmatchthese.txt
              echo "AC_" >> mustmatchthese.txt

              echo "contacting http://www.ncbi.nlm.nih.gov/genomes/genlist.cgi?taxid=10239&type=5&name=Viruses"
              wget "http://www.ncbi.nlm.nih.gov/genomes/genlist.cgi?taxid=10239&type=5&name=Viruses" -O list_of_virae

              #old way: fgrep -f mustmatchthese.txt list_of_virae | grep accn | cut -b98-106 > viral.acc.lst
              #new way (ncbi's format changed) ... , sed command remove html tags
              fgrep -f mustmatchthese.txt list_of_virae | sed -e 's/<[^>]*>//g' | fgrep -f mustmatchthese.txt > viral.acc.lst

              echo note run word count for viral.acc.lst - viral acc list shold be greater than 3000 or so
              echo vir.acc.lst has this many lines ...
              wc -l viral.acc.lst

              #idfetch is oldschool program from NCBI toolkit
              idfetch -c 1 -o viruses.sep24.2010 -G viral.acc.lst -t 5

              echo DONE !!!

              ls -l viruses.sep24.2010

              # you'll then fix the fasta record headers to shorten the name and index using bwa

              The wget fetch line for bacteria is
              wget "http://www.ncbi.nlm.nih.gov/genomes/genlist.cgi?taxid=2&type=0&name=Complete%20Bacteria"
              but ultimately creates a too big FASTA bacteriaome and must be split in two for BWA.

              Comment

              Latest Articles

              Collapse

              • seqadmin
                Strategies for Sequencing Challenging Samples
                by seqadmin


                Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                03-22-2024, 06:39 AM
              • seqadmin
                Techniques and Challenges in Conservation Genomics
                by seqadmin



                The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                Avian Conservation
                Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                03-08-2024, 10:41 AM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by seqadmin, Yesterday, 06:37 PM
              0 responses
              10 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, Yesterday, 06:07 PM
              0 responses
              9 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 03-22-2024, 10:03 AM
              0 responses
              49 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 03-21-2024, 07:32 AM
              0 responses
              67 views
              0 likes
              Last Post seqadmin  
              Working...
              X