Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • strange FastQC kmer plot even after trimming

    Hi,
    I've the attached strange FastQC kmer plot even after adpter and quality trimming. The data is from 400bp PE library from GAII. I've used trimmomatic to trim the TruSeq adapter.
    Code:
    GATCGGAAGAGCACACGTCTGAACTCCAGTCACCTTGTAATCTCGTATGCCGTCTTCTGCTTG
    As suggested by many other posts not to worry too much about things like this. However, I am coming back to this only after getting a highly fragmented denovo assembly of a large genome. I understand that denovo assembly can be like that for many reasons, however, just to make sure I've high quality reads to supply to assembler and not to mention the plot looks Ugly.
    Thanks for any suggestions.
    Attached Files

  • #2
    Oh, for de novo assembly you should definitely worry about that (if you were just mapping reads to a genome, it likely wouldn't matter). Those 5-mers are being generated from two dinucleotide repeats (just in different frames and strands). That is going to screw up your assembly if you have very many of them infiltrating your reads, which we can't tell for that plot, but its just relative to the highest abundant k-mer.

    Are you sure you put in the correct adapter for trimming. Just the TruSeq adapter is often not correct. But rather you need some set of indexed adapters, PCR primers, etc. I generally give Trimmomatic a pretty long list of every adapter/primer set that was used in the whole group of library preps being sequenced, just to be sure. After your assemblies, you'll find adapter/primer sequence of all kinds of stuff if you don't.

    Comment


    • #3
      Here are two K-mer plots before (bottom) and after (top and that CCCCC repeat is very much lower than the spikes you see in the bottom window) aggressive trimming with trimmomatic (including a quality trim) and overlapping with flash (do your 150bp reads overlap?). Here is the adapter file I went with too, as you can see it was a bit of the kitchen sink.
      Click image for larger version

Name:	kmer_profiles.png
Views:	1
Size:	19.5 KB
ID:	304269

      Click image for larger version

Name:	kmer_profiles_1.png
Views:	1
Size:	64.5 KB
ID:	304270

      Adapters.txt

      Comment


      • #4
        Hi Wallysb01,

        Thanks for your reply. I haven't explored the overlapping reads.
        I've used your adapters and it seems most of those kmers are still having fun out there.
        Also for your info, here is the trimmomatic command I used:
        Code:
        java -classpath trimmomatic-0.30.jar org.usadellab.trimmomatic.TrimmomaticPE -threads 16 -phred33 ../lane2_NoIndex_L002_R1_001_val_1.fq ../lane2_NoIndex_L002_R2_001_val_2.fq paired21.fq unpaired21.fq paired22.fq unpaired22.fq ILLUMINACLIP:Adapters.txt:2:30:10 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:30 MINLEN:50
        Attached Files

        Comment


        • #5
          Eek, Ok. Did the frequency drop much. You can tell by the table under that figure.

          Also, how big are your inserts? Can you even attempt overlapping the reads?

          And you may just want to trim off those first 10bp for your next assembly. That may help.

          Finally, what kind of coverage do you have?

          Comment


          • #6
            Nope, the frequency doesn't drop much. Reads are 150bp and insert size is 400bp for 2 lanes and 700bp for another two lanes. hence, not much chance of overlaps.
            Yes, I did trim off 10bp in both directions and it's almost the same and seems like I am running out of options.
            Attached Files

            Comment


            • #7
              You might try COPE (http://sourceforge.net/projects/coperead/). It can overlap reads using kmers, so reads don't have to actually overlap and instead just be close enough for high frequency kmers to span the gap. It may work pretty well with 2x150bp reads, because you could increase kmer sizes up a little bigger, assuming your coverage is pretty high too. And you can use the 700bp insert library to add to the kmer pool, but not attempt overlaps.

              With my shorter 170bp library, I found flash to work better, but the library was actually that small with very few >190bp. So that kmer method didn't seem to help much. And while you library may look like its 400bp, I've generally found libraries to be shorter than what sequencing cores say.

              Comment


              • #8
                Thanks for your suggestions. It would be good to have longer reads through overlaps, however, think I need to get rid of those funny k-mers first, isn't it?. I can't find a way to deal with that. Once I've quality data I can move to the next step.

                Comment


                • #9
                  Those repetitive kmers are really just dinucleoties, so in kmer lengths around 21, for error correction and overlapping, they may not provide a huge obstacle.

                  In fact, you could up the kmer length to 10bp in fastqc to see if these sequences continue to be a problem. It maybe that certain reads are just filled with them and they could be removed with a very strict dust filtering. Say, you remove reads with a dust score of 30? There is really no reason to attempt to keep sequences with so many very, very low complexity sequences. While you of course ideally you'd want to try to assembly low complexity sequences, however in this case, they may be artifacts and providing more problems than they are worth.

                  Prinseq can do dust filtering, if you want to give it a shot. And it will separate out the good and bad seqs for inspection.

                  After playing with prinseq, you might actually want to drop that score a little lower, 20-ish?
                  Last edited by Wallysb01; 08-01-2013, 10:27 PM.

                  Comment

                  Latest Articles

                  Collapse

                  • seqadmin
                    Advancing Precision Medicine for Rare Diseases in Children
                    by seqadmin




                    Many organizations study rare diseases, but few have a mission as impactful as Rady Children’s Institute for Genomic Medicine (RCIGM). “We are all about changing outcomes for children,” explained Dr. Stephen Kingsmore, President and CEO of the group. The institute’s initial goal was to provide rapid diagnoses for critically ill children and shorten their diagnostic odyssey, a term used to describe the long and arduous process it takes patients to obtain an accurate...
                    12-16-2024, 07:57 AM
                  • seqadmin
                    Recent Advances in Sequencing Technologies
                    by seqadmin



                    Innovations in next-generation sequencing technologies and techniques are driving more precise and comprehensive exploration of complex biological systems. Current advancements include improved accessibility for long-read sequencing and significant progress in single-cell and 3D genomics. This article explores some of the most impactful developments in the field over the past year.

                    Long-Read Sequencing
                    Long-read sequencing has seen remarkable advancements,...
                    12-02-2024, 01:49 PM

                  ad_right_rmr

                  Collapse

                  News

                  Collapse

                  Topics Statistics Last Post
                  Started by seqadmin, 12-17-2024, 10:28 AM
                  0 responses
                  23 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 12-13-2024, 08:24 AM
                  0 responses
                  42 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 12-12-2024, 07:41 AM
                  0 responses
                  28 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 12-11-2024, 07:45 AM
                  0 responses
                  42 views
                  0 likes
                  Last Post seqadmin  
                  Working...
                  X