Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • byou678
    Member
    • Aug 2011
    • 52

    Need help for FastQC results. Thanks!!

    Hi All,

    My data (paired end reads) come from Illumina GA II using RNAseq technology. There are four data files: 5_1, 5_2, 6_1 and 6_2. 5_1 and 5_2 are the pair end reads from Lane 5 on the flowcell. And 6_1 and 6_2 is the pair from Lane 6. The results of each pair(5_1 comparing to 5_2; 6_1 comparing to 6_2) look similar after FASTQC using the raw data.

    But after quality trimming and adapter trimming (I use the same adapters in this step), the FASTQC results of 5_1 vs 5_2 look different, especially in the FASTQC modules "Per Base Sequence Content" and "Overrepresented Kmers". 6_1 vs 6_2 have the same problem. Because they are paired end reads and be trimmed by the same way, generally the FASTQC results should also look similar. Could anybody give me some reasonable explanations? Any reply will be greatly appreciated.
    Last edited by byou678; 08-22-2011, 06:51 AM.
  • simonandrews
    Simon Andrews
    • May 2009
    • 870

    #2
    Without seeing your actual data it's really difficult to make any sensible suggestions as to what might be different. If you're seeing differences in the sequence content plots then these will bias the results of the Kmer plots. If you could post 2 of your sequence content plots which look different we might be able to offer more concrete suggestions.

    Comment

    • byou678
      Member
      • Aug 2011
      • 52

      #3
      Any other ideas? Thanks in advance.

      Comment

      • byou678
        Member
        • Aug 2011
        • 52

        #4
        [IMG]C:\Users\whittier.2\Desktop[/IMG]
        Last edited by byou678; 08-23-2011, 05:44 AM.

        Comment

        • byou678
          Member
          • Aug 2011
          • 52

          #5

          Comment

          • simonandrews
            Simon Andrews
            • May 2009
            • 870

            #6
            The links you tried to post point to a file on your desktop which we can't see. You either need to put the images on a public facing webserver, or add them as attachments to your post. To add an attachment go into the 'advanced' posting options and click on the paper clip icon.

            Comment

            • byou678
              Member
              • Aug 2011
              • 52

              #7
              Thanks simonandrews!

              The picture of " Per Base Sequence Content " of 5_1 is in the attachment.

              Originally posted by simonandrews View Post
              The links you tried to post point to a file on your desktop which we can't see. You either need to put the images on a public facing webserver, or add them as attachments to your post. To add an attachment go into the 'advanced' posting options and click on the paper clip icon.
              Attached Files

              Comment

              • byou678
                Member
                • Aug 2011
                • 52

                #8
                The Picture of " Per Base Sequence of Content" of 5_2 in the following attachment. Please take a look and compare to 5_1 and give me some ideas. Thanks !
                Attached Files

                Comment

                • byou678
                  Member
                  • Aug 2011
                  • 52

                  #9
                  The Picture of "Overrepresented Kmers" of 5_1 is in the attachment.
                  Attached Files

                  Comment

                  • byou678
                    Member
                    • Aug 2011
                    • 52

                    #10
                    The Picture of "Overrepresented Kmers" of 5_2 is in the attachment.

                    All the four pictures above are FASTQC results after quality trimming and adapter trimming.
                    Attached Files
                    Last edited by byou678; 08-23-2011, 06:42 AM.

                    Comment

                    • simonandrews
                      Simon Andrews
                      • May 2009
                      • 870

                      #11
                      It looks like in the 5_1 sample you are reading through your diverse insert sequence into some kind of adapter. You can actually read part of the adapter sequence from the Kmer plot - it starts with GAGCGGTCA. I've had a quick look but I couldn't match that to any of the standard illumina adapter sequences (it does occur in pBT-6 if that rings any bells), but you should check to see if it matches any of the primers or vectors you've used during your library construction.

                      If you can figure out the source of the sequence and you're happy the rest of your library is OK then you can rerun the adapter trimmer with this new sequence to remove the additional sequence and your libraries should look more like each other again.

                      Comment

                      • byou678
                        Member
                        • Aug 2011
                        • 52

                        #12
                        Thanks again.

                        The adapter sequences are as below:
                        5' TACACTCTTTCCCTACACGACGCTCTTCCGATCT
                        5 'AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTA
                        5' GACGGCATACGAGCTCTTCCGATCT
                        5' AGATCGGAAGAGCTCGTATGCCGTC

                        I use "Trim.pl" to do quality trimming, some scripts are showed below

                        Options:
                        --type <num> 0=standard trimming, 1=adaptive trimming, 2=windowed adaptive trimming. Default 0
                        --qual-threshold <num> quality threshold for trimming, default 20
                        --length-threshold <num> length threshold for trimming, default 20
                        --qual-type <num> 0=sanger qualities, 1=illumina qualities pipeline>=1.3, 2=illumina qualities pipeline<1.3. Default 0.
                        --pair1 <paired end input filename> fastq, paired end file. Must have same number of records as pair2. Required.
                        --pair2 <paired end input filename> fastq, paired end file. Must have same number of records as pair1. Required.
                        --outpair1 <paired end output file> Required.
                        --outpair2 <paired end output file> Required.
                        --single <single end output file> Required.


                        I choose the appropriate characters and values to run quality trimming. Here I use --type 2 ( windowed adaptive trimming); --qual-type 1 (illumina qualities pipeline>=1.3); default values for--qual-threshold <num> and length-threshold <num>.
                        Put the corresponding sequence data names after --pair1 and --pair2 . In addition, name the output files of --outpair1, --output2 and --single.

                        I choose software "cutadapt" for adaptor trimming, I use the following scripts in Terminal to run the 5.1 file (the 5.1 file after quality trimming). The output will be saved in file “5.1_adaptortrim.fastq”.
                        $ cutadapt -b TACACTCTTTCCCTACACGACGCTCTTCCGATCT –b AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTA –b GACGGCATACGAGCTCTTCCGATCT –b AGATCGGAAGAGCTCGTATGCCGTC 5.1_trimmed.fastq > 5.1_adaptortrim.fastq

                        Any more suggestions and many thanks!!



                        Originally posted by simonandrews View Post
                        It looks like in the 5_1 sample you are reading through your diverse insert sequence into some kind of adapter. You can actually read part of the adapter sequence from the Kmer plot - it starts with GAGCGGTCA. I've had a quick look but I couldn't match that to any of the standard illumina adapter sequences (it does occur in pBT-6 if that rings any bells), but you should check to see if it matches any of the primers or vectors you've used during your library construction.

                        If you can figure out the source of the sequence and you're happy the rest of your library is OK then you can rerun the adapter trimmer with this new sequence to remove the additional sequence and your libraries should look more like each other again.

                        Comment

                        • simonandrews
                          Simon Andrews
                          • May 2009
                          • 870

                          #13
                          So the problem is that you have a sequence in your library which isn't one of the adapters you passed to cutadapt. I can't immediately see where it's come from, but since cutadapt didn't know about it it didn't remove it, and your trimmed library is still biased. I'd suspect that if you looked at the size distribution of your two libraries after trimming you'll see that one has been trimmed significantly more than the other.

                          You need to figure out as much of this mystery sequence as you can (either by finding the sequence in one of your primers or by looking at some of your sequences and seeing where the common sequence at the end stops). You can then pass this as an extra sequence to cutadapt which can remove it from your library.

                          Comment

                          • GenoMax
                            Senior Member
                            • Feb 2008
                            • 7142

                            #14
                            byou678: Have you tried aligning the reads to your reference? Since this is an RNA-seq sample are we seeing just the standard "random primer effect" at the beginning of the read?

                            Comment

                            • byou678
                              Member
                              • Aug 2011
                              • 52

                              #15
                              Yes, align using BWA. Could you explain your second question in detail? Thanks for your reply.



                              Originally posted by GenoMax View Post
                              byou678: Have you tried aligning the reads to your reference? Since this is an RNA-seq sample are we seeing just the standard "random primer effect" at the beginning of the read?

                              Comment

                              Latest Articles

                              Collapse

                              • SEQadmin2
                                Nine Things a Sample Prep Scientist Thinks About Before Sequencing
                                by SEQadmin2


                                I’m not a sequencing expert. I’m a purification scientist who uses NGS to evaluate workflows my group develops. With this perspective, we think about the sample first and the NGS workflow second. The sequencer is an exceptionally honest reporter, but it can only report on what you give it, so whether you get clean, interpretable data from an NGS workflow is largely determined before you begin.

                                Here are nine questions we think about, in roughly the order they matter, before...
                                06-18-2026, 07:11 AM
                              • SEQadmin2
                                From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
                                by SEQadmin2


                                Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


                                The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
                                ...
                                06-02-2026, 10:05 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by SEQadmin2, 06-17-2026, 06:09 AM
                              0 responses
                              37 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 06-09-2026, 11:58 AM
                              0 responses
                              100 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 06-05-2026, 10:09 AM
                              0 responses
                              121 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 06-04-2026, 08:59 AM
                              0 responses
                              113 views
                              0 reactions
                              Last Post SEQadmin2  
                              Working...