Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Need help for FastQC results. Thanks!!

    Hi All,

    My data (paired end reads) come from Illumina GA II using RNAseq technology. There are four data files: 5_1, 5_2, 6_1 and 6_2. 5_1 and 5_2 are the pair end reads from Lane 5 on the flowcell. And 6_1 and 6_2 is the pair from Lane 6. The results of each pair(5_1 comparing to 5_2; 6_1 comparing to 6_2) look similar after FASTQC using the raw data.

    But after quality trimming and adapter trimming (I use the same adapters in this step), the FASTQC results of 5_1 vs 5_2 look different, especially in the FASTQC modules "Per Base Sequence Content" and "Overrepresented Kmers". 6_1 vs 6_2 have the same problem. Because they are paired end reads and be trimmed by the same way, generally the FASTQC results should also look similar. Could anybody give me some reasonable explanations? Any reply will be greatly appreciated.
    Last edited by byou678; 08-22-2011, 06:51 AM.

  • #2
    Without seeing your actual data it's really difficult to make any sensible suggestions as to what might be different. If you're seeing differences in the sequence content plots then these will bias the results of the Kmer plots. If you could post 2 of your sequence content plots which look different we might be able to offer more concrete suggestions.

    Comment


    • #3
      Any other ideas? Thanks in advance.

      Comment


      • #4
        [IMG]C:\Users\whittier.2\Desktop[/IMG]
        Last edited by byou678; 08-23-2011, 05:44 AM.

        Comment


        • #5

          Comment


          • #6
            The links you tried to post point to a file on your desktop which we can't see. You either need to put the images on a public facing webserver, or add them as attachments to your post. To add an attachment go into the 'advanced' posting options and click on the paper clip icon.

            Comment


            • #7
              Thanks simonandrews!

              The picture of " Per Base Sequence Content " of 5_1 is in the attachment.

              Originally posted by simonandrews View Post
              The links you tried to post point to a file on your desktop which we can't see. You either need to put the images on a public facing webserver, or add them as attachments to your post. To add an attachment go into the 'advanced' posting options and click on the paper clip icon.
              Attached Files

              Comment


              • #8
                The Picture of " Per Base Sequence of Content" of 5_2 in the following attachment. Please take a look and compare to 5_1 and give me some ideas. Thanks !
                Attached Files

                Comment


                • #9
                  The Picture of "Overrepresented Kmers" of 5_1 is in the attachment.
                  Attached Files

                  Comment


                  • #10
                    The Picture of "Overrepresented Kmers" of 5_2 is in the attachment.

                    All the four pictures above are FASTQC results after quality trimming and adapter trimming.
                    Attached Files
                    Last edited by byou678; 08-23-2011, 06:42 AM.

                    Comment


                    • #11
                      It looks like in the 5_1 sample you are reading through your diverse insert sequence into some kind of adapter. You can actually read part of the adapter sequence from the Kmer plot - it starts with GAGCGGTCA. I've had a quick look but I couldn't match that to any of the standard illumina adapter sequences (it does occur in pBT-6 if that rings any bells), but you should check to see if it matches any of the primers or vectors you've used during your library construction.

                      If you can figure out the source of the sequence and you're happy the rest of your library is OK then you can rerun the adapter trimmer with this new sequence to remove the additional sequence and your libraries should look more like each other again.

                      Comment


                      • #12
                        Thanks again.

                        The adapter sequences are as below:
                        5' TACACTCTTTCCCTACACGACGCTCTTCCGATCT
                        5 'AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTA
                        5' GACGGCATACGAGCTCTTCCGATCT
                        5' AGATCGGAAGAGCTCGTATGCCGTC

                        I use "Trim.pl" to do quality trimming, some scripts are showed below

                        Options:
                        --type <num> 0=standard trimming, 1=adaptive trimming, 2=windowed adaptive trimming. Default 0
                        --qual-threshold <num> quality threshold for trimming, default 20
                        --length-threshold <num> length threshold for trimming, default 20
                        --qual-type <num> 0=sanger qualities, 1=illumina qualities pipeline>=1.3, 2=illumina qualities pipeline<1.3. Default 0.
                        --pair1 <paired end input filename> fastq, paired end file. Must have same number of records as pair2. Required.
                        --pair2 <paired end input filename> fastq, paired end file. Must have same number of records as pair1. Required.
                        --outpair1 <paired end output file> Required.
                        --outpair2 <paired end output file> Required.
                        --single <single end output file> Required.


                        I choose the appropriate characters and values to run quality trimming. Here I use --type 2 ( windowed adaptive trimming); --qual-type 1 (illumina qualities pipeline>=1.3); default values for--qual-threshold <num> and length-threshold <num>.
                        Put the corresponding sequence data names after --pair1 and --pair2 . In addition, name the output files of --outpair1, --output2 and --single.

                        I choose software "cutadapt" for adaptor trimming, I use the following scripts in Terminal to run the 5.1 file (the 5.1 file after quality trimming). The output will be saved in file “5.1_adaptortrim.fastq”.
                        $ cutadapt -b TACACTCTTTCCCTACACGACGCTCTTCCGATCT –b AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTA –b GACGGCATACGAGCTCTTCCGATCT –b AGATCGGAAGAGCTCGTATGCCGTC 5.1_trimmed.fastq > 5.1_adaptortrim.fastq

                        Any more suggestions and many thanks!!



                        Originally posted by simonandrews View Post
                        It looks like in the 5_1 sample you are reading through your diverse insert sequence into some kind of adapter. You can actually read part of the adapter sequence from the Kmer plot - it starts with GAGCGGTCA. I've had a quick look but I couldn't match that to any of the standard illumina adapter sequences (it does occur in pBT-6 if that rings any bells), but you should check to see if it matches any of the primers or vectors you've used during your library construction.

                        If you can figure out the source of the sequence and you're happy the rest of your library is OK then you can rerun the adapter trimmer with this new sequence to remove the additional sequence and your libraries should look more like each other again.

                        Comment


                        • #13
                          So the problem is that you have a sequence in your library which isn't one of the adapters you passed to cutadapt. I can't immediately see where it's come from, but since cutadapt didn't know about it it didn't remove it, and your trimmed library is still biased. I'd suspect that if you looked at the size distribution of your two libraries after trimming you'll see that one has been trimmed significantly more than the other.

                          You need to figure out as much of this mystery sequence as you can (either by finding the sequence in one of your primers or by looking at some of your sequences and seeing where the common sequence at the end stops). You can then pass this as an extra sequence to cutadapt which can remove it from your library.

                          Comment


                          • #14
                            byou678: Have you tried aligning the reads to your reference? Since this is an RNA-seq sample are we seeing just the standard "random primer effect" at the beginning of the read?

                            Comment


                            • #15
                              Yes, align using BWA. Could you explain your second question in detail? Thanks for your reply.



                              Originally posted by GenoMax View Post
                              byou678: Have you tried aligning the reads to your reference? Since this is an RNA-seq sample are we seeing just the standard "random primer effect" at the beginning of the read?

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Strategies for Sequencing Challenging Samples
                                by seqadmin


                                Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                                03-22-2024, 06:39 AM
                              • seqadmin
                                Techniques and Challenges in Conservation Genomics
                                by seqadmin



                                The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                                Avian Conservation
                                Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                                03-08-2024, 10:41 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, Yesterday, 06:37 PM
                              0 responses
                              8 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, Yesterday, 06:07 PM
                              0 responses
                              8 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 03-22-2024, 10:03 AM
                              0 responses
                              49 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 03-21-2024, 07:32 AM
                              0 responses
                              67 views
                              0 likes
                              Last Post seqadmin  
                              Working...
                              X