Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • I'm an eternal optimist :-)

    The date is fixed now.

    For future reference, if anyone has any interesting datasets which might be useful examples to put on the project page we're always interested in providing more of these.

    Comment


    • Simon,

      Thank you for your efforts, this looks like a really terrific update and am eager to explore it more. I have a couple of questions.

      I deduced that the primary configuration is done by editing the $FASTQCHOME/Configuration/limits.txt. Is it possible to specify an alternate config file on the command line? This would be helpful if one wanted to use different QC tests/thresholds for different sequence types.

      The new single file HTML output is very handy. I noted that the default behavior is to output both this new single file report and the old multi-file report. Is there an option to only output the single file report?

      Comment


      • Hi Simon,
        Thanks for all your work on the update! One feature which would be really nice is the ability to run fastqc on paired end data and to show the reports for each read next to each other in the output. Cheers!

        Comment


        • Originally posted by kmcarr View Post
          Simon,

          Thank you for your efforts, this looks like a really terrific update and am eager to explore it more. I have a couple of questions.

          I deduced that the primary configuration is done by editing the $FASTQCHOME/Configuration/limits.txt. Is it possible to specify an alternate config file on the command line? This would be helpful if one wanted to use different QC tests/thresholds for different sequence types.
          I thought I'd added that already, but looking at the code I'd only added the ability to use a different file to the back end and never linked up the options so that you could change this from the command line. I've just enabled this in the development branch and will push out a fix in the next release (which will be soon as there were a couple of other minor issues to fix).

          Originally posted by kmcarr View Post
          The new single file HTML output is very handy. I noted that the default behavior is to output both this new single file report and the old multi-file report. Is there an option to only output the single file report?
          No, but the old format will now just sit inside the zip file unless you specifically ask for it to be unzipped. The reason for keeping the zip file is that the other data files which are generated need to be stored somewhere and I know many sites use these for automated monitoring. I think the only easy way to handle this would be to have a --nozip flag which would simply delete the zip file after the run was complete. I could do that but I'm always a bit wary of ever adding in code to delete files unless it's absolutely necessary.

          Comment


          • Originally posted by frozenlyse View Post
            Hi Simon,
            Thanks for all your work on the update! One feature which would be really nice is the ability to run fastqc on paired end data and to show the reports for each read next to each other in the output. Cheers!
            This is on the list of features for the next major release - it was one step too far for this release which had already been held up for ages, but it's definitely something we'd like to see too.

            Comment


            • I've just released FastQC v0.11.2 to address a few issues which were reported against the recent v0.11.1 release. These are:
              • Fixed incorrect warn/error values for the per sequence quality module
              • Added the ability to specify a custom limits file on the command line
              • Fixed some memory issues in the Kmer and per sequence quality modules
              • Fixed the naming of the folder extracted from the zip file
              • Fixed an error when using the --extract option


              If anyone discovers any other issues with this release please let me know.

              Thanks

              Simon.

              Comment


              • Hey guys,

                i have problems to understand the module for "Sequence duplication levels". I already checked http://www.bioinformatics.babraham.a...Sequences.html

                As far as i understood, the module takes the first 200k sequences of my fastq-file and then iterates over the entire fastq-file and counts how often each of the first 200k sequence appears. The frequency distribution is then given by the blue line in the figure.

                Am I correct up to this point?

                Where i got problems is the thing with the de-duplicated sequences (red line in the figure). Is the set reduced by excluding duplicated sequences and then again calculate the frequency distribution? But then i would expect that all sequences would have a duplication level of 1, so I think my assumption is incorrect.

                Perhaps somebody could bring some light into the darkness

                Greetings
                Mchicken

                Comment


                • Originally posted by Mchicken View Post
                  Hey guys,

                  i have problems to understand the module for "Sequence duplication levels". I already checked http://www.bioinformatics.babraham.a...Sequences.html

                  As far as i understood, the module takes the first 200k sequences of my fastq-file and then iterates over the entire fastq-file and counts how often each of the first 200k sequence appears. The frequency distribution is then given by the blue line in the figure.

                  Am I correct up to this point?

                  Where i got problems is the thing with the de-duplicated sequences (red line in the figure). Is the set reduced by excluding duplicated sequences and then again calculate the frequency distribution? But then i would expect that all sequences would have a duplication level of 1, so I think my assumption is incorrect.

                  Perhaps somebody could bring some light into the darkness

                  Greetings
                  Mchicken
                  You might also want to read this which has a bit more explanation about the duplication plot.

                  Briefly though, the red line in the plot uses the duplication levels from the original data, but expresses the proportions of the library from the counts in the deduplicated data. The idea is that you can see the effect that deduplication had on the relative composition of your library.

                  If you had a lot of low level duplication then you would see that the two traces ended up being quite close to each other. If you have high level duplication (a small fraction of the library which is highly duplicated) then you would see a really large shift at the bottom end of the plot where the low level sequences would go from making up a small proportion of the original library, to a high proportion of the deduplicated sequences.

                  Does this make it any clearer?

                  Comment


                  • Hey Simon,

                    thanks for your fast answer. It is slightly clearer to me but i think i miss the important point.

                    For example if I look at one of the figures of your link.
                    http://proteo.me.uk/wp-content/uploa...uplication.png

                    I can see that 49.38% of my reads survive the deduplication. So the red line is based on these 49.38% I think?.

                    And now my problem:

                    The red line tells me that about 2% of the deduplicated library (49.38% of the raw library) is made up by sequences that occur more than ten times. I draw this conclusion cause of the small peak in the red line at the >10 bin.

                    So how can sequences occur more than 10 times if i have deduplicated my library?

                    Hope you understand my problem.

                    Comment


                    • In the red line the duplication levels still come from the original data, so the set of sequences which underlie each point in both graphs is the same. The only thing which changes is the sequence counts which go to working out the proportions of the library (the y-axis). As you said it would be pointless doing the whole analysis again for deduplicated data as everything would only be present once.

                      You could for example say from looking at the graph you linked to that sequences with 10-50 copies made up around 15% of the original library, but that after deduplication those same sequences (of which there is now only one copy of each) make up around 2% of the deduplicated set.

                      Comment


                      • Now i got it

                        Thanks a lot for your explanations!
                        Last edited by Mchicken; 06-25-2014, 04:06 AM.

                        Comment


                        • Could you please update Linux apt-get with the latest version of FastQC? Thank you so much.

                          Comment


                          • Originally posted by ronton View Post
                            Could you please update Linux apt-get with the latest version of FastQC? Thank you so much.
                            Are you using some ubuntu/debian distribution? Maybe it's easier to upgrade your system to biolinux 8 (which includes fastqc, but I don't remember if it is the latest version). Biolinux 7 on the other hand comes with FastQC v 0.10.1 installed.

                            Maybe it's easier to just download the new version of FastQC and then run it via terminal, I did that and then I execute FastQC w/o options, so it opens the GUI of the new version.

                            Comment


                            • Ubuntu, yes I downloaded the newest version and ran it from the /FastQC folder. You can use apt-get install fastqc, but I am not 100% where all of the installation files go to try and 'upgrade' that way. Thank you.

                              Comment


                              • Originally posted by ronton View Post
                                Could you please update Linux apt-get with the latest version of FastQC? Thank you so much.
                                If fastqc is in a debian/ubuntu apt repository then you should put in an upgrade request through your distributions normal bug tracker. This isn't something which we control.

                                Manually installing fastqc is as simple as downloading and unzipping a zip file so that's a really easy option to get the latest version.

                                Comment

                                Latest Articles

                                Collapse

                                • seqadmin
                                  Techniques and Challenges in Conservation Genomics
                                  by seqadmin



                                  The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                                  Avian Conservation
                                  Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                                  03-08-2024, 10:41 AM
                                • seqadmin
                                  The Impact of AI in Genomic Medicine
                                  by seqadmin



                                  Artificial intelligence (AI) has evolved from a futuristic vision to a mainstream technology, highlighted by the introduction of tools like OpenAI's ChatGPT and Google's Gemini. In recent years, AI has become increasingly integrated into the field of genomics. This integration has enabled new scientific discoveries while simultaneously raising important ethical questions1. Interviews with two researchers at the center of this intersection provide insightful perspectives into...
                                  02-26-2024, 02:07 PM

                                ad_right_rmr

                                Collapse

                                News

                                Collapse

                                Topics Statistics Last Post
                                Started by seqadmin, 03-14-2024, 06:13 AM
                                0 responses
                                32 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 03-08-2024, 08:03 AM
                                0 responses
                                71 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 03-07-2024, 08:13 AM
                                0 responses
                                80 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 03-06-2024, 09:51 AM
                                0 responses
                                68 views
                                0 likes
                                Last Post seqadmin  
                                Working...
                                X