Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #61
    Originally posted by debarryj View Post
    Many Thanks! If I may ask one more noob question, what indicated to you that the miseq trimming was enabled? I have been handed this data with very little information and would like to know how to spot it.
    It's pretty easy - the read lengths vary. Because of the way illumina sequencing works, all sequences (in the same direction at least) have the same 'raw' read length before trimming.

    Comment


    • #62
      Hello everybody,

      I am trying Trimmomatic for the first time for a few days on my paired-ends 100pb reads data. I am facing a problem: the 4 output files created are empty.

      I want to use Trimmomatic with these parameters:
      java -jar /Trimmomatic-0.30/trimmomatic-0.30.jar PE GLE7.R1.fastq GLE7.R2.fastq GLE7_paired.R1.fastq GLE7_unpaired.R1.fastq GLE7_paired.R2.fastq GLE7_unpaired.R2.fastq SLIDINGWINDOW:6:20 LEADING:20 TRAILING:20 AVGQUAL:20 MINLEN:36
      The "output log" says:
      TrimmomaticPE: Started with arguments: GLE7.R1.fastq GLE7.R2.fastq GLE7_paired.R1.fastq GLE7_unpaired.R1.fastq GLE7_paired.R2.fastq GLE7_unpaired.R2.fastq SLIDINGWINDOW:6:20 LEADING:20 TRAILING:20 AVGQUAL:20 MINLEN:36
      Input Read Pairs: 61314232 Both Surviving: 0 (0.00%) Forward Only Surviving: 0 (0.00%) Reverse Only Surviving: 0 (0.00%) Dropped: 61314232 (100.00%)
      TrimmomaticPE: Completed successfully
      I tried several combinations of the parameters. I only got the output files not empty when using the MINLEN option lonely.

      I don't think that my thresholds are too drastic for my data. Attached is the FastQC output.
      Do you have any idea of what is happening?

      Thank you in advance,
      Jane
      Attached Files
      Last edited by Jane M; 10-31-2013, 04:25 AM.

      Comment


      • #63
        Originally posted by Jane M View Post
        Hello everybody,

        I am trying Trimmomatic for the first time for a few days on my paired-ends 100pb reads data. I am facing a problem: the 4 output files created are empty.

        I want to use Trimmomatic with these parameters:

        Trimmomatic, by default, assumes that FASTQ reads still use the very old ASCII phred+64 encoding for their Q-scores. Here is the quote from the Trimmomatic manual:
        If no quality score is specified, phred-64 is the default for historical reasons but is correct only for the older Illumina machines / pipeline versions.If you are using the Illumina HiSeq or MiSeq, you will need to add –phred33. This will be changed to an 'autodetected' quality score in a future version!
        Using that default Trimmomatic believes all your base calls are crap (<Q20). You have to add '-phred33' to your command line to change this default behvior. E.g.

        Code:
        java -jar /Trimmomatic-0.30/trimmomatic-0.30.jar PE -phred33 GLE7.R1.fastq GLE7.R2.fastq GLE7_paired.R1.fastq GLE7_unpaired.R1.fastq GLE7_paired.R2.fastq GLE7_unpaired.R2.fastq SLIDINGWINDOW:6:20 LEADING:20 TRAILING:20 AVGQUAL:20 MINLEN:36

        Comment


        • #64
          Originally posted by kmcarr View Post
          Trimmomatic, by default, assumes that FASTQ reads still use the very old ASCII phred+64 encoding for their Q-scores. Here is the quote from the Trimmomatic manual:

          Using that default Trimmomatic believes all your base calls are crap (<Q20). You have to add '-phred33' to your command line to change this default behvior. E.g.
          Yes, that was the problem! Thank you, it's working fine now.

          Now that the main problem is solved, there are 3 details I would like to discuss:

          - I read in this thread that Trimmomatic should be multi-threated. I have not found an option to do that. Is it possible?

          - I don't clearly understand what is the keepBothReads. Could you please explain me in other words?
          keepBothReads: After read-though has been detected by palindrome mode, and the adapter sequence removed, the reverse read contains the same sequence information as the forward read, albeit in reverse complement. For this reason, the default behaviour is to entirely drop the reverse read.
          - Last point: I intend to keep both paired and unpaired reads. I will use tophat2 for alignment, which seems to deal with unpaired reads.
          For gene fusion detection, I will use tophat2 --fusion-search. Do you know if it's a good idea to use unpaired reads for fusion detection? Should I set keepBothReads=true?

          Thank you,
          Jane

          Comment


          • #65
            Originally posted by Jane M View Post
            - I read in this thread that Trimmomatic should be multi-threated. I have not found an option to do that. Is it possible?
            Use the '-threads <int>' option described in the manual or in command line usage message shown when your run trimmomatic with just the '-h' (help) parameter

            Code:
            java -jar <path to trimmomatic.jar> PE [COLOR="Red"]-threads <int>[/COLOR] -phred33 <inputFiles> <outputFiles> <trimmerParameters>...
            Replace <int> with number of threads you wish to use. In my experience trimmomatic does scale much beyond 3-4 threads.

            - I don't clearly understand what is the keepBothReads. Could you please explain me in other words?
            Look at the figure on the top of page 5 of the Trimmomatic Manual. In part D of the figure it shows the case where the insert (green) is shorter than the read length such that you get read through of the insert into Illumina adapter at the 3' end (red). In such a case, with PE reads, read #2 will completely overlap read #1, as its reverse complement. No additional sequence information is provided by read #2. Trimmomatic's default behavior is to keep read #1 (after trimming the adapter (red) portion) as a singleton and discard read #2 since it is simply redundant information. The '-keepBothReads' option changes the default, read 1 and read 2 will be kept as paired reads.
            - Last point: I intend to keep both paired and unpaired reads. I will use tophat2 for alignment, which seems to deal with unpaired reads.
            For gene fusion detection, I will use tophat2 --fusion-search. Do you know if it's a good idea to use unpaired reads for fusion detection? Should I set keepBothReads=true?
            I'm not familiar with Tophat fusion or how it would deal with the case of completely overlapping reads so I can't comment.

            Comment


            • #66
              Originally posted by kmcarr View Post
              Use the '-threads <int>' option described in the manual or in command line usage message shown when your run trimmomatic with just the '-h' (help) parameter


              Replace <int> with number of threads you wish to use. In my experience trimmomatic does scale much beyond 3-4 threads.
              Thank you kmcarr!

              Look at the figure on the top of page 5 of the Trimmomatic Manual. In part D of the figure it shows the case where the insert (green) is shorter than the read length such that you get read through of the insert into Illumina adapter at the 3' end (red). In such a case, with PE reads, read #2 will completely overlap read #1, as its reverse complement. No additional sequence information is provided by read #2. Trimmomatic's default behavior is to keep read #1 (after trimming the adapter (red) portion) as a singleton and discard read #2 since it is simply redundant information. The '-keepBothReads' option changes the default, read 1 and read 2 will be kept as paired reads.
              It's clearer now, thank you.


              I'm not familiar with Tophat fusion or how it would deal with the case of completely overlapping reads so I can't comment.
              If someone has experience with detection of gene fusion and unpaired data, I would be interested in hearing it.

              Comment


              • #67
                Hi kmcarr,

                I hope you could also help me out

                I am having a similar problem here, no idea what is wrong with these reads

                Exception processing reads: HISEQ2000:406:H0JYCADXX:2:1101:13467:8659_/1 and HISEQ2000:406:H0JYCADXX:2:1101:13467:8659_/2
                java.util.concurrent.ExecutionException: java.lang.ArrayIndexOutOfBoundsException: 58
                ......

                Cheers

                Comment


                • #68
                  Trimmomatic is among the top ranking trimmers in the recent review. Congrats Tony!

                  Comment


                  • #69
                    Originally posted by luiscunhamx View Post
                    Hi kmcarr,

                    I hope you could also help me out

                    I am having a similar problem here, no idea what is wrong with these reads

                    Exception processing reads: HISEQ2000:406:H0JYCADXX:2:1101:13467:8659_/1 and HISEQ2000:406:H0JYCADXX:2:1101:13467:8659_/2
                    java.util.concurrent.ExecutionException: java.lang.ArrayIndexOutOfBoundsException: 58
                    ......

                    Cheers
                    Hi,

                    Could you tell me the command line used, including the version of trimmomatic, and also the remainder of the stack trace?

                    Tony.

                    Comment


                    • #70
                      Sorry Tony,

                      not sure why I did not post it in the first place

                      here goes the command:

                      java -classpath /media/scratch/sbilnc/appz/Trimmomatic-0.32/trimmomatic-0.32.jar org.usadellab.trimmomatic.TrimmomaticPE -phred33 w2_1_sufx.fastq w2_2_sufx.fastq trimmomatic_w1_1_sufx.fastq trimmomatic_w1_1_sufx_unpaired.fastq trimmomatic_w1_2_sufx.fastq trimmomatic_w1_2_sufx_unpaired.fastq ILLUMINACLIP:/media/scratch/sbilnc/appz/Trimmomatic-0.32/adapters/TruSeq2-PE.fa:2:30:10 LEADING:3 SLIDINGWINDOW:4:15 MINLEN:100

                      and the rest you can find it in text file attached,


                      Thanking in advance
                      Attached Files

                      Comment


                      • #71
                        Originally posted by luiscunhamx View Post
                        Sorry Tony,

                        not sure why I did not post it in the first place.
                        No problem.

                        Can you also post the reads:

                        HISEQ2000:406:H0JYCADXX:2:1101:13467:8659_/1 and HISEQ2000:406:H0JYCADXX:2:1101:13467:8659_/2

                        from the input files? It seems they trigger something, and even though it looks like a bug in the trimmomatic (or a least lack of graceful failure), i would like to know the trigger so i can handle it better or survive it.

                        Thanks,

                        Tony.

                        Comment


                        • #72
                          Hi Tony


                          Strangely the reads are these

                          @HISEQ2000:406:H0JYCADXX:2:1101:13467:8659_/1
                          CA
                          +
                          CC

                          and

                          @HISEQ2000:406:H0JYCADXX:2:1101:13467:8659_/2
                          G
                          +
                          @


                          I had no idea that I had this type of error as the data were produced by hiseq2500 rapid run and I would expect all reads with 150bp as this is the raw data (and added suffixes), nevertheless do you think would be a good idea to filter by length before feeding it to Trimmomatic?

                          Thanking in advance for the attention


                          Luis

                          Comment


                          • #73
                            Originally posted by luiscunhamx View Post
                            I had no idea that I had this type of error as the data were produced by hiseq2500 rapid run and I would expect all reads with 150bp as this is the raw data (and added suffixes), nevertheless do you think would be a good idea to filter by length before feeding it to Trimmomatic?
                            I guess some processing was already applied to the reads, because as you say, the reads should be 150bp each. I normally recommend against using trimmed data if it all possible with Trimmomatic - it was designed to work on the raw output. Perhaps you could contact whoever supplied the data and see if they have the original data also.

                            In any case, trimmomatic appears to break on such short read pairs - i guess i never tested such a scenario. You could try adding a MINLENGTH filter as the first step, to prevent the short reads causing problems.

                            Thanks,

                            Tony.

                            Comment


                            • #74
                              Thanks Tony,

                              I will give it a try

                              Cheers

                              Comment


                              • #75
                                Hi tonybolger,

                                How about add a function on Trimmomatic to remove the adapter of small RNA. The read is like this:

                                small RNA + ADAPTER

                                I just need the reads with adapter sequence. If the read has adapter, clip it. Otherwise, drop it.

                                Thanks

                                Comment

                                Latest Articles

                                Collapse

                                • seqadmin
                                  Current Approaches to Protein Sequencing
                                  by seqadmin


                                  Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                                  04-04-2024, 04:25 PM
                                • seqadmin
                                  Strategies for Sequencing Challenging Samples
                                  by seqadmin


                                  Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                                  03-22-2024, 06:39 AM

                                ad_right_rmr

                                Collapse

                                News

                                Collapse

                                Topics Statistics Last Post
                                Started by seqadmin, 04-11-2024, 12:08 PM
                                0 responses
                                25 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 04-10-2024, 10:19 PM
                                0 responses
                                28 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 04-10-2024, 09:21 AM
                                0 responses
                                24 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 04-04-2024, 09:00 AM
                                0 responses
                                52 views
                                0 likes
                                Last Post seqadmin  
                                Working...
                                X