Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #31
    Thank you much for the reply!

    Originally posted by relipmoc View Post
    It means the 3rd case which is different from the case of non-junction read pairs. For the 3rd case, we can not declare that there is no junction adapter in the fragment. However, for the non-junction read pairs, the fragment length is shorter than the read length, we can declare confidently that they do not contain junction adapters.
    just to be sure. correct me if i am not right. there are two cases of non-detected JA(junction adapter) in a pair
    1. 5968 ( 0.20%) non-junction read pairs filtered out by contaminant control
    having miseq read of 300pb, do only fragments shorter than 300 bp w/o JA belong to this group? or only that with an overlap between R1 and R2?
    2. 549499 (24.26%) untrimmed read pairs available after processing
    here are almost all pairs w/o detected JA?
    so having the fragment of the size (300)+N1+JA+N2+(300) - is it in this group? do you recommend to exclude this group from the de-novo assembly?

    Comment


    • #32
      Originally posted by MikhailFokin View Post
      just to be sure. correct me if i am not right. there are two cases of non-detected JA(junction adapter) in a pair
      1. 5968 ( 0.20%) non-junction read pairs filtered out by contaminant control having miseq read of 300pb, do only fragments shorter than 300 bp w/o JA belong to this group? or only that with an overlap between R1 and R2?
      Fragments equal to or shorter than 300 bp without JA belong to this group.

      Originally posted by MikhailFokin View Post
      2. 549499 (24.26%) untrimmed read pairs available after processing
      here are almost all pairs w/o detected JA?
      so having the fragment of the size (300)+N1+JA+N2+(300) - is it in this group?
      Fragments longer than 300 bp and none of the two paired reads having JA detected belong to this group.
      Originally posted by MikhailFokin View Post
      do you recommend to exclude this group from the de-novo assembly?
      These fragments tend to have long insert sizes. They may have JA or not. For an aggressive de-novo assembly, it is recommended to include this group; otherwise, to be more accurate and conservative, it is recommended to exclude this group.

      Note that, having JA is a prerequisite for a correctly constructed MP (Mate Pair) read.

      Comment


      • #33
        recomended command for paired ends

        Hi, replimoc

        I'm realy new to all this staff so I would like a guide. I did an MNase-seq experiment I got paired end reads and I got the following fastqc results:
        Overrepresented sequences
        Sequence Count Percentage Possible Source
        CGGTTCAGCAGGAATGCCGAGATCGGAAGAGCGGTTCAGCAGGAATGCCGAGACCGA 9634298 3.823442763780559 Illumina Paired End PCR Primer 2 (100% over 38bp)
        CAGCAGGAATGCCGAGATCGGAAGAGCGGTTCAGCAGGAATGCCGAGACCGATCTCG 402848 0.15987322278213426 Illumina Paired End PCR Primer 2 (100% over 43bp)
        CGGTTCAGCAGGAATGCCGAGACCGATCTCGTATGCCGTCTTCTGCTTGAAAAAAAA 362862 0.14400448150461417 Illumina Paired End PCR Primer 2 (100% over 49bp)
        with per base sequence quality like in index1 file for the first pair

        and for the second

        Overrepresented sequences
        Sequence Count Percentage Possible Source
        CGGCATTCCTGCTGAACCGAGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTAGATC 3344349 1.3272297559829216 Illumina Single End PCR Primer 1 (100% over 38bp)
        CGGCATTCCTGCTGAACCGAGATCGGAAGAGCGTCGTGTAGGGAAAGAGGGTAGATC 857909 0.34046756266333217 Illumina Single End PCR Primer 1 (97% over 38bp)
        CGGCATTCCTGCTGAACCGAGATCGGAAGAGCGTCGTGTAGGGGAAGAGGGTAGATC 275742 0.10943026202535763 Illumina Single End PCR Primer 1 (96% over 30bp)
        CGGCATTCCTGCTGAACCGAGATCGGAAGAGCGTCGTGTAGGGGAAGAGTGTAGATC 255724 0.10148596995079658 Illumina Single End PCR Primer 1 (97% over 38bp)
        with per base sequence quality like in index2 file for the second pair

        in both cases fastqc said both per base pair qualities are ok

        1. what is the best way to remove those adapters without doing any filtering of the reads, per base quality or anyother.
        2. also, in addition how can I chop stuff from the 3' end of both files without again doing any quality control filtering.
        Attached Files

        Comment


        • #34
          Sorry, one dull question more
          Is there any way to redirect result files into directory different from one with input data? Not to stdout... seems that -o option is for base name only?

          Comment


          • #35
            And one more Why there is the difference in count of reads in "trimmed read pairs available after processing" and "after trimming barcode dispatch"? Should not be the same? For example - I dont' want to include reads w/o JA into further processing - and put -b option to have them in separate files? Is it right?

            I can can easily manipulate number of "trimmed read pairs available after processing" by changing stringency options, but this slightly affects real output... see 2 cases below

            default settings
            Wed Aug 13 22:42:51 2014 >> done (0.139s)
            1000 read pairs processed; of these:
            2 ( 0.20%) degenerative read pairs filtered out
            2 ( 0.20%) non-junction read pairs filtered out by contaminant control
            86 ( 8.60%) short read pairs filtered out after trimming by size control
            2 ( 0.20%) empty read pairs filtered out after trimming by size control
            908 (90.80%) read pairs available; of these:
            718 (79.07%) trimmed read pairs available after processing
            190 (20.93%) untrimmed read pairs available after processing

            Barcode dispatch after trimming:
            category count percentage:

            X01Y01 575 80.08%

            relaxed settings

            Wed Aug 13 22:48:16 2014 >> done (0.257s)
            1000 read pairs processed; of these:
            0 ( 0.00%) degenerative read pairs filtered out
            5 ( 0.50%) non-junction read pairs filtered out by contaminant control
            65 ( 6.50%) empty read pairs filtered out after trimming by size control
            930 (93.00%) read pairs available; of these:
            908 (97.63%) trimmed read pairs available after processing
            22 ( 2.37%) untrimmed read pairs available after processing

            Barcode dispatch after trimming:
            category count percentage:

            X01Y01 767 84.47%

            And it is very strange, that even having k=14 (pls check settings below), I've got plenty of adaptors of 17-24bp (w/o mm or indels) left in the final untrimmed file... for example from 286 untrimmed sequences (from 1000) 95 contain single JA. So almost all single JA were left in the final untrimmed file. Strong reducing of stringency to -r 0.3 -d 0.2 doesn't affect the result significantly. What else should I change to detect these single JAs?


            Parameters used:
            -- 3' end adapter sequence (-x): AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC
            -- paired 3' end adapter sequence (-y): GATCGGAAGAGCACACGTCTGAACTCCAGTCACCTTGTAATCTCGTATGCCGTCTTCTGCTTG
            -- junction adapter sequence (-j): CTGTCTCTTATACACATCTAGATGTGTATAAGAGACAG
            -- maximum error ratio allowed (-r): 0.100
            -- maximum indel error ratio allowed (-d): 0.030
            -- minimum read length allowed after trimming (-l): 0
            -- file format (-f): Sanger/Illumina 1.8+ FASTQ (auto detected)
            -- minimum overlap length for junction adapter detection (-k): 14
            -- number of concurrent threads (-t): 4
            Last edited by MikhailFokin; 08-13-2014, 06:57 PM. Reason: more questions and details

            Comment


            • #36
              Originally posted by MikhailFokin View Post
              And one more Why there is the difference in count of reads in "trimmed read pairs available after processing" and "after trimming barcode dispatch"? Should not be the same? For example - I dont' want to include reads w/o JA into further processing - and put -b option to have them in separate files? Is it right?
              Sorry for not giving you a quick reply! In fact, JA (Junction Adapter) has nothing to do with barcode dispatch. JA can be regarded as a marker for correctly constructed LMP pairs. However, JA may be undetectable for some correctly constructed LMP pairs. For barcode dispatch, PE adapters instead of JA are used.

              For your case, you may use '-u' instead of '-b' to filter out the so-called "undetermined mate-pair reads" (The original fragments are equal to or greater than the read length, meanwhile JA is not found in either read of the pair). It is not recommended to include only those reads that have JA found. BTW: you helped me to found a bug in the program, the statistics for "barcode dispatch after trimming" is not correct! It should be those that have PE adapters detected. I'll update the program and release it after fully testing. Thanks!


              Originally posted by MikhailFokin View Post
              And it is very strange, that even having k=14 (pls check settings below), I've got plenty of adaptors of 17-24bp (w/o mm or indels) left in the final untrimmed file... for example from 286 untrimmed sequences (from 1000) 95 contain single JA. So almost all single JA were left in the final untrimmed file. Strong reducing of stringency to -r 0.3 -d 0.2 doesn't affect the result significantly. What else should I change to detect these single JAs?
              Could you send me the FastQ files that cause this problem?

              Comment


              • #37
                Originally posted by MikhailFokin View Post
                Sorry, one dull question more
                Is there any way to redirect result files into directory different from one with input data? Not to stdout... seems that -o option is for base name only?
                Thanks for this question! Now the result files can be redirected into a directory using -o. The difference is that a directory name must end with a slash '/'. I'll release the updated version soon.
                Last edited by relipmoc; 08-25-2014, 05:24 AM. Reason: typo

                Comment


                • #38
                  Originally posted by blsfoxfox View Post
                  Actually, I am more curious about why would skewer produce trimmed reads longer than original one? Then we may avoid getting the long reads and do not need another parameter to deal with it.
                  Now skewer provides an option for it. Please download the updated version from http://sourceforge.net/projects/skewer.

                  Comment


                  • #39
                    Hi,

                    First I used

                    prinseq-lite-0.20.4/prinseq-lite -fastq R1.fq -fastq2 R2.fq -out_format 3 -out_good good_reads -out_bad bad_reads -phred64 -log 8_.log -graph_data 8_.gd -graph_stats gc,qd,ns,pt,ts,da -min_qual_mean 30 -trim_qual_right 30 -trim_qual_window 2 -trim_qual_type mean -min_len 15 > run.log 2>&1
                    Then I used your software as:

                    skewer/skewer-0.1.118-linux-x86_64 -x GATCGGAAGAGCACACGTCTGAACTCCAGTCAC -y GATCGGAAGAGCGTCGTGTAGGGAAAGAGTGT -j CTGTCTCTTATACACATCTAGATGTGTATAAGAGACAG -m mp -t 20 good_reads_1.fastq good_reads_2.fastq > terminning-run.log 2>&1
                    After that I used bowtie to see how the alignment will go but;

                    Time loading reference: 00:00:08
                    Time loading forward index: 00:00:12
                    Time loading mirror index: 00:00:12
                    Seeded quality full-index search: 00:07:55
                    # reads processed: 1866894
                    # reads with at least one reported alignment: 131512 (7.04%)
                    # reads that failed to align: 1735382 (92.96%)
                    Reported 926715 paired-end alignments to 1 output stream(s)
                    Time searching: 00:08:27
                    Overall time: 00:08:27
                    so did I used the parameter in wrong way? or what I shall add or change?

                    Comment


                    • #40
                      Originally posted by Medhat View Post
                      Hi,

                      First I used

                      prinseq-lite-0.20.4/prinseq-lite -fastq R1.fq -fastq2 R2.fq -out_format 3 -out_good good_reads -out_bad bad_reads -phred64 -log 8_.log -graph_data 8_.gd -graph_stats gc,qd,ns,pt,ts,da -min_qual_mean 30 -trim_qual_right 30 -trim_qual_window 2 -trim_qual_type mean -min_len 15 > run.log 2>&1
                      Then I used your software as:

                      skewer/skewer-0.1.118-linux-x86_64 -x GATCGGAAGAGCACACGTCTGAACTCCAGTCAC -y GATCGGAAGAGCGTCGTGTAGGGAAAGAGTGT -j CTGTCTCTTATACACATCTAGATGTGTATAAGAGACAG -m mp -t 20 good_reads_1.fastq good_reads_2.fastq > terminning-run.log 2>&1
                      After that I used bowtie to see how the alignment will go but;

                      Time loading reference: 00:00:08
                      Time loading forward index: 00:00:12
                      Time loading mirror index: 00:00:12
                      Seeded quality full-index search: 00:07:55
                      # reads processed: 1866894
                      # reads with at least one reported alignment: 131512 (7.04%)
                      # reads that failed to align: 1735382 (92.96%)
                      Reported 926715 paired-end alignments to 1 output stream(s)
                      Time searching: 00:08:27
                      Overall time: 00:08:27
                      so did I used the parameter in wrong way? or what I shall add or change?
                      Suggestion: 1) use adapter trimming before quality trimming, because quality trimming may decay the paired information which is useful for adapter detection; 2) in your case of using skewer, you may input the following command:

                      skewer/skewer-0.1.118-linux-x86_64 -m mp -t 20 good_reads_1.fastq good_reads_2.fastq > terminning-run.log 2>&1
                      Note that the adapter sequences provided by Illumina is not appropriate for adapter trimming. In fact, there's an additional step called "Adenylate the 3' Ends" before "Ligate Paired-End Adapters" in the Paired-End Sample Preparation Protocol. Therefore, an additional leading 'A' should be added before the vendor provided sequence.

                      Nevertheless, the junction adapter is what we want.

                      Comment


                      • #41
                        Corrected adapter sequences

                        The following adapter sequences are provided for your convenience:

                        >TruSeq read 1 universal adapter
                        AGATCGGAAGAGCACACGTCTGAACTCCAGTCACNNNNNNATCTCGTATGCCGTCTTCTGCTTG

                        >TruSeq read 2 adapter
                        AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTAGATCTCGGTGGTCGCCGTATCATT

                        Comment


                        • #42
                          I'm probably doing something dumb, but I'm tired of banging my head against this and will just ask instead...
                          Are the default adapter sequences for skewer correct for Nextera PE libraries?

                          I'm attempting to trim Nextera PE 150 sequences for readthrough (and quality).
                          This data has already been demultiplexed (with CASAVA by the sequencing center), so the leading adapters have already been removed.

                          I've already trimmed this data with trimmomatic and mapped it, so I know there is a lot of readthrough (very short insert sizes for this particular library). Otherwise, the quality is very good.
                          Trimmomatic only passed the forward read of >60% of the pairs (it sensibly drops the reverse on readthrough, since that reverse read contains no new data).
                          Input Read Pairs: 13211394 Both Surviving: 4473051 (33.86%) Forward Only Surviving: 8561962 (64.81%) Reverse Only Surviving: 27843 (0.21%) Dropped: 148538 (1.12%)
                          So the problem is that skewer (with approximately the same quality filtering) is giving:
                          $ skewer -t 30 -l 36 -q 3 -Q 15 foo1.fastq.gz foo2.fastq.gz
                          ....
                          94344 ( 0.71%) read pairs filtered out by quality control
                          167205 ( 1.27%) short read pairs filtered out after trimming by size control
                          95108 ( 0.72%) empty read pairs filtered out after trimming by size control
                          12854737 (97.30%) read pairs available; of these:
                          5500721 (42.79%) trimmed read pairs available after processing
                          7354016 (57.21%) untrimmed read pairs available after processing
                          I've tried some other adapter sequences, which disturbingly seem to produce pretty much the same results. I guess I don't quite understand how one is supposed to tell skewer the proper sequence. Combine that with the fact that it is very difficult for me to find good documentation of the Nextera adapter sequences, and I'm just getting more and more confused. Another option is that I just don't understand what skewer is outputting.

                          Would it be possible for you (with help from users of course) to write up some brief usage examples for skewer for common situations like this one. Github has a very nice wiki feature...
                          Last edited by travc; 10-17-2014, 12:33 AM.

                          Comment


                          • #43
                            Hi travc,

                            The best scenario of using skewer is for pre-processing raw data, especially for pre-processing those data of complex libraries such as Nextera LMPs. To make things clearer, I suggest you to ask the sequencing center to send you the raw data without adapter trimming. Then you can use skewer to do adapter trimming by yourself.

                            To detect the readthroughs, skewer utilizes the reverse-complementary information of readthrough paired reads as well as the tailing adapter sequences. So feeding trimmed data to skewer will make it confused.

                            We will write a user manual in near future. Thank you for your suggestion!

                            Comment


                            • #44
                              Thanks for that info. I guessed that it might need raw sequences, but wasn't sure.

                              Comment


                              • #45
                                Originally posted by relipmoc View Post
                                Now skewer provides an option for it. Please download the updated version from http://sourceforge.net/projects/skewer.
                                Thanks for the good news! Is the option "-i, --intelligent For mate-pair mode, whether to redistribute reads based on junction information; (no)" ? So just keep it as default will avoid getting longer reads right?

                                Comment

                                Latest Articles

                                Collapse

                                • seqadmin
                                  Advancing Precision Medicine for Rare Diseases in Children
                                  by seqadmin




                                  Many organizations study rare diseases, but few have a mission as impactful as Rady Children’s Institute for Genomic Medicine (RCIGM). “We are all about changing outcomes for children,” explained Dr. Stephen Kingsmore, President and CEO of the group. The institute’s initial goal was to provide rapid diagnoses for critically ill children and shorten their diagnostic odyssey, a term used to describe the long and arduous process it takes patients to obtain an accurate...
                                  12-16-2024, 07:57 AM
                                • seqadmin
                                  Recent Advances in Sequencing Technologies
                                  by seqadmin



                                  Innovations in next-generation sequencing technologies and techniques are driving more precise and comprehensive exploration of complex biological systems. Current advancements include improved accessibility for long-read sequencing and significant progress in single-cell and 3D genomics. This article explores some of the most impactful developments in the field over the past year.

                                  Long-Read Sequencing
                                  Long-read sequencing has seen remarkable advancements,...
                                  12-02-2024, 01:49 PM

                                ad_right_rmr

                                Collapse

                                News

                                Collapse

                                Topics Statistics Last Post
                                Started by seqadmin, 12-17-2024, 10:28 AM
                                0 responses
                                33 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 12-13-2024, 08:24 AM
                                0 responses
                                49 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 12-12-2024, 07:41 AM
                                0 responses
                                34 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 12-11-2024, 07:45 AM
                                0 responses
                                46 views
                                0 likes
                                Last Post seqadmin  
                                Working...
                                X