Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #61
    Originally posted by safina View Post
    Now i have provided the complete info.
    ...
    I hope now im clear in my problem?
    Not really. You need to do two things that you're not doing:

    1) Provide the exact command lines you typed.
    2) Provide the exact output of the programs (what they printed to the screen).

    Though really, it would be easier to skip all that and just not use fastx because it can't deal with paired reads. I suggest BBDuk.

    Comment


    • #62
      Originally posted by safina View Post
      Yes i tried repair.sh but its also just making empty files. no result!!
      This confirms that it is not an issue with any program, and it is clearly an issue with the data. If you got things to work before, try to re-trace your process with this data set to find out where you went wrong. Make sure the files are non-empty for starters, then see if you can count the reads in each file to see if there are blank lines or a newline issue that has broken the files.

      Also, try to keep the discussion in one forum. I noticed you created two stackoverflow posts to ask the same question as here, and that is discouraged because it causes a lot of duplication of efforts. Along those lines, if you have another issue, create a new post but don't delete your comments saying something worked and give a downvote for an unrelated issue. We are trying to help but these things are not the best way to get help or show appreciation.

      Comment


      • #63
        Originally posted by SES View Post
        This confirms that it is not an issue with any program, and it is clearly an issue with the data. If you got things to work before, try to re-trace your process with this data set to find out where you went wrong. Make sure the files are non-empty for starters, then see if you can count the reads in each file to see if there are blank lines or a newline issue that has broken the files.

        Also, try to keep the discussion in one forum. I noticed you created two stackoverflow posts to ask the same question as here, and that is discouraged because it causes a lot of duplication of efforts. Along those lines, if you have another issue, create a new post but don't delete your comments saying something worked and give a downvote for an unrelated issue. We are trying to help but these things are not the best way to get help or show appreciation.
        I checked the reads are correct. I can count the reads and there are no space or new lines in between. as i have mention my raw data genbank accession ids if you can check it yourself , i would be very glad. thank you

        Comment


        • #64
          Originally posted by safina View Post
          I checked the reads are correct. I can count the reads and there are no space or new lines in between. as i have mention my raw data genbank accession ids if you can check it yourself , i would be very glad. thank you
          Okay, we would need to see the command you used to trim the files (and any other operations), as Brian mentioned above. That way we can try to reproduce the issue. One other thing to consider is that sometimes a download will not complete for these large files, so a good thing to check is that you have the same number of records locally as those files from the SRA.

          Comment


          • #65
            Originally posted by SES View Post
            Okay, we would need to see the command you used to trim the files (and any other operations), as Brian mentioned above. That way we can try to reproduce the issue. One other thing to consider is that sometimes a download will not complete for these large files, so a good thing to check is that you have the same number of records locally as those files from the SRA.

            I have cross check the files. The downloading was complete.

            I use SRA toolkit to split my files into 1.fq nd 2.fq using the command:
            Code:
            ./fastq-dump -I --split-files file.sra --outdir mexico

            Than i changed the headers:

            > sed 's|^@SRR|@mexD1SRR|; s| HWI.*|/1|g' file.fq > output.fq

            than i did quality filtering using Fastx toolkit:

            Code:
            quality-filter
            >./fastq_quality_filter -i input_1.fq -q 28 -p 100 -o filt_1.fq
            Code:
            quality-trimer:
            >./fastx_trimmer  -i filt.fq -f 14 -l 100 -o filt_trim.fq
            after this im having issues in pairing.

            Comment


            • #66
              Tried the following.

              Code:
              $ fastq-dump -F --split-files ./SRR1561197.sra
              @safina: Not sure why you are changing the fastq headers

              Code:
              $ fastq_quality_filter -i SRR1561197_1.fastq -q 28 -p 100 -Q33 -o SRR1561197_1_filt.fastq
              @safina: Note the -Q33 option. This data is most certainly sanger fastq formatted so you need to add this option (it remains undocumented in fastx_toolkit). I used the latest fastx_toolkit.

              I chose to use repair.sh from BBMap and did

              Code:
              $ repair.sh in1=SRR1561197_1_filt.fastq in2=SRR1561197_2_filt.fastq out1=fixed1.fq out2=fixed2.fq outsingle=single.fq
              Here is where things fell apart.

              @Brian: I get the following error about a Gb into the filtered files.

              Exception in thread "Thread-2" java.lang.AssertionError:
              There appear to be different numbers of reads in the paired input files.
              The pairing may have been corrupted by an upstream process. It may be fixable by running repair.sh.
              at stream.ConcurrentGenericReadInputStream.pair(ConcurrentGenericReadInputStream.java:492)
              at stream.ConcurrentGenericReadInputStream.readLists(ConcurrentGenericReadInputStream.java:358)
              at stream.ConcurrentGenericReadInputStream.run(ConcurrentGenericReadInputStream.java:195)
              at java.lang.Thread.run(Thread.java:745)
              Multiple possibilities:

              1. Original sra file from SRA is corrupt
              2. fastx_toolkit is messing up the files in the filter process
              3. Not sure why repair.sh is asking to run itself

              Should try BBDuk to see if that works instead of fastq_filter.

              Comment


              • #67
                Hi GenoMax,

                I identified this problem and solved it earlier today, and the fix will be in the next release. repair.sh currently works fine for disordered reads, or with a single file input (e.g. concatenating r1.fq and r2.fq and feeding that to repair as a file or through stdin), but crashes with 2-file input if the files have different numbers of reads.

                I'll upload it later today. Sorry about that!

                Comment


                • #68
                  Tried bbduk.sh from BBMap.

                  Code:
                  $ bbduk.sh -Xmx2g in1=SRR1561197_1.fastq in2=SRR1561197_2.fastq out1=SRR1561197_1_clean.fq out2=SRR1561197_2_clean.fq qtrim=rl trimq=28
                  Worked in under 4 mins. Reported no problems. Original fastq files must be ok.

                  Comment


                  • #69
                    I guess it has been mentioned before, but I do not see any good reason for still using the fastx toolkit (especially not for paired read data). It has been last updated 2010.
                    Bbmap and all its tools as well as seqtk are better alternatives.
                    Last edited by luc; 04-03-2015, 04:44 PM.

                    Comment


                    • #70
                      Originally posted by GenoMax View Post
                      I chose to use repair.sh from BBMap and did

                      Code:
                      $ repair.sh in1=SRR1561197_1_filt.fastq in2=SRR1561197_2_filt.fastq out1=fixed1.fq out2=fixed2.fq outsingle=single.fq
                      Here is where things fell apart.
                      Hi GenoMax, that's now fixed in the latest version (34.79).

                      Comment


                      • #71
                        I tried the whole process using the commands above and did not find any issues. Here is the script: seqanswers163784.sh (link to a gist, not a direct link). You can fetch that script and run it on your own machine. Here is the output:

                        Code:
                        ========= pairfq version : 0.14.1 (completion time: Wed Apr  8 12:14:41 EDT 2015)
                        Total forward reads (SRR1561197_1_filt_info.fastq)                   :    8492638
                        Total reverse reads (SRR1561197_2_filt_info.fastq)                   :   13525478
                        Total forward paired reads (SRR1561197_1_filt_info_p.fastq)          :    7105003
                        Total reverse paired reads (SRR1561197_2_filt_info_p.fastq)          :    7105003
                        Total forward unpaired reads (SRR1561197_1_filt_info_s.fastq)        :    1387635
                        Total reverse unpaired reads (SRR1561197_2_filt_info_s.fastq)        :    6420475
                        
                        Total paired reads                                                   :   14210006
                        Total unpaired reads                                                 :    7808110
                        
                        real	21m14.372s
                        user	9m54.612s
                        sys	0m19.421s
                        This used 5.5g of RAM on my machine, so you should be fine to use it without the --index option. For reference, the only issue was the missing pair information, which was one of my earlier suggestions in this thread, but it appears that modifying the headers and perhaps some other operations messed up the files for @safina. For the commands in the script, you can replace "pairfq" with

                        Code:
                        curl -sL git.io/pairfq_lite | perl -
                        and you'll never need to download any package or update it.

                        EDIT: Just my 2c, but I think fastx still has a place. It is stable, no need to update frequently, and is probably on most workstations. Also, it works very well in a Unix environment because of the single binaries that use one CPU, which allows you to use it on a cluster.
                        Last edited by SES; 04-08-2015, 09:12 AM.

                        Comment


                        • #72
                          Originally posted by Brian Bushnell View Post
                          Hi GenoMax, that's now fixed in the latest version (34.79).
                          Brian: repair.sh not working in v.34.79. The error message disappeared but the fixed files have stopped growing at about the same size ~1.1G. There is no error message.

                          java -ea -Xmx44123m -cp /path_to/bbmap-34.79/bbmap/current/ jgi.SplitPairsAndSingles rp in1=SRR1561197_1_filt.fastq in2=SRR1561197_2_filt.fastq out1=fixed1.fq out2=fixed2.fq outsingle=single.fq
                          Executing jgi.SplitPairsAndSingles [rp, in1=SRR1561197_1_filt.fastq, in2=SRR1561197_2_filt.fastq, out1=fixed1.fq, out2=fixed2.fq, outsingle=single.fq]

                          Set INTERLEAVED to false
                          Started output stream.
                          I am going to kill the job now.

                          Comment


                          • #73
                            Hi GenoMax,

                            Thanks for informing me - it's now fixed (for real) in 34.83 and I validated it on SRR1561197 using the exact methodology in your post.

                            It was hanging because of a buffer filling up. It passed my tests on smaller files; the problem only occurred when read 1 and read 2 files had a different number of reads by at least 3000 or so, so it only manifested on large datasets.

                            Comment


                            • #74
                              It is working now. Thanks Brian.

                              Comment


                              • #75
                                Originally posted by GenoMax View Post
                                Tried the following.

                                Code:
                                $ fastq-dump -F --split-files ./SRR1561197.sra
                                @safina: Not sure why you are changing the fastq headers

                                Code:
                                $ fastq_quality_filter -i SRR1561197_1.fastq -q 28 -p 100 -Q33 -o SRR1561197_1_filt.fastq
                                @safina: Note the -Q33 option. This data is most certainly sanger fastq formatted so you need to add this option (it remains undocumented in fastx_toolkit). I used the latest fastx_toolkit.

                                I chose to use repair.sh from BBMap and did

                                Code:
                                $ repair.sh in1=SRR1561197_1_filt.fastq in2=SRR1561197_2_filt.fastq out1=fixed1.fq out2=fixed2.fq outsingle=single.fq
                                Here is where things fell apart.

                                @Brian: I get the following error about a Gb into the filtered files.



                                Multiple possibilities:

                                1. Original sra file from SRA is corrupt
                                2. fastx_toolkit is messing up the files in the filter process
                                3. Not sure why repair.sh is asking to run itself

                                Should try BBDuk to see if that works instead of fastq_filter.

                                Thanx for this. but i have a question...
                                why you havent used the trim command.. as i need to trim SRR1561197 reads from start as well as from end. After trimming i get error in pairfq and it gives me empty files....

                                i used fastx tool kit for triming as well:

                                Code:
                                fastx_trimmer -f 14 -l 100 -o SRR1561197_1_filt_trim.fastq


                                And when i run pairfq after this i get empty files and all reads in unpaired file.

                                Comment

                                Latest Articles

                                Collapse

                                • seqadmin
                                  Strategies for Sequencing Challenging Samples
                                  by seqadmin


                                  Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                                  03-22-2024, 06:39 AM
                                • seqadmin
                                  Techniques and Challenges in Conservation Genomics
                                  by seqadmin



                                  The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                                  Avian Conservation
                                  Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                                  03-08-2024, 10:41 AM

                                ad_right_rmr

                                Collapse

                                News

                                Collapse

                                Topics Statistics Last Post
                                Started by seqadmin, 03-27-2024, 06:37 PM
                                0 responses
                                12 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 03-27-2024, 06:07 PM
                                0 responses
                                11 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 03-22-2024, 10:03 AM
                                0 responses
                                52 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 03-21-2024, 07:32 AM
                                0 responses
                                68 views
                                0 likes
                                Last Post seqadmin  
                                Working...
                                X