Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #46
    This is just a guess but you appear to have an extra "-" in the index option at the end of the command?

    You can also try the repair.sh utility from BBmap suite that can achieve the same result: http://seqanswers.com/forums/showthr...t=41057&page=4

    Comment


    • #47
      Originally posted by safina View Post
      Hi. Im not getting any eroor infact the pairfq program putting all my reads in single.fq where as 1-paired.fq and 2_paired.fq remains empty. the link to the script is below:


      the cammand was:

      $ pairfq makepairs -f s_1_1_trimmed.fq \
      -r s_1_2_trimmed.fq \
      -fp s_1_1_trimmed_p.fq \
      -rp s_1_2_trimmed_p.fq \
      -fs s_1_1_trimmed_s.fq \
      -rs s_1_2_trimmed_s.fq \
      --index
      Your command looks good, I'm sure this is an issue with the identifiers, similar to what was discussed above in the thread. If you scroll down to the "Expected Formats" section on the wiki homepage you can see what is expected.

      This is a very common issue and you can add the info with the following commands:

      Code:
      pairfq addinfo -i s_1_1_trimmed.fq -o s_1_1_trimmed_info.fq -p 1
      pairfq addinfo -i s_1_2_trimmed.fq -o s_1_2_trimmed_info.fq -p 2
      Then, you can try the 'makepairs' command again with the files you just created. If that doesn't work, please show us what the sequence records look because there could be something else going on.

      Comment


      • #48
        @safina had posted an example of reads in this post earlier in the thread: http://seqanswers.com/forums/showpos...3&postcount=38

        Comment


        • #49
          Originally posted by GenoMax View Post
          @safina had posted an example of reads in this post earlier in the thread: http://seqanswers.com/forums/showpos...3&postcount=38
          Ah, thanks I missed that. The reads look normal to me, so I don't see a reason to add the pair info as I suggested above. Nothing jumps out at me as being problematic with the data or commands, but if multiple methods are failing then something is clearly wrong.

          safina, could you run pairfq with the "--stats" command and show us the output? If possible, try to run the command without the "--index" because that may be the issue.

          Comment


          • #50
            Thanks for the response. I ran --index due to the ram error as i have 8gb ram on my linux computer.

            The files look like this:

            ==> forward_sequences.fastq <==
            >SRR1561197.13.1/1
            TCAAAAGGAGAACTCAATAGGCTGAACAAGTTATCTTCTGGGATTGTAATGAGAGTTGCTTCACTGCTTTGGAAGAAGAAAGCTCAT
            +
            JJJJJIJJIIJJIJJJIJIIJJJJJJJJJJIIIIIJJJJJJJHIJJIGJJJJJJJGIJJJJJGIIHHHHHFFEFFDEEDEDCACDDD
            >SRR1561197.17.1/1
            TATACAAAGCTGTCAACTTGATCTTCATACTTCTCATAAAGGACTGGTAATGTGTGGGCAGCAACGAAACCAACATATAAAACAGTC
            +
            HHGJJJJJJJJJJJJIJJJJJJJJJJJJJJJJJJJJJJJJIJJJJJJIJJJJIIIIIIJIIHCHHFFFDDEDDDDDDEEEDCDDCCA
            >SRR1561197.19.1/1
            GATCAACAGTACTGGAATGGCCATCCATCACAAGTTCAGCTAAAGCAGCTCCTGTTGCAGGACCGTTTAGAATACCCCAGCAACTGT

            ==> reverse_sequences.fastq <==
            >SRR1561197.4.2/2
            ATAAAGACAGATGAAGATGCAATACAAATCATAAATAAAACGCTTTAAATAGTTTGAGCAACCCAAGCGCATAAGAAATTTCAATCT
            +
            HIJJIHGJJJJJFECHEHIIJJIGIJJIIJJJIJJJDHHIJIJJIJEHIJJJGIIIGJJICAEHFFFFDDDDDDCDDDCCADEDDDC
            >SRR1561197.9.2/2
            AGATGTCTGTCCTCCAGAAGATGGCATTGCCTGAACGCAGGAGAGAATATAACAATCATATAGGTTTTCATTCTTGTTTCCAATATC
            +
            IIIJJJIJEFJIJGHGIJIEHJJJJIGIIJJJJFIJHHIJIJJJIIJIJIIJJJHHHHHFFFFFFFFDEDEEEEED@ACCCDCCCDB
            >SRR1561197.11.2/2
            CAATGCAATGTGATTATCCAAGCTCACAATCTTCCTCACCGATCTGGAGTCTTGGAGCTTGGCCGCGGATTTCTTTTCGACGCCGAG



            However the cammand i used with --stats:


            Code:
            ./pairfq makepairs -f forward_sequences.fastq -r reverse_sequences.fastq -fp f_paired_1.fastq -rp r_paired_2.fastq -fs f_single_1.fastq -rs r_single_2.fastq --stats


            Output:

            ========= pairfq version : 0.14.1 (completion time: mar 31 mar 2015, 09.45.47, CEST)
            Total forward reads (../../forward_sequences.fastq) : 8492638
            Total reverse reads (../../reverse_sequences.fastq) : 13525478
            Total forward paired reads (1_paired.fastq) : 0
            Total reverse paired reads (2_paired.fastq) : 0
            Total forward unpaired reads (single_1.fastq) : 8492638
            Total reverse unpaired reads (single_2.fastq) : 13525478

            Total paired reads : 0
            Total unpaired reads : 22018116


            It put all the reads in unpaired files.

            Please anyone can help me with this?

            Comment


            • #51
              @safina: Are you able to find corresponding read 2 for the ID's below in second file?

              Code:
              $ grep -A 3 "SRR1561197.13.1/2"  reverse_sequences.fastq
              Code:
              $ grep -A 3 "SRR1561197.19.1/2" reverse_sequences.fastq

              Comment


              • #52
                I didnt get wht you trying to say?

                Comment


                • #53
                  Test to check that the ID's are present in both files (i.e. these files are a real pair).

                  Have you tried to use "repair.sh" that I posted in #46 above?

                  It appears that this must be data from SRA/GEO. Why did you not use a trimming program that was pair-end aware? What program did you use for trimming (if these files have been trimmed)?

                  Comment


                  • #54
                    Originally posted by safina View Post
                    Thanks for the response. I ran --index due to the ram error as i have 8gb ram on my linux computer.

                    The files look like this:

                    ==> forward_sequences.fastq <==
                    >SRR1561197.13.1/1
                    TCAAAAGGAGAACTCAATAGGCTGAACAAGTTATCTTCTGGGATTGTAATGAGAGTTGCTTCACTGCTTTGGAAGAAGAAAGCTCAT
                    +
                    JJJJJIJJIIJJIJJJIJIIJJJJJJJJJJIIIIIJJJJJJJHIJJIGJJJJJJJGIJJJJJGIIHHHHHFFEFFDEEDEDCACDDD
                    >SRR1561197.17.1/1
                    TATACAAAGCTGTCAACTTGATCTTCATACTTCTCATAAAGGACTGGTAATGTGTGGGCAGCAACGAAACCAACATATAAAACAGTC
                    +
                    HHGJJJJJJJJJJJJIJJJJJJJJJJJJJJJJJJJJJJJJIJJJJJJIJJJJIIIIIIJIIHCHHFFFDDEDDDDDDEEEDCDDCCA
                    >SRR1561197.19.1/1
                    GATCAACAGTACTGGAATGGCCATCCATCACAAGTTCAGCTAAAGCAGCTCCTGTTGCAGGACCGTTTAGAATACCCCAGCAACTGT

                    ==> reverse_sequences.fastq <==
                    >SRR1561197.4.2/2
                    ATAAAGACAGATGAAGATGCAATACAAATCATAAATAAAACGCTTTAAATAGTTTGAGCAACCCAAGCGCATAAGAAATTTCAATCT
                    +
                    HIJJIHGJJJJJFECHEHIIJJIGIJJIIJJJIJJJDHHIJIJJIJEHIJJJGIIIGJJICAEHFFFFDDDDDDCDDDCCADEDDDC
                    >SRR1561197.9.2/2
                    AGATGTCTGTCCTCCAGAAGATGGCATTGCCTGAACGCAGGAGAGAATATAACAATCATATAGGTTTTCATTCTTGTTTCCAATATC
                    +
                    IIIJJJIJEFJIJGHGIJIEHJJJJIGIIJJJJFIJHHIJIJJJIIJIJIIJJJHHHHHFFFFFFFFDEDEEEEED@ACCCDCCCDB
                    >SRR1561197.11.2/2
                    CAATGCAATGTGATTATCCAAGCTCACAATCTTCCTCACCGATCTGGAGTCTTGGAGCTTGGCCGCGGATTTCTTTTCGACGCCGAG
                    This may be one issue, as these reads are not proper fastq (records should start with "@"). Because this will likely cause issues with any downstream program, I would fix the format and then re-pair the reads.

                    This should work:

                    Code:
                    sed 's/>SRR/@SRR/g' s_1_1_sequence.fq > s_1_1_sequence_fix.fq

                    Comment


                    • #55
                      Good catch though in prior post they were proper fastq (http://seqanswers.com/forums/showpos...3&postcount=38).

                      Comment


                      • #56
                        Originally posted by GenoMax View Post
                        Good catch though in prior post they were proper fastq (http://seqanswers.com/forums/showpos...3&postcount=38).
                        The IDs are different, so I think this is a different data set. Also, I think the same person asked this question on stackoverflow, where I answered it, and the question was marked as solved and the OP posted a comment saying it worked. Later, the solved mark was removed along with the previous comment, and a new comment was made saying it didn't work. It seems clear that this has to do with a different data set, one that is likely corrupted somehow, but we'll have to wait and see if there is a response.

                        Comment


                        • #57
                          Hello I tried it by replacing > with @ signs but the problem remained the same. and the data set is the same but i tried to modify the headers using fastool thats why header are changed.. Its still giving empty files when the program completes..

                          Comment


                          • #58
                            Originally posted by GenoMax View Post
                            Test to check that the ID's are present in both files (i.e. these files are a real pair).

                            Have you tried to use "repair.sh" that I posted in #46 above?

                            It appears that this must be data from SRA/GEO. Why did you not use a trimming program that was pair-end aware? What program did you use for trimming (if these files have been trimmed)?
                            Yes i tried repair.sh but its also just making empty files. no result!!

                            The fastq were made from .sra file. The Genbank accession number of this sra is: SRP045880. It has four samples. The ids for sample im using are:
                            1. SRR1561197 http://www.ncbi.nlm.nih.gov/sra/SRX689551[accn] and
                            2. SRR1562087 http://www.ncbi.nlm.nih.gov/sra/SRX690236[accn]

                            I used SRA toolkit for converting .sra to .fastq format.Then FASTX toolkit for filtering and trimming process.


                            Now i have provided the complete info. If anyone can tell me where I'm lacking or what are the issues?. As i want to run trinity on these reads to get the transcripts assembly/ unigenes.

                            I hope now im clear in my problem?

                            Comment


                            • #59
                              Originally posted by SES View Post
                              The IDs are different, so I think this is a different data set. Also, I think the same person asked this question on stackoverflow, where I answered it, and the question was marked as solved and the OP posted a comment saying it worked. Later, the solved mark was removed along with the previous comment, and a new comment was made saying it didn't work. It seems clear that this has to do with a different data set, one that is likely corrupted somehow, but we'll have to wait and see if there is a response.

                              Yes you are write but it gave me empty files when the process complete. But the errors i was facing were gone it ran successfully thats why i wrote it worked. But later when i saw the files were empty! Therefore, i have to remove my comment that it worked. The reads/ data is the same but the headers are changed as i tried changing header because i thought due to headers i m facing problems. but i was unsuccessful with different headers as well. Thats why you found the different headers in my post.

                              Comment


                              • #60
                                Originally posted by SES View Post
                                This may be one issue, as these reads are not proper fastq (records should start with "@"). Because this will likely cause issues with any downstream program, I would fix the format and then re-pair the reads.

                                This should work:

                                Code:
                                sed 's/>SRR/@SRR/g' s_1_1_sequence.fq > s_1_1_sequence_fix.fq
                                I tried with @SRR as well but the same results!

                                Comment

                                Latest Articles

                                Collapse

                                • seqadmin
                                  Current Approaches to Protein Sequencing
                                  by seqadmin


                                  Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                                  04-04-2024, 04:25 PM
                                • seqadmin
                                  Strategies for Sequencing Challenging Samples
                                  by seqadmin


                                  Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                                  03-22-2024, 06:39 AM

                                ad_right_rmr

                                Collapse

                                News

                                Collapse

                                Topics Statistics Last Post
                                Started by seqadmin, 04-11-2024, 12:08 PM
                                0 responses
                                25 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 04-10-2024, 10:19 PM
                                0 responses
                                27 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 04-10-2024, 09:21 AM
                                0 responses
                                24 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 04-04-2024, 09:00 AM
                                0 responses
                                52 views
                                0 likes
                                Last Post seqadmin  
                                Working...
                                X