Seqanswers Leaderboard Ad

**GenoMax** · 03-30-2015, 03:10 AM

This is just a guess but you appear to have an extra "-" in the index option at the end of the command?

You can also try the repair.sh utility from BBmap suite that can achieve the same result: http://seqanswers.com/forums/showthr...t=41057&page=4

**SES** · 03-30-2015, 05:27 AM

Originally posted by safina View Post

Hi. Im not getting any eroor infact the pairfq program putting all my reads in single.fq where as 1-paired.fq and 2_paired.fq remains empty. the link to the script is below:

GitHub - sestaton/Pairfq: Sync paired-end FASTA/Q files and keep singleton reads

https://github.com/sestaton/Pairfq

Sync paired-end FASTA/Q files and keep singleton reads - sestaton/Pairfq

the cammand was:

$ pairfq makepairs -f s_1_1_trimmed.fq \
-r s_1_2_trimmed.fq \
-fp s_1_1_trimmed_p.fq \
-rp s_1_2_trimmed_p.fq \
-fs s_1_1_trimmed_s.fq \
-rs s_1_2_trimmed_s.fq \
--index

Your command looks good, I'm sure this is an issue with the identifiers, similar to what was discussed above in the thread. If you scroll down to the "Expected Formats" section on the wiki homepage you can see what is expected.

This is a very common issue and you can add the info with the following commands:

Code:

pairfq addinfo -i s_1_1_trimmed.fq -o s_1_1_trimmed_info.fq -p 1
pairfq addinfo -i s_1_2_trimmed.fq -o s_1_2_trimmed_info.fq -p 2

Then, you can try the 'makepairs' command again with the files you just created. If that doesn't work, please show us what the sequence records look because there could be something else going on.

**GenoMax** · 03-30-2015, 05:32 AM

@safina had posted an example of reads in this post earlier in the thread: http://seqanswers.com/forums/showpos...3&postcount=38

**SES** · 03-30-2015, 05:54 AM

Originally posted by GenoMax View Post

@safina had posted an example of reads in this post earlier in the thread: http://seqanswers.com/forums/showpos...3&postcount=38

Ah, thanks I missed that. The reads look normal to me, so I don't see a reason to add the pair info as I suggested above. Nothing jumps out at me as being problematic with the data or commands, but if multiple methods are failing then something is clearly wrong.

safina, could you run pairfq with the "--stats" command and show us the output? If possible, try to run the command without the "--index" because that may be the issue.

**safina** · 03-31-2015, 12:11 AM

Thanks for the response. I ran --index due to the ram error as i have 8gb ram on my linux computer.

The files look like this:

==> forward_sequences.fastq <==
>SRR1561197.13.1/1
TCAAAAGGAGAACTCAATAGGCTGAACAAGTTATCTTCTGGGATTGTAATGAGAGTTGCTTCACTGCTTTGGAAGAAGAAAGCTCAT
+
JJJJJIJJIIJJIJJJIJIIJJJJJJJJJJIIIIIJJJJJJJHIJJIGJJJJJJJGIJJJJJGIIHHHHHFFEFFDEEDEDCACDDD
>SRR1561197.17.1/1
TATACAAAGCTGTCAACTTGATCTTCATACTTCTCATAAAGGACTGGTAATGTGTGGGCAGCAACGAAACCAACATATAAAACAGTC
+
HHGJJJJJJJJJJJJIJJJJJJJJJJJJJJJJJJJJJJJJIJJJJJJIJJJJIIIIIIJIIHCHHFFFDDEDDDDDDEEEDCDDCCA
>SRR1561197.19.1/1
GATCAACAGTACTGGAATGGCCATCCATCACAAGTTCAGCTAAAGCAGCTCCTGTTGCAGGACCGTTTAGAATACCCCAGCAACTGT

==> reverse_sequences.fastq <==
>SRR1561197.4.2/2
ATAAAGACAGATGAAGATGCAATACAAATCATAAATAAAACGCTTTAAATAGTTTGAGCAACCCAAGCGCATAAGAAATTTCAATCT
+
HIJJIHGJJJJJFECHEHIIJJIGIJJIIJJJIJJJDHHIJIJJIJEHIJJJGIIIGJJICAEHFFFFDDDDDDCDDDCCADEDDDC
>SRR1561197.9.2/2
AGATGTCTGTCCTCCAGAAGATGGCATTGCCTGAACGCAGGAGAGAATATAACAATCATATAGGTTTTCATTCTTGTTTCCAATATC
+
IIIJJJIJEFJIJGHGIJIEHJJJJIGIIJJJJFIJHHIJIJJJIIJIJIIJJJHHHHHFFFFFFFFDEDEEEEED@ACCCDCCCDB
>SRR1561197.11.2/2
CAATGCAATGTGATTATCCAAGCTCACAATCTTCCTCACCGATCTGGAGTCTTGGAGCTTGGCCGCGGATTTCTTTTCGACGCCGAG

However the cammand i used with --stats:

Code:

./pairfq makepairs -f forward_sequences.fastq -r reverse_sequences.fastq -fp f_paired_1.fastq -rp r_paired_2.fastq -fs f_single_1.fastq -rs r_single_2.fastq --stats

Output:

========= pairfq version : 0.14.1 (completion time: mar 31 mar 2015, 09.45.47, CEST)
Total forward reads (../../forward_sequences.fastq) : 8492638
Total reverse reads (../../reverse_sequences.fastq) : 13525478
Total forward paired reads (1_paired.fastq) : 0
Total reverse paired reads (2_paired.fastq) : 0
Total forward unpaired reads (single_1.fastq) : 8492638
Total reverse unpaired reads (single_2.fastq) : 13525478

Total paired reads : 0
Total unpaired reads : 22018116

It put all the reads in unpaired files.

Please anyone can help me with this?

**GenoMax** · 03-31-2015, 03:11 AM

@safina: Are you able to find corresponding read 2 for the ID's below in second file?

Code:

$ grep -A 3 "SRR1561197.13.1/2"  reverse_sequences.fastq

Code:

$ grep -A 3 "SRR1561197.19.1/2" reverse_sequences.fastq

**safina** · 03-31-2015, 03:17 AM

I didnt get wht you trying to say?

**GenoMax** · 03-31-2015, 03:36 AM

Test to check that the ID's are present in both files (i.e. these files are a real pair).

Have you tried to use "repair.sh" that I posted in #46 above?

It appears that this must be data from SRA/GEO. Why did you not use a trimming program that was pair-end aware? What program did you use for trimming (if these files have been trimmed)?

**SES** · 03-31-2015, 06:26 AM

Originally posted by safina View Post

Thanks for the response. I ran --index due to the ram error as i have 8gb ram on my linux computer.

The files look like this:

==> forward_sequences.fastq <==
>SRR1561197.13.1/1
TCAAAAGGAGAACTCAATAGGCTGAACAAGTTATCTTCTGGGATTGTAATGAGAGTTGCTTCACTGCTTTGGAAGAAGAAAGCTCAT
+
JJJJJIJJIIJJIJJJIJIIJJJJJJJJJJIIIIIJJJJJJJHIJJIGJJJJJJJGIJJJJJGIIHHHHHFFEFFDEEDEDCACDDD
>SRR1561197.17.1/1
TATACAAAGCTGTCAACTTGATCTTCATACTTCTCATAAAGGACTGGTAATGTGTGGGCAGCAACGAAACCAACATATAAAACAGTC
+
HHGJJJJJJJJJJJJIJJJJJJJJJJJJJJJJJJJJJJJJIJJJJJJIJJJJIIIIIIJIIHCHHFFFDDEDDDDDDEEEDCDDCCA
>SRR1561197.19.1/1
GATCAACAGTACTGGAATGGCCATCCATCACAAGTTCAGCTAAAGCAGCTCCTGTTGCAGGACCGTTTAGAATACCCCAGCAACTGT

==> reverse_sequences.fastq <==
>SRR1561197.4.2/2
ATAAAGACAGATGAAGATGCAATACAAATCATAAATAAAACGCTTTAAATAGTTTGAGCAACCCAAGCGCATAAGAAATTTCAATCT
+
HIJJIHGJJJJJFECHEHIIJJIGIJJIIJJJIJJJDHHIJIJJIJEHIJJJGIIIGJJICAEHFFFFDDDDDDCDDDCCADEDDDC
>SRR1561197.9.2/2
AGATGTCTGTCCTCCAGAAGATGGCATTGCCTGAACGCAGGAGAGAATATAACAATCATATAGGTTTTCATTCTTGTTTCCAATATC
+
IIIJJJIJEFJIJGHGIJIEHJJJJIGIIJJJJFIJHHIJIJJJIIJIJIIJJJHHHHHFFFFFFFFDEDEEEEED@ACCCDCCCDB
>SRR1561197.11.2/2
CAATGCAATGTGATTATCCAAGCTCACAATCTTCCTCACCGATCTGGAGTCTTGGAGCTTGGCCGCGGATTTCTTTTCGACGCCGAG

This may be one issue, as these reads are not proper fastq (records should start with "@"). Because this will likely cause issues with any downstream program, I would fix the format and then re-pair the reads.

This should work:

Code:

sed 's/>SRR/@SRR/g' s_1_1_sequence.fq > s_1_1_sequence_fix.fq

**GenoMax** · 03-31-2015, 06:32 AM

Good catch though in prior post they were proper fastq (http://seqanswers.com/forums/showpos...3&postcount=38).

**SES** · 03-31-2015, 01:41 PM

Originally posted by GenoMax View Post

Good catch though in prior post they were proper fastq (http://seqanswers.com/forums/showpos...3&postcount=38).

The IDs are different, so I think this is a different data set. Also, I think the same person asked this question on stackoverflow, where I answered it, and the question was marked as solved and the OP posted a comment saying it worked. Later, the solved mark was removed along with the previous comment, and a new comment was made saying it didn't work. It seems clear that this has to do with a different data set, one that is likely corrupted somehow, but we'll have to wait and see if there is a response.

**safina** · 03-31-2015, 10:26 PM

Hello I tried it by replacing > with @ signs but the problem remained the same. and the data set is the same but i tried to modify the headers using fastool thats why header are changed.. Its still giving empty files when the program completes..

**safina** · 03-31-2015, 10:49 PM

Originally posted by GenoMax View Post

Test to check that the ID's are present in both files (i.e. these files are a real pair).

Have you tried to use "repair.sh" that I posted in #46 above?

It appears that this must be data from SRA/GEO. Why did you not use a trimming program that was pair-end aware? What program did you use for trimming (if these files have been trimmed)?

Yes i tried repair.sh but its also just making empty files. no result!!

The fastq were made from .sra file. The Genbank accession number of this sra is: SRP045880. It has four samples. The ids for sample im using are:
1. SRR1561197 http://www.ncbi.nlm.nih.gov/sra/SRX689551[accn] and
2. SRR1562087 http://www.ncbi.nlm.nih.gov/sra/SRX690236[accn]

I used SRA toolkit for converting .sra to .fastq format.Then FASTX toolkit for filtering and trimming process.

Now i have provided the complete info. If anyone can tell me where I'm lacking or what are the issues?. As i want to run trinity on these reads to get the transcripts assembly/ unigenes.

I hope now im clear in my problem?

**safina** · 03-31-2015, 10:54 PM

Originally posted by SES View Post

The IDs are different, so I think this is a different data set. Also, I think the same person asked this question on stackoverflow, where I answered it, and the question was marked as solved and the OP posted a comment saying it worked. Later, the solved mark was removed along with the previous comment, and a new comment was made saying it didn't work. It seems clear that this has to do with a different data set, one that is likely corrupted somehow, but we'll have to wait and see if there is a response.

Yes you are write but it gave me empty files when the process complete. But the errors i was facing were gone it ran successfully thats why i wrote it worked. But later when i saw the files were empty! Therefore, i have to remove my comment that it worked. The reads/ data is the same but the headers are changed as i tried changing header because i thought due to headers i m facing problems. but i was unsuccessful with different headers as well. Thats why you found the different headers in my post.

**safina** · 03-31-2015, 10:54 PM

Originally posted by SES View Post

This may be one issue, as these reads are not proper fastq (records should start with "@"). Because this will likely cause issues with any downstream program, I would fix the format and then re-pair the reads.

This should work:

Code:

sed 's/>SRR/@SRR/g' s_1_1_sequence.fq > s_1_1_sequence_fix.fq

I tried with @SRR as well but the same results!

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 25 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 27 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 24 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 52 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News