SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
BSMAP: whole genome Bisulfite Sequence MAPping program wei Epigenetics 4 03-20-2014 01:13 PM
How to make HiSeq indexed paired-end library with homemade oligos? ostrakon Illumina/Solexa 6 03-16-2012 04:22 AM
Difference between mate pair and pair end bassu General 2 06-19-2010 06:13 AM
pair-end sequencing produces single-end read artifact pparg Bioinformatics 9 03-29-2010 11:15 AM
whole genome Bisulfite Sequence MAPping program wei Bioinformatics 0 08-07-2009 02:46 PM

Reply
 
Thread Tools
Old 12-17-2012, 03:16 PM   #21
azneto
Member
 
Location: Brazil

Join Date: Dec 2009
Posts: 24
Default

That's really good news bmtb!
Cheers
Adhemar
azneto is offline   Reply With Quote
Old 07-12-2013, 06:48 AM   #22
bwubb
Member
 
Location: Philadelphia

Join Date: Jan 2012
Posts: 58
Default

Been trying to use the script provided. But I cannot seem to get the regex to work.

@HWI-ST965:305:C0MR9ACXX:6:1113:6758:31224 1:N:0:GTGAAA

and Im using '^@(\S+)\s[1|2]\S+$'

I guess I should ask if this handles .fastq.gz or does it only work on uncompressed files? Thank you.
bwubb is offline   Reply With Quote
Old 03-19-2014, 12:51 AM   #23
shatz
Junior Member
 
Location: Europe

Join Date: Jun 2013
Posts: 1
Default

Hello Azneto,
I would like to use your script to fix my mate-pairs but I have problems with the default expression definition to locate the ID:
@SBS123:173:C2RGEACXX:7:2214:5915:84780 1:N:0:ACTTGA
could you please recommend an expression that will work in this case.

Im using zipped fastq and I hope it is ok to do that.

Thanks!
shatz is offline   Reply With Quote
Old 03-19-2014, 06:13 AM   #24
azneto
Member
 
Location: Brazil

Join Date: Dec 2009
Posts: 24
Default

Hi bwubb,
The regex is correct.
You can test it by running:

grep -P '^@(\S+)\s[1|2]\S+$' yourSequenceFile.fastq

The script does not handle zipped files.
-Best
azneto is offline   Reply With Quote
Old 03-19-2014, 06:16 AM   #25
azneto
Member
 
Location: Brazil

Join Date: Dec 2009
Posts: 24
Default

Hi shatz,
The default regex should work.
You can test it by running:

grep -P '@(\S+)\s[1|2]\S+$' yourSequenceFile.fastq

The script does not handle zipped files.
-Best
azneto is offline   Reply With Quote
Old 03-19-2014, 06:45 AM   #26
SES
Senior Member
 
Location: Vancouver, BC

Join Date: Mar 2010
Posts: 275
Default

Quote:
Originally Posted by shatz View Post
Hello Azneto,
I would like to use your script to fix my mate-pairs but I have problems with the default expression definition to locate the ID:
@SBS123:173:C2RGEACXX:7:2214:5915:84780 1:N:0:ACTTGA
could you please recommend an expression that will work in this case.

Im using zipped fastq and I hope it is ok to do that.

Thanks!
Another option would be to use Pairfq for this task because it can handle FASTA/FASTQ and compressed (bzip2/gzip) or uncompressed data. The specific command you would want would be makepairs. Just a disclaimer, I wrote this for a specific problem we were having with pairing really large numbers of sequences and for this reason there are some dependencies. Specifically with the "--index" option which uses virtually no memory. The requirements are all explained in the documentation and this has been tested on a number of operating systems. This may not be what you need but it doesn't hurt to mention other options.
SES is offline   Reply With Quote
Old 02-02-2019, 06:47 AM   #27
kcritap
Junior Member
 
Location: Brazil

Join Date: Feb 2019
Posts: 1
Default How to use the -r option?

Quote:
Originally Posted by azneto View Post
Hi bmtb,
Sorry it took me so long to reply.
The version of the script you have uses 40x the size of the f1 file.
I've just attached a version that uses about 6x.
So, if you use the 35GB file as f1 you should be able to run it this time.
Please let me know if it worked.
Perl hashes are really memory consuming structures and we're studing alternatives.
Best,
Adhemar
Hello i'm trying use this script in a set of paired end reads. My read names have the following format:
Code:
@SN1054:328:HGF77BCX2:1:1104:1293:2046 1:N:0:GAGCTGAA
What I need to use in option
Code:
-r
?
kcritap is offline   Reply With Quote
Old 02-03-2019, 04:56 PM   #28
SNPsaurus
Registered Vendor
 
Location: Eugene, OR

Join Date: May 2013
Posts: 501
Default

kcritap, are you trying to make sure that the paired-end reads are kept as pairs? I would use bbduk from bbtools for trimming by quality and adapter removal.

bbduk.sh in=R1.fq in2=R2.fq out=R1_trimmed.fq out2=R2_trimmed.fq qtrim=r removeifeitherbad=t
__________________
Providing nextRAD genotyping and PacBio sequencing services. http://snpsaurus.com
SNPsaurus is offline   Reply With Quote
Old 02-04-2019, 12:35 AM   #29
azneto
Member
 
Location: Brazil

Join Date: Dec 2009
Posts: 24
Default

Try this '^@(\S+)\s[1|2]\S+$'
azneto is offline   Reply With Quote
Old 10-07-2019, 05:35 PM   #30
nelbn
Junior Member
 
Location: Brazil

Join Date: Oct 2019
Posts: 1
Default

Hello, azneto,

So my files pattern is @J00160:133:HVYVWBBXX:3:1101:7476:1297 1:N:0:GAACGAAG+CTCCTTAC.

I thougth the second regex '^@(\\S+)\\s[1|2]\\S+\$' would work fine for me.
I'm trying to end up with two separate files so I'm running the following:

perl mergeShuffledFastqSeqs.pl -f1 originals/SAMPLE-READ1.fastq -f2 originals/SAMPLE-READ2.fastq -r '^@(\S+)\s[1|2]\S+$' -o mergedsequences -t

After the run, i end up with two empy files (mergedsequences.1.fastq and mergedsequences.1.fastq) and a large file contain all the sequences and named mergedsequences.nomatch.fastq.

Am i doing something wrong? Any thoughts of what's happening?

I appreciate any help
nelbn is offline   Reply With Quote
Reply

Tags
match reads, paired end read, program, uneven number

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 02:00 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO