SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Velvet paired end after some sequences removed? LizBent Bioinformatics 6 03-06-2012 05:25 AM
Custom barcodes AND Paired end seq readlength GenomicIBK SOLiD 3 11-20-2011 07:02 PM
Illumina1.8 Paired-End Barcode Splitting? pbatzel Bioinformatics 2 10-25-2011 03:08 PM
the oligonucleotide sequences of P7 and P5 on paired-end flow cells lhemivw Illumina/Solexa 0 08-20-2010 02:52 AM
Library prep Solid4 paired-end BARCODES KNS General 0 08-04-2010 06:26 AM

Reply
 
Thread Tools
Old 08-09-2011, 06:53 AM   #1
ester
Member
 
Location: Israel

Join Date: Jun 2008
Posts: 10
Default Splitting NuGen barcodes from paired-end sequences

Hi all,

Does anybody know a software to split Nugen barcodes that supports PAIRED-END reads?

Thanks,

Ester
ester is offline   Reply With Quote
Old 08-09-2011, 07:48 AM   #2
axgraf
Junior Member
 
Location: Germany

Join Date: Apr 2011
Posts: 7
Default

Hi Ester,
I wrote a small program for that issue.

The tool filters the reads by searching for the barcode only in the first read. If found the barcode is removed and written to the output. Note that the input files must be in order.
Small example:

java -Xmx4g -jar DemultiplexNUGEN.jar -i1 laneX_1.fastq laneY_1.fastq ... -i2 laneX_2.fastq laneY_2.fastq ... -b ATTG -o1 ATTG_demulitplex_1.fastq -o2 ATTG_demultiplex_2.fastq -s

Hope I could help. Please keep me informed if it works.

Alex

Last edited by axgraf; 08-10-2011 at 07:18 AM.
axgraf is offline   Reply With Quote
Old 08-10-2011, 05:47 AM   #3
ester
Member
 
Location: Israel

Join Date: Jun 2008
Posts: 10
Default Splitting NuGen barcodes from paired-end sequences

Hi Alex,

Thanks for your help.

I tried to run your program with the following command:

java -Xmx4g -jar DemultiplexNUGEN.jar -i1 s_7_1_sequence.txt -i2 s_7_2_sequence.txt -b ACCC -o1 test.1
-o2 test.2 -s

and got the following error:

de.genzentrum.lafuga.NotFastqFormatException: Read1 has not the same identifier as read2
at de.genzentrum.lafuga.trimmer.Demultiplex.iterateFastqPairedEnd(Demultiplex.java:96)
at de.genzentrum.lafuga.main.MainPairedEnd.main(MainPairedEnd.java:70)

The input files:

>head s_7_1_sequence.txt
@HWI-ST611_0176:7:1:1226:2054#0/1
NGTACTCGTCCACGTCGTTCTCAGAGAGAATATTCTCTCTCCACACATCAGCAGTTAAGGAGGATGTGAAGACAATCTTTTCAACACTATCGGTCTGAGC
+HWI-ST611_0176:7:1:1226:2054#0/1
BYWYW[ZZZZcccccc_cccccccccccccc_ccccccccccccccccc\ccc_ccc\cccc_\cccccVccac______YUcUc\^^^\^^XZ^[X\\\
@HWI-ST611_0176:7:1:1161:2111#0/1
GAGTAGGCCACGCNTTCACGGTTCGTATTCGTGCTGGAAATCAGAATCAAACGAGCTTTTACCCTTTTGTTCCACACGAGATTTCTGTTCTCGTTGAGCT
+HWI-ST611_0176:7:1:1161:2111#0/1
gggeggggggcccBccccccggggfdgeggdbdddgfgfgdgggggeefgegeggbeegedea[gfedaagZeed]]bb`eedfegXgggabaddYaeca
@HWI-ST611_0176:7:1:1197:2111#0/1
GAGCCGCCCGCTCTCTGCTTTCCAAGCCTTTGCGATCTGCTTAAGCAGCTTTGACACCAGATCGGAAGAGCGGTTCAGCAGGAATGCCGAGACCGATCTC
arkady Melon_2011/data> head s_7_2_sequence.txt
@HWI-ST611_0176:7:1:1226:2054#0/2
CAAATGGTGGATTTGGAGGTTAGAGGAACAATTAATGTCGTCGAGGCTTGTGCTCAGACCGATAGTGTTGAAAAGATTGTCTTCACATCCTCCTTAACTG
+HWI-ST611_0176:7:1:1226:2054#0/2
gggggggegggggggggggggggdgggggggggggggggggggggggge^cd`cddfeeffbe`d`dddd]eee_XddacaddW[aca`cadcbeMdcbT
@HWI-ST611_0176:7:1:1161:2111#0/2
GGTGGGCCGATCCGGGCGGAAGACATTGTCAGGTGGGGAGTTTGGCTGGGGGCGGCACATCTGTTAAAAGATAACGCAGGTGTTCTAAGATGAGCTCAAC
+HWI-ST611_0176:7:1:1161:2111#0/2
fhdgbgggddfefffegfggfbggggddegeea^eedd^deeebecee^cadUXd\TV]`a[]bdfeeda\VadaabcdcK^V\E]U[TY]Ybbbdb[d\
@HWI-ST611_0176:7:1:1197:2111#0/2
GGTGTCAAAGCTGCTTAAGCAGATCGCAAAGGCTTGGAAAGCAGAGAGCGGGCGGCTCAGATCGGAAGGGCGTCGTGTAGGGAAAGAGGGGAGATTTCGG


Can you help with this?

Thanks again,
Ester
ester is offline   Reply With Quote
Old 08-10-2011, 07:17 AM   #4
axgraf
Junior Member
 
Location: Germany

Join Date: Apr 2011
Posts: 7
Default

Hi Ester,
The tool compared the identifier of the reads and stopped because the names weren't the same.
I missed the fact, that paired-end reads could have
a "/1" and "/2" at the end of the identifier, which aren't present in our reads.

I changed the code, so that it should work for your files.

Alex
Attached Files
File Type: tar DemultiplexNUGEN.tar (140.0 KB, 28 views)
axgraf is offline   Reply With Quote
Old 08-10-2011, 07:26 AM   #5
ester
Member
 
Location: Israel

Join Date: Jun 2008
Posts: 10
Default

Hi Alex,

Still having problems:


java.lang.NullPointerException
at java.io.File.<init>(Unknown Source)
at de.genzentrum.lafuga.trimmer.Demultiplex.iterateFastqPairedEnd(Demultiplex.java:74)
at de.genzentrum.lafuga.main.MainPairedEnd.main(MainPairedEnd.java:70)


Thanks,

Ester
ester is offline   Reply With Quote
Old 08-10-2011, 08:02 AM   #6
axgraf
Junior Member
 
Location: Germany

Join Date: Apr 2011
Posts: 7
Default

Have you used the same parameter as in the last post?
It seems to me, that the -o2 switch was not set.

If I use the same parameter and the same sequences as in the last post, I can run it successfully.

If you copy the parameter out of your last post, the "-o2 test.2 -s"
line is missing.

That could have caused the file not found exception.

Otherwise I need the exact parameter which you have used.

Alex
axgraf is offline   Reply With Quote
Old 08-10-2011, 09:26 AM   #7
ester
Member
 
Location: Israel

Join Date: Jun 2008
Posts: 10
Default

Hi Alex,

You are right. It was my mistake.
The program run but the output file is missing the read name after the +:

arkady Melon_2011/data> more test.1
@HWI-ST611_0176:7:1:2764:2469#0/1
AGGAGTCCGGTATTGTTATTTATTGTCACTGCCTCCCCGTGTCAGGATTGGGTAGATCGGAAGAGCGGTTCTGCAGGAATGCCGAGACCGATACCG
+
gggfggggggdggggggggggggggggegeTedcdeggdfgccZegada`ecXabZX_``\`bMYY`aM^\ZX[S^dabXbBBBBBBBBBBBBBBB
@HWI-ST611_0176:7:1:5412:2350#0/1
CCGGGTGACGGAGAATTAGGGTTCGATTCCGGAGAGGGAGCCTGAGAAACGGCTACCACATCCAAGGAAGGCAGCAGGCGCGCAAATTACCCAATC
+
gggfg_gegggggegggdggfggggegggggeggaggd\eefcdbdd[edd`ddeX\\aBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB

Thanks again,

Ester
ester is offline   Reply With Quote
Old 08-10-2011, 09:31 AM   #8
eslondon
Member
 
Location: London, UK

Join Date: Jul 2009
Posts: 21
Default

You can also use novobarcode, part of Novoalign, to split reads in "buckets" based on barcodes.
__________________
--------------------------------------
Elia Stupka
Co-Director and Head of Unit
Center for Translational Genomics and Bioinformatics
San Raffaele Scientific Institute
Via Olgettina 58
20132 Milano
Italy
---------------------------------------
eslondon is offline   Reply With Quote
Old 08-11-2011, 02:14 AM   #9
axgraf
Junior Member
 
Location: Germany

Join Date: Apr 2011
Posts: 7
Default

You are right.
Sorry for that. This tools was used up to now only here at our institute.
I changed it.
Hope everything is fine now.

Alex
Attached Files
File Type: tar DemultiplexNUGEN.tar (150.0 KB, 35 views)
axgraf is offline   Reply With Quote
Old 08-14-2011, 12:51 AM   #10
ester
Member
 
Location: Israel

Join Date: Jun 2008
Posts: 10
Default

Now it works fine.
Thanks a lot,
Ester
ester is offline   Reply With Quote
Old 10-03-2011, 10:36 AM   #11
senpeng
Member
 
Location: phoenix

Join Date: Sep 2011
Posts: 10
Default

Quote:
Originally Posted by axgraf View Post
Hi Ester,
The tool compared the identifier of the reads and stopped because the names weren't the same.
I missed the fact, that paired-end reads could have
a "/1" and "/2" at the end of the identifier, which aren't present in our reads.

I changed the code, so that it should work for your files.

Alex
Dear Alex,
we met the similar problem, and our input format is for CASAVA 1.8, a little bit different with the former one (the position of "1" and "2")
our input are as follows:
@HWUSI-EAS174:6:FC:1:1:1153:945 1:Y:0:
GGGAGGTCGAGGCTGTAGTGAGCTGGGATCGTACCATTTCTCTCATTACGAGATCGGAAGAGCGTGGTGTTGGGACTGAGTGTAGATCTCGGTGGGCGGC
+
25+=70.6;1@@;,;A?=?:19)7;*+++5+?=;+.7;<)3>61*?=;:=BD?B@?222=?8+BB###################################
@HWUSI-EAS174:6:FC:1:1:1288:931 1:Y:0:
GAGGTCGGCTTGGAGTCAGAAAGCTCGGGGCATTGTCTCAGGTCTGTTGCTTCCTAGGAGTGTGAACGATGAGGAAGTTCCTGCATCGCTGAGGACTCAG
+
?+@=6;2;@==B;54;=;=+:785+--/77B?B?D#################################################################
@HWUSI-EAS174:6:FC:1:1:1305:938 1:Y:0:
GGGTTCGCTCGGTGAACTGCACGCCCTTTGAAATGTCTCCTCTCGATTTGGGTGTTTTACTTGATTTTTCTTATATCTTACATCTTTTCTTTAGTCTGTC
+
####################################################################################################
@HWUSI-EAS174:6:FC:1:1:1528:951 1:Y:0:
CGCAAGGACAAAAAACCAAATACTGCATGTTCTCAATCATAGGTGGGAATTGAACAATGAGAACACAGGGACACAGGAACACTCAGATCGGAAGAGCGTC
+
IIIHIHBBIGIIIIIIIIHHBIHIIIIEBIGIIIIDGGBGGGGDGGADGEIIIIGDGEGIHFGHI<IHDGE@HHBIFF@FIFBHHIEG@@HEDDEE>B>3
@HWUSI-EAS174:6:FC:1:1:1551:943 1:Y:0:
GGAGGCTGCTTTTAGGCCTACTATGGGTGTTAAATTTTTTACTCTCTCAAACACCGGGCAGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTAGATCTCG
+
ED=D4FEEE?8BB@B4FBFEE4BFD/:0:4B?;8B*45402921;+86=4CCE?EDDB+DA<ACAD@<GB<0><6?>:4??C>1?###############
@HWUSI-EAS174:6:FC:1:1:1588:935 1:Y:0:
CCGTGATAGTTTTTAGGTGTTAGACACCCCACCTTAAGCTTGTACCTGAAAGCTTTATCTCGTTATAAATAATTCACTGTAATTTAGGGGAGGTATGTCC
+
2+85::1:77)::1:=+9=@@32,@=3<;99@@@F=@4B8B?7C:B?CAB=??8E734282B==77241@##############################

Thus when I run the java, it still shows "Read 1 has not the same identifier as read 2", would you pls help me solve that?

Thanks so much
senpeng is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 07:14 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2021, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO