SEQanswers

Go Back   SEQanswers > Sequencing Technologies/Companies > Illumina/Solexa



Similar Threads
Thread Thread Starter Forum Replies Last Post
Demultiplex amplicon data on duel barcodes and priming sequence. JackieBadger Bioinformatics 4 11-02-2012 05:49 AM
using Mothur to demultiplex plate prepared using RLMIDs NaN_ 454 Pyrosequencing 1 10-01-2012 07:34 PM
duplicate reads in Illumina short, single end reads of RNAseq data inbarpl Bioinformatics 4 05-22-2012 08:36 AM
How to Demultiplex a Nextera paired-end MiSeq run allo Illumina/Solexa 6 02-27-2012 07:10 AM
Casava 1.7 demultiplex.pl slowness FredG Bioinformatics 1 06-17-2011 07:28 AM

Reply
 
Thread Tools
Old 10-17-2012, 10:47 AM   #1
newBioinfo
Member
 
Location: US

Join Date: Mar 2012
Posts: 36
Default Demultiplex Illumina reads

Hi Everyone,
I am kind of stuck with my Illumina data, I want to remove the barcodes from my reads. My read file looks like this
@HWI-ST1035:115:C0RG7ACXX:5:1101:1216:2040 1:N:0:
NACAGAGGATGCAAGCGTTATCCGGAATGATTGGGCGTAAAGCGTCTGNNNGNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
+
#1=DDDFFCADHHIIIIIIIIGIIIIIIIGIIIIIIIFHIIIIICFHH################################
#######################################################################
and my barcode files look like this:
@HWI-ST1035:115:C0RG7ACXX:5:1101:1216:2040 2:N:0:
NNNNNANACACA
+
############
When I am using fastx toolkit to trim barcodes I am getting error.
The command I am using is:
cat lane5_NoIndex_L005_R1_001.fastq | /u2/software/fastx/fastx_toolkit-0.0.13.2/bin/fastx_barcode_splitter.pl --bcfile lane5_NoIndex_L005_R2_001.fastq --bol --prefix x --suffix ".fastq"
The error I am getting is:
Error: bad barcode value (2:N:0 at barcode file (lane5_NoIndex_L005_R2_001.fastq) line 1
The reason I think is beacuse of 2:N:0 in the barcode header and 1:N:0 in the reads header.
I am not sure how to rectify this, please if anyone has any idea could you please help me.

Thanks!!!!
newBioinfo is offline   Reply With Quote
Old 10-17-2012, 11:03 AM   #2
JackieBadger
Senior Member
 
Location: Halifax, Nova Scotia

Join Date: Mar 2009
Posts: 381
Default

Use Trimmomatic... much more versatile and mate-pair aware. Or just use the trim function in text manipulation in Galaxy.
JackieBadger is offline   Reply With Quote
Old 10-17-2012, 11:07 AM   #3
sklages
Senior Member
 
Location: Berlin, DE

Join Date: May 2008
Posts: 628
Default

Why don't you ask your sequencing provider to demultiplex the data for you? It is not more work for them, as it is part of the fastq processing; you just need to provide some kind of sample description, a samplesheet.

All our customers are quite happy that this work has already been done when they get their data :-)

Sven
sklages is offline   Reply With Quote
Old 10-17-2012, 01:25 PM   #4
JackieBadger
Senior Member
 
Location: Halifax, Nova Scotia

Join Date: Mar 2009
Posts: 381
Default

BEWARE! You must always have the ability to check any processing a provider does for you... for e.g. the trimming script on the MiSeq software should be avoided as it is VERY promiscuous...even in the "remedied" latest update. Also...I have come across significant errors in the MiSeq de-multiplexing.
I would do everything with a program where you set the parameters and know what is going in and what should come out.
JackieBadger is offline   Reply With Quote
Old 10-17-2012, 01:40 PM   #5
newBioinfo
Member
 
Location: US

Join Date: Mar 2012
Posts: 36
Default

Quote:
Originally Posted by JackieBadger View Post
Use Trimmomatic... much more versatile and mate-pair aware. Or just use the trim function in text manipulation in Galaxy.
Thanks jackieBadger,
I will try it!
newBioinfo is offline   Reply With Quote
Old 10-17-2012, 01:42 PM   #6
newBioinfo
Member
 
Location: US

Join Date: Mar 2012
Posts: 36
Default

Quote:
Originally Posted by JackieBadger View Post
BEWARE! You must always have the ability to check any processing a provider does for you... for e.g. the trimming script on the MiSeq software should be avoided as it is VERY promiscuous...even in the "remedied" latest update. Also...I have come across significant errors in the MiSeq de-multiplexing.
I would do everything with a program where you set the parameters and know what is going in and what should come out.
Thanks JackieBadger,
I am planning to use QIME, so hopefully I will not encounter such issues.



Thanks for the help!!!
newBioinfo is offline   Reply With Quote
Old 10-17-2012, 01:44 PM   #7
newBioinfo
Member
 
Location: US

Join Date: Mar 2012
Posts: 36
Default

Quote:
Originally Posted by sklages View Post
Why don't you ask your sequencing provider to demultiplex the data for you? It is not more work for them, as it is part of the fastq processing; you just need to provide some kind of sample description, a samplesheet.

All our customers are quite happy that this work has already been done when they get their data :-)

Sven
Thanks Seven,
Ii would be good if they demultiplex the data before sending, but in my case it is not.
newBioinfo is offline   Reply With Quote
Old 10-17-2012, 11:52 PM   #8
sklages
Senior Member
 
Location: Berlin, DE

Join Date: May 2008
Posts: 628
Default

Quote:
Originally Posted by JackieBadger View Post
BEWARE! You must always have the ability to check any processing a provider does for you... for e.g. the trimming script on the MiSeq software should be avoided as it is VERY promiscuous...even in the "remedied" latest update. Also...I have come across significant errors in the MiSeq de-multiplexing.
I would do everything with a program where you set the parameters and know what is going in and what should come out.
hhm, it's pretty easy to check what the provider does if you also have the "Undetermined_indices" data files. MiSeq is another thing ... the trimming issue is known and should not be used (currently). You could also ask for some (demultiplexing) stats, to see if the results are "good" or as expected.

If you don't trust in your sequence provider at all, you should look for another one ;-)

What "significant errors" did you encounter in the MiSeq demultiplexing?
We are not plexing Miseq libs, so I am just curious :-)

Sven
sklages is offline   Reply With Quote
Old 10-18-2012, 07:08 AM   #9
NextGenSeq
Senior Member
 
Location: USA

Join Date: Apr 2009
Posts: 482
Default

Are you sure they did an index read? The title of your files say No_Index. The only time we ever get data titled like this is if a index read wasn't done.
NextGenSeq is offline   Reply With Quote
Old 10-18-2012, 08:59 AM   #10
sklages
Senior Member
 
Location: Berlin, DE

Join Date: May 2008
Posts: 628
Default

Quote:
Originally Posted by NextGenSeq View Post
Are you sure they did an index read? The title of your files say No_Index. The only time we ever get data titled like this is if a index read wasn't done.
No, you'll always get that naming when there is not index specified in the samplesheet for that run (irrespective if there was run an index read).
You cannot safely deduce from the naming wether there has been run an index read or not (at least for the "_NoIndex_" case)..

Sven
sklages is offline   Reply With Quote
Old 10-18-2012, 11:38 AM   #11
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 7,077
Default

Quote:
Originally Posted by sklages View Post
No, you'll always get that naming when there is not index specified in the samplesheet for that run (irrespective if there was run an index read).
You cannot safely deduce from the naming wether there has been run an index read or not (at least for the "_NoIndex_" case)..

Sven
Let us hope that is the case. If the facility did not run this as a multiplex sample then OP is out of luck. This run will have to be repeated.
GenoMax is offline   Reply With Quote
Old 10-18-2012, 12:04 PM   #12
sklages
Senior Member
 
Location: Berlin, DE

Join Date: May 2008
Posts: 628
Default

Quote:
Originally Posted by GenoMax View Post
Let us hope that is the case. If the facility did not run this as a multiplex sample then OP is out of luck. This run will have to be repeated.
Sure, you are absolutely right.

This problem might arise if the customer doesn't mention any indices that need to be demultiplexed in their "order" (however this order looks like), maybe assuming that this is not relevant for the sequencing run itself but for the post-processing only.
... and the sequencing core organizes their FCs with respect to read length and MP/no MP ...

We had a similar post a while ago, where the OP has hand-written a little note on the "order sheet" and as a result the sequencing didn't recognize it as "please do an index read, as my libraries have indices" ...

Sven
sklages is offline   Reply With Quote
Old 12-04-2012, 02:26 PM   #13
LVAndrews
Member
 
Location: Flagstaff, AZ

Join Date: Sep 2012
Posts: 55
Default

Quote:
Originally Posted by newBioinfo View Post
Thanks JackieBadger,
I am planning to use QIME, so hopefully I will not encounter such issues.



Thanks for the help!!!
If you are using QIIME, then you will have the option to remove unwanted sequences (such as barcodes) during the split_libraries_fastq.py step. See http://qiime.org/scripts/split_libraries_fastq.html for more information.
LVAndrews is offline   Reply With Quote
Old 12-04-2012, 02:27 PM   #14
LVAndrews
Member
 
Location: Flagstaff, AZ

Join Date: Sep 2012
Posts: 55
Default

Quote:
Originally Posted by AKrohn View Post
If you are using QIIME, then you will have the option to remove unwanted sequences (such as barcodes) during the split_libraries_fastq.py step. See http://qiime.org/scripts/split_libraries_fastq.html for more information.
Oops for this application you want split_libraries.py script instead.
LVAndrews is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 04:35 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO