SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Basespace: fastq generation stops after demultiplexing Sylviane Illumina/Solexa 0 07-15-2014 01:54 AM
Demultiplexing dual-indexed MiSeq fastq files lynchde Bioinformatics 2 08-18-2013 02:15 PM
demultiplexing fastq data(RNA-Seq data) abh RNA Sequencing 0 11-07-2012 07:27 PM
demultiplexing fastq data(RNA-Seq data) abh Bioinformatics 0 11-07-2012 07:10 PM
dual indices? greigite Illumina/Solexa 10 12-01-2011 06:20 PM

Reply
 
Thread Tools
Old 07-31-2018, 01:50 AM   #1
bpbbentley
Junior Member
 
Location: Perth, Western Australia

Join Date: Jul 2018
Posts: 3
Question Demultiplexing FASTQ with custom indices

Hi all,

I'm fairly new to the realm of bioinformatics with large data sets, so apologies if I've missed something crucial here...

I've recently received some Illumina HiSeq2500 data in FASTQ format which haven't been demultiplexed. We've used custom i5 and i7 sequences in unique combinations for 96 samples. I was given the data in 8 FASTQ files, 2 per lane (4 lanes) with paired-ends. I've concatenated all of the forward and all of the reverse reads into 2 files for simplicity. I've been using the demuxbyname.sh method through BBMap - but I keep running into a couple of problems:
1. When I run demuxbyname.sh with a single string I only receive ~2500 reads in the output files. I've noticed that a lot of the index sequences in the FASTQ files contain N's - especially as the first base call (for i5 and i7).
2. This generally takes ~3hrs, but when I then attempt to run the script with an index.txt file containing multiple index combinations, the compute time increases exponentially.
Any help on either of these points is greatly appreciated!
bpbbentley is offline   Reply With Quote
Old 07-31-2018, 03:29 AM   #2
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 6,794
Default

Before we get into specifics can you ask your sequence provider to do this demultiplexing with Illumina's program called bcl2fastq (you can't do this since it requires access to the full data folder for the flowcell). That should be trivial for them to do (and they should have done it in first place unless you chose not to give them the sample_ID_index combinations).

Can you tell us how you are running "demuxbyname.sh" (full command line)? You should run it like this: https://www.biostars.org/p/139395/#139409 You could start multiple runs (even 96 with just one index combo) to speed things up.

There is also another package called deML that can be used for this.
GenoMax is offline   Reply With Quote
Old 07-31-2018, 07:18 PM   #3
liorgalanti
Junior Member
 
Location: New York

Join Date: Jan 2018
Posts: 2
Default

https://biosails.github.io/pheniqs/
liorgalanti is offline   Reply With Quote
Old 08-02-2018, 09:15 PM   #4
bpbbentley
Junior Member
 
Location: Perth, Western Australia

Join Date: Jul 2018
Posts: 3
Default

Thanks for your feedback on this, it's much appreciated!

I've contacted BGI and they've said that they'll help me with the demultiplexing. I thought it was strange that they simply provided FASTQ files for each lane, especially as they contacted me early on and asked me to provide the index sequences...

I've run the command a few ways, this is ideally what I'm going for:

../sw/bbmap/demuxbyname.sh in=all_lanes_1.fq in2=all_lanes_2.fq out=demux_out/%_1.fq out2=demux_out/%_2.fq prefixmode=f substringmode=f names=index_names_s1.txt

However, I have run it using single sequence strings, and also just running 1 lane of data at a time. Thanks again for your help.
bpbbentley is offline   Reply With Quote
Old 08-03-2018, 05:16 AM   #5
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 6,794
Default

Your indexes most likely look like Index1+Index2 (e.g. GGACTCCT+GCGATCTA) then that is how you need to include them in the file one per line. Is that how you are doing this?
GenoMax is offline   Reply With Quote
Old 08-05-2018, 08:57 PM   #6
bpbbentley
Junior Member
 
Location: Perth, Western Australia

Join Date: Jul 2018
Posts: 3
Default

Yep my indexes are index1_index2 in the read header, and my .txt file reflects these. I get output files with the index complex names, but these are typically not populated with reads...
bpbbentley is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 10:22 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2018, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO