SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Initial QC and grooming for Illumina HiSeq2000 paired end RNAseq on Galaxy lindseykelly RNA Sequencing 5 07-30-2014 01:09 PM
Demultiplexing PE Illumina reads Girma Illumina/Solexa 3 07-23-2014 10:16 AM
Demultiplexing illumina reads Bioinform Bioinformatics 4 02-25-2014 09:37 AM
duplicate reads in Illumina short, single end reads of RNAseq data inbarpl Bioinformatics 4 05-22-2012 08:36 AM
Demultiplexing Illumina reads Bardj Bioinformatics 1 02-16-2011 10:17 AM

Reply
 
Thread Tools
Old 01-13-2015, 02:37 AM   #1
Bacms
Junior Member
 
Location: Cambridge,UK

Join Date: Aug 2014
Posts: 6
Default Demultiplexing Illumina RNASeq paired reads

Hello everyone,

BGI normally provides us with demultiplexed reads but this time we received our fastq files before demultiplexed. Can anyone recommended a software to perform the demultiplexing? And also where I can get the fastq files for the Illumina barcodes?

Thank you very much in advance.

Bruno
Bacms is offline   Reply With Quote
Old 01-13-2015, 03:48 AM   #2
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 6,978
Default

What kind of run is this (HiSeq/MiSeq)? Is that data truly non-demultiplexed? Do you see barcodes in the Fastq ID Header? Can you post a few example sequences?

You won't be able to get fastq files for the barcodes unless the instrument was set up to run (or the post-processing of the data was done) in a special way.
GenoMax is offline   Reply With Quote
Old 01-13-2015, 05:08 AM   #3
gmarco
Member
 
Location: Spain

Join Date: Oct 2012
Posts: 36
Default

You should use bcl2fastq from Illumina to demultiplex your data. Download and employ version according to the sequencing instrument used to obtain the data.
gmarco is offline   Reply With Quote
Old 01-13-2015, 05:12 AM   #4
Bacms
Junior Member
 
Location: Cambridge,UK

Join Date: Aug 2014
Posts: 6
Default

Quote:
Originally Posted by GenoMax View Post
What kind of run is this (HiSeq/MiSeq)? Is that data truly non-demultiplexed? Do you see barcodes in the Fastq ID Header? Can you post a few example sequences?

You won't be able to get fastq files for the barcodes unless the instrument was set up to run (or the post-processing of the data was done) in a special way.
This is HiSeq (2000 I believe but need to double check) and I do see barcodes on the Fastq ID. Does that mean that effectively the data has been demultiplexed just needs to be split?
Here is the head of one of the files:
head 140916_I607_FCC5618ACXX_L4_CHKPEI14080416_1.fq
Code:
@FCC5618ACXX:4:1101:1415:1818#GTGGCCAT/1
NCCCAAACGCGCGTGACTTCACAATAATTAGCCCGTACCTGCTGGTTACGTGGCGGCACCGTGTACAATACCCTAGGCATCAGGGTTAGGCATGGTTACT
+
BP\ceeeegggggghiiiiiiiiihiiiiihiiiiiiiiiiiiiifgggggeeeccaccaccaacdcccccbccccbccccccbc[`accccccc`bccc
@FCC5618ACXX:4:1101:1308:1827#ATGTCAAT/1
NCCCACCAAAACCGGAAAATGCAGGCCCTGTCGTCTCGCGTGAACATCGCGGCCAAGCCCCAGCGCGCTCAGCGCCTGGTGGTCCGCGCCGAGGAGGTTA
+
BP\ccecegggggiihhhiegghhhhihihgiihhiiihighfhiihfggecaacca_acccccZ]]]aaXb]]aX]ac]^_]bccccccc]_a___QW`
@FCC5618ACXX:4:1101:1465:1834#TCCCGAAT/1
NAACCAGGCGAACGGTTGGCGTCGGGATTCGGGACGCAAGCATGGCGCTGACCAGGAGATCGGAAGAGCACACGTCTGAACTCCAGTCACTCCCGAAGCT
head 140916_I607_FCC5618ACXX_L4_CHKPEI14080416_2.fq
Code:
@FCC5618ACXX:4:1101:1415:1818#GTGGCCAT/2
CTCCGGTGTCAAGTAACCATGCCTAACCCTGATGCCTAGGGTATTGTACACGGTGCCGCCACGTAACCAGCAGGTACGGGCTAATTATTGTGAAGTCACG
+
_bbeeecegggggihiiiiiiiiiiiiiiiiiiiiiihhiicffhhhhhighieghhhhiggeeeecddccccccccccccccbbcdddcdcbdbbbbcc
@FCC5618ACXX:4:1101:1308:1827#ATGTCAAT/2
CGGGGCGCAGGATCTTCACCAGCGAGCCGCGCTTGGGGCCGACCTCCTTCTTGGGGGCAGCCTTAACCTCCTCGGCGCGGACCACCAGGCGCTGAGCGCG
+
ab_ceeeef`geghhiiihhiiihiihhiigeeca`accccccccccccc]bbcacW[acccccbbccccccb__cccaaccc^aa[[_`accca^baac
@FCC5618ACXX:4:1101:1465:1834#TCCCGAAT/2
CCTGGTCAGCGCCATGCTTGCGTCCCGAATCCCGACGCCAACCGTTCGCCTGGTTCAGATCGGAAGAGCGTCGTGTAGGGA

Last edited by Bacms; 01-13-2015 at 07:54 AM.
Bacms is offline   Reply With Quote
Old 01-14-2015, 03:17 AM   #5
dolphing
Junior Member
 
Location: China

Join Date: Dec 2010
Posts: 3
Default

The reads in the fastq file have the same barcode, which should have been demultiplexed.
dolphing is offline   Reply With Quote
Old 01-14-2015, 03:49 AM   #6
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 6,978
Default

@Bacms: Unless you have access to the original flowcell folder de-multiplexing these files will require a custom script/grep solution. Is there a chance you can go back and ask BGI to do the de-multiplexing? It would be easy for them to do.

One reason this could have happened is if you had not provided them with the barcodes for your samples. Was that the case?
GenoMax is offline   Reply With Quote
Old 01-14-2015, 04:28 AM   #7
Bacms
Junior Member
 
Location: Cambridge,UK

Join Date: Aug 2014
Posts: 6
Default

Quote:
Originally Posted by GenoMax View Post
@Bacms: Unless you have access to the original flowcell folder de-multiplexing these files will require a custom script/grep solution. Is there a chance you can go back and ask BGI to do the de-multiplexing? It would be easy for them to do.

One reason this could have happened is if you had not provided them with the barcodes for your samples. Was that the case?
This is the only data we got from BGI. They normally do the demultiplexing but this was at the end of the agreement between BGI and our University and apparently demultiplexing was not included on the cost of the contract even if they had been doing for a year. I wrote a quick python script just to look for the barcode sequence on the ID (perfect matching) and the diversity of barcodes in the sample is ridiculous including some other barcodes that Illumina provides but we did not use so I am suspecting a bit of cross contamination with someone else samples going on. Need to pull the sequences and see what they match to.

The main question is whether I also need to cut the barcode sequence from the sequence itself or not?
Bacms is offline   Reply With Quote
Old 01-14-2015, 05:37 AM   #8
mastal
Senior Member
 
Location: uk

Join Date: Mar 2009
Posts: 667
Default

You will only get barcodes in the reads fot those reads where the insert is short and you read into the Illumina adapter, and all the way through the first part of the adapter into the barcode.

If you trim your reads with something like Trimmomatic, the barcodes will be removed when Illumina adapter sequences are removed.

As for having a lot of different barcodes in the file, I think that as well as perfect matches to the barcode, the demultiplexing usually allows for a one-base mismatch to the barcode sequence, and at the end you are usually left with a small number of reads that don't match to any of the barcodes because they have too many sequencing erors.
mastal is offline   Reply With Quote
Old 01-14-2015, 08:33 AM   #9
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 6,978
Default

Quote:
Originally Posted by Bacms View Post
The main question is whether I also need to cut the barcode sequence from the sequence itself or not?
In illumina sequencing barcode sequence is *never* part of the actual read (when the reads are pre-processed, which your reads appear to be). Did you get files with generic names like (lane1_undetermined*)? What you could have is adapter contamination in reads. That can be taken care of by an appropriate trimming program.

If you have written a python script to enumerate tags then separate the reads (4 lines per) into separate files. Remember to maintain the order of R1/R2 in the two files to not get reads out of order.

Note: If you have "not expected" barcodes present (after allowing for one error as Mastal pointed out) there may be some other issue going on.

Last edited by GenoMax; 01-14-2015 at 08:47 AM.
GenoMax is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 01:42 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO