Seqanswers Leaderboard Ad

**GenoMax** · 01-13-2015, 04:48 AM

What kind of run is this (HiSeq/MiSeq)? Is that data truly non-demultiplexed? Do you see barcodes in the Fastq ID Header? Can you post a few example sequences?

You won't be able to get fastq files for the barcodes unless the instrument was set up to run (or the post-processing of the data was done) in a special way.

**gmarco** · 01-13-2015, 06:08 AM

You should use bcl2fastq from Illumina to demultiplex your data. Download and employ version according to the sequencing instrument used to obtain the data.

**Bacms** · 01-13-2015, 06:12 AM

Originally posted by GenoMax View Post

What kind of run is this (HiSeq/MiSeq)? Is that data truly non-demultiplexed? Do you see barcodes in the Fastq ID Header? Can you post a few example sequences?

You won't be able to get fastq files for the barcodes unless the instrument was set up to run (or the post-processing of the data was done) in a special way.

This is HiSeq (2000 I believe but need to double check) and I do see barcodes on the Fastq ID. Does that mean that effectively the data has been demultiplexed just needs to be split?
Here is the head of one of the files:
head 140916_I607_FCC5618ACXX_L4_CHKPEI14080416_1.fq

Code:

@FCC5618ACXX:4:1101:1415:1818#GTGGCCAT/1
NCCCAAACGCGCGTGACTTCACAATAATTAGCCCGTACCTGCTGGTTACGTGGCGGCACCGTGTACAATACCCTAGGCATCAGGGTTAGGCATGGTTACT
+
BP\ceeeegggggghiiiiiiiiihiiiiihiiiiiiiiiiiiiifgggggeeeccaccaccaacdcccccbccccbccccccbc[`accccccc`bccc
@FCC5618ACXX:4:1101:1308:1827#ATGTCAAT/1
NCCCACCAAAACCGGAAAATGCAGGCCCTGTCGTCTCGCGTGAACATCGCGGCCAAGCCCCAGCGCGCTCAGCGCCTGGTGGTCCGCGCCGAGGAGGTTA
+
BP\ccecegggggiihhhiegghhhhihihgiihhiiihighfhiihfggecaacca_acccccZ]]]aaXb]]aX]ac]^_]bccccccc]_a___QW`
@FCC5618ACXX:4:1101:1465:1834#TCCCGAAT/1
NAACCAGGCGAACGGTTGGCGTCGGGATTCGGGACGCAAGCATGGCGCTGACCAGGAGATCGGAAGAGCACACGTCTGAACTCCAGTCACTCCCGAAGCT

head 140916_I607_FCC5618ACXX_L4_CHKPEI14080416_2.fq

Code:

@FCC5618ACXX:4:1101:1415:1818#GTGGCCAT/2
CTCCGGTGTCAAGTAACCATGCCTAACCCTGATGCCTAGGGTATTGTACACGGTGCCGCCACGTAACCAGCAGGTACGGGCTAATTATTGTGAAGTCACG
+
_bbeeecegggggihiiiiiiiiiiiiiiiiiiiiiihhiicffhhhhhighieghhhhiggeeeecddccccccccccccccbbcdddcdcbdbbbbcc
@FCC5618ACXX:4:1101:1308:1827#ATGTCAAT/2
CGGGGCGCAGGATCTTCACCAGCGAGCCGCGCTTGGGGCCGACCTCCTTCTTGGGGGCAGCCTTAACCTCCTCGGCGCGGACCACCAGGCGCTGAGCGCG
+
ab_ceeeef`geghhiiihhiiihiihhiigeeca`accccccccccccc]bbcacW[acccccbbccccccb__cccaaccc^aa[[_`accca^baac
@FCC5618ACXX:4:1101:1465:1834#TCCCGAAT/2
CCTGGTCAGCGCCATGCTTGCGTCCCGAATCCCGACGCCAACCGTTCGCCTGGTTCAGATCGGAAGAGCGTCGTGTAGGGA

**dolphing** · 01-14-2015, 04:17 AM

The reads in the fastq file have the same barcode, which should have been demultiplexed.

**GenoMax** · 01-14-2015, 04:49 AM

@Bacms: Unless you have access to the original flowcell folder de-multiplexing these files will require a custom script/grep solution. Is there a chance you can go back and ask BGI to do the de-multiplexing? It would be easy for them to do.

One reason this could have happened is if you had not provided them with the barcodes for your samples. Was that the case?

**Bacms** · 01-14-2015, 05:28 AM

Originally posted by GenoMax View Post

@Bacms: Unless you have access to the original flowcell folder de-multiplexing these files will require a custom script/grep solution. Is there a chance you can go back and ask BGI to do the de-multiplexing? It would be easy for them to do.

One reason this could have happened is if you had not provided them with the barcodes for your samples. Was that the case?

This is the only data we got from BGI. They normally do the demultiplexing but this was at the end of the agreement between BGI and our University and apparently demultiplexing was not included on the cost of the contract even if they had been doing for a year. I wrote a quick python script just to look for the barcode sequence on the ID (perfect matching) and the diversity of barcodes in the sample is ridiculous including some other barcodes that Illumina provides but we did not use so I am suspecting a bit of cross contamination with someone else samples going on. Need to pull the sequences and see what they match to.

The main question is whether I also need to cut the barcode sequence from the sequence itself or not?

**mastal** · 01-14-2015, 06:37 AM

You will only get barcodes in the reads fot those reads where the insert is short and you read into the Illumina adapter, and all the way through the first part of the adapter into the barcode.

If you trim your reads with something like Trimmomatic, the barcodes will be removed when Illumina adapter sequences are removed.

As for having a lot of different barcodes in the file, I think that as well as perfect matches to the barcode, the demultiplexing usually allows for a one-base mismatch to the barcode sequence, and at the end you are usually left with a small number of reads that don't match to any of the barcodes because they have too many sequencing erors.

**GenoMax** · 01-14-2015, 09:33 AM

Originally posted by Bacms View Post

The main question is whether I also need to cut the barcode sequence from the sequence itself or not?

In illumina sequencing barcode sequence is *never* part of the actual read (when the reads are pre-processed, which your reads appear to be). Did you get files with generic names like (lane1_undetermined*)? What you could have is adapter contamination in reads. That can be taken care of by an appropriate trimming program.

If you have written a python script to enumerate tags then separate the reads (4 lines per) into separate files. Remember to maintain the order of R1/R2 in the two files to not get reads out of order.

Note: If you have "not expected" barcodes present (after allowing for one error as Mastal pointed out) there may be some other issue going on.

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 27 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 30 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 26 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 52 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

Demultiplexing Illumina RNASeq paired reads

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News