Seqanswers Leaderboard Ad

**kmcarr** · 07-11-2012, 04:01 PM

Mouth,

The actual barcode read for each read is recorded in the definition line of the FASTQ file. Here is the format of the Illumina FASTQ produced by CASAVA 1.8.x

Code:

@HWI-ST957:100:D0V52ACXX:6:1101:1221:2161 1:N:0:CGATGT

The barcode sequence at the end of the read is the actual barcode read for that cluster.

Here is an example showing that you can see variation in the barcode recorded in the defline. It grabs the first 1000 deflines from a gzipped fastq file, splits the defline at ":" and takes the 10th field (the barcode), sorts them and counts the number of each uniq one.

Code:

zgrep ^@HWI CTRL1_CGATGT_L006_R1_001.fastq.gz | head -1000 | cut -d":" -f10 | sort | uniq -c
      1 CGATAT
      2 CGATGA
      1 CGATGG
    987 CGATGT
      2 CGCTGT
      1 CGGTGT
      6 TGATGT

You can see that 98.7% match he expected and there are a few with mismatches.

But be aware if CASAVA demultiplexing was run with default settings no mismatches are allowed in the barcode. You will only see differences between the barcode read vs. the configured if you set up the CASAVA run (configureBclToFastq.pl) with --mismatches=1.

**Mouth_Breather** · 07-11-2012, 04:33 PM

Originally posted by kmcarr View Post

Mouth,

The actual barcode read for each read is recorded in the definition line of the FASTQ file. Here is the format of the Illumina FASTQ produced by CASAVA 1.8.x

Code:

@HWI-ST957:100:D0V52ACXX:6:1101:1221:2161 1:N:0:CGATGT

The barcode sequence at the end of the read is the actual barcode read for that cluster.

Here is an example showing that you can see variation in the barcode recorded in the defline. It grabs the first 1000 deflines from a gzipped fastq file, splits the defline at ":" and takes the 10th field (the barcode), sorts them and counts the number of each uniq one.

Code:

zgrep ^@HWI CTRL1_CGATGT_L006_R1_001.fastq.gz | head -1000 | cut -d":" -f10 | sort | uniq -c
      1 CGATAT
      2 CGATGA
      1 CGATGG
    987 CGATGT
      2 CGCTGT
      1 CGGTGT
      6 TGATGT

You can see that 98.7% match he expected and there are a few with mismatches.

But be aware if CASAVA demultiplexing was run with default settings no mismatches are allowed in the barcode. You will only see differences between the barcode read vs. the configured if you set up the CASAVA run (configureBclToFastq.pl) with --mismatches=1.

Hi thanks for the reply! I'm aware of the ability to grab the barcode sequence from the fastq files - I was not aware that it showed the actual variations for those barcodes that have 1 mismatch - thanks for that.

But I also want to see the barcodes for which mismatches are 2 and greater. Are those recorded somewhere other than the .bcl files?

**kmcarr** · 07-12-2012, 02:26 AM

Originally posted by Mouth_Breather View Post

But I also want to see the barcodes for which mismatches are 2 and greater. Are those recorded somewhere other than the .bcl files?

Those are the reads in the fastq files under the Undetermined/Sample_lanex directories.

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 17 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 22 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 16 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 46 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

Getting a list of all index sequences

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News