SEQanswers

Go Back   SEQanswers > Sequencing Technologies/Companies > Illumina/Solexa



Similar Threads
Thread Thread Starter Forum Replies Last Post
For MAQ: Is there a Tool to convert sanger-format fastq file to illumina-fotmat fastq byb121 Bioinformatics 6 12-20-2013 02:26 AM
Lost in fog, what to do with a assembly c_ro87 Bioinformatics 1 02-10-2012 11:24 AM
IGV: Mapping Quality vs. XA tag (I'm lost) asheenlevrai General 3 01-18-2012 07:30 AM
Reduce file size after Illumina FASTQ to Sanger FASTQ conversion? jjw14 Illumina/Solexa 2 06-01-2010 05:35 PM
read-qualities lost in Mosaik-processing tczy Bioinformatics 4 05-17-2010 09:31 AM

Reply
 
Thread Tools
Old 11-11-2011, 10:28 AM   #1
eilosei
Member
 
Location: New York

Join Date: Nov 2011
Posts: 19
Unhappy Help! lost data in fastq file

I got some Illumina Hiseq result in fastq files from my facility, but the yield is a lot lower than expected.

For example, the summary file for one sample shows that it contains 225566 +/- 0 clusters (PF), and I expected to get around 225566*32=7 million reads. But the fastq file only includes less than 0.5 million reads. One example read is

@HWI-1KL117_0134:6:2104:1422:2023#GNCAAT/1
NTTTTAATGAAAACACGGAAATTAAAAATTCTTGAAGGTGACATCCCTCCA

Because the sample has index barcode "GCCAAT", is it possible that the facility selected reads with bad barcode "GNCAAT" in stead of "GCCAAT" when demultiplexing? If it is true, is there anyway to rescue my sample?

Thanks in advance!
eilosei is offline   Reply With Quote
Old 11-11-2011, 11:04 AM   #2
kopi-o
Senior Member
 
Location: Stockholm, Sweden

Join Date: Feb 2008
Posts: 319
Default

Demultiplexing usually allows for one mismatch and sometimes indels, so it's not surprising that this read would have been considered as matching the barcode. Why don't you go through the whole fastq file and check for the barcodes that appear in the headers with something like

grep '@' file.fastq | cut -d '#' -f2 | sort | uniq -c

Maybe your summary was referring to the cluster density for the whole lane, and they had 12 multiplexed samples in it? That would fit approximately with 12*0.5M. 7 million reads for a lane is pretty low, though ...

Last edited by kopi-o; 11-11-2011 at 01:26 PM. Reason: typo
kopi-o is offline   Reply With Quote
Old 11-11-2011, 12:46 PM   #3
eilosei
Member
 
Location: New York

Join Date: Nov 2011
Posts: 19
Default

Thank you for your quick reply. I tried the command like

grep '@' 003_s_6_sequence.txt | cut -d '#' -f2 | sort |*uniq -c

But get the notice like

-bash: *uniq: command not found

Beside, the entire fastq file only contains reads from barcode "GNCAAT". As to the summary file, I have three samples in one lane, and each sample has its own summary file after demultiplexing. The PF cluster numbers are similar. So there should be around 10 million PF reads for each, which is still low for HiSeq.
eilosei is offline   Reply With Quote
Old 11-11-2011, 01:30 PM   #4
kopi-o
Senior Member
 
Location: Stockholm, Sweden

Join Date: Feb 2008
Posts: 319
Default

I'm sorry, there was a typo in my reply. The star '*' is not supposed to be there; the command is called uniq. I've removed it.

Anyway, there is no need for you to run the command if you already know that the entire fastq file contains the same barcode. Then it would appear that the facility has failed to extract the exact-match barcode, or that the second cycle in the index read step went so seriously awry that no base calls could be made.
kopi-o is offline   Reply With Quote
Old 11-11-2011, 02:00 PM   #5
kopi-o
Senior Member
 
Location: Stockholm, Sweden

Join Date: Feb 2008
Posts: 319
Default

Also, just a crazy suggestion - why not just ask the facility? :-)
kopi-o is offline   Reply With Quote
Old 11-11-2011, 02:03 PM   #6
eilosei
Member
 
Location: New York

Join Date: Nov 2011
Posts: 19
Default

Now the command works, and all my three samples from one lane have the same problem. The barcode of each read all carries a mismatch "N" at the second character, e.g. CNATGT, ANAGTG, GNCAAT.

I will ask the facility to reprocess the demultiplexing step. Hope it's just a computer problem and I can get reads with perfect match barcodes.

Thanks!
eilosei is offline   Reply With Quote
Old 11-11-2011, 02:06 PM   #7
eilosei
Member
 
Location: New York

Join Date: Nov 2011
Posts: 19
Default

Because the facility believed I gave them bad samples, although they said no problem basing on the bioanalyzer result!

Quote:
Originally Posted by kopi-o View Post
Also, just a crazy suggestion - why not just ask the facility? :-)
eilosei is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 09:29 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO