SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Show filtered reads in the IGV? MBender Bioinformatics 0 01-25-2012 12:04 PM
Inordinate Number of Reads Being Filtered Dynamac 454 Pyrosequencing 6 10-25-2011 03:09 PM
Trim Illumina reads? sapearl Bioinformatics 3 08-10-2011 08:35 AM
quality filtered illumina PE reads Wallysb01 Bioinformatics 1 07-21-2011 10:04 AM
getting reads not filtered? gerald2545 454 Pyrosequencing 12 05-18-2009 12:00 AM

Reply
 
Thread Tools
Old 05-08-2012, 07:36 PM   #1
rnaeye
Member
 
Location: East Cost

Join Date: May 2011
Posts: 79
Default Illumina filtered reads vs unfiltered reads

Hi!
I was wondering if someone could tell me why some of my Illumina reads have either 1:N:18:ATCACG or 1:Y:18:ATCACG. The vast majority of reads do not have any of these flags. What does it mean?

My understanding is that some of the reads are filtered and they either pass it or fail, and those are labeled as 1:N:18:ATCACG and 1:Y:18:ATCACG, respectively. Why only some of the reads are filtered? What does filtering do? How about reads they are not flagged with :Y: or :N:? Are those high quality reads.

Can someone help me understand this puzzle. Thank you for your help.
rnaeye is offline   Reply With Quote
Old 05-08-2012, 10:33 PM   #2
Wallysb01
Senior Member
 
Location: San Francisco, CA

Join Date: Feb 2011
Posts: 286
Default

The Y/N is wether or not it failed the pass filtering step. So the better reads are going to have a N. I believe the quality score cut off is 2 however (might be wrong here been a while since I did this). So even reads that passed can be fairly poor in quality. Then the ATCACG is the bar code.

So if some read IDs have this information and other don't you might ask some questions to who ever did the CASAVA pipeline for you, as this doesn't seem like the standard way to go about it. They all should have this info. My guess is that they didn't handle the bar coding of your samples very well.
Wallysb01 is offline   Reply With Quote
Old 05-09-2012, 04:26 AM   #3
kmcarr
Senior Member
 
Location: USA, Midwest

Join Date: May 2008
Posts: 1,178
Default

Quote:
Originally Posted by rnaeye View Post
Hi!
I was wondering if someone could tell me why some of my Illumina reads have either 1:N:18:ATCACG or 1:Y:18:ATCACG. The vast majority of reads do not have any of these flags. What does it mean?
I would be very concerned if the majority of the sequence ID lines in your FASTQ files do NOT have this block of information attached; it should be present for every sequence ID. If the majority of your reads do not include this information then the FASTQ file is malformed and you should talk with your sequencing provider. I have attached a PDF which describes in detail the format and meaning of the Illumina FASTQ sequence ID line (as well as the file naming convention) for files generated by CASAVA 1.8 and greater.

Quote:
My understanding is that some of the reads are filtered and they either pass it or fail, and those are labeled as 1:N:18:ATCACG and 1:Y:18:ATCACG, respectively. Why only some of the reads are filtered? What does filtering do? How about reads they are not flagged with :Y: or :N:? Are those high quality reads.
Filtering is performed by the Illumina Real Time Analysis (RTA) software and happens during the run. The filter algorithm examines the ratio of signal intensities for each cluster at each cycle up to cycle #25. If the intensity ratios exceed a particular threshold more than once during the first 25 cycles the cluster (read) is designated as "failed filtering" and will have a "Y" in its FASTQ header line, otherwise it is passed and will have an "N". Beyond cycle 25 has no impact on whether a read is considered passed or failed.
Attached Files
File Type: pdf IlluminaFASTQ_CASAVA_1.8.pdf (74.8 KB, 155 views)
kmcarr is offline   Reply With Quote
Old 05-09-2012, 05:55 AM   #4
rnaeye
Member
 
Location: East Cost

Join Date: May 2011
Posts: 79
Default

Hi Wallysb01 and kmcarr,

Thanks for the information. I checked the file again. You are right. All the sequences in fastaq file associated with either "...Y..." or "...N..." flags.

I have noticed that it's the SAM (Bowtie output) file does not have flags in most sequences. The sequences without flags are aligned to target genome (has chr number and positions), and sequences tagged with 'Y' or 'N' flag are not perfectly aligned to target genome.
rnaeye is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 10:38 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO