SEQanswers

Go Back   SEQanswers > Sequencing Technologies/Companies > Illumina/Solexa



Similar Threads
Thread Thread Starter Forum Replies Last Post
Strange Illumina fastq header chariko Bioinformatics 2 07-27-2016 04:59 AM
Strange headers of NextSeq 500 fastq reads azzzkita Illumina/Solexa 7 01-13-2016 03:08 AM
Lotsa new toys from Illumina: HiSeq X Five, 3000, 4000, NextSeq 550 GW_OK Illumina/Solexa 53 05-21-2015 12:30 AM
Rewriting the header of fastq SeqTrbl Bioinformatics 1 09-13-2013 01:48 AM

Reply
 
Thread Tools
Old 08-03-2017, 12:49 PM   #1
jomare1188
Junior Member
 
Location: Colombia

Join Date: Jul 2017
Posts: 5
Default NextSeq 550 Fastq header

Hi, im working with PE Nextseq 550 data

I know that there are 12 inline barcodes from 5 to 8 bp for my data

the reads fastq headers shows 57 different dual index sequences (all 8 bp each one) with CGGGGGGG+TGCTTCCA the most prevalent being in the 98% of the total reads

something like this:
@NB501358:172:HTJY7BGX2:1:11101:17581:139461:N:0:CGGGGGGG+TGCTTCCA

does someone know what the 57 sequences means?

thanks
jomare1188 is offline   Reply With Quote
Old 08-03-2017, 03:46 PM   #2
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 6,550
Default

Are you sure about inline barcodes?

The example you posted above is standard Illumina index sequences for a dual indexed run. It is common to see a mishmash of combinatorial index sequences (besides those known to be there) in sequence data. They represent errors among other things. I am not sure why you received 57 combinations (it is as if they gave you the remaining data after they took what they wanted). Was this sample run with others that did not belong to you?
GenoMax is offline   Reply With Quote
Old 08-04-2017, 10:17 AM   #3
jomare1188
Junior Member
 
Location: Colombia

Join Date: Jul 2017
Posts: 5
Default

Thanks for your answer!

Yes i'm sure about the inline barcodes

Yes this data was provided by a external lab most probably was run with other data
jomare1188 is offline   Reply With Quote
Old 08-04-2017, 10:29 AM   #4
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 6,550
Default

If "CGGGGGGG+TGCTTCCA" is 98% of the data then you should remove and ignore the rest. You have no idea what is in there if your sample was part of a larger pool. If you are able to recognize the inline barcodes conclusively then you could go fishing in the remainder of the data.
GenoMax is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 02:17 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2017, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO