![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
A first look at Illumina’s new NextSeq 500 | AllSeq | Vendor Forum | 111 | 03-12-2020 03:25 AM |
Nextseq 500 base calling | Paulfrobbins | Illumina/Solexa | 2 | 03-29-2015 07:38 PM |
A heads up for all NextSeq 500 users! | LizD | Illumina/Solexa | 10 | 02-08-2015 09:59 AM |
Questions about whole-exome sequencing on NextSeq 500 | newtoseq | Illumina/Solexa | 3 | 11-02-2014 08:26 PM |
no dual indices on NextSeq 500 (yet) | SeqNerd | Illumina/Solexa | 9 | 10-20-2014 12:06 PM |
![]() |
|
Thread Tools |
![]() |
#1 |
Junior Member
Location: St.Petersburg Join Date: Jan 2016
Posts: 3
|
![]()
Hello everyone,
I have some reads from NextSeq 500 in fastq format with such structure of headers: @ERR1136327.6 NS500217:127:H72WTBGXX:2:11203:22066:4060/1 It doesn't match the common structures of fastq headers (casava 1.8): @ <instrument‐name>:<run ID>:<flowcell ID>:<lane‐number>:<tile‐number>:** <x‐pos>: <y‐pos> <read number>:<is filtered>:<control number>:<barcode sequence>. Nor does it fit the older standard, which was like “@HWUSI-EAS100R:6:73:941:1973#0/1”. Do you know, what do the items in this header mean? I'm especially intriuged by the last number after the slash. Thanks in advance. |
![]() |
![]() |
![]() |
#2 | |
Senior Member
Location: East Coast USA Join Date: Feb 2008
Posts: 7,088
|
![]()
Did you downloaded this data from SRA (Fastq-dump)?
If you use the option Quote:
BTW: NextSeq data requires processing by bcl2fastq v.2.x, the successor to older versions of CASAVA/bcl2fastq (v.1.x). |
|
![]() |
![]() |
![]() |
#3 |
Senior Member
Location: uk Join Date: Mar 2009
Posts: 667
|
![]()
ERR1136327.6 is a number given by the nucleotide archives (SRA or ENA). I think .6 is the read number.
I'm guessing NS500 means it's the NestSeq 500, so H72WTBGXX is probably the flow cell ID. Have a look at pages 62-64 of the NestSeq system guide for a description of the flow cell and camera,swath, tile and lane numbers. https://support.illumina.com/content...5046563-01.pdf |
![]() |
![]() |
![]() |
#4 |
Senior Member
Location: uk Join Date: Mar 2009
Posts: 667
|
![]()
Correction, I've been looking at the file, and H72WTBGXX is probably not the flow cell, as each read has a different set of numbers/letter for that part of the header.
|
![]() |
![]() |
![]() |
#5 |
Senior Member
Location: East Coast USA Join Date: Feb 2008
Posts: 7,088
|
![]()
Here is a direct link for fastq version of the file at EBI SRA: ftp://ftp.sra.ebi.ac.uk/vol1/fastq/E...36327.fastq.gz
On taking a deeper look, something strange appears to be going on with this file. It looks like the data may come from more than one machine/flowcell. I see these three (what appears to be) machine ID's Code:
HSQ700642 M00282 NS500217 Code:
H3LYMBGXX H3MKGBGXX H72GCBGXX H72W7BGXX H72WTBGXX H7BRNADXX H88PCADXX H8FU7ADXX H8JGMADXX You should check with SRA and/or with the data submitters to confirm. |
![]() |
![]() |
![]() |
#6 |
Junior Member
Location: St.Petersburg Join Date: Jan 2016
Posts: 3
|
![]()
Thank you for your answers! Just in case, if someone gets in the same situation (which is rather unlikely), I wrote to the first author of this research. This research studied the ancient people’s DNA, which preserved in form of very short fragments, generally even shorted, than the length of middle NextSeq 500 reads. When such short fragments are sequenced from both ends, reads are generally the same, so they were merged by researchers. This explains, why the headers of fastq files had /1 in the ends, like the headers of the first half of paired-end reads, though the file was single, and, as the author of this research wrote, should be treated like single-end reads. Other details about EBI fastq headers format could be found here: http://www.ebi.ac.uk/ena/submit/read-data-format. Another strange thing in this story is that author wrote, that they never uploaded fastq files to the database, but only uploaded bam. So, probably, EBI automatically generated fastq files, using bam files 0_o. This is weird, but could also partly explain the structure of fastq headers.
|
![]() |
![]() |
![]() |
#7 |
Senior Member
Location: East Coast USA Join Date: Feb 2008
Posts: 7,088
|
![]()
Thanks for the explanation.
Did the authors say if they actually "merged" data from three different illumina sequencers (HiSeqSQ, MiSeq and NextSeq) and multiple flowcells in one file (in addition to merging R1/R2 reads)? Based on the flowcell ID's that appears to be so. I have not seen data merged like this yet. EBI always makes the fastq files available for samples (in most cases). People tend to have issues with SRA archives at times and this is a nice fall back to get the reads directly. |
![]() |
![]() |
![]() |
#8 |
Junior Member
Location: St.Petersburg Join Date: Jan 2016
Posts: 3
|
![]()
You were right, these fastq resulted from merging data from varios runs, which were made on different sequencers. So these files are totally artifitial, automatically generated from downstream proccessed files, they are not raw reads.
|
![]() |
![]() |
![]() |
Thread Tools | |
|
|