I downloaded some HMP FASTQ data recently and need help understanding the metadata. I filtered by "wgs_raw_seq_set" and selected "FASTQ" as the file type, so these should be all WGS metagenomics data.
The metadata is below:
Q1: why does https://downloads.hmpdacc.org/ihmp/i...w/CSM67UEW.tar appear twice, with different file_id, different md5 and different size?!!
Q2: what is the difference between CSM67UEW.tar, CSM67UEW_TR.tar and CSM67UEW_P.fastq.gz
Bonus: CSM67UEW_P.fastq.gz is interleaved paired end FASTQ, the two tar files contained paired-end FASTQ as separate files
Bonus2: CSM67UEW.tar has 500k reads, the other two have 5M reads
Bonus3: CSM67UEW_P.fastq.gz and CSM67UEW_TR.tar correlate very well after Kraken profiling (r=0.9) but both correlate quite poorly with CSM67UEW.tar (r=0.7)
Please help, there doesn't seem to be anyone to email at HMP
The metadata is below:
Q1: why does https://downloads.hmpdacc.org/ihmp/i...w/CSM67UEW.tar appear twice, with different file_id, different md5 and different size?!!
Q2: what is the difference between CSM67UEW.tar, CSM67UEW_TR.tar and CSM67UEW_P.fastq.gz
Bonus: CSM67UEW_P.fastq.gz is interleaved paired end FASTQ, the two tar files contained paired-end FASTQ as separate files
Bonus2: CSM67UEW.tar has 500k reads, the other two have 5M reads
Bonus3: CSM67UEW_P.fastq.gz and CSM67UEW_TR.tar correlate very well after Kraken profiling (r=0.9) but both correlate quite poorly with CSM67UEW.tar (r=0.7)
Please help, there doesn't seem to be anyone to email at HMP