SEQanswers

Go Back   SEQanswers > Sequencing Technologies/Companies > Illumina/Solexa



Similar Threads
Thread Thread Starter Forum Replies Last Post
Rescuing stopped MiSeq run yaximik Illumina/Solexa 9 09-20-2018 08:22 AM
MiSeq stopped run yaximik Illumina/Solexa 20 03-18-2013 08:10 AM
How much PhiX in MiSeq run? thdybwf Illumina/Solexa 1 11-26-2012 04:01 AM
Need help debugging a faulty MiSeq run simon_seq Illumina/Solexa 6 08-15-2012 05:18 AM
Worst MiSeq Run ever? pmiguel Illumina/Solexa 0 04-30-2012 05:54 AM

Reply
 
Thread Tools
Old 02-16-2013, 08:47 PM   #1
yaximik
Senior Member
 
Location: Oregon

Join Date: Apr 2011
Posts: 205
Default MiSeq PE run output files

Hi,

Can anyone advise me why two output files from a paired end run differ in size? The file for run 2 is about 2 times bigger than for read 1, so I thought it inlcudes both reads 1 and reads 2. Yet, the end strings of each read name (either 1:N:0:1 in read 1 or 2:N:0:1 in read 2) indicate this is not the case. So why the difference?
yaximik is offline   Reply With Quote
Old 02-17-2013, 11:31 AM   #2
mcnelson.phd
Senior Member
 
Location: Connecticut

Join Date: Jul 2011
Posts: 162
Default

Are you referring to the gzip compressed or the un-compressed fastq files?

I haven't seen a case where one read files is twice as large as the other, but I have seen differences in file size for the gzipped files. Part of this is most likely due to the compression algorithm being able to compress one file better than the other. It's also possible that with adapter trimming turned on, depending on the quality of your data, you could have longer reads for read 2 because the data quality dropped enough that Reporter couldn't properly identify the adapter and thus didn't trim it.

Either way, I wouldn't be concerned about it.
mcnelson.phd is offline   Reply With Quote
Old 02-17-2013, 08:21 PM   #3
yaximik
Senior Member
 
Location: Oregon

Join Date: Apr 2011
Posts: 205
Default

No, this is not compression as decompressed files are also about twice longer. The number of records as counted using Biopieces is the same, and clean&trim reduces file size to about the same as read 1. It is very likely much longer records with lots of Ns, which is very surprising. I guess somtheing is wrong with basecalling.
yaximik is offline   Reply With Quote
Old 02-19-2013, 08:58 PM   #4
kcchan
Senior Member
 
Location: USA

Join Date: Jul 2012
Posts: 184
Default

Are all of the reads the same length? Did you disable adapter trimming? This may contribute to unequal file sizes.
kcchan is offline   Reply With Quote
Old 02-20-2013, 07:19 AM   #5
yaximik
Senior Member
 
Location: Oregon

Join Date: Apr 2011
Posts: 205
Default

Looks like this indeed a read quality issue as quality filtering/adaptor removal levels file sizes, although read 2 file size now always become smaller. Say, from original 4.2 GB and 8 GB files shrink to 3.8 GB and 3.6 GB, and this is a common trend no matter what library sizes or run lengths are. I am bugging Illumina Tech Support with that. For example, read quality peaks at 100-120 cycles and sharply declines after that even with library size around 600 bp.
yaximik is offline   Reply With Quote
Old 02-20-2013, 08:41 AM   #6
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 7,080
Default

Quote:
Originally Posted by yaximik View Post
I am bugging Illumina Tech Support with that. For example, read quality peaks at 100-120 cycles and sharply declines after that even with library size around 600 bp.
Is this a "low complexity"/amplicon type library, if so this is a known issue.

Are you running MCS v.2.1.1.13 on your MiSeq?
GenoMax is offline   Reply With Quote
Old 02-20-2013, 01:06 PM   #7
yaximik
Senior Member
 
Location: Oregon

Join Date: Apr 2011
Posts: 205
Default

No these were all genomic libraries. I just recently upgraded to v.2.1.13.0, the majority of runs were done with whatever was the previous version.
ILMN support thinks it is a matrix issue, so I was asked to make a full 2x250 bp phiX run to get kinda baseline output and then go from there. Also, they noticed that in the majority of runs the first nucleotide is C, which is consistent with the fact that most of the samples are from ancient archeological samples. From Paabo work on Neanderthal it is known that DNA breaks preferentially either after or before G.
yaximik is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 07:49 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2021, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO