Seqanswers Leaderboard Ad

**GenoMax** · 10-01-2012, 03:39 AM

That does look odd. Can you tell us where exactly you downloaded this data from? I can have a look to see if I can reproduce this.

**per_ngs** · 10-01-2012, 03:47 AM

I got the 2 X 75 LHCN (cycling and 7 day diff) fastq files from the following link

Early Error

http://genome.ucsc.edu/cgi-bin/hgFileUi?db=hg19&g=wgEncodeCaltechRnaSeq

Attached is the snapshot of the files that I got

Attached Files

ENCODEfiles.JPG (24.6 KB, 63 views)

**GenoMax** · 10-01-2012, 04:41 AM

I tried one of the files from the dataset and I am seeing this plot. Data appears to have been submitted on 05/05/2011. Based on that date it is most likely in illumina format.

Attached Files

fastqqc_img.PNG (50.3 KB, 91 views)

**per_ngs** · 10-01-2012, 06:28 AM

Originally posted by GenoMax View Post

I tried one of the files from the dataset and I am seeing this plot. Data appears to have been submitted on 05/05/2011. Based on that date it is most likely in illumina format.

Thanks for trying fastqc one of the datasets. Other than the fact that your plot looks much better, I notice that the fastqc run that i did has identified the illumina encoding as 1.9 whereas your run has identified it as Illumina 1.5. I ran fastqc on the same file that you did and got the same results! Attached is the plot
Could my earlier results have happened because I combined the data from three runs and performed fastqc on the combined dataset? Or did you specify any paramters during the fastqc run?

Also, would it be better to keep the runs separate and also do the alignment etc. accordingly?

Thanks a lot for your help.
NGSnewbie

Attached Files

per_base_quality.png (11.4 KB, 83 views)

**GenoMax** · 10-01-2012, 08:04 AM

The only explanation seems to be that something happened when you combined the files (did you just "cat" them together?).

You could keep the lanes separate and then combine the results later.

Originally posted by per_ngs View Post

Could my earlier results have happened because I combined the data from three runs and performed fastqc on the combined dataset? Or did you specify any paramters during the fastqc run?

Also, would it be better to keep the runs separate and also do the alignment etc. accordingly?

Thanks a lot for your help.
NGSnewbie

**pbluescript** · 10-01-2012, 11:27 AM

Illumina 1.3+ uses Phred+64 while Illumina 1.9+ uses Phred+33.
You can't combine them without adjusting the quality scores to match. You will have to treat each version separately or convert the quality scores.

**per_ngs** · 10-01-2012, 08:35 PM

Thanks GenoMax and pbluescript. I will keep the data separate and process it that way.

Topics	Statistics	Last Post
Evaluating Genome Sequencing for ECMO Patients in the NICU by seqadmin Started by seqadmin, 12-17-2024, 10:28 AM	0 responses 33 views 0 likes	Last Post by seqadmin 12-17-2024, 10:28 AM
New Genetic Toolkit Refines Studies on Gene Function and Disease by seqadmin Started by seqadmin, 12-13-2024, 08:24 AM	0 responses 49 views 0 likes	Last Post by seqadmin 12-13-2024, 08:24 AM
Study Links Brain Mechanism to Emotional Responses in Animals and Humans by seqadmin Started by seqadmin, 12-12-2024, 07:41 AM	0 responses 34 views 0 likes	Last Post by seqadmin 12-12-2024, 07:41 AM
Study Identifies Ribosomal RNA Fingerprints as Early Cancer Biomarkers by seqadmin Started by seqadmin, 12-11-2024, 07:45 AM	0 responses 46 views 0 likes	Last Post by seqadmin 12-11-2024, 07:45 AM

Seqanswers Leaderboard Ad

Announcement

FastQC with the ENCODE RNASeq data

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News