View Single Post
Old 01-12-2012, 04:13 PM   #5
Senior Member
Location: Sydney

Join Date: Feb 2011
Posts: 149

Originally Posted by RogerH View Post

Thanks for the reply. Yes, I'm using Illumina 100bp paired-end data.

My supervisor told me that I should just try trimmed and untrimmend, and then suggested that I use the untrimmed assembly for annotation. But I did fear that there might be a problem with that.

I used FastQC on my data, there is a bit of a problem with the GCAT content in the first 10 bp (due to the not-so-random random primers that are used for Illumina library preparation I believe). And the Q value of the last 15-20 bases drops off considerably.

The problem is that I'm pressed for time, so before Christmas I decided to stop working on the assembly and go ahead with the annotation step (which takes a considerable amount of time, using Blast2go).
Sounds like you are doing exactly the same thing as me.
I am also using blast2go now, and I have around 240k transcripts, and this will probably take 2 weeks or more as i am doing it through the web-interface.
I also have 100-nt paired end Illumina reads, and the first 10 bases or so is like yours. It is indeed due to the not-so-random nature of the random hexamers used the library prep. I trimmed off the first 12 for good measure, though trimming off 15 is not unusual.

Unfortunately I don't think the annotation for the untrimmed data would be reliable, particularly since you say the Q score of the 3' end also drops off alot. I would recommend using trimmed data.
Kennels is offline   Reply With Quote