![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Quality Score: FastQC vs Illumina | ericguo | Illumina/Solexa | 8 | 10-22-2015 05:08 AM |
about illumina reads quality score | gridbird | Illumina/Solexa | 4 | 08-08-2011 06:10 AM |
Illumina quality score | whereisshe | Bioinformatics | 3 | 11-26-2010 07:45 AM |
Threshold quality score to determine the quality read of ILLUMINA reads problem | edge | General | 1 | 09-13-2010 03:22 PM |
Quality score threshold? | chris | Bioinformatics | 8 | 04-29-2008 01:43 AM |
![]() |
|
Thread Tools |
![]() |
#21 |
Member
Location: Boston Join Date: Sep 2010
Posts: 36
|
![]()
Thanks for the code snip -- very nice!
I just found a bug in the code -- without chomp the newline character, the score will be wrong. Here is the new code: cat fastq_file | perl -ne 'chomp;print;<STDIN>;<STDIN>;$_ = <STDIN>; chomp; $score=0; map{ $score += ord($_)-33} split(""); print " avg score: " .($score/length($_))."\n";' I tested this by following example: @HWI-ST570:30 ![]() AGGTGGGGGGGGGTGGGGGTGTGGGGTGGGGTGGGTGGGTGTTGGGGGGATGGGGGTGTGAGGTGGGGGGGGGGGGGGGGGGGGGGGTTATGGTGT + ################################################################################################ I also change the above code to Phred+33, which works for Illumina 1.8+ and Sanger. For Illumina 1.5+, use ord($_)-64 |
![]() |
![]() |
![]() |
#22 |
Junior Member
Location: South Aafrica Join Date: Apr 2015
Posts: 5
|
![]()
I tried your code for getting quality averages, howvere this is what it returned to me:
@M01232:82:000000000-AHHB7:1:2119:22801:25267 1:N:0:33 avg score: 635857.304635762 @M01232:82:000000000-AHHB7:1:2119:10835:25269 1:N:0:33 avg score: 635856.986754967 @M01232:82:000000000-AHHB7:1:2119:17952:25279 1:N:0:33 avg score: 635852.953642384 I think I have an idea of how they got to this score but what am I doing wrong or how can I fix this? I copied the code exactley and added my fastq file. Also once/if this can be fixed is it possible to run multiple separate fastq files at a time instead of having to do one at a time. All my files contains 2 mill reads per file, it takes incredibly long. Any help? Auberi |
![]() |
![]() |
![]() |
#23 |
Junior Member
Location: South Aafrica Join Date: Apr 2015
Posts: 5
|
![]()
Din't see the post prior to mine and that seems to have fixed my bug as I worked with ILLUMINA 1.8+.
Sorry about that. Thanks |
![]() |
![]() |
![]() |
#24 |
Member
Location: Madrid, Spain Join Date: Mar 2014
Posts: 30
|
![]()
Sorry. It is a little off topic but ...
I am trying to merge paired ends fastq files. Currently I have only been able to perform this task with BBmerge, mergePairs.py and a custom software developed by myself. I consistently get low percentages of merging: about 20% when limiting quality of overlap above 90%. Is this normal? I assumed that merging would be much above 50%. Thanks. |
![]() |
![]() |
![]() |
#25 | |
Simon Andrews
Location: Babraham Inst, Cambridge, UK Join Date: May 2009
Posts: 871
|
![]() Quote:
|
|
![]() |
![]() |
![]() |
#26 |
Member
Location: Madrid, Spain Join Date: Mar 2014
Posts: 30
|
![]()
Thanks. If I understand correctly, given a library and a particular gene to be studied, insert size should be known. OK?
|
![]() |
![]() |
![]() |
#27 |
Simon Andrews
Location: Babraham Inst, Cambridge, UK Join Date: May 2009
Posts: 871
|
![]()
Whoever made the library should have a rough idea of the size selection they used when creating the library from which you can work out the insert size range. It will be an approximate range though, not a fixed value.
|
![]() |
![]() |
![]() |
#28 |
Member
Location: Madrid, Spain Join Date: Mar 2014
Posts: 30
|
![]()
Thanks a lot. You are being very helpful
![]() |
![]() |
![]() |
![]() |
#29 |
Senior Member
Location: East Coast USA Join Date: Feb 2008
Posts: 7,087
|
![]()
@bi_maniac: While you wait to hear back from people who made the libraries, you can estimate the insert size based on data at hand. See Brian's suggestion here: http://seqanswers.com/forums/showpos...13&postcount=2
|
![]() |
![]() |
![]() |
#30 |
Super Moderator
Location: Walnut Creek, CA Join Date: Jan 2014
Posts: 2,707
|
![]()
It would help if you could post the insert-size histogram from merging, by the way. If it is monotonically increasing and then suddenly drops to zero just before 2*(read length), then they mostly don't overlap. If there is a prominent bell-shaped peak well below 2*(read length), then the problem is likely quality. But at a 20% merge rate I assume they mostly don't overlap.
|
![]() |
![]() |
![]() |
#31 |
Member
Location: Madrid, Spain Join Date: Mar 2014
Posts: 30
|
![]()
My histogram has 2 peaks in 149-150 and 200
#Mean 166,846 #Median 151 #Mode 200 #STDev 28,597 #PercentOfPairs 16,089 #InsertSize Count 51 1 57 1 69 3 73 1 75 1 76 1 79 1 82 2 86 1 87 1 88 1 91 1 92 1 95 1 96 1 97 1 98 4 101 1 103 3 105 1 106 4 107 2 109 2 111 3 112 4 113 2 114 3 115 9 116 6 117 3 118 13 119 3 120 4 121 17 122 3 123 8 124 18 125 12 126 15 127 16 128 10 129 13 130 16 131 8 132 12 133 17 134 19 135 16 136 26 137 25 138 18 139 48 140 31 141 32 142 39 143 35 144 49 145 53 146 98 147 160 148 217 149 398 150 379 151 240 152 175 153 114 154 57 155 32 156 17 157 8 158 5 159 2 160 2 161 1 162 1 163 1 164 1 165 2 166 1 167 3 168 4 169 1 170 2 172 5 175 1 176 1 177 3 178 4 179 2 180 5 181 5 182 4 183 12 184 4 185 7 186 3 187 7 188 13 189 14 190 15 191 18 192 31 193 37 194 52 195 79 196 82 197 92 198 267 199 170 200 405 201 154 202 52 203 31 204 22 205 6 206 3 207 3 208 1 209 1 210 2 211 3 214 1 216 1 217 3 218 1 221 4 231 1 238 1 240 1 241 2 249 1 256 1 257 1 260 2 263 3 264 2 265 2 266 2 273 1 282 3 284 2 286 2 287 2 288 2 289 2 290 1 291 3 292 1 |
![]() |
![]() |
![]() |
#32 |
Super Moderator
Location: Walnut Creek, CA Join Date: Jan 2014
Posts: 2,707
|
![]()
That would be very strange, for random shearing. Is it an amplicon library? If so, my previous post is irrelevant, it was based on the assumption of random shearing. Note that BBMerge can merge reads with insert size longer than read length using kmer counting, but that won't work for amplicons, only random fragmentation with sufficient coverage (>5x or so). Sometimes you can increase the merge rate by quality trimming (flags "qtrim2=r trimq=12" in BBMerge, which will only trim if the initial merge attempt fails), and for generating an insert size histogram of very low quality reads, I generally use the "xloose" flag which makes it more sensitive (at the expense of false positive merges).
What's the read quality like? Can you post the per-base qscore histogram? (reformat.sh in=reads.fq qhist=qhist.txt). |
![]() |
![]() |
![]() |
#33 |
Member
Location: Madrid, Spain Join Date: Mar 2014
Posts: 30
|
![]()
Dear Brian.
I will continue giving you more info as soon as I get it. Meanwhile let my give you many thanks for your most valuable help. Besides that I want to ask you for a further help: Do you know any tutorial or book that I could read in order to learn thiese concepts. The bioinformatics books I have are too basic and do not treat these issues. |
![]() |
![]() |
![]() |
#34 |
Super Moderator
Location: Walnut Creek, CA Join Date: Jan 2014
Posts: 2,707
|
![]()
Sorry, I can't give you any advice there. Bioinformatics too rapidly-evolving now for books to be relevant for very long, I think.
|
![]() |
![]() |
![]() |
#35 |
Member
Location: Madrid, Spain Join Date: Mar 2014
Posts: 30
|
![]()
Hi Brian,
It is amplicon library. Insert range is 300-370. I will send you exec reports soon. |
![]() |
![]() |
![]() |
#36 |
Member
Location: Madrid, Spain Join Date: Mar 2014
Posts: 30
|
![]()
Hi, helpful people: this conversation continues here: http://seqanswers.com/forums/showthread.php?t=63930
Thanks a lot. Last edited by bi_maniac; 11-02-2015 at 11:55 AM. |
![]() |
![]() |
![]() |
Thread Tools | |
|
|