I'm trying to retrieve average quality of each read to make graphs of read length/quality. I don't want to use fastx, fastqc, I want data to make graphs myself, so i can adjust scales. I retrieved sequences length, this was trivial. I got phred qualities in qual file, I have no idea how to make those numbers an average. I tried numpy average, but it constantly wants something different, so until I will give up, I wanted to ask a question here.
Seqanswers Leaderboard Ad
Collapse
Announcement
Collapse
No announcement yet.
X
-
Originally posted by nanto View PostI'm trying to retrieve average quality of each read to make graphs of read length/quality. I don't want to use fastx, fastqc, I want data to make graphs myself, so i can adjust scales. I retrieved sequences length, this was trivial. I got phred qualities in qual file, I have no idea how to make those numbers an average.
Comment
-
Originally posted by nanto View Postno it's not one sequence. I received pretty new data set, and sequence quality might be affected by it's length. I need some plots to show it. So I have to calculate average for each sequence with it's corresponding length.
Coming back to your original question, to keep track of the qualities for different lengths you'd just need to make a 2D dataset where you had something like a hash of arrays, where the hash key was the length and the array held the set of average quality values for sequences with that length. Depending on how wide your range of lengths was you might want to bin them rather than tracking every length separately.
Comment
-
funny thing is I know how it will look like, i just have to do it to visualize data.
Idea of length filtering and making per base qualities is good, i think i will use boxplots for each subset.
And with hashing array, I don't think that's necessary. My file is sorted, so every result is in certain position. When loading data do produce graph I will have corresponding length in position to corresponding sequence quality.
So what I need is a simple part of a script in python, perl whatever which will read my qual file and write to new file only average values for each record which will be separated by \n
But it's good to find some new ideas what I can get from this data and how to show it so it will look at least interesting
Comment
Latest Articles
Collapse
-
by seqadmin
Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...-
Channel: Articles
04-04-2024, 04:25 PM -
-
by seqadmin
Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...-
Channel: Articles
03-22-2024, 06:39 AM -
ad_right_rmr
Collapse
News
Collapse
Topics | Statistics | Last Post | ||
---|---|---|---|---|
Started by seqadmin, Today, 09:21 AM
|
0 responses
9 views
0 likes
|
Last Post
by seqadmin
Today, 09:21 AM
|
||
Started by seqadmin, 04-04-2024, 09:00 AM
|
0 responses
40 views
0 likes
|
Last Post
by seqadmin
04-04-2024, 09:00 AM
|
||
Started by seqadmin, 04-04-2024, 08:48 AM
|
0 responses
30 views
0 likes
|
Last Post
by seqadmin
04-04-2024, 08:48 AM
|
||
Started by seqadmin, 04-01-2024, 06:45 AM
|
0 responses
48 views
0 likes
|
Last Post
by seqadmin
04-01-2024, 06:45 AM
|
Comment