SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
quality control from fastq to vcf dongshenglulv Bioinformatics 3 11-05-2014 02:08 PM
Quality control of genomic resequencing data from a HiSeq gavin.oliver Genomic Resequencing 2 06-30-2013 01:48 AM
Webinar on Quality Control of NGS Data - FREE Strand SI Events / Conferences 0 09-09-2011 06:33 PM
TileQC: a system for tile-based quality control of Solexa data ScottC Illumina/Solexa 0 06-03-2008 04:54 PM
PubMed: TileQC: a system for tile-based quality control of Solexa data. Newsbot! Literature Watch 0 05-30-2008 08:21 AM

Reply
 
Thread Tools
Old 11-23-2010, 11:38 PM   #141
simonandrews
Simon Andrews
 
Location: Babraham Inst, Cambridge, UK

Join Date: May 2009
Posts: 871
Default

Quote:
Originally Posted by zhangpanda View Post
Yes, it works! Thanks!
Cool. I'll make sure that I use the lower settings for future releases then.
simonandrews is offline   Reply With Quote
Old 11-24-2010, 07:56 AM   #142
simonandrews
Simon Andrews
 
Location: Babraham Inst, Cambridge, UK

Join Date: May 2009
Posts: 871
Default

FastQC v0.7.1 has been released. This contains a much improved command line interface to the program which should make it easier to include it in analysis pipelines. It also adds a new command line option to manually define the format of an input sequence file rather than letting the program guess from the filename.

You can get the new version from:

http://www.bioinformatics.bbsrc.ac.uk/projects/fastqc/

[If you don't see the new version of any page hit shift+refresh to force our cache to update]
simonandrews is offline   Reply With Quote
Old 11-29-2010, 02:55 AM   #143
simonandrews
Simon Andrews
 
Location: Babraham Inst, Cambridge, UK

Join Date: May 2009
Posts: 871
Default

FastQC v0.7.2 is now out at the same address as above. I've fixed a bug which affected libraries where there weren't any unique sequences. I've also added a new command line option to allow a user to specify a custom contaminants file rather than using the default systemwide one.
simonandrews is offline   Reply With Quote
Old 12-21-2010, 07:10 AM   #144
frymor
Senior Member
 
Location: Germany

Join Date: May 2010
Posts: 149
Unhappy Fastq results

Hi,

I find this program really good. but I wish the help files were a bit more detailed. It is sometime difficult to understand the results of each of the analyses.

I also find the idea of constructing some sort of DB for various results (good, bad, etc) so one would have a comparisons to look for.

I have a problem with the results of my analysis in the duplication level. As in the attached image clearly visible I have a very high number of duplication with more than 10 duplicated sequences.

As this is my first run at the next generation sequencing methods, I don't understand exactly what this means. In the report summary I have this list of percentage:

>>Sequence Duplication Levels fail
#Total Duplicate Percentage 67.8215
#Duplication Level Relative count
1 100.0
2 24.090619513028887
3 15.179389965349534
4 11.182932703513215
5 8.729431141911524
6 7.257951737961682
7 6.00867038550585
8 5.357614556303122
9 4.614882607952515
10+ 96.88767344655594
>>END_MODULE

If I add all the numbers together I get over 180% of duplicated reads.
Q: how can that be?

Q: What can be a reason for such a huge number of duplicated files?

Q: Does it means that my library is not good? Is there a tool to extract this duplicated sequences?

I want to mention that I am working with PE Illumina reads of 76bp long.

Can anyone tell me of a way to visualize these duplicated reads?

If I understood it correctly, with Picard I can find out how many duplicates I have, but is there a way of extracting them?

Thanks for any help

Assa
Attached Images
File Type: jpg duplication_levels.jpg (18.4 KB, 48 views)

Last edited by frymor; 12-21-2010 at 07:13 AM.
frymor is offline   Reply With Quote
Old 12-21-2010, 08:54 AM   #145
Howie Goodell
Member
 
Location: Boston, MA

Join Date: Feb 2010
Posts: 10
Default Re: Fastq results

Hi --

High duplication levels typically result from low DNA in the sample (or the fraction size-selected in library preparation) masked by extra PCR cycles. Since FASTQC runs before alignment, it should actually under-estimate duplicates -- more will become apparent when fragments align together on the genome with allowance for sequencing errors. However, I don't know its algorithms. I agree with you about the terseness of the documentation: it wasn't clear to me, either, exactly how to interpret the proportions of sequences duplicated at each level.

In particular, I often see the same spike you did at 10 duplicates (using 0.7.0, BTW). Perhaps it's the effect of lumping together all sequences duplicated 10 or more times; perhaps a few sequences (like chrM -- see below) were greatly over-duplicated in the library, or perhaps it's a bug.

For a good visual overview of duplication impacts after alignment, I suggest posting BAM files in a web-accessible location and making UCSC custom tracks (bigDataUrl=myURL/myfile.bam visibility="squish"). Then look in your regions of interest for the following pattern: a sparse landscape dominated by "towers" of many reads at identical positions, without a lot of others nearby like you see in a true amplified region. (A confirmatory detail one biologist pointed out to me: are the sequences identical with just occasional random sequencing error differences, instead of expected proportions of multiple alleles at locations you know are heterozygous?)

You can also use de-duplication tools such as the one in SAMtools. For single-read data, the Java version Picard is supposed to be superior. However, I don't fully trust these. I'm not sure their algorithms are foolproof -- visually I've seen unexpected effects. Certainly for dense clusters of reads in ChIPped/exon enhanced loci, they remove some real data and skew the results. However undetected duplication will skew algorithms like MACS far worse. Where I suspect duplication, I give researchers both full and deduplicated data as well as custom tracks to help them assess what it means.

One more note: skewed RNA or DNA sources can simulate the effects of PCR clonal duplication. The biggest example I know is that whole-RNA preparation methods often yield a significant (20-30%) proportion of mitochondrial RNA. Millions of reads mapped to 16K "chrM" looks like duplication to FASTQC and other deduplication programs, but it says nothing about duplication in regions you care about.

Net, net: be aware of duplication and warn researchers if it might compromise their results -- a thankless task, but it's better they hear bad news immediately. Trying too hard to draw conclusions from inadequate data costs both time and credibility. Rather, work with the lab people to better results next time.

Cheers!
Howie
Howie Goodell is offline   Reply With Quote
Old 12-22-2010, 11:59 PM   #146
simonandrews
Simon Andrews
 
Location: Babraham Inst, Cambridge, UK

Join Date: May 2009
Posts: 871
Default

Quote:
Originally Posted by frymor View Post
If I add all the numbers together I get over 180% of duplicated reads.
Q: how can that be?
Because the percentages are relative to the number of unique sequences, so if you have more duplicated sequences than unique ones then you get totals >100%. We do it this way so that the plot still shows useful information even when you have a single sequence (say an adapter) which makes up a high proportion of your library. The overall figure for duplication levels in the library is given in the header of the plot.

Quote:
Originally Posted by frymor View Post
Q: What can be a reason for such a huge number of duplicated files?
Howie seems to have covered this pretty well. Basically the answer will depend on the type of library you have. For ChIP libraries the answer is usually technical (PCR overamplification). For some other libraries (eg 4C) duplication is expected. For yet others (small RNA?) there may be a higher than usual level of duplication due to overrepresentation of certain genomic regions. In each case the shape of the plot will be different and you should be able to figure out the basic cause for your library.

Quote:
Originally Posted by frymor View Post
Q: Does it means that my library is not good? Is there a tool to extract this duplicated sequences?
It means that you're not making the best use of the sequencing capacity you have because nearly 70% of the sequences you've generated are simply duplicates of something which was already in the library. It's also a warning that if the duplication is technical and biased then you may get artefacts in your analysis.

Whether you remove duplicates will depend on the type of library you're working with. If your intention is to map this data to a reference then you don't want to deduplicate until after you've done that since (as Howie pointed out) there will be duplicates which are missed at the sequence level due to sequencing errors artificially increasing diversity.

Quote:
Originally Posted by frymor View Post
Can anyone tell me of a way to visualize these duplicated reads?
If you load these into any data browser (after assembly or mapping) you'll see your duplicates as towers of reads with exactly the same position. In our downstream analysis package you can even quantitate the level of duplication and visualise it on a genome wide scale if you're really interested in it.
simonandrews is offline   Reply With Quote
Old 01-03-2011, 06:31 AM   #147
frymor
Senior Member
 
Location: Germany

Join Date: May 2010
Posts: 149
Default

Quote:
Originally Posted by Howie Goodell View Post
For a good visual overview of duplication impacts after alignment, I suggest posting BAM files in a web-accessible location and making UCSC custom tracks (bigDataUrl=myURL/myfile.bam visibility="squish"). Then look in your regions of interest for the following pattern: a sparse landscape dominated by "towers" of many reads at identical positions, without a lot of others nearby like you see in a true amplified region. (A confirmatory detail one biologist pointed out to me: are the sequences identical with just occasional random sequencing error differences, instead of expected proportions of multiple alleles at locations you know are heterozygous?)
I have a basic question, but I think is quite important for my analysis.
I am using two fastqc files, one for each genotype. After the QC I ran bowtie with each of these files.
When I am trying to load the data into the UCSC genome browser, it always time out, before finishing uploading.
Q: Is it possible to do a separate bowtie for each of the chromosomes, or is it better to do one run for the complete file and than separate the sam/bam files into single chromosomes?

Q: What about the following analysis (tophat, cufflinks)? Is it preferable to run them on separate chromosomes, or a complete genome? I am talking not only due to the file size but also because of the correctness of the analysis.

Quote:
Originally Posted by Howie Goodell View Post
You can also use de-duplication tools such as the one in SAMtools.
Do you mean the rmdup option?

Quote:
Originally Posted by Howie Goodell View Post
For single-read data, the Java version Picard is supposed to be superior.
Picard support also paired-end and I tired to run it with my data. But I can only mark the duplicates, not mask them.
I am still looking for a way of extracting these duplicates, so that I can calculate the true coverage of my library and to check its quality.

Quote:
Originally Posted by simonandrews View Post
Basically the answer will depend on the type of library you have. For ChIP libraries the answer is usually technical (PCR overamplification). For some other libraries (eg 4C) duplication is expected. For yet others (small RNA?) there may be a higher than usual level of duplication due to overrepresentation of certain genomic regions. In each case the shape of the plot will be different and you should be able to figure out the basic cause for your library.
I am working with mRNA-Seq and try to look for differentially regulated genes between a wild type and a mutation genotype. For that reason I am expecting genes with high expression to be found more than just one time. Genes with lower expression will be found not so often.
My Problem is how to identify PCR amplification and to be able to distinguish between those and high expression of the genes.

Quote:
Originally Posted by simonandrews View Post
Whether you remove duplicates will depend on the type of library you're working with. If your intention is to map this data to a reference then you don't want to deduplicate until after you've done that since (as Howie pointed out) there will be duplicates which are missed at the sequence level due to sequencing errors artificially increasing diversity.
If you load these into any data browser (after assembly or mapping) you'll see your duplicates as towers of reads with exactly the same position. In our downstream analysis package you can even quantitate the level of duplication and visualise it on a genome wide scale if you're really interested in it.
I have download SeqMonq and look at my data. the image I posted give an overview of a part of chromosome X from D. melanogaster genome. As you can see here, I don't have this sparse landscape behaviour, Howie spoke about.

Maybe it is a very naive question, maybe it is even a bit silly, but I would like to know how the number of PCR cycles influence the read duplication numbers.
In my data set I have the expression profiles of drosophila genes. I thought I need to expect them to be more often for highly expressed genes and therefore find more duplications for this positions.
Q: Does it make sense to extract the duplicated reads and than to look for differentially regulated expression?

I hope it is not too much and will be very greatfull for your help.

Thanks
Assa
Attached Files
File Type: pdf SeqMonq.pdf (298.7 KB, 49 views)
frymor is offline   Reply With Quote
Old 01-03-2011, 11:27 PM   #148
simonandrews
Simon Andrews
 
Location: Babraham Inst, Cambridge, UK

Join Date: May 2009
Posts: 871
Default

Quote:
Originally Posted by frymor View Post
I have download SeqMonk and look at my data. the image I posted give an overview of a part of chromosome X from D. melanogaster genome. As you can see here, I don't have this sparse landscape behaviour, Howie spoke about.
Actually from the screenshot you posted you can't really tell whether you're seeing sparse data. You won't get completely isolated peaks on a chromosome level, but rather when you look at an individual exon you won't see smooth coverage over the whole exon but will see a small number of positions where most reads sit.

If you want to look at this quantitatively in SeqMonnk then do Data > Quantiation > Coverage Depth Quantitation and then find the Max depth for Exact overlaps and Express as % of all reads. This will tell you what proportion of all of the reads in a given exon are coming from potential PCR duplicates. For all exons where you have a reasonable number of reads (say >30) you should be seeing values of only a few percent. For heavily duplicated experiments you can see values going way higher than that.


Quote:
Originally Posted by frymor View Post
Maybe it is a very naive question, maybe it is even a bit silly, but I would like to know how the number of PCR cycles influence the read duplication numbers.
We've tried to look at this in some of our data. Our conclusion is that it isn't just a simple case that the number of cycles determines duplication level. For some samples the problem seems to be the diversity in the starting material, ie that if you have too low an amount of starting material then the high duplication level is fixed almost immediately when you start your PCR, and reducing cycles won't help. For other samples you can increase duplication levels by adding more cycles but you need to go really over the top to completely bias an otherwise diverse sample. There may also be effects associated with the PCR conditions you use, but we haven't really gone into that.
[/QUOTE]

Quote:
Originally Posted by frymor View Post
Q: Does it make sense to extract the duplicated reads and than to look for differentially regulated expression?
We've certainly done that in the cases where we had a ridiculously high level of duplication, and we got sensible results. It's not something we'd routinely do for a sample which looked to be diverse though.
simonandrews is offline   Reply With Quote
Old 01-04-2011, 12:52 AM   #149
frymor
Senior Member
 
Location: Germany

Join Date: May 2010
Posts: 149
Default

Thanks for the fast response
Quote:
Originally Posted by simonandrews View Post
Actually from the screenshot you posted you can't really tell whether you're seeing sparse data. You won't get completely isolated peaks on a chromosome level, but rather when you look at an individual exon you won't see smooth coverage over the whole exon but will see a small number of positions where most reads sit.

If you want to look at this quantitatively in SeqMonnk then do Data > Quantiation > Coverage Depth Quantitation and then find the Max depth for Exact overlaps and Express as % of all reads. This will tell you what proportion of all of the reads in a given exon are coming from potential PCR duplicates. For all exons where you have a reasonable number of reads (say >30) you should be seeing values of only a few percent. For heavily duplicated experiments you can see values going way higher than that.
I added here two screen shots I took from a smaller portion of the chromosomes X and 3R respectively.
I also quantified the data according to the description after defining the probes using the feature probe generator and featuring the design around mRNA. This got me ~167K probes.

It is clearly visible, that mostly the only reads with high percentage are with very low depth. The groups of reads with deeper coverage have usually very low percentage.
According to your description and to my understanding, it is highly reliable, that these reads are not PCR duplications, but true deep coverage of the mRNA.

Another question which comes by looking at the data is to the block I marked with yellow. What are these regions of reads. These are not genes, so they can't be mRNA or CDS.
Are these repeats which were mapped at random according to the bowtie/bwa/etc. preferences and therefore to the presumably wrong place?

Are there any other suggestions as to what kind of reads these are?

I would like again to mention That i am working with paired-end RNA-seq obtained from PolyA purification.
Q: Can these reads have something to due with PolyA tail residues?
Q: Are these reads maybe ncRNA with polyA tail or rRNA which slipped through? Is there a way to establish such a theory?
Attached Files
File Type: pdf Screen shot 1.pdf (149.2 KB, 63 views)
File Type: pdf Screen shot 2.pdf (187.7 KB, 23 views)

Last edited by frymor; 01-04-2011 at 01:23 AM. Reason: another question
frymor is offline   Reply With Quote
Old 01-10-2011, 09:16 PM   #150
Yilong Li
Member
 
Location: WTSI

Join Date: Dec 2010
Posts: 41
Default

Dear Simon,

Like many others, I want to thank you for your excellent program, which made its way rightaway to our NGS data analysis pipeline.

I would like to propose a slight improvement in the Per Base GC Content and Per Base Sequence Content plots. Would it be possible to add horizontal grid lines to those plots as well? It would make the visual interpretation of the plots easier.

Yilong
Yilong Li is offline   Reply With Quote
Old 01-10-2011, 11:35 PM   #151
simonandrews
Simon Andrews
 
Location: Babraham Inst, Cambridge, UK

Join Date: May 2009
Posts: 871
Default

Quote:
Originally Posted by Yilong Li View Post
Dear Simon,
I would like to propose a slight improvement in the Per Base GC Content and Per Base Sequence Content plots. Would it be possible to add horizontal grid lines to those plots as well? It would make the visual interpretation of the plots easier.
The next release of the program will feature some improvements to try to make the graphs easier to interpret. I'll add in horizontal lines from the y-axis points as well as I can see that this would make life easier.
simonandrews is offline   Reply With Quote
Old 01-21-2011, 12:39 AM   #152
simonandrews
Simon Andrews
 
Location: Babraham Inst, Cambridge, UK

Join Date: May 2009
Posts: 871
Default

FastQC v0.8.0 has been released. This features improved graphs (from some of the suggestions presented here). It also adds an option to analyse only mapped entries from BAM/SAM files which should make life easier for colorspace people, as well as adding an option to run multiple threads to process files in parallel.

You can get the new version from:

http://www.bioinformatics.bbsrc.ac.uk/projects/fastqc/

[If you don't see the new version of any page hit shift+refresh to force our cache to update]
simonandrews is offline   Reply With Quote
Old 01-21-2011, 01:12 AM   #153
dnusol
Senior Member
 
Location: Spain

Join Date: Jul 2009
Posts: 133
Default quality control for pair-end analysis

Dear Simon,

I am also using your tool, which is great by the way. I am starting with my first pair-end experiment from illumina. I am wondering if I should look at each fastq file generated for each lane (_1 and _2) separately or I should merge them into one single file and run the script. They come from the same lane so I suspect you would want to have them merged to get one QC value for each lane, right?


Thanks,

Dave
dnusol is offline   Reply With Quote
Old 01-21-2011, 01:51 AM   #154
simonandrews
Simon Andrews
 
Location: Babraham Inst, Cambridge, UK

Join Date: May 2009
Posts: 871
Default

Quote:
Originally Posted by dnusol View Post
Dear Simon,

I am also using your tool, which is great by the way. I am starting with my first pair-end experiment from illumina. I am wondering if I should look at each fastq file generated for each lane (_1 and _2) separately or I should merge them into one single file and run the script. They come from the same lane so I suspect you would want to have them merged to get one QC value for each lane, right?
We analyse our first and second read data separately. Although they come from the same insert there could easily be a problem with affected only the first or second read, and which would be difficult to spot if you concatonated the two files.

If you did want to combine then it would be better to conactonate the individual reads together so read 2 followed on from read 1 in the same sequence so you wouldn't lose any information, but I'm not sure that's worth the hassle.

This can actually be a problem with BAM/SAM files containing paired end data where we currently can't separate out systematic biases which affect only one of the reads. I did look at adding an option to separate these into separate reports but didn't come up with a simple way to do this which didn't feel clunky.
simonandrews is offline   Reply With Quote
Old 02-02-2011, 08:53 AM   #155
dnusol
Senior Member
 
Location: Spain

Join Date: Jul 2009
Posts: 133
Default

Dear Simon,

in this link
http://bioinfo-core.org/index.php/9t...8_October_2010

you showed an image of a run with quality dropping gradually. You suggest trimming. I am having the same problem with the paired-end experiment I mentioned before and instead of trimming I thought of filtering reads that did not pass a quality threshold. But this filter would probably have to work on both s_N_1_sequence.txt and s_N_2_sequence.txt simultaneously taking into account the pairing of the reads. Do you know of any tool that could do this? Or is it better to trim both files similarly to a desired length so that the pairing is not lost?

Thanks,

Dave
dnusol is offline   Reply With Quote
Old 02-02-2011, 09:47 AM   #156
simonandrews
Simon Andrews
 
Location: Babraham Inst, Cambridge, UK

Join Date: May 2009
Posts: 871
Default

Whether and how you trim poor quality sequence is going to depend both on how poor your quality gets, and what you're going to do with the data. In general I don't filter poor quality reads unless I have a specific reason to. Many downstream applications can take the quality into account when they work so will effectively ignore the poor data for you. For some specific applications (bisulphite sequencing is the obvious example, but some SNP calling applications would be similar), removing poor quality data makes your life much easier as false calls can have a disastrous effect if left in in any quantity - but these are the exception rather than the rule.

In the run I linked to the quality dropped to such an extent that the bases much above 50 were effectively useless so I wouldn't worry about losing useful information by removing them. If my run had only been 55 bases I might well have kept everything.

Removing reads based on their average quality can be useful where you see a specific subset of poor quality data (FastQC will show you this in the per-read quality plot), but often you will see a general loss of quality over the length of the read so you'd be throwing away good data in many cases. I'm not aware of a tool which would filter on paired reads quality but some of the toolkits could probably be easily adapted to do it.

You normally find that poor quality in one read will be reflected in its pair, but there's no intrinsic problem with having paired ends with different lengths if you had a problem at only one end (unless your downstream application doesn't handle this well).
simonandrews is offline   Reply With Quote
Old 02-10-2011, 07:21 AM   #157
simonandrews
Simon Andrews
 
Location: Babraham Inst, Cambridge, UK

Join Date: May 2009
Posts: 871
Default

FastQC v0.9.0 has been released. The major change is that this version now has much better support for the analysis of datasets containing long and variable read lengths.

Many of the plots which were ususable before on this kind of data are now grouped into variable sized bins so the graphs can effectively summarise even the longest and most diverse datasets. This will primarily affect users of long (>75bp) Illumina reads but especially 454 or PacBio data.

To see what the changes look like I've put up an example report from a 454 and PacBio dataset on the fastqc project page.

You can get the new version from:

http://www.bioinformatics.bbsrc.ac.uk/projects/fastqc/

[If you don't see the new version of any page hit shift+refresh to force our cache to update]
simonandrews is offline   Reply With Quote
Old 03-03-2011, 03:46 AM   #158
Bruins
Member
 
Location: Groningen

Join Date: Feb 2010
Posts: 78
Default

Hi Simon and others,

I have a question regarding thresholds for the per sequence quality scores.

We did a lot of whole exomes on our Illumina machines and for each sample I generate a QC report. I have incorporated some of the FastQC outcome as well, including the number of reads (as a percentage of the total number of reads) that have a mean quality of <27 (I sum the counts for score < 27 from the per sequence quality scores module in the fastqc_data.txt file). I used 27 as a threshold because it's also used by the per sequence quality scores module.

But Simon why did you choose 27? What would be a reasonable error rate threshold to report in my situation? What do you and others say?

Cheers,
Bruins
(and, naturally, thanks again for the awesome tool!!)
Bruins is offline   Reply With Quote
Old 03-03-2011, 04:24 AM   #159
simonandrews
Simon Andrews
 
Location: Babraham Inst, Cambridge, UK

Join Date: May 2009
Posts: 871
Default

Quote:
Originally Posted by Bruins View Post
But Simon why did you choose 27? What would be a reasonable error rate threshold to report in my situation? What do you and others say?
The thresholds used in FastQC are somewhat arbitrary and are simply based on looking at a load of data coming out of our sequencing facility and public datasets. I've generally picked thresholds which equate to the points where I'd have been concerned if I saw data with those properties. They've mostly been based on Illumina data, but more recently I've run a lot of data from other platforms through and the results seemed mostly sensible there as well.

I suspect that for something like Phred scores there isn't going to be a single ideal answer for where to set a cutoff which would apply to all run variants on all platforms. Data which might appear to be of poor quality if it came from a short Illumina run would be fantastically good if it came from a PacBio for example.

I've tried to stress that the pass/warn/fail levels in FastQC are not absolute limits but are intended to be indicative of aspects of your data which you might want to investigate further (or at least be aware of) rather than a flag to say that if you see this you should throw it away.

To address your specific question - I've actually been thinking about the per-sequence quality plot lately. This is actually the plot where I'm least happy about the metrics used to evaluate the pass/warn/fail levels. At the moment we just find the peak in this graph and see where it lies, but I've seen too many datasets where this simple measure fails to spot what seem to be obvious problems in the data. In particular I would expect to see a particular profile of mean quality scores (looking like a skewed normal distribution), and I'd like to spot deviations from this distribution so that secondary peaks with lower quality would trigger a warning. Finding the right way to model and measure this is something I'm hoping to find some time to look at.

I'm *very* keen to hear feedback about both the metrics and the cutoffs used in the program and have adjusted things in response to previous suggestions. However it's often difficult to find a good balance where you have reasonably stringent quality criteria but you don't end up giving everyone the impression that their data is rubbish.
simonandrews is offline   Reply With Quote
Old 03-21-2011, 05:34 AM   #160
Chuckytah
Member
 
Location: Barcelos, Braga, Portugal

Join Date: Mar 2011
Posts: 65
Default

Hello

i'm new here, and i started to try to use this program, but i get an error: you can see here: http://img27.imageshack.us/f/semttulohl.png/

it says that my ID line didn't start with "@", i can change the original file and put an "@" in the beggining?
Chuckytah is offline   Reply With Quote
Reply

Tags
fastq, quality, report

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 03:04 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO