![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Genome Size Estimation from PacBio Raw Reads | jpummil | Bioinformatics | 8 | 10-06-2019 09:15 AM |
Uploading PacBio raw data to ENA SRA | maubp | Pacific Biosciences | 13 | 03-11-2016 07:07 AM |
Import raw PacBio data from *.bax.h5 files | reubennowell | Pacific Biosciences | 4 | 10-12-2015 10:02 AM |
Raw counts of 12 column bed file (against multiple BAM) | swaraj | Bioinformatics | 8 | 09-27-2013 06:02 AM |
what is the file size for a 30X human genome sequencing file, raw and BAM? | RNA-seq | Illumina/Solexa | 2 | 04-15-2011 12:27 PM |
![]() |
|
Thread Tools |
![]() |
#1 |
Junior Member
Location: Paris Join Date: Sep 2017
Posts: 5
|
![]()
I have just received data from my first PacBio sequencing operation and was unaware that the new output format was in .bam file for raw reads, as they are calling the 'better fastq'.
I am using a pipeline, beginning with canu, which requires a fastq file however when using both samtools and bamtools to generate a fastq file from the bam file, the quality row just contains exclamation marks samtools bam2fq data.bam > data.fastq bamtools convert -format fastq -in data.bam -out data.fastq e.g. @read1 ATGCATGCAGCTGATGCTAGCATGCTACTAGTCGATCGTAGCTAGTCGATCGATGCTAGCATCGATGCTAGCTAGTCGATGCTAGCTGCGTAGCTGATGATGCTAGTCGACTGATACGAT + !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Additional output files were a bam.pbi, an xml and a fasta file Does anyone know how to handle the raw read bam files in order to generate fastq files with the appropriate quality score? Last edited by anotherSAM; 09-06-2017 at 08:42 AM. Reason: grammar |
![]() |
![]() |
![]() |
#2 |
Super Moderator
Location: Walnut Creek, CA Join Date: Jan 2014
Posts: 2,707
|
![]()
It's possible that the bam file does not contain quality scores. Try using samtools view to convert the bam to sam, and post the first couple reads from it...
|
![]() |
![]() |
![]() |
#4 |
Senior Member
Location: Berlin, DE Join Date: May 2008
Posts: 628
|
![]()
What platform? At least for Sequel data the subread-bam/fastq files do not have quality values associated to the bases. So there is a dummy value for each base, zero, "!".
|
![]() |
![]() |
![]() |
#5 |
Junior Member
Location: Paris Join Date: Sep 2017
Posts: 5
|
![]() Code:
head -n1 test.sam m54072_170901_055052/5112430/0_2238 4 * 0 255 * * 0 0 TTCCGGGGATGGGGGGTCTTGGTATTGGACATCTATATGGTTCCTTTCCACTAAACTTGAGGCATCAGGCCTGTTTGGACCGGAGTACGTAAATTTCGTTTTCGTTATTTTCGATCCATGGCTCATTCTTCGTTGGCGCTTTTCTATCAAGAGATGAGGAACCAGCTCTACTCTATGTT !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! bc:B:S,1,1 bq:i:85 cx:i:10 ip:B:C,1,61,22,90,8,8,72,3,7,40,9,68,4,15,8,12,72,34,156,61,10,3,50,9,70,74,16,17,2,28,6,38,60,13,11,13,122,14,42,70,7,12,14,32,25,21,38,14,7,8,34,2,1,7,5,12,60,61,6,5,8,9,52,3,16,182,7,86,44,15,56 np:i:1 pw:B:C,7,5,13,13,2,11,2,5,6,10,3,6,47,13,24,41,7,6,7,6,39,34,12,44,5,11,18,21,8,15,14,7,7,2,6,5,8,30,28,3,4,2,22,15,12,13,4,15,2,12,10,7,8,17,5,9,2,22,26,24,7,13,11,6,9,6,6,10,3,11,3,3,14,6,4,17,10 qe:i:2238 qs:i:0 rq:f:0.8 sn:B:f,7.81349,14.5671,6.42793,10.4147 zm:i:5112430RG:Z:25f8c430 Sklages, This is with sequel data, however if it is true that the bam files do not have a quality score, where is it? From what I had read, the new bam output was meant to replace fastq. Genomax, I followed the link for installing the bam2fastx library tool, which considering no error messages occurred I thought had succeeded, but when I went to use the bam2fastq command, it was unavailable. Thank you all for your replies Last edited by GenoMax; 09-07-2017 at 05:57 AM. Reason: more clarity; added [CODE] tags |
![]() |
![]() |
![]() |
#6 |
Senior Member
Location: East Coast USA Join Date: Feb 2008
Posts: 7,080
|
![]()
Did you check the local directory where you did the install for one called bam2fastx or something similar. The executables should be inside that directory, if the build was successful.
|
![]() |
![]() |
![]() |
#7 | |
Senior Member
Location: Berlin, DE Join Date: May 2008
Posts: 628
|
![]() Quote:
"The subread-bam/fastq files do not have quality values associated to the bases. In fact none of our SMRT Analysis tools use or need them (e.g. the much improved CCS2 algorithm doesn't need the qualities anymore), and because there is no "placeholder" quality value as far as I know (like N for bases), the qualities in the BAM files are set to the lowest value "!"." And as our current BAM files also have "!" values, I assume that this has not changed (yet). |
|
![]() |
![]() |
![]() |
#8 |
Senior Member
Location: East Coast USA Join Date: Feb 2008
Posts: 7,080
|
![]()
Interesting that PacBio chose to set the quality values to lowest setting rather than highest (or somewhere north of Q30).
|
![]() |
![]() |
![]() |
#9 |
Senior Member
Location: Berlin, DE Join Date: May 2008
Posts: 628
|
![]()
That what my first thought too ... it is not a very smart idea choosing basically a "junk value" :-)
|
![]() |
![]() |
![]() |
#10 |
Junior Member
Location: Paris Join Date: Sep 2017
Posts: 5
|
![]()
Genomax, I managed to get the bam2fastx tools working but they provided the same information as samtools and bamtools.
But as we have come to understand this is of no fault of the tools and is actually due to the output. Sklages, So in this way the new bam files are only able to be analysed using SMRT software? |
![]() |
![]() |
![]() |
#11 |
Senior Member
Location: Berlin, DE Join Date: May 2008
Posts: 628
|
![]()
It's probably a good idea to adjust the quality dummy values to something "more or less good", e.g. 30 or more before using third-party tools ..
|
![]() |
![]() |
![]() |
#12 |
Senior Member
Location: East Coast USA Join Date: Feb 2008
Posts: 7,080
|
![]()
You could use reformat.sh from BBMap suite to set a fake Q score of 30 for each base like this.
Code:
reformat.sh in=pbio_input.bam out=stdout.fa | reformat.sh in=stdin.fa out=new.fq.gz qfake=30 |
![]() |
![]() |
![]() |
#13 |
Junior Member
Location: Paris Join Date: Sep 2017
Posts: 5
|
![]()
I will most likely substitute the quality scores, but how are they written in PacBio data.
But it is coded scores in fastq so 30 corresponds to something like > This should work. It will mean canu is unable to trim reads and increase the real quality but I will also run it against illumina data to correct errors afterwards. Thank you all UPDATE: With substitution of a qscore of 30 for each base canu assembly has issues and produces a large number of contigs ~130. Possibly has issues in ranking overlap probabilities. Last edited by anotherSAM; 09-11-2017 at 03:20 AM. |
![]() |
![]() |
![]() |
#14 |
Junior Member
Location: Ghent, Belgium Join Date: Feb 2011
Posts: 2
|
![]()
Hi,
interesting approach, I might give that a try as well. I was just wondering why canu would have more trouble with all qscores put to 30 than when all were still 0 ? Also, canu accepts both fasta as fastq as input (or even a mixture) might it then not simply ignore any quality score present? |
![]() |
![]() |
![]() |
#15 |
David Eccles (gringer)
Location: Wellington, New Zealand Join Date: May 2011
Posts: 838
|
![]()
As fas as I know, canu also doesn't use quality scores for assembly. By default it does its own error correction based on an all-vs-all read mapping, and produces corrected FASTA files (i.e. without quality scores) as intermediate data files prior to assembly.
|
![]() |
![]() |
![]() |
#16 |
Senior Member
Location: Oxford, Ohio Join Date: Mar 2012
Posts: 253
|
![]()
Did anyone figure this out?
|
![]() |
![]() |
![]() |
#17 |
Senior Member
Location: San Francisco Join Date: Aug 2012
Posts: 322
|
![]()
What issue do you still have? The believe the '!' has been replaced with a more meaningful value in recent versions of software, but assembly and most other downstream tools still don't use quality values associated with bases in subreads.
|
![]() |
![]() |
![]() |
#18 |
Senior Member
Location: Oxford, Ohio Join Date: Mar 2012
Posts: 253
|
![]()
Check the GitHub for PacBio, they have a converter. Here's the URL: https://github.com/PacificBiosciences/bam2fastx
|
![]() |
![]() |
![]() |
Tags |
bam, fastq, pacbio, raw data |
Thread Tools | |
|
|