Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Sam file from alignment of Solid Colorspace reads using Bowtie

    Hi everyone, I am mapping color-space fasta and QV files (sequenced in ABI SOLID) from a human exome, using the reference given by the GATK resources (Kariotype sorted). I am using Bowtie for mapping it (indexing and mapping with -C flag). my code line was:

    Code:
    bowtie -p 40 --best --strata -a --mapq 60 --chunkmbs 1000 human_ref38 -f -C my_file.csfasta -Q my_file.qual -S align.sam
    My main goal is do a variant calling analysis, using GATK.

    The results look like good (I have saw that in general the number of mapped reads for color space format is very low):

    Code:
    # reads processed: 9540575
    # reads with at least one reported alignment: 7585206 (79.50%)
    # reads that failed to align: 1955369 (20.50%)
    Reported 188945896 alignments to 1 output stream(s)
    My doubt is regarding the SAM file output, It has not the the alignment section bellow the header @PG (It is the end of my SAM file), whit the mandatories fields as QNAME, FLAG, RNAME, etc, But the most important for me, is that it lacks the mapping quality (Fifth column normally).

    This is the beginning and end of my SAM file:

    Code:
    @HD	VN:1.0	SO:unsorted
    @SQ	SN:chr1	LN:248956422
    @SQ	SN:chr2	LN:242193529
    @SQ	SN:chr3	LN:198295559
    @SQ	SN:chr4	LN:190214555
    @SQ	SN:chr5	LN:181538259
    @SQ	SN:chr6	LN:170805979
    @SQ	SN:chr7	LN:159345973
    @SQ	SN:chr8	LN:145138636
    @SQ	SN:chr9	LN:138394717
    @SQ	SN:chr10	LN:133797422
    @SQ	SN:chr11	LN:135086622
    @SQ	SN:chr12	LN:133275309
    @SQ	SN:chr13	LN:114364328
    @SQ	SN:chr14	LN:107043718
    @SQ	SN:chr15	LN:101991189
    @SQ	SN:chr16	LN:90338345
    @SQ	SN:chr17	LN:83257441
    @SQ	SN:chr18	LN:80373285
    @SQ	SN:chr19	LN:58617616
    @SQ	SN:chr20	LN:64444167
    @SQ	SN:chr21	LN:46709983
    @SQ	SN:chr22	LN:50818468
    @SQ	SN:chrX	LN:156040895
    @SQ	SN:chrY	LN:57227415
    @SQ	SN:chrM	LN:16569
    @SQ	SN:chr1_KI270706v1_random	LN:175055
    End:
    Code:
    @SQ	SN:HLA-DRB1*13:01:01	LN:13935
    @SQ	SN:HLA-DRB1*13:02:01	LN:13941
    @SQ	SN:HLA-DRB1*14:05:01	LN:13933
    @SQ	SN:HLA-DRB1*14:54:01	LN:13936
    @SQ	SN:HLA-DRB1*15:01:01:01	LN:11080
    @SQ	SN:HLA-DRB1*15:01:01:02	LN:11571
    @SQ	SN:HLA-DRB1*15:01:01:03	LN:11056
    @SQ	SN:HLA-DRB1*15:01:01:04	LN:11056
    @SQ	SN:HLA-DRB1*15:02:01	LN:10313
    @SQ	SN:HLA-DRB1*15:03:01:01	LN:11567
    @SQ	SN:HLA-DRB1*15:03:01:02	LN:11569
    @SQ	SN:HLA-DRB1*16:02:01	LN:11005
    @PG	ID:Bowtie	VN:0.12.5	CL:"/home/alsalas/.linuxbrew/bin/bowtie/bowtie -p 40 --best --strata -a --mapq 60 --chunkmbs 1000 human_ref38 -f -C my_file.csfasta -Q my_file.qual -S align.sam"
    So, why I can not get these required fields in the SAM file?, Even I used the flag --mapq = 60 (Normally Bowtie should report mapq as 0 and 255) but it does not inform anything.

    I need the mapq values to do the Base quality score recalibration with GATK and others analysis. By the moment, GATK rules out all my mapped reads, and it informs that = (100.00% of total) failing MappingQualityUnavailableFilter.

    Previously I have converted the color-space fasta and QV files to fastq, but in some cases it is not possible (errors when the script run into that), and in others cases the fastq obtained is very unreliable.

    Maybe is not possible get the alignment fields in the Bowtie Sam file using this kind of data?.

    Thanks in advance and any suggestion or comment is very well received.


    Alexis.

  • #2
    The results look like good (I have saw that in general the number of mapped reads for color space format is very low):
    No surprise there; the Solid platform had terrible quality, which is why it is now extinct.

    Can you describe what you are trying to do, and why you are using Solid data to do it? I highly recommend not using Solid. In most cases, I think it's much more cost effective to sequence on Illumina and throw away Solid data, than analyze Solid data.

    Comment


    • #3
      Originally posted by Brian Bushnell View Post
      No surprise there; the Solid platform had terrible quality, which is why it is now extinct.

      Can you describe what you are trying to do, and why you are using Solid data to do it? I highly recommend not using Solid. In most cases, I think it's much more cost effective to sequence on Illumina and throw away Solid data, than analyze Solid data.
      Hi Brian. I agree. Honestly, I think that color space Solid is not a very good option. In fact, are not there many softwares to deal with it, and the options to conversion to fastq (base space), are not too reliable.

      Well, I'm trying to disclosing variants (SNPs and Indels VCF) in a human exome, associated to cancer. So, my pipeline to do this is:

      Code:
      Map to reference (Bowtie) --> Sam to Bam - statistics (Samtools - Picard) --> Post-alignment processing ( Remove duplicates - InDel realignment -  Base quality score recalibration) using Picard and GATK --> Variant calling (GATK) --> Annotate variants (ANNOVAR ?)
      The problem is that to do this, the quality of mapping is necessary. But, the SAM file obtained after mapping with Bowtie, lacks all alignment fields like MAPQ or QNAME. So, I wonder if using color fasta-space and QV as input in Bowtie would not allow getting these fields like MAPQ (I have no experience with Solid), or maybe I'm doing something wrong.

      This data was given to me, and I do not have the opportunity to sequence again using another platform like Illumina (if I could, I would do it without thinking).
      Thank you for your response, and any suggestions will be welcome.

      Alexis
      Last edited by Alexis1; 02-01-2017, 06:38 AM.

      Comment

      Latest Articles

      Collapse

      • seqadmin
        Techniques and Challenges in Conservation Genomics
        by seqadmin



        The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

        Avian Conservation
        Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
        03-08-2024, 10:41 AM
      • seqadmin
        The Impact of AI in Genomic Medicine
        by seqadmin



        Artificial intelligence (AI) has evolved from a futuristic vision to a mainstream technology, highlighted by the introduction of tools like OpenAI's ChatGPT and Google's Gemini. In recent years, AI has become increasingly integrated into the field of genomics. This integration has enabled new scientific discoveries while simultaneously raising important ethical questions1. Interviews with two researchers at the center of this intersection provide insightful perspectives into...
        02-26-2024, 02:07 PM

      ad_right_rmr

      Collapse

      News

      Collapse

      Topics Statistics Last Post
      Started by seqadmin, 03-14-2024, 06:13 AM
      0 responses
      32 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 03-08-2024, 08:03 AM
      0 responses
      71 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 03-07-2024, 08:13 AM
      0 responses
      80 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 03-06-2024, 09:51 AM
      0 responses
      68 views
      0 likes
      Last Post seqadmin  
      Working...
      X