Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Failure in running BWA on human fusion genome

    Dear all,

    I have used the fasta genome provided by NCBI. The headers of this file are:

    >chr1 AC:CM000663.2 gi:568336023 LN:248956422 rl:Chromosome M5:6aef897c3d6ff0c78aff06ac189178dd AS:GRCh38
    >chr2 AC:CM000664.2 gi:568336022 LN:242193529 rl:Chromosome M5:f98db672eb0993dcfdabafe2a882905c AS:GRCh38
    >chr3 AC:CM000665.2 gi:568336021 LN:198295559 rl:Chromosome M5:76635a41ea913a405ded820447d067b0 AS:GRCh38
    [...]
    >chrUn_GL000218v1 AC:GL000218.1 gi:224183305 LN:161147 rl:unplaced M5:1d708b54644c26c7e01c2dad5426d38c AS:GRCh38
    >chrEBV AC:AJ507799.2 gi:86261677 LN:171823 rl:decoy M5:6743bd63b3ff2b5b8985d8933c53290a SP:Human_herpesvirus_4 tp:circular

    I need to use a fusion genome built by concatenating this human genome with one obtained from selected virus sequences. This virus genome is formed by a single header and a long stretch of nucleotides derived from individual virus sequences.

    I prepared the header for the virus genome as follows:

    >chrV AC:XXXXXXXX.1 gi:00000000 LN:370064105 rl:Chromosome M5:5aa5be7025d7baa666a8651e0909e4ce AS:1 SP:All_viruses tp:linear

    I made up accession number AC to XXXXXXXX.1 because there is no real entry for my made-up genome in Genbank & NCBI; since the IDs given in the human genome are 8 digit long, I gave a 8 letters fake entry and a ".1" because this is first time I am using this genome (maybe I should have used two letter, 6 numbers?).

    Same for the GI number: the made up genome is not recorded in GenBank, thus I simply gave a fake 8 digit number.

    LN is the length of the genome, I treated it as a real chromosome and M5 derives for md5sum I made on the fasta file. AS and SP are free text fields (I assumed) and the genome is linear.

    I separated the fields with two spaces.

    I concatenated the human genome and the made up virus genome with `cat <human.fa> <virus.fa> > <fusion.fa> and I prepared the indices for this genome and aligned the samples with

    bwa index -a bwtsw <fusion.fa>
    bwa mem -t 8 -R <read_group> <fusion.fa> <R1.fq.gz> <R2.fq.gz> | \
    samtools sort -o <file_ALN-SRT.sam>
    However, I got this error message:

    [bns_restore_core] Parse error reading <fusion.fa>.amb
    and the SAM file is virtually empty:

    cat <file_ALN-SRT.sam>
    @HD VN:1.3 SO:coordinate
    May I ask what I got wrong? When aligning against either one or the other genome separately the alignment is OK, thus it must be a problem with the headers I guess.

    I tried with both sed -i 's/\s*$//g' (for spaces in the sequence) and sed -i 's/^[^>]\s*$//g' (for spaces in the header) followed by bwa index but the result was always the same.

    Any clues?

    Thank you

Latest Articles

Collapse

  • seqadmin
    Strategies for Sequencing Challenging Samples
    by seqadmin


    Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
    03-22-2024, 06:39 AM
  • seqadmin
    Techniques and Challenges in Conservation Genomics
    by seqadmin



    The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

    Avian Conservation
    Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
    03-08-2024, 10:41 AM

ad_right_rmr

Collapse

News

Collapse

Topics Statistics Last Post
Started by seqadmin, Yesterday, 06:37 PM
0 responses
10 views
0 likes
Last Post seqadmin  
Started by seqadmin, Yesterday, 06:07 PM
0 responses
9 views
0 likes
Last Post seqadmin  
Started by seqadmin, 03-22-2024, 10:03 AM
0 responses
49 views
0 likes
Last Post seqadmin  
Started by seqadmin, 03-21-2024, 07:32 AM
0 responses
67 views
0 likes
Last Post seqadmin  
Working...
X