Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • ugolino
    Member
    • Oct 2011
    • 14

    convert sorted bam to sorted sam for htseq-count

    Hi there,

    I have a bowtie2 alignment of PE non-stranded RNA-seq reads from a bacterial species (used option -k 1; 96.20% pairs aligned concordantly exactly 1 time) , and would like to use htseq-count to get count data across genes. I am having trouble retaining reads sorted after converting a sorted bam to sam format (htseq-count needs sorted sam for PE reads).

    These are my attempts and error messages:

    # sorting reads
    $ samtools sort myalignment.bam myalignment.sorted

    # convert back to sam
    $ samtools view -h myalignment.sorted.bam > out.sorted.sam

    # check (truncated output) - note @HD line 'unsorted', ?
    $ head out.sorted.sam
    @HD VN:1.0 SO:unsorted
    @SQ SN:NC_017656.1 LN:5212843
    @PG ID:bowtie2 PN:bowtie2 VN:2.0.0-beta7
    HWUSI-EAS1615L:13:FC64RB1AAXX:4:23:7604:14541_1 99 NC_017656.1 156 255 68M = 206 118 CAGACAGATAAAAATTACAGAGTACACAACATCCATGAAACGCATTAGCACCACCATTACCACCACCA IIIIIIEIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIHIIIIHGIIIIGHIIHIHH AS:i:0 XN:i:0 XM:i:0 XO:i:0 XG:i:0 NM:i:0 MD:Z:68 YS:i:0 YT:Z:CP
    HWUSI-EAS1615L:13:FC64RB1AAXX:4:70:9040:11393_1 99 NC_017656.1 164 255 68M = 207 111 TAAAAATTACAGAGTACACAACATCCATGAAACGCATTAGCACCACCATTACCACCACCATCACCATT IIIIFIIIIBHHHIFIIIIIIIIIHIGIHIIIIIFHIIIIBIHGHIGHIHHICIHCHF;FEDEFDEH< AS:i:0 XN:i:0 XM:i:0 XO:i:0 XG:i:0 NM:i:0 MD:Z:68 YS:i:0 YT:Z:CP

    # tried htseq-count (truncated output)
    $ htseq-count -s no -t gene -i ID out.sorted.sam ../reference.gff
    4965 GFF lines processed.
    Warning: Read HWUSI-EAS1615L:13:FC64RB1AAXX:4:23:7604:14541_1 claims to have an aligned mate which could not be found. (Is the SAM file properly sorted?)
    Warning: Read HWUSI-EAS1615L:13:FC64RB1AAXX:4:70:9040:11393_1 claims to have an aligned mate which could not be found. (Is the SAM file properly sorted?)
    Warning: Read HWUSI-EAS1615L:13:FC64RB1AAXX:4:62:8731:7335_1 claims to have an aligned mate which could not be found. (Is the SAM file properly sorted?)


    Your insight is much appreciated!
  • ugolino
    Member
    • Oct 2011
    • 14

    #2
    I think I figured it out. Bowtie2 outputs by default reads sorted by name. The offending part in the reads is the _1, _2 at the end. Removing those (in vim) fixed the problem and htseq-count works without any sorting needed. Only after couple hours of staring at the reads to figure this out, came across this thread that explains an identical issue. Feeling slow...

    Discussion of next-gen sequencing related bioinformatics: resources, algorithms, open source efforts, etc


    thanks

    Comment

    • ThePresident
      Member
      • Jun 2012
      • 72

      #3
      Simple curiosity (since I've also done RNA-seq on some bacterial species), why have you chosen to do your study with paired-end sequencing?

      TP

      Comment

      • ugolino
        Member
        • Oct 2011
        • 14

        #4
        To align reads with greater confidence, as these strains have many phages and IS elements ( some of which are chromosomal in certain strains and plasmid borne in others ), and their genomes have not been sequenced yet ( so I also sequenced the genomes ).

        Comment

        Latest Articles

        Collapse

        • GATTACAT
          Reply to Nine Things a Sample Prep Scientist Thinks About Before Sequencing
          by GATTACAT
          Love this - good data definitely starts from good input, and poor input can only give relatively poor data. I particularly like the mention of Nanodrop/absorbance based methods for quantification. It's such a toss up if you'll get an accurate reading or what amounts to a randomly generated number, and a lot of library/sequencing related issues can be traced back to poor quant.
          Yesterday, 11:43 AM
        • SEQadmin2
          Nine Things a Sample Prep Scientist Thinks About Before Sequencing
          by SEQadmin2


          I’m not a sequencing expert. I’m a purification scientist who uses NGS to evaluate workflows my group develops. With this perspective, we think about the sample first and the NGS workflow second. The sequencer is an exceptionally honest reporter, but it can only report on what you give it, so whether you get clean, interpretable data from an NGS workflow is largely determined before you begin.

          Here are nine questions we think about, in roughly the order they matter, before...
          06-18-2026, 07:11 AM
        • SEQadmin2
          From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
          by SEQadmin2


          Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


          The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
          ...
          06-02-2026, 10:05 AM

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by SEQadmin2, 06-30-2026, 05:37 AM
        0 responses
        11 views
        0 reactions
        Last Post SEQadmin2  
        Started by SEQadmin2, 06-26-2026, 11:10 AM
        0 responses
        18 views
        0 reactions
        Last Post SEQadmin2  
        Started by SEQadmin2, 06-17-2026, 06:09 AM
        0 responses
        52 views
        0 reactions
        Last Post SEQadmin2  
        Started by SEQadmin2, 06-09-2026, 11:58 AM
        0 responses
        111 views
        0 reactions
        Last Post SEQadmin2  
        Working...