Seqanswers Leaderboard Ad

**blancha** · 04-16-2014, 09:41 AM

Removing reads with any base pairs with a Phred score lower than 30 could definitely be a problem. There are often low quality reads at the end of the reads.

I would redo the trimming. You can also examine the quality of the reads before and after trimming with FASTQC.

You could also examine the BAM files in IGV. You could thus determine if the problem is with htseq-count or further upstream.

**dpryan** · 04-16-2014, 02:03 PM

Originally posted by Rivalyn View Post

I intend to analyze these sequences by first trimming the low quality base pairs (<30 Phred quality score),

This is crazy conservative, try just trimming off adapters and low quality (Phred scores <=5) from the ends of reads. You can do that with trimmomatic or trim_galore or a large number of other programs.

#sort the bam files by name
for f in *.bam; do samtools sort -n "$f" "${f%.*}".sorted; done
#convert bam to sam
for f in *sorted.bam; do samtools view "$f" > "${f%.*}".sam; done
#convert sam to read counts
for f in *.sam; do python -m HTSeq.scripts.count "$f" Mus_musculus.GRCm38.75.gtf > "${f%.*}".readcount.txt ; done

You can actually just:

Code:

for f in *.bam
do
    htseq-count -f bam -r pos "$f" Mus_musculus.GRCm38.75.gtf > "${f%.*}".readcount.txt
done

at least with the newer versions. BTW, remember that the defaults assume a stranded library, which you didn't mention having (so you might also try with "-s no")

Unfortunately, the according to HTSeq, I have no read counts anywhere. The output looks like this:

Did you align to mm10 from UCSC? This sort of thing normally happens when there's a chromosome name mismatch between your reference genome (the fasta file) and the annotation file you use (the GTF from Ensembl in your case). If you aligned to mm10 then the chromosome names in your alignments will be of the form: "chr1", "chr10", "chr11", ... . In the annotation from Ensembl, however, the same chromosomes would be: "1", "10", "11", ... . This is a common cause of problems and it's best to not mix the two (stick to Ensembl, the annotations are MUCH better). Just post the output of "samtools view -H blah.bam" on one of your alignments and we can confirm if this is the case. If that is the case then simply reheadering the alignments will solve the problem (I can show you how should the need arise).

**Rivalyn** · 04-17-2014, 03:13 AM

Thank you both for your prompt responses.

Code:

samtools view -H BirA_rep1_Galaxy.bam | head

@HD	VN:1.0	SO:coordinate
@SQ	SN:chr1	LN:195471971
@SQ	SN:chr10	LN:130694993
@SQ	SN:chr11	LN:122082543
@SQ	SN:chr12	LN:120129022
@SQ	SN:chr13	LN:120421639
@SQ	SN:chr14	LN:124902244
@SQ	SN:chr15	LN:104043685
@SQ	SN:chr16	LN:98207768
@SQ	SN:chr17	LN:94987271

Looks like dpryan was right about the GTF files. I ran the initial alignment on Galaxy and used their built-in reference, which was mm10. I'm told that one should avoid Galaxy and do as much as possible via the console, but since my Linux installation was borked at the time, I thought it would be better than nothing.

Because the QC trimming needs to be redone, I will rerun tophat rather than simply reheadering. However, I'm curious as to how you would go about reheadering should the need arise, so if you can explain how to do that, I would be grateful.

BTW, remember that the defaults assume a stranded library, which you didn't mention having (so you might also try with "-s no")

Forgive me, but what is a "stranded library"?

You could also examine the BAM files in IGV. You could thus determine if the problem is with htseq-count or further upstream.

IGV requires Java, and Java has been... difficult... ever since we got this computer. This is also why I haven't used Trimmomatic, but tried to do the QC trimming with the Fastx toolkit. Fastx has been a disaster, so I've put a significant amount of time into fixing (or trying to) my Java installation. I think I succeeded, but I won't know for sure until I install Trimmomatic and make sense of its documentation (which I will start on right after writing this).

**dpryan** · 04-17-2014, 03:25 AM

Which distro have you been using that java has given such issues?

Regarding stranded libraries, perhaps you know them as "directional" or "strand-specific". In short, the sequencing libraries are created in was such that the strand of origin ("+" or "-") of each sequenced fragment is preserved. Some sequencing facilities will use such a protocol be default but others (ours is one example) won't unless you specifically ask for it. There are also 2 basic types of strand-specific libraries, the first where the orientation of read1's alignment denotes the strand and the second where the opposite is true...so if you used the former but specified the latter then you'd get a lot of aberrant 0 counts.

For reheadering the file, the easiest route would be to simply:

Code:

samtools view -H BirA_rep1_Galaxy.bam > foo.header

Then edit "foo.header" to remove the "chr" prefix and rename "M" to "MT". That's normally sufficient. You can then simply:

Code:

samtools reheader foo.header BirA_rep1_Galaxy.bam > BirA_rep1_Galaxy.Ensembl.bam

**dpryan** · 04-17-2014, 03:26 AM

I should note that you should never change the order of chromosomes when reheadering...that will completely bork the alignments (you'll end up changing what chromosome everything aligns to!).

**Rivalyn** · 04-17-2014, 06:15 AM

Thanks for explaining reheadering. I'll archive that for future reference.

I am reasonably certain that my reads are strand-specific. Is there any way to check that easily? How do I tell what kind of strand-specific library I have?

Which distro have you been using that java has given such issues?

I'm using Ubuntu 13.10. I don't want to get too off topic here, but we had a real mess trying to dual boot Ubuntu and Windows earlier this month. I've had a long standing problem with making files executable. chmod 755, chmod +x etc all run without errors but don't actually have any impact on the executable status of the file in question. After a lot of googling, it looks like somehow the partition that contains the Ubuntu system files was mounted as NTFS instead of... whatever the Linux file system is called. IF this is really the problem, then I'm surprised Ubuntu runs at all.

No idea if this is also causing the issues I'm having with java, but it's my best guess at the moment. I've had a lot of things behave oddly ever since we set up our dual-boot.

I gather that the solution is normally to unmount and remount the afflicted partition as the right kind of file system with some arcane Linux console wizardry. But somehow I don't think it's a good idea to unmount the partition that contains all the Linux system files. That seems like it would be... bad.

**dpryan** · 04-17-2014, 06:22 AM

The easiest way is to simply too at things in IGV. If you color reads according to being read#1 or read#2 (there's an option for that) and look at a random gene or two then it should be immediately apparent what sort of library type you have.

Yeah, dual booting can screw things up nicely. The default fs is ext4 (at least on my ubuntu work station) and I too am surprised that it ran at all with everything being ntfs. You might ply a local IT person with copious amounts of booze to make the issues disappear...

Topics	Statistics	Last Post
A Closer Look at the Enigmatic Genomes of Oikopleura dioica by seqadmin Started by seqadmin, Yesterday, 06:35 AM	0 responses 15 views 0 likes	Last Post by seqadmin Yesterday, 06:35 AM
Advanced Epigenome Editing Platform Explores Gene Regulation Mechanisms by seqadmin Started by seqadmin, 05-09-2024, 02:46 PM	0 responses 21 views 0 likes	Last Post by seqadmin 05-09-2024, 02:46 PM
Telomere Maintenance by PARP1: A New Perspective in Cancer Research by seqadmin Started by seqadmin, 05-07-2024, 06:57 AM	0 responses 18 views 0 likes	Last Post by seqadmin 05-07-2024, 06:57 AM
Enhanced Neoantigen Detection: Introducing NeoHunter by seqadmin Started by seqadmin, 05-06-2024, 07:17 AM	0 responses 19 views 0 likes	Last Post by seqadmin 05-06-2024, 07:17 AM

Seqanswers Leaderboard Ad

Announcement

Problem with HTSeq in RIPseq pipeline

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News