yes. It did work.. I then split the fastq files per barcode using fastx barcode splitter. However, it still did not solve my problem of less number of reads being aligned after running tophat. Also, fastq files I obtained from fastx and casava were totally different!
Seqanswers Leaderboard Ad
Collapse
Announcement
Collapse
No announcement yet.
X
-
-
well my barcodes are not illumina but are nugen. I ran casava normally with the added
use-bases-mask parameter. it did not complain and generated fastq files. When I ran tophat with these files, it somehow could not align most of the reads. Final read count of SAM files was in thousands or even less in some cases.
I then generated 1 fastq files per lane through casava ignoring the barcodes. Then used barcodespliiter to split the fastq file according to the barcode.
For any sample, fastq file generated this way did not match with the one generated by casava. (in terms of number of lines as well as contents).
Also, tophat alignment does better job then the previous version. But the line counts of the SAM file are still not in millions.. I am not sure of my results at this point.
Comment
-
I ran barsplitter as follows:
cat combined.fastq | fastx_barcode_splitter.pl --bcfile ../barcode1.txt --bol --mismatches 1 --prefix "lane1" --suffix ".fastq"
It creates separate fastq files but barcodes are retained in the file. So I removed those (first 4) first using:
fastx_trimmer -i fastqfile -o trim_fastqfile -f 5 -l 50 -Q 33
then ran tophat on the fastq files.
Comment
Latest Articles
Collapse
-
by seqadmin
The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist on Modified Bases...-
Channel: Articles
Today, 07:01 AM -
-
by seqadmin
Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...-
Channel: Articles
04-04-2024, 04:25 PM -
ad_right_rmr
Collapse
News
Collapse
Topics | Statistics | Last Post | ||
---|---|---|---|---|
Started by seqadmin, 04-11-2024, 12:08 PM
|
0 responses
37 views
0 likes
|
Last Post
by seqadmin
04-11-2024, 12:08 PM
|
||
Started by seqadmin, 04-10-2024, 10:19 PM
|
0 responses
41 views
0 likes
|
Last Post
by seqadmin
04-10-2024, 10:19 PM
|
||
Started by seqadmin, 04-10-2024, 09:21 AM
|
0 responses
35 views
0 likes
|
Last Post
by seqadmin
04-10-2024, 09:21 AM
|
||
Started by seqadmin, 04-04-2024, 09:00 AM
|
0 responses
54 views
0 likes
|
Last Post
by seqadmin
04-04-2024, 09:00 AM
|
Comment