Hi,
I have a Trinotate annotation report saved as Excel CSV. The annotation report has 15 columns for every entry. There are ~700,000 entries of annotation info.
I also have a list of Trinity transcript ID's (1 column, ~14,000 rows).
Some of the Trinity transcript ID's in my single-column list match up to the transcript ID's in my Trinotate annotation report. I would like to extract (or remove - whatever is easier) the entries from my Trinotate annotation report, based on the column of transcript ID's, that match my separate list of Trinity transcript ID's,
The Trinity transcript ID's are found in the second column of my Trinotate report, and the transcript ID's in the separate list are in the first and only column. I want to remove all Trinotate annotation report entries, based on "transcript_id" in column 2, that do not match my separate list of Trinity transcript ID's.
I'm trying to filter these in Excel, but my lap top may soon catch fire as the Worksheet is too big. I'm sure a simple line of syntax using awk or sed could accomplish this, I'm just not very savvy with text processing program language yet (but I'm working on it!). Any help with this would be very appreciated! Thanks for considering my post.
M
I have a Trinotate annotation report saved as Excel CSV. The annotation report has 15 columns for every entry. There are ~700,000 entries of annotation info.
I also have a list of Trinity transcript ID's (1 column, ~14,000 rows).
Some of the Trinity transcript ID's in my single-column list match up to the transcript ID's in my Trinotate annotation report. I would like to extract (or remove - whatever is easier) the entries from my Trinotate annotation report, based on the column of transcript ID's, that match my separate list of Trinity transcript ID's,
The Trinity transcript ID's are found in the second column of my Trinotate report, and the transcript ID's in the separate list are in the first and only column. I want to remove all Trinotate annotation report entries, based on "transcript_id" in column 2, that do not match my separate list of Trinity transcript ID's.
I'm trying to filter these in Excel, but my lap top may soon catch fire as the Worksheet is too big. I'm sure a simple line of syntax using awk or sed could accomplish this, I'm just not very savvy with text processing program language yet (but I'm working on it!). Any help with this would be very appreciated! Thanks for considering my post.
M
Comment