R/annotations.R
artmsLeaveOnlyUniprotEntryID.Rd
Downloading a Reference Uniprot fasta database includes several Uniprot IDs for every protein. If the regular expression available in Maxquant is not activated, the full id will be used in the Proteins, Lead Protein, and Leading Razor Protein columns. This script leaves only the Entry ID.
For example, values in a Protein column like this:
sp|P12345|Entry_name;sp|P54321|Entry_name2
will be replace by
`P12345;P54321``
artmsLeaveOnlyUniprotEntryID(x, columnid)
x | (data.frame) that contains the |
---|---|
columnid | (char) Column name with the full uniprot ids |
(data.frame) with only Entry IDs.
# Example of data frame with full uniprot ids and sequences p <- c("sp|A6NIE6|RN3P2_HUMAN;sp|Q9NYV6|RRN3_HUMAN", "sp|A7E2V4|ZSWM8_HUMAN", "sp|A5A6H4|ROA1_PANTR;sp|P09651|ROA1_HUMAN;sp|Q32P51|RA1L2_HUMAN", "sp|A0FGR8|ESYT2_HUMAN") s <- c("ALENDFFNSPPRK", "GWGSPGRPK", "SSGPYGGGGQYFAK", "VLVALASEELAK") evidence <- data.frame(Proteins = p, Sequences = s, stringsAsFactors = FALSE) # Replace the Proteins column with only Entry ids evidence <- artmsLeaveOnlyUniprotEntryID(x = evidence, columnid = "Proteins")