artMS is a Bioconductor package that provides a set of tools for the analysis and integration of large-scale proteomics (mass-spectrometry-based) datasets obtained using the popular proteomics software MaxQuant.
The functions available in artMS can be grouped into the following categories:
For a graphical overview check the slides presented at the 2019 SUMS-RAS (Stanford University Mass Spectrometry - Research Application Symposium)
Before you begin, ensure that your system is running an
R version >= 3.6 or the installation of
artMS won’t work. You can check the R version running on your system by executing the function
If the outcome is
>= 3.6.0, congratulations! you can move forward. If it is not, then you need to install the latest version of R in your system.
Two options to install artMS:
(Why Bioconductor? Here you can find a nice summary of many good reasons).
R (>= 3.6)version running on your system, follow these steps:
Once installed, the package can be loaded and attached to your current workspace as follows:
artMS performs the different analyses taking as input the following files:
Check below to find out more about generating the input files.
artmsQuantification() requires a large number of arguments, specially those related to the statistical package MSstats. To facilite the task of providing all those arguments, the function
artmsQuantification() takes a config file (in
yaml format) for the customization of the parameters for quantification (using
MSstats) and other operations, including QC analyses, charts, and annotations.
A configuration file template can be generated by running
Check below to learn the details of the configuration file.
Generate the input files: Check the input files section for details
Quality Control: if you are interested in performing only quality control analysis, run the following functions:
Relative Quantification: fill up the configuration file and run the following function:
Analysis of Quantifications: performs annotations, clustering analysis, PCA analysis, enrichment analysis by running the function
Miscellaneous functions: Check below to discover more useful functions provided by the
artMS also enables the relative quantification of untargeted polar metabolites using the alignment table generated by MarkerView. This means that the metabolites do not need to have an
ID, as the
retention time will be used as identifiers. Typical workflow:
Run QC on the metabolomics dataset:
artmsQuantification() (notice that a few options must be changed in the config file before running the function)
Please, keep in mind that most of the functions available in artMS don’t work for metabolomics data due to annotation issues (protein/gene ids are the primary ids for most of the functions). Check the metabolomics section to find out more.
Three basic (tab-delimited) files are required to perform the full pack of operations:
The output of the quantitative proteomics software package MaxQuant. It combines all the information about the identified peptides.
Tab delimited file generated by the user. It summarizes the experimental design of the evidence file.
artMS merges the
evidence.txt by the “RawFile” column. Each RawFile corresponds to a unique individual experimental technical replicate / biological replicate / Condition / Run.
For any basic label-free proteomics experiment, the keys file must contain the following columns and rules:
'L'for label free experiments (
'H'will be used for SILAC experiments, see below)
Conditionname, and add as suffix
dash (-)plus the biological replicate number. For example, if condition
H1N1_06Hhas too biological replicates, name them
For more examples, check the artMS data object
Tip: it is recommended to use Microsoft Excel (OpenOffice Cal / or similar) to generate the keys file. Do not forget to choose the format = Tab Delimited Text (.txt) when saving the file (use save as option)
The comparisons between conditions that the user wants to quantify.
WT_A549) relative to two additional experimental conditions with drugs (
WT_DRUG_B), but also changes in protein abundance between
DRUG_B, the contrast file would look like this:
WT_DRUG_A-WT_A549 WT_DRUG_B-WT_A549 WT_DRUG_A-WT_DRUG_B
-), and only one dash symbol is allowed, i.e., only one comparison per line.
As a result of the quantification, the condition on the left will take the positive log2FC sign -if the protein is more abundant in condition on the left (numerator), and the condition on the right the negative log2FC -if a protein is more abundant in condition on the right term (denominator).
Example of wrong comparisons
Only condition names are allowed. Individual Bioreplicates cannot be compared. For example, this is wrong:
The configuration file (in
yaml format) contains a variety of options available for the QC, quantification, and annotations performed by
To generate a sample configuration file, go to the project folder (
setwd(/path/to/your/working/folder/)) and execute:
my_config.yaml file with your favorite editor (RStudio for example). Note: Although the configuration file might look complex, the default options work very well.
The configuration (
yaml) file contains the following sections:
files : evidence : /path/to/the/evidence.txt keys : /path/to/the/keys.txt contrasts : /path/to/the/contrast.txt summary: /path/to/the/summary.txt # Optional output : /path/to/the/results_folder/ph-results.txt
path/name of the required files. It is recommended to create a new folder in your folder project (for example,
results_folder). The results file name (e.g.
-results.txt) will be used as prefix for the several files (
qc: basic: 1 # 1 = yes; 0 = no extended: 1 # 1 = yes; 0 = no extendedSummary: 0 # 1 = yes; 0 = no
Select to perform both ‘basic’ and ‘extended’ quality control based on the
evidence.txt file or ‘extendedSummary’ based on the
summary.txt file. Read below to find out more about the details of each type of analysis.
data: enabled : 1 # 1 = yes; 0 = no fractions: enabled : 0 # 1 for protein fractionation silac: enabled : 0 # 1 for SILAC experiments filters: enabled : 1 contaminants : 1 protein_groups : remove # remove, keep modifications : AB # PH, UB, AC, AB, APMS sample_plots : 1 # correlation plots
Let’s break it down
enabled : 1: to pre-process the data provided in the files section.
0: won’t process the data (and a pre-generated MSstats file will be expected)
fractions: Multiple fractionation or separation methods are often combined in proteomics to improve signal-to-noise and proteome coverage and to reduce interference between peptides in quantitative proteomics.
enabled : 1for fractionation dataset. See Special case: Protein Fractionation below for details
enabled : 0no fractions
enabled : 1: check if the files belong to a SILAC experiment. See Special case: SILAC below for details
enabled : 0: no silac experiment (default)
enabled : 1Enables filtering (this section)
contaminants : 1Removes contaminants (
REV__labeled by MaxQuant)
protein_groups : removechoose whether
modifications : ABany of the proteomics experiments,
ACfor posttranslational modifications,
1Generate correlation plots
msstats : enabled : 1 msstats_input : # `-mss.txt` file or blank (default) profilePlots : none normalization_method : equalizeMedians normalization_reference : # blank (default) if equalizeMedians summaryMethod : TMP censoredInt : NA cutoffCensored : minFeature MBimpute : 1 feature_subset: all
Let’s break it down:
1to run MSstats,
msstats_input :leave it blank if MSstats will be run (previous
enabled : 1). But if MSstats was already run and the
evidence-mss.txtfile is available, then choose
enabled : 0and provide here the
profilePlots :Choose one of the following options:
beforeplot before normalization
afterplot after normalization
before-after: recommended, although computational expensive
noneno normalization plots
normalization_method :available options:
0: no normalization (not recommended)
globalStandardsif selected, specified the reference protein in
normalization_reference :UniProt id if
globalStandardsis chosen as the
summaryMethod :TMP # “TMP”(default) means Tukey’s median polish, which is robust estimation method. “linear” uses linear mixed model. “logOfSum” conducts log2 (sum of intensities) per run.
NA(default) Missing values are censored or at random. ‘NA’ assumes that all ‘NA’s in ’Intensity’ column are censored.
0uses zero intensities as censored intensity. In this case, NA intensities are missing at random. The output from Skyline should use
0. Null assumes that all
NAintensities are randomly missing.
minFeatureCutoff value for censoring. Only with
censoredInt : NAor
0. Default is ‘minFeature’, which uses minimum value for each feature.
minFeatureNRunuses the smallest between minimum value of corresponding feature and minimum value of corresponding run.
minRunuses minimum value for each run.
0. TRUE (default) imputes ‘NA’ or ‘0’ (depending on censoredInt option) by Accelerated failure model.
FALSEuses the values assigned by cutoffCensored.
highQuality: this option seems to be buggy right now
Check MSstats documentation to find out more about every option.
enabled : 1 # if 0, won't process anything on this section annotate : enabled: 1 species : HUMAN plots: volcano: 1 heatmap: 1 LFC : -1 1 # Range of minimal log2fc FDR : 0.05 # adjusted p-value, false discovery rate heatmap_cluster_cols : 0 heatmap_display : log2FC # log2FC or pvalue
Extra actions to perform based on the MSstats results, including annotations and plots (heatmaps and volcano plots). Let’s break it down:
enabled :1 (default) enables this section, 0 turns it off
enabled: 1 (default), will generate a
-results-annotated.txtfile that includes
Protein.Name(only for supported species)
species: The supported species are: HUMAN, MOUSE, ANOPHELES, ARABIDOPSIS, BOVINE, WORM, CANINE, FLY, ZEBRAFISH, ECOLI_STRAIN_K12, ECOLI_STRAIN_SAKAI, CHICKEN, RHESUS, MALARIA, CHIMP, RAT, YEAST, PIG, XENOPUS
plots :options for additional plots
LFC :log2 fold change cutoff (minimal negative and positive value)
FDR :false discovery rate cutoff for significance (recommended: 0.05)
heatmap :correlation plots
heatmap_cluster_cols :1 perfoms clustering of columns, 0 (default) doesn’t
heatmap_display :choose to display either
To handle protein fractionation experiments, two options must be activated
keys.txt: The keys file must contain an additional column named “
FractionKey” with the information about fractions. For example:
config.yaml: Enable fractions in the configuration file as follow:
fractions: enabled : 1 # 1 for protein fractions, 0 otherwise
One of the most widely used techniques that enable relative protein quantification is stable isotope labeling by amino acids in cell culture (SILAC). The
keys.txt file can capture the typical SILAC experiment. The following example shows a SILAC experiment with two conditions, two biological replicates, and two technical replicates:
It is also required to activate the silac option in the yaml file as follows:
silac: enabled : 1 # 1 for SILAC experiments
artMS provides 3 functions to perform QC analyses.
The basic quality control analysis takes as input both the
evidence.txt and keys.txt files and generates several QC plots exploring different aspects of the MS data. Run it as follows:
artmsQualityControlEvidenceBasic( evidence_file = artms_data_ph_evidence, keys_file = artms_data_ph_keys, prot_exp = "PH")
REVreversed sequences used by MaxQuant to estimate the FDR); Box plots of MS Intensity values per biological replicates and conditions; bar plots of total intensity (excluding contaminants) by bioreplicates and conditions; Bar plots of total feature counts by bioreplicates and conditions.
AC) an extra pdf file will be generated with stats related to the selected modification, including: bar plot of peptide counts and intensities, broken by
PTM/othercategories; bar plots of total sum-up of MS intensity values by other/PTM categories.
?artmsQualityControlEvidenceBasic() to find out more options. Remember: by default, all the plots are printed to a
artmsQualityControlEvidenceBasic( evidence_file = artms_data_ph_evidence, keys_file = artms_data_ph_keys, prot_exp = "PH", plotPTMSTATS = TRUE, plotINTDIST = FALSE, plotREPRO = FALSE, plotCORMAT = FALSE, plotINTMISC = FALSE, printPDF = FALSE, verbose = FALSE)
It takes as input the
keys.txt files as follows:
artmsQualityControlEvidenceExtended( evidence_file = artms_data_ph_evidence, keys_file = artms_data_ph_keys)
and generates the following QC plots:
plotCS (qcExtended_evidence.qcplot.ChargeState): charge state distribution of PSMs confidently identified in each BioReplicate.
plotME (qcExtended_evidence.qcplot.MassError.pdf): Distribution of precursor error for all PSMs confidently identified in each BioReplicate.
plotMOCD (qcExtended_evidence.qcplot.MZ.pdf): Distribution of precursor mass-over-charge for all PSMs confidently identified in each BioReplicate.
plotPEPDETECT (qcExtended_evidence.qcplot.PeptideDetection.pdf): frequency of peptide detection across BioReplicates by condition, showing the percentage of peptides detected once, twice, thrice, and so on (based on the number of bioreplicates for each condition).
plotPEPTOVERLAP (qcExtended_evidence..qcplot.PeptidesOverlap.pdf): peptide overlaps across bioreplicates (page 1) and conditions (page 2)
plotPROTOVERLAP (qcExtended_evidence..qcplot.ProteinOverlap.pdf): Protein overlap across bioreplicates (page 1) and conditions (page 2)
plotTYPE (qcExtended_evidence.qcplot.Type.pdf): identification type. MaxQuant classifies each peptide identification into different categories (e.g., MSMS, MULTI-MSMS, MULTI-SECPEP). This plot shows the distribution of identification type in each BioReplicate
plotPCA plots only:
artmsQualityControlEvidenceExtended( evidence_file = artms_data_ph_evidence, keys_file = artms_data_ph_keys, plotPCA = TRUE, plotTYPE = TRUE, plotPEPTIDES = TRUE, plotPSM = FALSE, plotIONS = FALSE, plotPEPTOVERLAP = FALSE, plotPROTEINS = FALSE, plotPROTOVERLAP = FALSE, plotPIO = FALSE, plotCS = FALSE, plotME = FALSE, plotMOCD = FALSE, plotPEPICV = FALSE, plotPEPDETECT = FALSE, plotPROTICV = FALSE, plotPROTDETECT = FALSE, plotIDoverlap = FALSE, plotSP = FALSE, printPDF = FALSE, verbose = FALSE)
It requires two files:
summary.txtfile. As described by MaxQuant’s
table.pdf, the summary file contains summary information for all the raw files processed with a single MaxQuant run, including statistics on the peak detection. The QC analysis of this file gathers a quick overview on the quality of every RawFile based on this
summary.txt. Run it as follows:
artmsQualityControlSummaryExtended(summary_file = "summary.txt", keys_file = artms_data_ph_keys)
It generates the following
plotMS1SCANS (.qcplot.MS1scans.pdf): generates MS1 scan counts plot: Page 1 shows the number of MS1 scans in each BioReplicate. If replicates are present, Page 2 shows the mean number of MS1 scans per condition with error bar showing the standard error of the mean. If
TRUE, each fraction is a stack on the individual bar graphs.
plotMS2 (.qcplot.MS2scans.pdf): generates MS2 scan counts plot: Page 1 shows the number of MSs scans in each BioReplicate. If replicates are present, Page 2 shows the mean number of MS1 scans per condition with error bar showing the standard error of the mean. If
TRUE, each fraction is a stack on the individual bar graphs.
plotMSMS (.qcplot.MSMS.pdf): generates MS2 identification rate (%) plot: Page 1 shows the fraction of MS2 scans confidently identified in each BioReplicate. If replicates are present, Page 2 shows the mean rate of MS2 scans confidently identified per condition with error bar showing the standard error of the mean. If
TRUE, each fraction is a stack on the individual bar graphs.
plotISOTOPE (.qcplot.Isotope.pdf): generates Isotope Pattern counts plot: Page 1 shows the number of Isotope Patterns with charge greater than 1 in each BioReplicate. If replicates are present, Page 2 shows the mean number of Isotope Patterns with charge greater than 1 per condition with error bar showing the standard error of the mean. If
TRUE, each fraction is a stack on the individual bar graphs.
The relative quantification is a fundamental step in the analysis of MS data.
artMS facilitates and simplifies the analysis using MSstats, a fantastic statistical package for the relative quantification of Mass-Spectrometry based proteomics.
All the options and parameters required to run a relative quantification analysis using
MSstats (in addition to other options) are summarized in
artMS through a configuration file in
.yaml format. Check the input-files section to find out more about each of the options.
Different types of proteomics experiments can be quantified including changes in global protein abundance (AB), affinity purification mass spectrometry (APMS), and different type of posttranslational modifications, including phosphorylation (PH), ubiquitination (UB), and acetylation (AC).
artMS also enables the relative quantification of untargeted polar metabolites using the alignment table generated by MarkerView. This means that
artMS does not require an ID for the metabolites, as the m/z and retention time will be combined and used as identifiers.
The quantification of changes in protein abundance between different conditions requires to fill up the following sections of the config file:
files: evidence : /path/to/the/evidence.txt keys : /path/to/the/keys.txt contrasts : /path/to/the/contrast.txt output : /path/to/the/output/results_ptmGlobal/results.txt . . . data: . . . filters: modifications : AB
The remaining options can be left unmodified (and run the default parameters). Then run the following
Warning: This quantification is only possible for experiments that have used methods to enrich phosphopeptides, ubiquitinated, or acetylated peptides prior to the mass spectrometry analysis.
The global phosphorylation, ubiquitination, or acetylation quantification analysis calculates changes in phosphorylation, ubiquitination, or acetylation at the protein level. This means that all the modified peptides are used to quantify changes in protein phosphorylation, ubiquitination, or acetylation between different conditions. The site-specific analysis (explained next) would quantify changes at the peptide level, i.e., each modified peptide is quantify independently between the different conditions.
Only two sections need to be filled up in the default configuration file:
files: evidence : /path/to/the/evidence.txt keys : /path/to/the/keys.txt contrasts : /path/to/the/contrast.txt output : /path/to/the/output/results_ptmGlobal/results.txt . . . data: . . . filters: modifications : PH # Use "UB" for ubiquination, "AC" for acetylation
The remaining options can be left unmodified.
Once the configuration
yaml file is ready, run the following command:
Warning: This quantification is only possible for experiments that have used methods to enrich phosphopeptides or ubiquitinated peptides prior to the mass spectrometry analysis.
site-specific analysis quantifies changes at the modified peptide level. This means that changes in every modified (PH, UB, or AC) peptide of a given protein will be quantified individually. The caveat is that the proportion of missing values should increase relative to the global analysis. Both sites and global ptm analysis are highly correlated due to the usually only one or two peptides drive the overall changes in PTMs for every protein.
To run a site/peptide specific analysis follow these steps:
Leading razor protein,
Leading protein, or
Proteins) and re-annotates it to incorporate the ptm-site/peptide-specific information. By default, this function converts the column
Leading razor protein. This step is computational expensive, which means that it might take several minutes to finish (depending on the size of the fasta database, evidence file, computer power, etc)
It also requires the same reference proteome (fasta sequence database) used for the MaxQuant search.
artmsProtein2SiteConversion( evidence_file = "/path/to/the/evidence.txt", ref_proteome_file = "/path/to/the/reference_proteome.fasta", output_file = "/path/to/the/output/ph-sites-evidence.txt", mod_type = "PH")
As a result, the IDs in the “Leading razor protein” column will contain site/peptide-specific notation. For example:
artmsProtein2SiteConversion( evidence_file = "/path/to/the/evidence.txt", ref_proteome_file = "/path/to/the/reference_proteome.fasta", output_file = "/path/to/the/output/ub-sites-evidence.txt", mod_type = "UB")
artmsProtein2SiteConversion( evidence_file = "/path/to/the/evidence.txt", ref_proteome_file = "/path/to/the/reference_proteome.fasta", output_file = "/path/to/the/output/ac-sites-evidence.txt", mod_type = "AC")
Tip: How to re-annotate all the Protein columns on the same file.
artmsProtein2SiteConversion doesn’t allow to overwrite the
evidence.txt file for security reasons (you don’t want to lose the evidence file if something goes wrong). To overwrite the evidence file the argument
overwrite_evidence must be turned on (
overwrite_evidence = TRUE).
column_name argument is not used,
artmsProtein2SiteConversion converts the
Leading razor protein column, which is used in the quantification step when
protein_groups : remove is selected (default). However, if
protein_groups : keep is used,
artMS will use the
Proteins column. To convert the
Proteins column to the site/peptide-specific notation, then add the argument
column_name = "Proteins".
To annotate both columns of the same file, first generate the “site-evidence.txt” file, and then use this same output file as the
evidence_file and activate
overwrite.evidence = TRUE.
In summary, to annotate both the “Leading razor protein” and
Proteins columns follow these steps:
# Convert 'Leading razor protein' evidence's file column artmsProtein2SiteConversion( evidence_file = "/path/to/the/evidence.txt", # ORIGINAL column_name = "Leading razor protein", ref_proteome_file = "/path/to/the/reference_proteome.fasta", output_file = "/path/to/the/phsites-evidence.txt", # SITES VERSION mod_type = "PH") # Convert 'Proteins' evidence's file column artmsProtein2SiteConversion( evidence_file = "/path/to/the/phsites-evidence.txt", # <- USE SITES VERSION column_name = "Proteins", overwrite_evidence = TRUE, # <--- TURN ON ref_proteome_file = "/path/to/the/reference_proteome.fasta", output_file = "/path/to/the/phsites-evidence.txt", # <- SITES VERSION mod_type = "PH")
ubsites_config.yaml) as explained above, but using the “new”
sites-evidence.txtfile instead of the original
files: evidence : /path/to/the/evidence-site.txt keys : /path/to/the/keys.txt contrasts : /path/to/the/contrast.txt output : /path/to/the/output/results_ptmSITES/sites-results.txt # <- this one . . . data: . . . filters: modifications : PH # <- Don't forget this one.
Once the new
yaml file has been created, execute:
The files generated after succesfully running
artmsQuantification are (based on MSstats documentation):
Protein: Protein ID
Label: comparison (from contrast.txt)
log2FC: log2 fold change
SE: standard error
Tvalue: test statistic of the Student test
DF: degree of freedom of the Student test
pvalue: raw p-values
adj.pvalue: p-values adjusted among all the proteins in the specific comparison using the approach by Benjamini and Hochberg
issue: shows if there is any issue for inference in corresponding protein and comparison, for example, OneConditionMissing or CompleteMissing.
MissingPercentage: percentage of random and censored missing in the corresponding run and protein out of the total number of feature in the corresponding protein.
ImputationPercentage: percentage of imputation
pvalue=NA. For example, if for the comparison
Condition A - Condition Bone protein is completely missed for condition B, then
log2FC = Infand
adj.pvalue = 0.
pvaluewill all be
results.txtbut 3 more columns of annotations, i.e.,
Comprehensive analysis of the quantifications outputs obtained from the function artmsQuantification() section to find out more). It includes:
It takes as input two files generated from the previous quantification step (artmsQuantification())
-results.txt: MSstats quantification results
-results_ModelQC.txt: MSstats normalized abundance values. It will be used to extract details about reproducibility.
To run this analysis:
And then run the following function (e.g., for a protein abundance “AB” experiment)
artmsAnalysisQuantifications(log2fc_file = "ab-results.txt", modelqc_file = "ab-results_ModelQC.txt", species = "human", output_dir = "AnalysisQuantifications")
A few comments on the available options for
isPTM: two options:
"noptm": use for protein abundance (
AB), Affinity Purification-Mass Spectrometry (
APMS), and global analysis of posttranslational modifications (
AC) use the option .
"ptmsites": use for site specific PTM analysis.
species: this downstream analysis supports (for now)
outliers: outliers can be kept (default) or could be removed from the abundance data. Options:
keep: keeps the outliers
iqr: removes any outlier outside +/- 6 x interquartile range from the mean (recommended)
std: it removes any outliers outside +/- 6 x the standard deviation from the mean
TRUE, it will perform enrichment analysis using
enrich = TRUE, the user can provide a background gene list (add the file path as well)
mnbr: Minimal Number of Biological Replicates for imputation. Missing values will be imputed. This argument is set to specify the minimal number of biological replicates that are required in at least one of the conditions, for all the proteins. For example,
mnbr = 2would indicate that only proteins found in at least two biological replicates will be imputed. CAUTION:
mnbrwould also add the constrain that any protein must be identified in at least
nmbrbiological replicates of the same condition or it will be filtered out. That is, if
mnbr = 2, a protein found in two conditions but only in one biological replicate in each of them, it would be removed.
l2fc_thres: log2fc cutoff for enrichment analysis, absolute value, e.g., if it is set to 1, it will consider significant
log2fc > +1and
log2fc < -1.
ipval: select whether
adjpvalueid used for the analysis. The default option is
adjpvalue(multiple testing correction). But if the number of biological replicates for a given experiment is too low (for example n = 2), then
Notes on Imputation: artMS imputes the missing values by default. The
nmbr can be used to specify the minimal number of biological replicates required to impute the missing values on the condition for which the protein is not missed (default: 2). That is, if one protein is fully missed in one condition but found in at least 2 biological replicates, the MS intensity value will be imputed for that protein in the missed condition and the log2FC value recalculated. The missing values will be imputed from randomly sampling the range within the lowest 5 MS intensities.
Summary file (
Reminder: for any given relative quantification, as for example WT-Mutant:
log2fc > 0) are more abundant in the condition on the left / numerator (WT)
log2fc < 0) are more abundant in the condition on the right / denominator (Mutant)
The summary excel file (
results-summary.xlsx) gathers several tabs:
log2fcImputed: includes quantitative results.
yes/no) indicates whether the iLog2FC value has been imputed according to the
nmbrcriteria (see above)
wide_iLog2fc: log2fc values (including imputed values) in wide format, i.e., each row is a unique protein/ptmsite. The columns shows the values for each of the comparisons.
wide_iPvalue: same as before, but for pvalues (including imputed)
enrichALL: enrichment analysis using GProfileR for all the proteins changing significantly in any direction (ab(log2fc) > 0 and pvalue < 0.05)
enrich-MACpos: enrichment of only the positive significant changes (log2fc > 1, pvalue < 0.05)
enirch-MACneg: enrichment of only the negative significant changes (log2fc < -1, pvalue < 0.05)
enMACallCorum, enMACposCorum, enMACnegCorum: same as above but only for protein complex enrichment analysis (based on CORUM)
results-log2fc-long.txt: same as the
log2fcImputedtab from the summary file
results-log2fc-wide.txt: wide version (i.e., each row is an individual protein) of pvalues and adj.pvalues for each comparison
Gene Enrichment analysis: enrichment analysis only supported for human and mouse. Check the GprofileR documentation to find out more about the details:
results-enrich-MAC-allsignificants.txt: all significant changes (abs(log2fc) > 1 & pvalue < 0.05)
results-enrich-MAC-positives.txt: only positive significant changes (log2fc > 1 & pvalue < 0.05)
results-enrich-MAC-negatives.txt: all significant changes (based on p-value only)
Protein Complex Enrichment analysis (based on CORUM)
Based on relative abundance
Based on significant changes
artMS also provides a number of very handy functions.
Takes the given
columnid (of Uniprot IDs) from the input data.frame, and map the gene symbol, name, and entre id (source: bioconductor annotation packages)
# This example adds annotations to the evidence file available in # artMS, based on the column 'Proteins'. evidence_anno <- artmsAnnotationUniprot(x = artms_data_ph_evidence, columnid = 'Proteins', species = 'human')
Taking as input the evidence file, it will summarize and return back the average intensity, average retention time, and the average calibrated retention time for each protein. If a list of proteins is provided, then only those proteins will be summarized and returned. Check
?artmsAvgIntensityRT() to find out more options.
artmsAvgIntensityRT(evidence_file = '/path/to/the/evidence.txt)
Changes a given column name in the input data.frame
artms_data_ph_evidence <- artmsChangeColumnName( dataset = artms_data_ph_evidence, oldname = "Phospho..STY.", newname = "PH_STY")
Protein abundance dot plots for each unique uniprot id. It can take a long time
Enrichment analysis based on a data.frame with
Label protein (i.e, typical MSstats results)
# The data must be annotated (Protein and Gene columns) data_annotated <- artmsAnnotationUniprot( x = artms_data_ph_msstats_results, columnid = "Protein", species = "human") # And then the enrichment enrich_set <- artmsEnrichLog2fc( dataset = data_annotated, species = "human", background = unique(data_annotated$Gene), verbose = FALSE)
Function that simplifies enrichment analysis using gProfileR
# annotate the MSstats results to get the Gene name data_annotated <- artmsAnnotationUniprot( x = artms_data_ph_msstats_results, columnid = "Protein", species = "human") # Filter the list of genes with a log2fc > 2 filtered_data <- unique(data_annotated$Gene[which(data_annotated$log2FC > 2)]) # And perform enrichment analysis data_annotated_enrich <- artmsEnrichProfiler( x = filtered_data, categorySource = c('KEGG'), species = "hsapiens", background = unique(data_annotated$Gene))
Converts the MaxQuant evidence file to the 3 required files by SAINTexpress. Choose one of the following quantitative MS metrics:
Converts the MaxQuant evidence file to the required files by SAINTq. The user can filter based on either peptides with spectral counts (use
msspc) or all the peptides (use
all) for the analysis. The quantitative metric can be also chosen (either MS intensity or spectral counts)
It generates the Phosfate input file from the
imputedL2fcExtended.txt file resulting from running the
artmsAnalysisQuantifications() on a ph-site quantification (see above). Notice that the only species suported by PHOTON is humans.
It generates the Photon input file from the
imputedL2fcExtended.txt file resulting from running the
artmsAnalysisQuantifications() on a ph-site quantification (see above). Please, notice that the only species suported by PHOTON is humans.
Remove contaminants and erroneously identified ‘reverse’ sequences by MaxQuant, in addition to empty protein ids
evidencefiltered <- artmsFilterEvidenceContaminants(x = artms_data_ph_evidence)
Generate extended detailed ph-site file, where every line is a ph site instead of a peptide. Therefore, if one peptide has multiple ph sites it will be breaking down in multiple extra lines for each of the sites.
artMS enables the relative quantification of untargeted polar metabolites using the alignment table generated by MarkerView. This means that the metabolites do not need to have an id in order to perform the quantification, as the m/z and retention time will be used as identifiers.
MarkerView is an ABSciex software that supports the files generated by Analyst software (
.wiff) used to run our specific mass spectrometer (ABSciex Triple TOF 5600+). It also supports
.t2d files generated by the Applied Biosystems 4700/4800 MALDI-TOF.
Markview is used to align mass spectrometry data from several samples for comparison. Using the import feature in the software,
.wiff files (also
.t2d MALDI-TOF files and tab-delimited
.txt mass spectra data in mass-intensity format) are loaded for retention time alignment. Once the data files are selected, a series of windows will appear wherein peak finding, alignment, and filtering options can be entered and selected. These options include minimum spectral peak width, minimum retention time peak width, retention time and mass tolerance, and the ability to filter out peaks that do not appear in more than a user selected number of samples.
The alignment file is further processed and formatted to perform QC and relative quantification using the following
Pre-process the markview
.txt file to generate an “evidence-like” file by running:
Perform quality control analysis on the metabolomics data by running:
artmsQualityControlMetabolomics(evidence_file = "metabolomics-evidence.txt", keys_file = "metabolomics-keys.txt")
It generates the following plots:
plotINTDIST.pdfcontains both Box-dot plot and Jitter plot of biological replicates based on MS (raw) intensity values.
plotREPRO.pdfcorrelation dotplot for all the combinations of biological replicates of conditions, based on MS Intensity values using features (mz_rt+charge)
plotCORMAT.pdf, includes up to 3 pdf files for technical replicates, biological replicates, and conditions. Each pdf file contains:
plotINTMISC.pdfthe pdf contains several pages, including bar plots of Total Sum of Intensities in BioReplicates, Total Sum of Intensities in Conditions, Total Feature Counts in BioReplicates, Total Feature Counts in conditions separated by categories (INT: has a intensity value NOINT: no intensity value ) Box plots of MS Intensity values per biological replicates and conditions; bar plots of total intensity by bioreplicates and conditions; Barplots of total feature counts by bioreplicates and conditions.
The relative quantification is performed using
MSstats. It requires a configuration file (
yaml format, please check above). A template can be generated by running:
artmsWriteConfigYamlFile(config_file_name = "metab_config.yaml"). The relative quantification is performed by running:
The artMS package provides the following testing datasets
Phosphoproteomics dataset: example dataset consisting of two head and neck cancer cell lines (conditions
"HSC6"), 2 biological replicates each). The number of peptides was reduced to 1/8 due to bioconductor limitations on data size.
artms_data_ph_msstats_results: results after running
artmsQuantification()on the reduced version
The full data set (2 conditions, 4 biological replicates) can be found at the following urls:
url_evidence <- 'http://kroganlab.ucsf.edu/artms/ph/evidence.txt'
url_keys <- 'http://kroganlab.ucsf.edu/artms/ph/keys.txt'
Protein Complexes dataset: downloaded (2017-08-01) from CORUM database
and further enriched with annotations of mouse mitochondrial complexes not available at CORUM. Used for complex enrichment calculations.
Pathogens Uniprot IDs:
artms_data_pathogen_LPN: Legionella pneumophila philadelphia (downloaded 2017-07-17)
artms_data_pathogen_TB: Mycobacterium tuberculosis strain ATCC 35801 / TMC 107 / Erdman (downloaded 2018-04-01)
Check the individual help pages (e.g,
?artms_data_ph_evidence) to find out more about them.