diff --git a/doc/sphinx/source/api/esmvaltool.diag_scripts.perfmetrics.rst b/doc/sphinx/source/api/esmvaltool.diag_scripts.perfmetrics.rst
new file mode 100644
index 0000000000..a4936689bc
--- /dev/null
+++ b/doc/sphinx/source/api/esmvaltool.diag_scripts.perfmetrics.rst
@@ -0,0 +1,20 @@
+.. _api.esmvaltool.diag_scripts.perfmetrics:
+
+Performance Metrics
+===================
+
+This module contains various reusable diagnostics and plot scripts.
+
+
+Examples
+--------
+
+* :ref:`recipe_perfmetrics_python <recipe_perfmetrics_python>`
+
+
+Diagnostic scripts
+------------------
+.. toctree::
+   :maxdepth: 1
+
+   esmvaltool.diag_scripts.perfmetrics/portrait_plot.rst
diff --git a/doc/sphinx/source/api/esmvaltool.diag_scripts.perfmetrics/portrait_plot.rst b/doc/sphinx/source/api/esmvaltool.diag_scripts.perfmetrics/portrait_plot.rst
new file mode 100644
index 0000000000..3679d7015b
--- /dev/null
+++ b/doc/sphinx/source/api/esmvaltool.diag_scripts.perfmetrics/portrait_plot.rst
@@ -0,0 +1,6 @@
+.. _api.esmvaltool.diag_scripts.perfmetrics.portrait_plot:
+
+Plot performance metrics of multiple datasets vs up to four references
+======================================================================
+
+.. automodule:: esmvaltool.diag_scripts.perfmetrics.portrait_plot
diff --git a/doc/sphinx/source/api/esmvaltool.rst b/doc/sphinx/source/api/esmvaltool.rst
index b080b81ac8..f3553d7056 100644
--- a/doc/sphinx/source/api/esmvaltool.rst
+++ b/doc/sphinx/source/api/esmvaltool.rst
@@ -29,3 +29,4 @@ Diagnostic Scripts
    esmvaltool.diag_scripts.ocean
    esmvaltool.diag_scripts.psyplot_diag
    esmvaltool.diag_scripts.seaborn_diag
+   esmvaltool.diag_scripts.perfmetrics
diff --git a/doc/sphinx/source/recipes/index.rst b/doc/sphinx/source/recipes/index.rst
index edcc48977a..53ca7e08fb 100644
--- a/doc/sphinx/source/recipes/index.rst
+++ b/doc/sphinx/source/recipes/index.rst
@@ -69,6 +69,7 @@ Climate metrics
    :maxdepth: 1
 
    recipe_perfmetrics
+   recipe_perfmetrics_python
    recipe_smpi
 
 Future projections
diff --git a/doc/sphinx/source/recipes/recipe_perfmetrics.rst b/doc/sphinx/source/recipes/recipe_perfmetrics.rst
index 067b65af85..25a32eb6d8 100644
--- a/doc/sphinx/source/recipes/recipe_perfmetrics.rst
+++ b/doc/sphinx/source/recipes/recipe_perfmetrics.rst
@@ -3,12 +3,26 @@
 Performance metrics for essential climate parameters
 ====================================================
 
+.. note::
+
+   We are working on a reimplemenation of the
+   :ref:`performance metrics in python <recipe_perfmetrics_python>`.
+
 Overview
 --------
 
-The goal is to create a standard recipe for the calculation of performance metrics to quantify the ability of the models to reproduce the climatological mean annual cycle for selected "Essential Climate Variables" (ECVs) plus some additional corresponding diagnostics and plots to better understand and interpret the results.
+The goal is to create a standard recipe for the calculation of performance
+metrics to quantify the ability of the models to reproduce the climatological
+mean annual cycle for selected "Essential Climate Variables" (ECVs) plus some
+additional corresponding diagnostics and plots to better understand and
+interpret the results.
+
+The recipe can be used to calculate performance metrics at different vertical
+levels (e.g., 5, 30, 200, 850 hPa as in
+`Gleckler et al. (2008) <http://dx.doi.org/10.1029/2007JD008972>`_) and in
+different regions. As an additional reference, we consider
+`Righi et al. (2015) <https://doi.org/10.5194/gmd-8-733-2015>`_.
 
-The recipe can be used to calculate performance metrics at different vertical levels (e.g., 5, 30, 200, 850 hPa as in `Gleckler et al. (2008) <http://dx.doi.org/10.1029/2007JD008972>`_ and in different regions. As an additional reference, we consider `Righi et al. (2015) <https://doi.org/10.5194/gmd-8-733-2015>`_.
 
 Available recipes and diagnostics
 -----------------------------------
@@ -21,12 +35,19 @@ Recipes are stored in recipes/
 
 Diagnostics are stored in diag_scripts/perfmetrics/
 
-* main.ncl: calculates and (optionally) plots annual/seasonal cycles, zonal means, lat-lon fields and time-lat-lon fields. The calculated fields can also be plotted as difference w.r.t. a given reference dataset. main.ncl also calculates RMSD, bias and taylor metrics. Input data have to be regridded to a common grid in the preprocessor. Each plot type is created by a separated routine, as detailed below.
+* main.ncl: calculates and (optionally) plots annual/seasonal cycles, zonal
+  means, lat-lon fields and time-lat-lon fields. The calculated fields can also
+  be plotted as difference w.r.t. a given reference dataset. main.ncl also
+  calculates RMSD, bias and taylor metrics. Input data have to be regridded to
+  a common grid in the preprocessor. Each plot type is created by a separated
+  routine, as detailed below.
 * cycle.ncl: creates an annual/seasonal cycle plot.
 * zonal.ncl: creates a zonal (lat-pressure) plot.
 * latlon.ncl: creates a lat-lon plot.
-* cycle_latlon.ncl: precalculates the metrics for a time-lat-lon field, with different options for normalization.
-* collect.ncl: collects and plots the metrics previously calculated by cycle_latlon.ncl.
+* cycle_latlon.ncl: precalculates the metrics for a time-lat-lon field, with
+  different options for normalization.
+* collect.ncl: collects and plots the metrics previously calculated by
+  cycle_latlon.ncl.
 
 User settings in recipe
 -----------------------
@@ -37,9 +58,12 @@ User settings in recipe
 
    *Required settings (scripts)*
 
-   * plot_type: cycle (time), zonal (plev, lat), latlon (lat, lon), cycle_latlon (time, lat, lon), cycle_zonal (time, plev, lat)
+   * plot_type: cycle (time), zonal (plev, lat), latlon (lat, lon), cycle_latlon
+     (time, lat, lon), cycle_zonal (time, plev, lat)
    * time_avg: type of time average (monthlyclim, seasonalclim, annualclim)
-   * region: selected region (global, trop, nhext, shext, nhtrop, shtrop, nh, sh, nhmidlat, shmidlat, nhpolar, shpolar, eq)
+   * region: selected region (global, trop, nhext, shext, nhtrop, shtrop, nh,
+     sh, nhmidlat, shmidlat, nhpolar, shpolar, eq)
+
 
    *Optional settings (scripts)*
 
@@ -51,9 +75,12 @@ User settings in recipe
    * projection: map projection for plot_type latlon (default: CylindricalEquidistant)
    * plot_diff: draws difference plots (default: False)
    * calc_grading: calculates grading metrics (default: False)
-   * stippling: uses stippling to mark statistically significant differences (default: False = mask out non-significant differences in gray)
-   * show_global_avg: diplays the global avaerage of the input field as string at the top-right of lat-lon plots (default: False)
-   * annots: choose the annotation style, e.g. ```alias``` which would display the alias of the dataset as title (applies to plot_type zonal and cycle_zonal)
+   * stippling: uses stippling to mark statistically significant differences
+     (default: False = mask out non-significant differences in gray)
+   * show_global_avg: displays the global avaerage of the input field as string
+     at the top-right of lat-lon plots (default: False)
+   * annots: choose the annotation style, e.g. ```alias``` which would display
+     the alias of the dataset as title (applies to plot_type zonal and cycle_zonal)
    * metric: chosen grading metric(s) (if calc_grading is True)
    * normalization: metric normalization (for RMSD and BIAS metrics only)
    * abs_levs: list of contour levels for absolute plot
@@ -114,8 +141,8 @@ User settings in recipe
 
    *Optional settings (scripts)*
 
-   * label_lo: adds lower triange for values outside range
-   * label_hi: adds upper triange for values outside range
+   * label_lo: adds lower triangle for values outside range
+   * label_hi: adds upper triangle for values outside range
    * cm_interval: min and max color of the color table
    * cm_reverse: reverses the color table
    * sort: sorts datasets in alphabetic order (excluding MMM)
@@ -157,14 +184,21 @@ Variables
 Observations and reformat scripts
 ---------------------------------
 
-The following list shows the currently used observational data sets for this recipe with their variable names and the reference to their respective reformat scripts in parentheses. Please note that obs4MIPs data can be used directly without any reformating. For non-obs4MIPs data use `esmvaltool data info DATASET` or see headers of cmorization scripts (in `/esmvaltool/cmorizers/data/formatters/datasets/
-<https://github.com/ESMValGroup/ESMValTool/blob/main/esmvaltool/cmorizers/data/formatters/datasets/>`_) for downloading and processing instructions.
+The following list shows the currently used observational data sets for this
+recipe with their variable names and the reference to their respective reformat
+scripts in parentheses. Please note that obs4MIPs data can be used directly
+without any reformatitng. For non-obs4MIPs data use `esmvaltool data info DATASET`
+or see headers of cmorization scripts (in `/esmvaltool/cmorizers/data/formatters/datasets/
+<https://github.com/ESMValGroup/ESMValTool/blob/main/esmvaltool/cmorizers/data/formatters/datasets/>`_)
+for downloading and processing instructions.
+
 #.  recipe_perfmetrics_CMIP5.yml
 
     * AIRS (hus - obs4MIPs)
     * CERES-EBAF (rlut, rlutcs, rsut, rsutcs - obs4MIPs)
     * ERA-Interim (tas, ta, ua, va, zg, hus - esmvaltool/cmorizers/data/formatters/datasets/era-interim.py)
-    * ESACCI-AEROSOL (od550aer, od870aer, od550abs, od550lt1aer - esmvaltool/cmorizers/data/formatters/datasets/esacci-aerosol.ncl)
+    * ESACCI-AEROSOL (od550aer, od870aer, od550abs, od550lt1aer -
+      esmvaltool/cmorizers/data/formatters/datasets/esacci-aerosol.ncl)
     * ESACCI-CLOUD (clt - esmvaltool/cmorizers/data/formatters/datasets/esacci-cloud.ncl)
     * ESACCI-OZONE (toz - esmvaltool/cmorizers/data/formatters/datasets/esacci-ozone.ncl)
     * ESACCI-SOILMOISTURE (sm - esmvaltool/cmorizers/data/formatters/datasets/esacci_soilmoisture.ncl)
@@ -190,9 +224,13 @@ The following list shows the currently used observational data sets for this rec
 References
 ----------
 
-* Gleckler, P. J., K. E. Taylor, and C. Doutriaux, Performance metrics for climate models, J. Geophys. Res., 113, D06104, doi: 10.1029/2007JD008972 (2008).
+* Gleckler, P. J., K. E. Taylor, and C. Doutriaux, Performance metrics for climate models, J.
+  Geophys. Res., 113, D06104, doi: 10.1029/2007JD008972 (2008).
+
+* Righi, M., Eyring, V., Klinger, C., Frank, F., Gottschaldt, K.-D., Jöckel, P.,
+  and Cionni, I.: Quantitative evaluation of oone and selected climate parameters in a set of EMAC simulations,
+  Geosci. Model Dev., 8, 733, doi: 10.5194/gmd-8-733-2015 (2015).
 
-* Righi, M., Eyring, V., Klinger, C., Frank, F., Gottschaldt, K.-D., Jöckel, P., and Cionni, I.: Quantitative evaluation of oone and selected climate parameters in a set of EMAC simulations, Geosci. Model Dev., 8, 733, doi: 10.5194/gmd-8-733-2015 (2015).
 
 Example plots
 -------------
@@ -200,17 +238,24 @@ Example plots
 .. figure:: /recipes/figures/perfmetrics/perfmetrics_fig_1.png
    :width: 90%
 
-   Annual cycle of globally averaged temperature at 850 hPa (time period 1980-2005) for different CMIP5 models (historical simulation) (thin colored lines) in comparison to ERA-Interim (thick yellow line) and NCEP-NCAR-R1 (thick black dashed line) reanalysis data.
+   Annual cycle of globally averaged temperature at 850 hPa (time period 1980-2005)
+   for different CMIP5 models (historical simulation) (thin colored lines) in comparison to
+   ERA-Interim (thick yellow line) and NCEP-NCAR-R1 (thick black dashed line) reanalysis data.
 
 .. figure:: /recipes/figures/perfmetrics/perfmetrics_fig_2.png
    :width: 90%
 
-   Taylor diagram of globally averaged temperature at 850 hPa (ta) and longwave cloud radiative effect (lwcre) for different CMIP5 models (historical simulation, 1980-2005). Reference data (REF) are ERA-Interim for temperature (1980-2005) and CERES-EBAF (2001-2012) for longwave cloud radiative effect.
+   Taylor diagram of globally averaged temperature at 850 hPa (ta) and longwave cloud
+   radiative effect (lwcre) for different CMIP5 models (historical simulation, 1980-2005).
+   Reference data (REF) are ERA-Interim for temperature (1980-2005) and CERES-EBAF (2001-2012)
+   for longwave cloud radiative effect.
 
 .. figure:: /recipes/figures/perfmetrics/perfmetrics_fig_3.png
    :width: 90%
 
-   Difference in annual mean of zonally averaged temperature (time period 1980-2005) between the CMIP5 model MPI-ESM-MR (historical simulation) and ERA-Interim. Stippled areas indicdate differences that are statistically significant at a 95% confidence level.
+   Difference in annual mean of zonally averaged temperature (time period 1980-2005) between the
+   CMIP5 model MPI-ESM-MR (historical simulation) and ERA-Interim. Stippled areas indicdate
+   differences that are statistically significant at a 95% confidence level.
 
 .. figure:: /recipes/figures/perfmetrics/perfmetrics_fig_4.png
    :width: 90%
@@ -221,4 +266,9 @@ Example plots
    :width: 90%
    :align: center
 
-   Relative space-time root-mean-square deviation (RMSD) calculated from the climatological seasonal cycle of CMIP5 simulations. A relative performance is displayed, with blue shading indicating better and red shading indicating worse performance than the median of all model results. A diagonal split of a grid square shows the relative error with respect to the reference data set (lower right triangle) and the alternative data set (upper left triangle). White boxes are used when data are not available for a given model and variable.
+   Relative space-time root-mean-square deviation (RMSD) calculated from the climatological
+   seasonal cycle of CMIP5 simulations. A relative performance is displayed, with blue shading
+   indicating better and red shading indicating worse performance than the median of all model results.
+   A diagonal split of a grid square shows the relative error with respect to the reference data set
+   (lower right triangle) and the alternative data set (upper left triangle).
+   White boxes are used when data are not available for a given model and variable.
diff --git a/doc/sphinx/source/recipes/recipe_perfmetrics_python.rst b/doc/sphinx/source/recipes/recipe_perfmetrics_python.rst
new file mode 100644
index 0000000000..8f3f7f173d
--- /dev/null
+++ b/doc/sphinx/source/recipes/recipe_perfmetrics_python.rst
@@ -0,0 +1,146 @@
+.. _recipe_perfmetrics_python:
+
+Performance metrics for essential climate parameters in python
+==============================================================
+
+.. note::
+
+      This recipe uses python diagnostics to reproduce parts of the evaluation
+      done in the
+      :ref:`original recipe based on NCL diagnostics <nml_perfmetrics>`:
+      It aims for a complete replacement of all involved NCL diagnostics. So
+      far, only portrait plots (including performance metrics) are supported.
+
+Overview
+--------
+
+The goal is to create a standard recipe for the calculation of performance metrics to quantify the ability of the models to reproduce the climatological mean annual cycle for selected "Essential Climate Variables" (ECVs) plus some additional corresponding diagnostics and plots to better understand and interpret the results.
+
+The recipe can be used to calculate performance metrics at different vertical levels (e.g., 5, 30, 200, 850 hPa as in `Gleckler et al. (2008) <http://dx.doi.org/10.1029/2007JD008972>`_) and in different regions. As an additional reference, we consider `Righi et al. (2015) <https://doi.org/10.5194/gmd-8-733-2015>`_.
+Brief description of the diagnostic.
+
+
+Available recipes and diagnostics
+---------------------------------
+
+Recipes are stored in esmvaltool/recipes/
+
+    * recipe_perfmetrics_python.yml
+    * recipe_perfmetrics_CMIP5_python.yml
+
+Diagnostics are stored in esmvaltool/diag_scripts/perfmetrics/
+
+    * portrait_plot.py: Plot metrics for any variable for multiple datasets and
+      up to four references.
+
+
+User settings in recipe
+-----------------------
+
+#. Script perfmetrics/portrait_plot.py
+
+   This plot expects a scalar value in each input file and at most one input
+   file for each subset of metadata that belongs to a cell or part of cell in
+   the figure.
+   By default cells are plotted for combinations of `short_name`,
+   `dataset`, `project` and `split`.
+   Where `split` is an optional extra_facet for variables.
+   However, all this can be customized using the `x_by`,
+   `y_by`, `group_by` and `split_by` script settings.
+   For a complete and detailed list of settings see the
+   :ref:`API documentation <api.esmvaltool.diag_scripts.perfmetrics.portrait_plot>`.
+   While this allows very flexible use for any kind of data, there are some
+   limitations as well: The grouping (separated
+   plots in the figure) and normalization is always applied along the x-axis.
+   With default settings this means normalizing all metrics for each variable
+   and grouping all datasets by project.
+
+   To plot distance metrics like RMSE, pearson R, bias etc. the
+   :func:`distance_metrics <esmvalcore.preprocessor.derive>` preprocessor or
+   custom diagnostics can be used.
+
+
+
+Variables
+---------
+
+.. note::
+
+   The recipe generally works for any variable that is preprocessed correctly.
+   To use different preprocessors or reference datasets it could be useful
+   to create different variable groups and link them with the same extra_facet
+   like `variable_name` See recipe for examples. Below listed are the variables
+   needed to produce the example figures.
+
+
+#.  recipe_perfmetrics_CMIP5.yml
+
+   * clt (atmos, monthly mean, longitude latitude time)
+   * hus (atmos, monthly mean, longitude latitude lev time)
+   * od550aer, od870aer, od550abs, od550lt1aer (aero, monthly mean, longitude latitude time)
+   * pr (atmos, monthly mean, longitude latitude time)
+   * rlut, rlutcs, rsut, rsutcs (atmos, monthly mean, longitude latitude time)
+   * sm (land, monthly mean, longitude latitude time)
+   * ta (atmos, monthly mean, longitude latitude lev time)
+   * tas (atmos, monthly mean, longitude latitude time)
+   * toz (atmos, monthly mean, longitude latitude time)
+   * ts (atmos, monthly mean, longitude latitude time)
+   * ua (atmos, monthly mean, longitude latitude lev time)
+   * va (atmos, monthly mean, longitude latitude lev time)
+   * zg (atmos, monthly mean, longitude latitude lev time)
+
+
+Observations and reformat scripts
+---------------------------------
+
+*Note: (1) obs4MIPs data can be used directly without any preprocessing;
+(2) see headers of reformat scripts for non-obs4MIPs data for download
+instructions.*
+
+The following list shows the currently used observational data sets for this recipe with their variable names and the reference to their respective reformat scripts in parentheses. Please note that obs4MIPs data can be used directly without any reformatting. For non-obs4MIPs data use `esmvaltool data info DATASET` or see headers of cmorization scripts (in `/esmvaltool/cmorizers/data/formatters/datasets/
+<https://github.com/ESMValGroup/ESMValTool/blob/main/esmvaltool/cmorizers/data/formatters/datasets/>`_) for downloading and processing instructions.
+
+#.  recipe_perfmetrics_CMIP5.yml
+
+    * AIRS (hus - obs4MIPs)
+    * CERES-EBAF (rlut, rlutcs, rsut, rsutcs - obs4MIPs)
+    * ERA-Interim (tas, ta, ua, va, zg, hus - esmvaltool/cmorizers/data/formatters/datasets/era-interim.py)
+    * ESACCI-AEROSOL (od550aer, od870aer, od550abs, od550lt1aer - esmvaltool/cmorizers/data/formatters/datasets/esacci-aerosol.ncl)
+    * ESACCI-CLOUD (clt - esmvaltool/cmorizers/data/formatters/datasets/esacci-cloud.ncl)
+    * ESACCI-OZONE (toz - esmvaltool/cmorizers/data/formatters/datasets/esacci-ozone.ncl)
+    * ESACCI-SOILMOISTURE (sm - esmvaltool/cmorizers/data/formatters/datasets/esacci_soilmoisture.ncl)
+    * ESACCI-SST (ts - esmvaltool/ucmorizers/data/formatters/datasets/esacci-sst.py)
+    * GPCP-SG (pr - obs4MIPs)
+    * HadISST (ts - esmvaltool/cmorizers/data/formatters/datasets/hadisst.ncl)
+    * MODIS (od550aer - esmvaltool/cmorizers/data/formatters/datasets/modis.ncl)
+    * NCEP-NCAR-R1 (tas, ta, ua, va, zg - esmvaltool/cmorizers/data/formatters/datasets/ncep_ncar_r1.py)
+    * NIWA-BS (toz - esmvaltool/cmorizers/data/formatters/datasets/niwa_bs.ncl)
+    * PATMOS-x (clt - esmvaltool/cmorizers/data/formatters/datasets/patmos_x.ncl)
+
+
+References
+----------
+
+
+* Gleckler, P. J., K. E. Taylor, and C. Doutriaux, Performance metrics for climate models, J.
+  Geophys. Res., 113, D06104, doi: 10.1029/2007JD008972 (2008).
+
+* Righi, M., Eyring, V., Klinger, C., Frank, F., Gottschaldt, K.-D., Jöckel, P.,
+  and Cionni, I.: Quantitative evaluation of oone and selected climate parameters in a set of EMAC simulations,
+  Geosci. Model Dev., 8, 733, doi: 10.5194/gmd-8-733-2015 (2015).
+
+
+Example plots
+-------------
+
+.. _fig_perfmetrics_python_portrait_plot:
+
+.. figure:: /recipes/figures/perfmetrics/perfmetrics_fig_5_python.png
+   :width: 90%
+   :align: center
+
+
+   Relative space-time root-mean-square deviation (RMSD) calculated from the climatological
+   seasonal cycle of CMIP5 simulations. A relative performance is displayed, with blue shading
+   indicating better and red shading indicating worse performance than the median of all model results.
+   A diagonal split of a grid square shows the relative error with respect to the reference data set
diff --git a/esmvaltool/config-references.yml b/esmvaltool/config-references.yml
index 285ace740e..76491b1611 100644
--- a/esmvaltool/config-references.yml
+++ b/esmvaltool/config-references.yml
@@ -24,6 +24,11 @@ authors:
     institute: DLR, Germany
     email: bjoern.broetz@dlr.de
     orcid:
+  cammarano_diego:
+    name: Cammarano, Diego
+    institute: DLR, Germany
+    email: diego.cammarano@dlr.de
+    github: diegokam
   debeire_kevin:
     name: Debeire, Kevin
     institute: DLR, Germany
diff --git a/esmvaltool/diag_scripts/perfmetrics/portrait_plot.py b/esmvaltool/diag_scripts/perfmetrics/portrait_plot.py
new file mode 100644
index 0000000000..c9f67e0cd6
--- /dev/null
+++ b/esmvaltool/diag_scripts/perfmetrics/portrait_plot.py
@@ -0,0 +1,513 @@
+"""Overview plot for performance metrics.
+
+Description
+-----------
+This diagnostic provides plot functionalities for performance metrics.
+The multi model overview heatmap might be useful for different
+tasks and therefore this diagnostic tries to be as flexible as possible.
+X and Y axis, grouping parameter and slits for each rectangle can be
+configured in the recipe. All *_by parameters can be set to any metadata
+key. To split by 'reference' this key needs to be set as extra_facet in recipe.
+
+Author
+------
+Lukas Ruhe (Universität Bremen, Germany)
+Diego Cammarano
+
+Configuration parameters through recipe:
+----------------------------------------
+normalize: str or None, optional
+    ('mean', 'median', 'centered_mean', 'centered_median', None).
+    Subtract median/mean if centered. Divide by median/mean if not None.
+    By default 'centered_median'.
+distance_metric: str or None, optional
+    A method for the distance_metric preprocessor can be set, to apply it to
+    the input data along all axis before plotting. If set to None, the input
+    is expected to contain scalar values for each input file. By default, None.
+x_by: str, optional
+    Metadata key for x coordinate.
+    By default 'alias'.
+y_by: str, optional
+    Metadata key for y coordinate.
+    By default 'variable_group'.
+group_by: str, optional
+    Metadata key for grouping.
+    Grouping is always applied in x direction. Can be set to None to skip
+    grouping into subplots.
+    By default 'project'.
+split_by: str, optional
+    The rectangles can be split into 2-4 triangles. This is used to show
+    metrics for different references. For this case there is no need to change
+    this parameter. Multiple variables can be set in the recipe with `split`
+    assigned as extra_facet to label the different references. Data without
+    a split assigned will be plotted as main rectangles, this can be changed
+    by setting default_split parameter.
+    By default 'split'.
+default_split: str, optional
+    Data labeled with this string, will be used as main rectangles. All other
+    splits will be plotted as overlays. This can be used to choose the base
+    reference, while all references are labeled for the legend.
+    By default None.
+plot_legend: bool, optional
+    If True, a legend is plotted, when multiple splits are given.
+    By default True.
+legend: dict, optional
+    Customize, if, how and where the legend is plotted. The 'best' position
+    and size of the legend depends on multiple parameters of the figure
+    (i.e. lengths of labels, aspect ratio of the plots...). And might require
+    manual adjustment of `x`, `y` and `size` to fit the figure layout.
+    Keys (each optional) that will be handled are:
+    position: str or None, optional
+        Position of the legend. Can be 'right' or 'left'. Or set to None to
+        disable plotting the legend. By default 'right'.
+    x_offset: float, optional
+        Manually adjust horizontal position to save space or fix overlap.
+        Number given in Inches. By default 0.
+    y_offset: float, optional
+        Manually adjust vertical position to save space or fix overlap.
+        Number given in Inches. By default 0.
+    size: float, optional
+        Size of the legend in Inches. By default 0.3.
+plot_kwargs: dict, optional
+    Dictionary that gets passed as kwargs to `matplotlib.pyplot.imshow()`.
+    Colormaps will be converted to 11 discrete steps automatically. Default
+    colormap RdYlBu_r and limits vmin=-0.5, vmax=0.5 can be changed using
+    keywords like: cmap, vmin, vmax.
+    By default {}.
+cbar_kwargs: dict, optional
+    Dictionary that gets passed to `matplotlib.pyplot.colorbar()`.
+    E.g. label, ticks...
+    By default {}.
+plot_properties: dict, optional
+    Dictionary that gets passed to `matplotlib.axes.Axes.set()`.
+    Subplots can be widely customized. For a full list of
+    properties see:
+    https://matplotlib.org/stable/api/_as_gen/matplotlib.axes.Axes.set.html
+    E.g. xlabel, ylabel, yticklabels, xmargin...
+    By default {}.
+nan_color: str or None, optional
+    Matplotlib named color or hexcode for NaN values. If set to None,
+    no triagnles are plotted for NaN values.
+    By default 'white'.
+figsize: list(float), optional
+   [width, height] of the figure in inches. The final figure will be saved with
+   bbox_inches="tight", which can change the resulting aspect ratio.
+   By default [5, 3].
+dpi: int, optional
+    Dots per inch for the figure. By default 300.
+"""
+
+import itertools
+import logging
+from pathlib import Path
+
+import iris
+import matplotlib as mpl
+import matplotlib.pyplot as plt
+import numpy as np
+import xarray as xr
+from esmvalcore import preprocessor as pp
+from matplotlib import patches
+from mpl_toolkits.axes_grid1 import ImageGrid
+
+from esmvaltool.diag_scripts.shared import (
+    get_diagnostic_filename,
+    get_plot_filename,
+    group_metadata,
+    run_diagnostic,
+    select_metadata,
+)
+
+log = logging.getLogger(__name__)
+
+
+def unify_limits(grid):
+    """Set same limits for all subplots."""
+    vmin, vmax = np.inf, -np.inf
+    images = [ax.get_images()[0] for ax in grid]
+    for img in images:
+        vmin = min(vmin, img.get_clim()[0])
+        vmax = max(vmax, img.get_clim()[1])
+    for img in images:
+        img.set_clim(vmin, vmax)
+
+
+def plot_matrix(data, row_labels, col_labels, axe, plot_kwargs):
+    """Create an image for given data."""
+    img = axe.imshow(data, **plot_kwargs)
+    # Show all ticks and label them with the respective list entries.
+    axe.set_xticks(np.arange(data.shape[1]), labels=col_labels)
+    axe.set_yticks(np.arange(data.shape[0]), labels=row_labels)
+    # Rotate the tick labels and set their alignment.
+    plt.setp(
+        axe.get_xticklabels(),
+        rotation=90,
+        ha="right",
+        va="center",
+        rotation_mode="anchor",
+    )
+    # Turn spines off and create white grid.
+    # ax.spines[:].set_visible(False)
+    axe.set_xticks(np.arange(data.shape[1] + 1) - 0.5, minor=True)
+    axe.set_yticks(np.arange(data.shape[0] + 1) - 0.5, minor=True)
+    axe.grid(which="minor", color="black", linestyle="-", linewidth=1)
+    axe.tick_params(which="both", bottom=False, left=False)
+    return img
+
+
+def remove_reference(metas):
+    """Remove reference for metric from list of metadata.
+
+    NOTE: list() creates a copy with same references to allow removing in place
+    """
+    for meta in list(metas):
+        if meta.get("reference_for_metric", False):
+            metas.remove(meta)
+
+
+def add_split_none(cfg, metas):
+    """List of metadata with split=None if no split is given."""
+    for meta in metas:
+        if cfg["split_by"] not in meta:
+            meta[cfg["split_by"]] = None
+
+
+def open_file(metadata, **selection):
+    """Try to find a single file for selection and return data.
+
+    If multiple files are found, raise an error. If no file is found,
+    return np.nan.
+    """
+    metas = select_metadata(metadata, **selection)
+    if len(metas) > 1:
+        raise ValueError(f"Multiple files found for {selection}")
+    if len(metas) < 1:
+        log.debug("No Metadata found for %s", selection)
+        return np.nan
+    log.warning("Metadata found for %s", selection)
+    das = xr.open_dataset(metas[0]["filename"])
+    varname = list(das.data_vars.keys())[0]
+    return das[varname].values.item()
+    # iris.load_cube(metas[0]["filename"]).data
+
+
+def load_data(cfg, metas):
+    """Load all nc files from metadata into xarray dataset.
+
+    The dataset contains all relevant information for the plot. Coord
+    names are metadata keys, ordered as x, y, group, split. The default
+    reference is None, or if all references are named the first from the
+    list.
+    """
+    coords = {  # order matters: x, y, group, split
+        cfg["x_by"]: list(group_metadata(metas, cfg["x_by"]).keys()),
+        cfg["y_by"]: list(group_metadata(metas, cfg["y_by"]).keys()),
+        cfg["group_by"]: list(group_metadata(metas, cfg["group_by"]).keys()),
+        cfg["split_by"]: list(group_metadata(metas, cfg["split_by"]).keys()),
+    }
+    shape = [len(coord) for coord in coords.values()]
+    var_data = xr.DataArray(np.full(shape, np.nan), dims=list(coords.keys()))
+    data = xr.Dataset({"var": var_data}, coords=coords)
+    # loop over each cell (coord combination) and load data if existing
+    for coord_tuple in itertools.product(*coords.values()):
+        selection = dict(zip(coords.keys(), coord_tuple))
+        data['var'].loc[selection] = open_file(metas, **selection)
+        # data[coord_tuple] = (list(coords.keys(), value))
+    if None in data.coords[cfg["split_by"]].values:
+        cfg.update({"default_split": None})
+    else:
+        cfg.update({"default_split": data.coords[cfg["split_by"]].values[0]})
+    log.debug("using %s as default split", cfg["default_split"])
+    log.debug("Loaded Data:")
+    log.debug(data)
+    return data
+
+
+def split_legend(cfg, grid, data):
+    """Create legend for references, based on split coordinate in the dataset.
+
+    Mpl handles axes positions in relative figure coordinates. To anchor the
+    legend to the origin of the first graph (bottom left) with fixed size,
+    without messing up the layout for changing figure sizes a few extra steps
+    are required.
+    NOTE: maybe `mpl_toolkits.axes_grid1.axes_divider.AxesDivider` simplifies
+    this a bit by using `append_axes`.
+    """
+    grid[0].get_figure().canvas.draw()  # set axes position in figure
+    size = cfg["legend"].get("size", 0.5)  # rect width in physical size (inch)
+    fig_size = grid[0].get_figure().get_size_inches()  # physical figure size
+    ax_size = (size / fig_size[0], size / fig_size[1])  # legend (fig coords)
+    gaps = [0.3 / fig_size[0], 0.3 / fig_size[1]]  # margins (fig coords)
+    # anchor legend on origin of first plot or colorbar
+    anchor = grid[0].get_position().bounds  # relative figure coordinates
+    if cfg["legend"].get("position", "right") == "right":
+        cbar_x = grid.cbar_axes[0].get_position().bounds[0]
+        gaps[0] *= 0.8  # compensate colorbar padding
+        anchor = (cbar_x + gaps[0] + cfg["legend"]["x_offset"],
+                  anchor[1] - gaps[1] - ax_size[1] + cfg["legend"]["y_offset"])
+    else:
+        anchor = (anchor[0] - gaps[0] - ax_size[0] + cfg["legend"]["x_offset"],
+                  anchor[1] - gaps[1] - ax_size[1] + cfg["legend"]["y_offset"])
+    # create legend as empty imshow like axes in figure coordinates
+    axes = {"main": grid[0].get_figure().add_axes([*anchor, *ax_size])}
+    axes["main"].imshow(np.zeros((1, 1)))  # same axes properties as main plot
+    axes["main"].set_xticks([])
+    axes["main"].set_yticks([])
+    axes["twiny"], axes["twinx"] = [axes["main"].twiny(), axes["main"].twinx()]
+    axes["twinx"].set_yticks([])
+    axes["twiny"].set_xticks([])
+    label_at = [  # order matches get_triangle_nodes (halves and quarters)
+        axes["main"].set_ylabel,  # left
+        axes["twinx"].set_ylabel,  # right
+        axes["main"].set_xlabel,  # bottom
+        axes["twiny"].set_xlabel,  # top
+    ]
+    for i, label in enumerate(data.coords[cfg["split_by"]].values):
+        axes["main"].add_patch(
+            patches.Polygon(get_triangle_nodes(
+                i, len(data.coords[cfg["split_by"]].values)),
+                            closed=True,
+                            facecolor=["#bbb", "#ccc", "#ddd", "#eee"][i],
+                            edgecolor="black",
+                            linewidth=0.5,
+                            fill=True))
+        label_at[i](label)
+
+
+def overlay_reference(cfg, axe, data, triangle):
+    """Create triangular overlays for given data and axes."""
+    # use same colors as in main plot
+    cmap = axe.get_images()[0].get_cmap()
+    norm = axe.get_images()[0].norm
+    if cfg["nan_color"] is not None:
+        cmap.set_bad(cfg["nan_color"])
+    for i, j in itertools.product(*map(range, data.shape)):
+        if np.isnan(data[i, j]) and cfg["nan_color"] is None:
+            continue
+        color = cmap(norm(data[i, j]))
+        edges = [(e[0] + j, e[1] + i) for e in triangle]
+        patch = patches.Polygon(
+            edges,
+            closed=True,
+            facecolor=color,
+            edgecolor="black",
+            linewidth=0.5,
+            fill=True,
+        )
+        axe.add_patch(patch)
+
+
+def plot_group(cfg, axe, data, title=None):
+    """Create matrix for one subplot in ax using plt.imshow.
+
+    by default split None is used, if all splits are named the first is
+    used. Other splits will be added by overlaying triangles.
+    """
+    split = data.sel({cfg["split_by"]: cfg["default_split"]})
+    print(f"Plotting group {title}")
+    print(split)
+    plot_matrix(
+        split.values.T,  # 2d numpy array
+        split.coords[cfg["y_by"]].values,  # y_labels
+        split.coords[cfg["x_by"]].values,  # x_labels
+        axe,
+        cfg["plot_kwargs"],
+    )
+    if title is not None:
+        axe.set_title(title)
+    axe.set(**cfg["axes_properties"])
+
+
+def get_triangle_nodes(position, total_count=2):
+    """Return list of nodes with relative x, y coordinates.
+
+    The nodes of the triangle are given as list of three tuples. Each tuples
+    contains relative coordinates (-0.5 to +0.5). For total of <= 2 a top left
+    (position=0) and bottom right (position=1) rectangle is returned.
+    For higher counts (3 or 4) one quartile is returned for each position.
+    NOTE: Order matters. Ensure axis labels for the legend match when changing.
+    """
+    if total_count < 3:
+        halves = [
+            [(0.5, -0.5), (-0.5, -0.5), (-0.5, 0.5)],  # top left
+            [(0.5, -0.5), (0.5, 0.5), (-0.5, 0.5)],  # bottom right
+        ]
+        return halves[position]
+    quarters = [
+        [(-0.5, -0.5), (0, 0), (-0.5, 0.5)],  # left
+        [(0.5, -0.5), (0, 0), (0.5, 0.5)],  # right
+        [(-0.5, 0.5), (0, 0), (0.5, 0.5)],  # bottom
+        [(-0.5, -0.5), (0, 0), (0.5, -0.5)],  # top
+    ]
+    return quarters[position]
+
+
+def plot_overlays(cfg, grid, data):
+    """Call overlay_reference for each split in data and each group in grid."""
+    split_count = data.shape[3]
+    group_count = data.shape[2]
+    for i in range(group_count):
+        if split_count < 2:
+            log.debug("No additional splits for overlay.")
+            break
+        if split_count > 4:
+            log.warning("Too many splits for overlay, only 3 will be plotted.")
+        group_data = data.isel({cfg["group_by"]: i})
+        group_data = group_data.dropna(cfg["x_by"], how="all")
+        for sss in range(split_count):
+            split = group_data.isel({cfg["split_by"]: sss})
+            split_label = split.coords[cfg["split_by"]].values.item()
+            if split_label == cfg["default_split"]:
+                log.debug("Skipping default split for overlay.")
+                continue
+            nodes = get_triangle_nodes(sss, split_count)
+            overlay_reference(cfg, grid[i], split.values.T, nodes)
+
+
+def plot(cfg, data):
+    """Create figure with subplots for each group.
+
+    sets same color range and  overlays additional references based on
+    the content of data (xr.DataArray)
+    """
+    fig = plt.figure(1, cfg.get("figsize", (5.5, 3.5)))
+    group_count = len(data.coords[cfg["group_by"]])
+    grid = ImageGrid(
+        fig,
+        111,  # similar to subplot(111)
+        cbar_mode="single",
+        cbar_location="right",
+        cbar_pad=0.1,
+        cbar_size=0.2,
+        nrows_ncols=(1, group_count),
+        axes_pad=0.1,
+    )
+    # remap colorbar to 10 discrete steps
+    cmap = mpl.cm.get_cmap(cfg.get("cmap", "RdYlBu_r"), 10)
+    cfg["plot_kwargs"]["cmap"] = cmap
+    for i in range(group_count):
+        group = data.isel({cfg["group_by"]: i})
+        group = group.dropna(cfg["x_by"], how="all")
+        title = None
+        if group_count > 1:
+            title = group.coords[cfg["group_by"]].values.item()
+        plot_group(cfg, grid[i], group, title=title)
+    # use same colorrange and colorbar for all subplots:
+    unify_limits(grid)
+    # set cb of first image as single cb for the figure
+    grid.cbar_axes[0].colorbar(grid[0].get_images()[0], **cfg["cbar_kwargs"])
+    if data.shape[3] > 1:
+        plot_overlays(cfg, grid, data)
+    if cfg["plot_legend"] and data.shape[3] > 1:
+        split_legend(cfg, grid, data)
+    basename = "portrait_plot"
+    fname = get_plot_filename(basename, cfg)
+    plt.savefig(fname, bbox_inches="tight", dpi=cfg["dpi"])
+    log.info("Figure saved:")
+    log.info(fname)
+
+
+def normalize(array, method, dims):
+    """Divide and shift values along dims depending on method."""
+    shift = 0
+    norm = 1
+    if "mean" in method:
+        norm = array.mean(dim=dims)
+    elif "median" in method:
+        norm = array.median(dim=dims)
+    if "centered" in method:
+        shift = norm
+    normalized = (array - shift) / norm
+    return normalized
+
+
+def apply_distance_metric(cfg, metas):
+    """Optionally apply preproc method.
+
+    reference_for_metric facet required.
+    """
+    if not cfg["distance_metric"]:
+        return
+    for y_metas in group_metadata(metas, cfg["y_by"]).values():
+        try:  # TODO: add select_single_metadata to shared?
+            reference = select_metadata(y_metas, reference_for_metric=True)[0]
+        except IndexError as exc:
+            raise IndexError("No reference found for metric.") from exc
+        ref_cube = iris.load_cube(reference["filename"])
+        for meta in y_metas:
+            if meta.get("reference_for_metric", False):
+                continue  # skip distance to itself
+            cube = iris.load_cube(meta["filename"])
+            distance = pp.distance_metric([cube],
+                                          reference=ref_cube,
+                                          metric=cfg["distance_metric"])
+            basename = f"{Path(meta['filename']).stem}"
+            basename += f"{cfg['distance_metric']}"
+            fname = get_diagnostic_filename(basename, cfg)
+            iris.save(distance, fname)
+            log.info("Distance metric saved: %s", fname)
+            # TODO: adjust all relevant meta data
+            meta["filename"] = fname
+
+
+def set_defaults(cfg):
+    """Set default values for most important config parameters."""
+    cfg.setdefault("normalize", "centered_median")
+    cfg.setdefault("x_by", "alias")
+    cfg.setdefault("y_by", "variable_group")
+    cfg.setdefault("group_by", "project")
+    cfg.setdefault("split_by", "split")  # extra facet
+    cfg.setdefault("default_split", None)
+    cfg.setdefault("cbar_kwargs", {})
+    cfg.setdefault("axes_properties", {})
+    cfg.setdefault("nan_color", 'white')
+    cfg.setdefault("figsize", (7.5, 3.5))
+    cfg.setdefault("dpi", 300)
+    cfg.setdefault("plot_legend", True)
+    cfg.setdefault("plot_kwargs", {})
+    cfg["plot_kwargs"].setdefault("cmap", "RdYlBu_r")
+    cfg["plot_kwargs"].setdefault("vmin", -0.5)
+    cfg["plot_kwargs"].setdefault("vmax", 0.5)
+    cfg.setdefault("legend", {})
+    cfg["legend"].setdefault("x_offset", 0)
+    cfg["legend"].setdefault("y_offset", 0)
+    cfg["legend"].setdefault("size", 0.3)
+
+
+def sort_data(cfg, dataset):
+    """Sort the dataset along by custom or alphabetical order."""
+    # custom order: dsimport xarray as xr
+    # import pandas as pd
+    # order = ['value3', 'value1', 'value2']  # replace by custom order
+    # ds[cfg['y_by']] = pd.Categorical(ds[cfg['y_by']], categories=order,
+    #     ordered=True)
+    # ds = ds.sortby('y_by')
+    # sort alphabetically (caseinsensitive)
+    dataset = dataset.sortby([
+        dataset[cfg["x_by"]].str.lower(), dataset[cfg["y_by"]].str.lower(),
+        dataset[cfg["group_by"]].str.lower(),
+        dataset[cfg["split_by"]].str.lower()
+    ])
+    # apply custom orders if given:
+    # if cfg.get("x_order"):
+    #     dataset = dataset.reindex({cfg["x_by"]: cfg["x_order"]})
+    return dataset
+
+
+def main(cfg):
+    """Run the diagnostic."""
+    set_defaults(cfg)
+    metas = list(cfg["input_data"].values())
+    remove_reference(metas)
+    add_split_none(cfg, metas)
+    dataset = load_data(cfg, metas)
+    dataset = sort_data(cfg, dataset)
+    if cfg["normalize"] is not None:
+        dataset["var"] = normalize(dataset["var"], cfg["normalize"],
+                                   [cfg["x_by"], cfg["group_by"]])
+    plot(cfg, dataset["var"])
+
+
+if __name__ == "__main__":
+    with run_diagnostic() as config:
+        main(config)
diff --git a/esmvaltool/recipes/recipe_perfmetrics_CMIP5_python.yml b/esmvaltool/recipes/recipe_perfmetrics_CMIP5_python.yml
new file mode 100644
index 0000000000..6ad21ee5f5
--- /dev/null
+++ b/esmvaltool/recipes/recipe_perfmetrics_CMIP5_python.yml
@@ -0,0 +1,276 @@
+# ESMValTool
+# recipe_perfmetrics_CMIP5.yml
+---
+documentation:
+  title: Performance metrics for essential climate variables in CMIP5
+
+  description: |
+    Recipe for plotting the performance metrics for the CMIP5 datasets,
+    including the standard ECVs as in Gleckler et al., and some additional
+    variables (like ozone, sea-ice, aerosol...)
+
+  authors:
+    - winterstein_franziska
+    - righi_mattia
+    - eyring_veronika
+    - ruhe_lukas
+
+  maintainer:
+    - ruhe_lukas
+
+  references:
+    - gleckler08jgr
+
+  projects:
+    - esmval
+    - embrace
+    - crescendo
+    - c3s-magic
+    - cmug
+
+preprocessors:
+  ppNOLEV1:
+    regrid:
+      target_grid: reference_dataset
+      scheme: linear
+    mask_fillvalues:
+      threshold_fraction: 0.95
+    # multi_model_statistics:
+    #   span: overlap
+    #   statistics: [mean, median]
+    #   exclude: [reference_dataset]
+
+  rmse:  &rmse
+    custom_order: true
+    regrid:
+      # target_grid: reference_dataset
+      scheme: linear
+      target_grid: 3x3
+      # scheme: nearest
+    regrid_time:
+      calendar: standard
+      frequency: mon
+    mask_fillvalues:
+      threshold_fraction: 0.95
+    # multi_model_statistics:
+    #   span: overlap
+    #   statistics: [mean, median]
+    #   exclude: [reference_dataset, alternative_dataset]
+    distance_metric:
+      metric: rmse
+
+  pp500:
+    # <<: *rmse
+    custom_order: true
+    regrid:
+      target_grid: 3x3
+      scheme: linear
+    extract_levels:
+      levels: 50000
+      scheme: linear
+    regrid_time:
+      calendar: standard
+      frequency: mon
+    mask_fillvalues:
+      threshold_fraction: 0.95
+    distance_metric:
+      metric: rmse
+
+diagnostics:
+  perfmetrics:
+    description: Near-surface air temperature
+    themes:
+      - phys
+    realms:
+      - atmos
+    variables:
+      tas: &var_default
+        preprocessor: rmse
+        reference_dataset: ERA-Interim
+        # alternative_dataset: NCEP-NCAR-R1
+        mip: Amon
+        split: Ref
+        y_label: tas_Glob
+        project: CMIP5
+        exp: historical
+        ensemble: r1i1p1
+        start_year: 2000
+        end_year: 2002
+        additional_datasets:
+          - {dataset: ERA-Interim, project: OBS6, type: reanaly,
+            version: 1, tier: 3, reference_for_metric: true}
+      tas_alt:
+        <<: *var_default
+        short_name: tas
+        y_label: tas_Glob
+        split: Alt
+        additional_datasets:
+          - {dataset: NCEP-NCAR-R1, project: OBS6, type: reanaly,
+            version: 1, tier: 2, reference_for_metric: true}
+      pr:
+        <<: *var_default
+        y_label: pr_Glob
+        reference_dataset: GPCP-V2.2
+        additional_datasets:
+          - {dataset: GPCP-V2.2, project: obs4MIPs, level: L3, tier: 1, reference_for_metric: true}
+      
+      # swcre:
+      #   <<: *var_default
+      #   derive: true
+      #   force_derivation: false
+      #   y_label: swcre_Glob
+      #   additional_datasets:
+      #     - {dataset: CERES-EBAF, project: obs4MIPs, level: L3B,
+      #       tier: 1, reference_for_metric: true}
+      
+      # rlut:
+      #   <<: *var_default
+      #   y_label: rlut_Glob
+      #   additional_datasets:
+      #     - {dataset: CERES-EBAF, project: obs4MIPs, level: L3B,
+      #       tier: 1, reference_for_metric: true, start_year: 2000, end_year: 2002}  34 vs 36 month dataa.. 2 months missing in obs??
+        
+      zg:
+        <<: *var_default
+        y_label: zg_Glob-500
+        preprocessor: pp500
+        additional_datasets:
+          - {dataset: ERA-Interim, project: OBS6, type: reanaly,
+            version: 1, tier: 3, reference_for_metric: true}
+
+    additional_datasets:
+      - {dataset: ACCESS1-0}
+      - {dataset: ACCESS1-3}
+      - {dataset: bcc-csm1-1}
+      - {dataset: bcc-csm1-1-m}
+      - {dataset: BNU-ESM}
+      - {dataset: CanCM4}
+      - {dataset: CanESM2}
+      - {dataset: CCSM4}
+      - {dataset: CESM1-BGC}
+      - {dataset: CESM1-CAM5}
+      - {dataset: CESM1-FASTCHEM}
+      - {dataset: CESM1-WACCM}
+      - {dataset: CMCC-CESM}
+      - {dataset: CMCC-CM}
+      - {dataset: CMCC-CMS}
+      - {dataset: CNRM-CM5}
+      # - {dataset: CNRM-CM5-2}  # not in example plot
+      - {dataset: CSIRO-Mk3-6-0}
+      - {dataset: EC-EARTH, ensemble: r6i1p1}
+      - {dataset: FGOALS-g2}
+      # - {dataset: FGOALS-s2}
+      - {dataset: FIO-ESM}
+      - {dataset: GFDL-CM2p1}
+      - {dataset: GFDL-CM3}
+      - {dataset: GFDL-ESM2G}
+      - {dataset: GFDL-ESM2M}
+      - {dataset: GISS-E2-H, ensemble: r1i1p2}
+      # - {dataset: GISS-E2-H-CC}  # not in example plot
+      - {dataset: GISS-E2-R, ensemble: r1i1p2}
+      # - {dataset: GISS-E2-R-CC}  # not in example plot
+      - {dataset: HadCM3}
+      - {dataset: HadGEM2-AO}
+      - {dataset: HadGEM2-CC}
+      - {dataset: HadGEM2-ES}
+      - {dataset: inmcm4}
+      - {dataset: IPSL-CM5A-LR}
+      - {dataset: IPSL-CM5A-MR}
+      - {dataset: IPSL-CM5B-LR}
+      - {dataset: MIROC4h}
+      - {dataset: MIROC5}
+      - {dataset: MIROC-ESM}
+      - {dataset: MIROC-ESM-CHEM}
+      - {dataset: MPI-ESM-LR}
+      # - {dataset: MPI-ESM-MR}  # not in example plot
+      - {dataset: MPI-ESM-P}
+      - {dataset: MRI-CGCM3}
+      - {dataset: MRI-ESM1}
+      - {dataset: NorESM1-M}
+      - {dataset: NorESM1-ME}
+      # - {dataset: ERA-Interim, project: OBS6, type: reanaly,
+      #    version: 1, tier: 3, reference_for_metric: true}
+      # - {dataset: NCEP-NCAR-R1, project: OBS6, type: reanaly,
+      #    version: 1, tier: 2}
+    scripts:
+      portrait:
+        script: perfmetrics/portrait_plot.py
+        y_by: y_label
+        x_by: dataset
+        plot_kwargs:
+          vmin: -0.5
+          vmax: 0.5
+        normalize: centered_median  # default
+        default_split: Alt
+
+  ### pr: PRECIPITATION #######################################################
+  # pr:
+  #   description: Precipitation
+  #   themes:
+  #     - phys
+  #   realms:
+  #     - atmos
+  #   variables:
+  #     pr:
+  #       preprocessor: ppNOLEV1
+  #       reference_dataset: GPCP-V2.2
+  #       mip: Amon
+  #       project: CMIP5
+  #       exp: historical
+  #       ensemble: r1i1p1
+  #       start_year: 2000
+  #       end_year: 2002
+  #   additional_datasets:
+  #     - {dataset: ACCESS1-0}
+  #     - {dataset: ACCESS1-3}
+  #     - {dataset: bcc-csm1-1}
+  #     - {dataset: bcc-csm1-1-m}
+  #     - {dataset: BNU-ESM}
+  #     - {dataset: CanCM4}
+  #     - {dataset: CanESM2}
+  #     - {dataset: CCSM4}
+  #     - {dataset: CESM1-BGC}
+  #     - {dataset: CESM1-CAM5}
+  #     - {dataset: CESM1-CAM5-1-FV2}
+  #     - {dataset: CESM1-FASTCHEM}
+  #     - {dataset: CESM1-WACCM}
+  #     - {dataset: CMCC-CESM}
+  #     - {dataset: CMCC-CM}
+  #     - {dataset: CMCC-CMS}
+  #     - {dataset: CNRM-CM5}
+  #     - {dataset: CNRM-CM5-2}
+  #     - {dataset: CSIRO-Mk3-6-0}
+  #     - {dataset: EC-EARTH, ensemble: r6i1p1}
+  #     - {dataset: FGOALS-g2}
+  #     - {dataset: FIO-ESM}
+  #     - {dataset: GFDL-CM2p1}
+  #     - {dataset: GFDL-CM3}
+  #     - {dataset: GFDL-ESM2G}
+  #     - {dataset: GFDL-ESM2M}
+  #     - {dataset: GISS-E2-H, ensemble: r1i1p2}
+  #     - {dataset: GISS-E2-H-CC}
+  #     - {dataset: GISS-E2-R, ensemble: r1i1p2}
+  #     - {dataset: GISS-E2-R-CC}
+  #     - {dataset: HadCM3}
+  #     - {dataset: HadGEM2-AO}
+  #     - {dataset: HadGEM2-CC}
+  #     - {dataset: HadGEM2-ES}
+  #     - {dataset: inmcm4}
+  #     - {dataset: IPSL-CM5A-LR}
+  #     - {dataset: IPSL-CM5A-MR}
+  #     - {dataset: IPSL-CM5B-LR}
+  #     - {dataset: MIROC4h}
+  #     - {dataset: MIROC5}
+  #     - {dataset: MIROC-ESM}
+  #     - {dataset: MIROC-ESM-CHEM}
+  #     - {dataset: MPI-ESM-LR}
+  #     - {dataset: MPI-ESM-MR}
+  #     - {dataset: MPI-ESM-P}
+  #     - {dataset: MRI-CGCM3}
+  #     - {dataset: MRI-ESM1}
+  #     - {dataset: NorESM1-M}
+  #     - {dataset: NorESM1-ME}
+  #     - {dataset: GPCP-V2.2, project: obs4MIPs, level: L3, tier: 1}
+  #   scripts:
+  #     grading:
+  #       <<: *grading_settings
diff --git a/esmvaltool/recipes/recipe_perfmetrics_python.yml b/esmvaltool/recipes/recipe_perfmetrics_python.yml
new file mode 100644
index 0000000000..1d044a2829
--- /dev/null
+++ b/esmvaltool/recipes/recipe_perfmetrics_python.yml
@@ -0,0 +1,243 @@
+# ESMValTool
+#
+---
+documentation:
+  title: Performance metrics plots.
+  description: >
+    Compare performance of model simulations to a reference dataset.
+  authors:
+    - ruhe_lukas
+    # - cammarano_diego
+  maintainer:
+    - ruhe_lukas
+  references:
+    - eyring21ipcc
+
+
+preprocessors:
+  default: &default_preproc
+    custom_order: true
+    regrid:
+      target_grid: 3x3
+      scheme: nearest
+      # for icon:
+      # scheme:
+      #   reference: esmf_regrid.schemes:ESMFAreaWeighted
+    regrid_time:
+      calendar: standard
+      frequency: mon
+    distance_metric:
+      metric: pearsonr
+  rmse:
+    <<: *default_preproc
+    distance_metric:
+      metric: rmse
+
+
+cmip6_default: &cmip6
+  grid: gn
+  ensemble: r1i1p1f1
+  project: CMIP6
+  timerange: '1990/1992'
+
+
+cmip6_examples: &cmip6_examples
+  - {<<: *cmip6, dataset: MRI-ESM2-0}
+  - {<<: *cmip6, dataset: NESM3}
+  - {<<: *cmip6, dataset: NorCPM1, institute: NCC, ensemble: r10i1p1f1}
+  - {<<: *cmip6, dataset: NorESM2-LM, institute: NCC}
+  - {<<: *cmip6, dataset: NorESM2-MM, institute: NCC}
+  - {<<: *cmip6, dataset: SAM0-UNICON}
+  - {<<: *cmip6, dataset: TaiESM1}
+  - {<<: *cmip6, dataset: UKESM1-0-LL, ensemble: r1i1p1f2}
+
+cmip6_remaining: &cmip6_remaining
+  - {<<: *cmip6, dataset: ACCESS-CM2}
+  - {<<: *cmip6, dataset: ACCESS-ESM1-5, institute: CSIRO}
+  - {<<: *cmip6, dataset: AWI-CM-1-1-MR}
+  - {<<: *cmip6, dataset: AWI-ESM-1-1-LR}
+  - {<<: *cmip6, dataset: CESM2-FV2, institute: NCAR}
+  - {<<: *cmip6, dataset: CESM2-WACCM-FV2, institute: NCAR}
+  - {<<: *cmip6, dataset: CESM2-WACCM, institute: NCAR}
+  - {<<: *cmip6, dataset: CIESM, grid: gr}
+  - {<<: *cmip6, dataset: CMCC-CM2-HR4}
+  - {<<: *cmip6, dataset: CMCC-CM2-SR5}
+  - {<<: *cmip6, dataset: CMCC-ESM2}
+  - {<<: *cmip6, dataset: CNRM-CM6-1-HR, ensemble: r1i1p1f2, grid: gr}
+  - {<<: *cmip6, dataset: CNRM-CM6-1, ensemble: r1i1p1f2, grid: gr}
+  - {<<: *cmip6, dataset: CNRM-ESM2-1, ensemble: r1i1p1f2, grid: gr}
+  - {<<: *cmip6, dataset: E3SM-1-0, grid: gr}
+  - {<<: *cmip6, dataset: E3SM-1-1-ECA, institute: E3SM-Project, grid: gr}
+  - {<<: *cmip6, dataset: E3SM-1-1, institute: E3SM-Project, grid: gr}
+  - {<<: *cmip6, dataset: EC-Earth3-AerChem, grid: gr}
+  - {<<: *cmip6, dataset: EC-Earth3-CC, grid: gr}
+  - {<<: *cmip6, dataset: EC-Earth3-Veg-LR, grid: gr}
+  - {<<: *cmip6, dataset: EC-Earth3-Veg, grid: gr}
+  - {<<: *cmip6, dataset: EC-Earth3, grid: gr}
+  - {<<: *cmip6, dataset: FGOALS-f3-L, grid: gr}
+  - {<<: *cmip6, dataset: FGOALS-g3}
+  - {<<: *cmip6, dataset: GFDL-CM4, grid: gr1}
+  - {<<: *cmip6, dataset: GFDL-ESM4, grid: gr1}
+  - {<<: *cmip6, dataset: GISS-E2-1-G}
+  - {<<: *cmip6, dataset: GISS-E2-1-H}
+  - {<<: *cmip6, dataset: HadGEM3-GC31-LL, ensemble: r1i1p1f3}
+  - {<<: *cmip6, dataset: HadGEM3-GC31-MM, ensemble: r1i1p1f3}
+  - {<<: *cmip6, dataset: IITM-ESM}
+  - {<<: *cmip6, dataset: INM-CM4-8, grid: gr1}
+  - {<<: *cmip6, dataset: INM-CM5-0, grid: gr1}
+  - {<<: *cmip6, dataset: IPSL-CM6A-LR, grid: gr}
+  - {<<: *cmip6, dataset: KIOST-ESM, grid: gr1}
+  - {<<: *cmip6, dataset: MIROC-ES2L, ensemble: r1i1p1f2}
+  - {<<: *cmip6, dataset: MIROC6}
+  - {<<: *cmip6, dataset: MPI-ESM-1-2-HAM}
+  - {<<: *cmip6, dataset: MPI-ESM1-2-HR}
+  - {<<: *cmip6, dataset: MPI-ESM1-2-LR}
+
+datasets:
+  *cmip6_examples
+  # ICON
+  # - {project: ICON, dataset: ICON, exp: icon-2.6.1_atm_amip_R2B4_r1v1i1p1l1f1, timerange: '19900101/19930101'}
+
+  #   # grid wrong? missing data
+  #   # - {<<: *cmip6, dataset: MCM-UA-1-0}
+  #   # - {<<: *cmip6, dataset: GISS-E2-1-G-CC}
+  #   # - {<<: *cmip6, dataset: IPSL-CM5A2-INCA, grid: gr}
+  #   # - {<<: *cmip6, dataset: KACE-1-0-G, grid: gr}
+
+
+diagnostics:
+  simple:
+    variables: &simple_variables
+      pr: &var_default
+        preprocessor: default
+        mip: Amon
+        exp: historical
+        additional_datasets: &ref
+          # - {dataset: ERA5, project: native6, type: reanaly, version: v1, tier: 3, reference_for_metric: true}
+          - {<<: *cmip6, dataset: BCC-ESM1, reference_for_metric: true, timerange: 1990/1992}
+      tas:
+        <<: *var_default
+      ps:
+        <<: *var_default
+      clt:
+        <<: *var_default
+    scripts:
+      perfmetrics:
+        script: perfmetrics/portrait_plot.py
+        y_by: variable_group
+
+  complex:
+    description: >
+      A more complex example with extra variables to support different reference datasets,
+      groups for datasets and some plot customization.
+    variables:
+      pr:
+        <<: *var_default
+        y_label: Precipitation
+      pr_vs_access: &var_default_ref2
+        <<: *var_default
+        short_name: pr
+        split: "ACCESS"
+        y_label: Precipitation
+        additional_datasets: &ref_access
+          - {<<: *cmip6, dataset: ACCESS-ESM1-5, institute: CSIRO, reference_for_metric: true}
+      pr_vs_era:
+        <<: *var_default
+        short_name: pr
+        split: "ERA5"
+        y_label: Precipitation
+        additional_datasets: &ref_era
+          - {dataset: ERA5, project: native6, type: reanaly, version: v1,
+             tier: 3, timerange: 1990/1992, reference_for_metric: true}
+      tas:
+        <<: *var_default
+        y_label: Temperature
+        # additional_datasets:
+        #     - {dataset: NCEP-NCAR-R1, project: OBS6, type: reanaly, version: 1, tier: 2}
+      tas_vs_access:
+        short_name: tas
+        <<: *var_default_ref2
+        y_label: Temperature
+      rlut:
+        <<: *var_default
+        y_label: "LW radiation out"
+
+      ps:
+        <<: *var_default
+        y_label: "Surface pressure"
+      # sfcWind:
+      #   <<: *var_default
+      clt:
+        <<: *var_default
+        y_label: "Cloud cover"
+      clt_vs_esacci:
+        <<: *var_default
+        short_name: clt
+        split: "ESACCI"
+        y_label: "Cloud cover"
+        additional_datasets:
+          - {reference_for_metric: true, dataset: ESACCI-CLOUD, project: OBS,
+             type: sat, version: AVHRR-fv3.0, tier: 2, timerange: '1990/1992'}
+      psl:
+        <<: *var_default
+        y_label: "Sea level pressure"
+
+
+    additional_datasets:
+      # - {dataset: ERA5, project: native6, type: reanaly,
+      #   version: v1, tier: 3, reference_for_metric: true}
+      - {<<: *cmip6, dataset: MPI-ESM-MR, type: exp, project: CMIP5, exp: historical, ensemble: r1i1p1}
+
+    scripts:
+      perfmetrics:
+        script: perfmetrics/portrait_plot.py
+        x_by: dataset
+        y_by: y_label
+        # split_by is the 'split' extra facet by default (set in variables)
+        group_by: project
+        additional_datasets:
+          - {<<: *cmip6, dataset: MPI-ESM-MR, type: exp, project: CMIP5, exp: historical, ensemble: r1i1p1}
+        plot_kwargs:
+          vmin: 0.5
+          vmax: 1.0
+        cbar_kwargs:
+          label: "Pearson correlation coefficient"
+          ticks: [0.5, 0.6, 0.7, 0.8, 0.9, 1.0]
+          extend: both
+        cmap: "Reds"
+  metrics:
+    variables:
+      pr:
+        <<: *var_default
+        y_label: "Precipitation"
+      pr_rmse:
+        short_name: ps
+        split: "RMSE"
+        y_label: "Precipitation"
+        <<: *var_default
+        preprocessor: rmse
+      tas:
+        <<: *var_default
+        y_label: "Temperature"
+      tas_rmse:
+        short_name: tas
+        split: "RMSE"
+        y_label: "Temperature"  # extra_facet to share y tick
+        <<: *var_default
+        preprocessor: rmse
+      clt:
+        <<: *var_default
+        y_label: "Cloud cover"
+      clt_rmse:
+        short_name: clt
+        y_label: "Cloud cover"
+        split: "RMSE"
+        <<: *var_default
+        preprocessor: rmse
+    scripts:
+      perfmetrics:
+        script: perfmetrics/portrait_plot.py
+        y_by: y_label
+        plot_kwargs:
+          vmin: 0
+          vmax: 1.0