Protocol for extending the variables pane #560

dfalbel · 2024-10-02T13:59:24Z

This PR is a proposal of a mechanism that allows package authors to implement custom behavior for objects in the variables pane.

A description of how the protocol works is available here: https://github.com/posit-dev/ark/blob/daf9348affc0f1b190928a2fb0cad5bfadaf263c/doc/variables-pane-extending.md

See also #561 for the custom inspect behavior.

Motivation

The main motivation is to allow package authors to improve the display of objects for which the internal R data structure does not show useful information for users. For example, reticulate R obejcts are environments containing a flag and an external pointer (x <- reticulate::py_eval("1", convert = FALSE)):

Or torch tensors (x = torch::torch_randn(10, 10)) :

Or xml2 documents (x <- xml2::read_xml("<foo> <bar> text <baz/> </bar> </foo>")):

How it's implemented

Package authors can implement a set of S3 like methods, each customizing a specific part of the UI.
Methods are discovered by ark when packages are loaded and lazily stored in an environment. Package authors don't need to export these methods in their packages. See here for the description of methods.

When responding to events from the variables pane, we try to apply the custom methods and if the method works correctly we use it's value, otherwise we try to continue using the default implementation.

Methods are expllicitly implemented for Ark, so it's expected that package authors will take extra care to make they work correctly with Positron.

Comparison to RStudio

RStudio uses the utils::str S3 in a few situations to obtain the values description, see usages of ValueFromStr in eg: https://github.com/rstudio/rstudio/blob/2175496934a54e21979461a1c67eb13ad5048cf2/src/cpp/session/modules/SessionEnvironment.R#L122

There are many guardrails though, as str methods are commonly implemented and package authors might not be aware of such bugs when used from RStudio.

lionel-

That seems like a good idea!

Before reviewing the implementation, I'd just like to discuss a potential simplification of the extension mechanism. What if we had a single generic ps_variable_proxy() that would be in charge of converting an object to a base type representation (data frame, vector, list, environment) that Ark would be able to display in the usual way. This is consistent with how genericity works in the tidyverse via vctrs proxies.

The display type could be set to the first element of the class vector, but if that's too inconvenient we could have an additional method for it as proposed here.

Laziness for deeply/infinitely nested objects:

Could be supported via environments. The only downside is that environments have the restriction of uniquely named elements, but that seems reasonable for object internals since that's consistent with struct-like types?
Or we could allow list-proxies containing classed objects that themselves have a proxy method.

What do you think?

dfalbel · 2024-10-03T11:48:11Z

I like the idea of a single ps_variable_proxy.

I think I'd still want to customize the display value though, eg for a torch_tensor, I think the display value should be something like Float [1:10, 1:10], I think the closest proxy would be a length 1 character vector with that text, which would still be displayed with quotes around it:"Float [1:10, 1:10]".

I'm fine with display type = class(x)[1], we could remove display_type.

So the proxy would replace has_children(), get_children() and get_child_at().

My main concern with a single proxy implementation is performance. The way the variables pane is currenty implemented, we don't keep any state after the representations are sent to the front-end and we only get back a vector of access keys from the front-end when we want to expand a node.

Considering we implement the second option where variable_proxy() returns a list-proxy containing classed objects implementing variable_proxy. So suppose that A is an object that implements the variable_proxy() and it's children are also objects that also implement variable_proxy() (For instance, A could be a reticulate object representing a Python dictionary and B a dictionary contained in A).

When we click on B, so we get it's children, we'll receive a request from the front-end containing [A, B].
To compute the children of B, we'll need to:

ps_variable_proxy(A)
find B there and then
ps.variable_proxy(B)

Root
└── A
    ├── B
    │   ├── B1
    │   └── B2
    └── C

If A is large, building it's variable_proxy() can be time consuming and we'll need to rebuild it for every. neste child one expands.

The main reason for having two separate get_children() and get_child() is that there might be an efficient way of finding B in A without building the entire list of possible children.

IMO this will be a rare situation though, except for reticulate, most use cases I can think would only want to show a nice display value. So maybe it's fine to let package authors guarantee that variable_proxy is fast enough, even if they need to implement some kind of caching, etc. Or maybe, we want to keep some state in the variables pane so subsequent requests don't need to recompute the children.

Let me know what you think!

lionel- · 2024-10-03T12:10:49Z

hmm if performance is an issue I think we should be able to store the proxies in our current_bindings? These would only be updated if the objects change across top-level calls.

t-kalinowski · 2024-10-03T12:16:05Z

I think we can add support for a ps_variable_proxy, which could be in addition to the methods proposed in the PR, as an alternative API. However, I would not want to require that the proxy approach is the only supported API.

The 'base-atomic-proxy' approach creates difficulties for objects that cannot be easily materialized as a base atomic. This is particularly problematic for objects that would be much larger in memory when materialized as a base atomic type, and for objects that have no suitable base-atomic equivalent.

For example, large sparse arrays or arrays with smaller dtypes pose challenges. It's not uncommon to have a single-digit GB-sized NumPy array with a 'float32' dtype. Materializing that as a 'base-atomic-proxy' would mean allocating a double-digit GB double array every time the variables pane is updated. Similarly, if a user has a 10 GB array of int8s, we'd be materializing a 40 GB R integer array with each variables pane update.

(We encounter these same limitations of the vctrs-proxy approach in other places, like rstudio/reticulate#1481)

Many objects also lack a suitable base atomic equivalent. This opens up tricky questions. For instance:

Python AST nodes: import ast; n = ast.Name('x'). Would this need to be presented as a string or an R symbol?
Python futures: fut = concurrent.futures.ThreadPoolExecutor().submit(fn). Would reticulate have to construct an R {coro} or {future} proxy?
Lazy-loaded arrays like https://www.bioconductor.org/packages/release/bioc/html/DelayedArray.html: How would the correct "true" dimensions of such an array be communicated without materializing a full proxy?

lionel- · 2024-10-03T12:45:29Z

In my mind the variable pane should never contain GBs of data and any materialised data should be truncated - to support large data we'd need a similar approach to the data viewer and then the proxy becomes a slice proxy.

I see your points regarding flexibility of data types in the context of interop. I think this aligns with Daniel's request to keep a display value extension point?

t-kalinowski · 2024-10-03T12:52:44Z

I think this aligns with Daniel's request to keep a display value extension point?

I'm probably misunderstanding, but I don't think it does. A code example might help.

With the proxy approach, what method(s) would reticulate implement to display a NumPy array?

dfalbel · 2024-10-03T13:06:32Z

I think for a numpy array, reticulate would only implement display_value and a proxy method that returns anything that's scalar atomic, so the arrow \/ to list children doesn't appear for it in the variables pane (equivalent to current has_children() = FALSE).

Storing proxies in current_bindings is probably possible, but the implementation will be quite tricky, because an object implementing variable_proxy() could be any level nested into an object that is pointed by one of the current_bindings.
I can take a look at what that could look like.

lionel- · 2024-10-03T14:18:34Z

@dfalbel Feel free to take a look but I think you've both convinced me that the proxy approach would not be a simplification.

dfalbel · 2024-10-03T17:04:38Z

Ok, I didn't get to it, but happy to experiment if you think it's worth it. I set up an example package here: https://github.com/dfalbel/testVariablesPaneExtension/tree/main/R with an example that works for R6 classes and some examples of what I'd like to make it work for torch.

lionel-

This is a really nice PR!
Approved but I think @DavisVaughan should also take a look.

I believe this is the very first API exposed from Ark to R packages (if we exclude rstudioapi support) so we should discuss what that should look like. I like the setup you've created to register methods but I wonder if:

.ps.register_ark_method("ark_variable_display_value", ...)

should have the opposite namespacing:

.ark.register_method("positron_variable_display_value", ...)

The variable pane is currently a positron-only feature so should be namespaced in positron_
I'm imagining the register_method() function will be common to both ark and positron features, so should be namespaced as such.

We probably should meet and discuss how to expose this API.

crates/ark/src/interface.rs

lionel- · 2024-10-09T07:50:34Z

crates/ark/src/variables/methods.rs

+use crate::modules::ARK_ENVS;
+
+#[derive(Debug, PartialEq, EnumString, EnumIter, IntoStaticStr, Display, Eq, Hash, Clone)]
+pub enum ArkGenerics {


Should be in its own module one level above?

Or renamed to ArkVariableGenerics.

Do you have a preference? It feels like this could be used elsewhere, so we could move to its own module above.

I've moved one level above in ff887ff . Happy to rever if you prefer making it specific for the variables pane.

lionel- · 2024-10-09T07:57:36Z

crates/ark/src/variables/methods.rs

+    fn parse_method(name: &String) -> Option<(Self, String)> {
+        for method in ArkGenerics::iter() {
+            let method_str: &str = method.clone().into();
+            if name.starts_with::<&str>(method_str) {
+                if let Some((_, class)) = name.split_once(".") {
+                    return Some((method, class.to_string()));
+                }
+            }
+        }
+        None
+    }
+}


lionel- · 2024-10-09T07:58:43Z

crates/ark/src/variables/methods.rs

+            Rf_lang3(
+                r_symbol!(":::"),
+                r_symbol!(package),
+                r_symbol!(format!("{generic}.{class}")),
+            )


You can replace this with RCall which works similarly to RFunction (the latter is implemented in terms of the former).

But why not pass a function to register_method()? To avoid inlining objects in R calls? If so needs a comment, the implicit evaluation is surprising.

IIRC, the motivation for storing a call pkgname:::methodname instead of the closure object was to ensure that:

Everything worked seamlessly with devtools::load_all() and similar functions (no risk of calling an outdated method from a stale cache);

An escape hatch remained available for monkey-patching a method in the package namespace (e.g., to enable workarounds for misbehaving methods).

I updated to use RCall in aadef2b

lionel- · 2024-10-09T08:02:43Z

crates/ark/src/variables/methods.rs

+        }
+    }
+
+    pub fn register_method(generic: Self, class: &str, method: RObject) -> anyhow::Result<()> {


Any reason not to take a consuming self (as opposed to &self)?

My reasoning was that a static method would make more clear that we're actually modfying some global state and not just an object state. But loooks weird indeed - fixed in 16d4ab8

crates/ark/src/variables/methods.rs

lionel- · 2024-10-09T08:28:21Z

crates/ark/src/variables/methods.rs

+        T: TryFrom<RObject>,
+        <T as TryFrom<harp::RObject>>::Error: std::fmt::Debug,
+    {
+        if !r_is_object(x) {


Since this is rather generic infrastructure, we might want to allow registering methods for base types in the future (only us would be allowed to)?

crates/ark/src/variables/methods.rs

crates/ark/src/variables/variable.rs

lionel- · 2024-10-09T08:48:04Z

crates/ark/src/variables/variable.rs

+    let kind: Option<String> = ArkGenerics::VariableKind.try_dispatch(value, vec![])?;
+    match kind {
+        None => Ok(None),
+        // Enum reflection is not beautiful, we want to parse a VariableKind from it's


I don't understand this note.

This was more like a rant :P. It's hard to go from the string representation to an enum value in rust. Here we had to create a json and then read it with serde.

Co-authored-by: Lionel Henry <[email protected]>

…sitron_variables`.

dfalbel · 2024-10-09T13:36:10Z

Thanks @lionel-

I agree that we should have the opposite namespacing and just changed here: a21dc37

I'm happy to chat further on how to expose this API. Currently we don't necessarily need to expose .ark_register_method() as we already automatically find methods in packages, but it's nice for interactive debugging and for testing purposes.

This comment was marked as resolved.

Sign in to view

dfalbel and others added 18 commits October 2, 2024 11:40

Add methods registration and dispatching

97ee312

Dispatch on methods before relying on defaults

cc1285b

add more supported methods

7926192

Use a strum enum to make less error prone method dispatching

90ee3ab

Favor let Some(x) instead of match

501b343

use let Ok() instead

6d4f7e8

Move into an R based impl

ba1c37d

minor revision

94a4db2

typo

96783ba

Make a method instead

2856256

Add some unit tests

1390024

Add some documentation

891e9e4

Revise 'Extending Variables Pane` doc

5ac78f0

Additional note

92e8c30

fix tests

d3210b8

Find namespaced method calls like foo:::bar

e85af96

Remove has_ark_method

a2f876c

use r_task

8676851

dfalbel force-pushed the feature/extend-variables branch from 47a46d8 to 8676851 Compare October 2, 2024 14:42

dfalbel mentioned this pull request Oct 2, 2024

Extending the variables pane - Inspect #561

Open

dfalbel requested review from lionel- and DavisVaughan October 2, 2024 17:47

lionel- reviewed Oct 3, 2024

View reviewed changes

lionel- approved these changes Oct 9, 2024

View reviewed changes

dfalbel and others added 11 commits October 9, 2024 08:15

Use if let Err construct

8551bf3

Use is_string helper

d3565fe

USe cls for consistency

c889216

Improve readability

63976ad

Co-authored-by: Lionel Henry <[email protected]>

Simplify function

9ba6a7e

Co-authored-by: Lionel Henry <[email protected]>

Simplify note

648b76c

Use &self instead

16d4ab8

Use RCall instead

aadef2b

Use RArgument

51111c0

Add note on why we store a call

27b7f02

Use different namespacing. .ps -> .ark and ark_variables -> `po…

a21dc37

…sitron_variables`.

Move methods.rs to it's own module one level above.

ff887ff

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Protocol for extending the variables pane #560

Protocol for extending the variables pane #560

dfalbel commented Oct 2, 2024 •

edited

Loading

This comment was marked as resolved.

This comment was marked as resolved.

lionel- left a comment

dfalbel commented Oct 3, 2024

lionel- commented Oct 3, 2024

t-kalinowski commented Oct 3, 2024

lionel- commented Oct 3, 2024

t-kalinowski commented Oct 3, 2024

dfalbel commented Oct 3, 2024

lionel- commented Oct 3, 2024

dfalbel commented Oct 3, 2024 •

edited

Loading

lionel- left a comment

lionel- Oct 9, 2024

dfalbel Oct 9, 2024

dfalbel Oct 9, 2024

lionel- Oct 9, 2024

lionel- Oct 9, 2024

lionel- Oct 9, 2024

t-kalinowski Oct 9, 2024

dfalbel Oct 9, 2024

lionel- Oct 9, 2024

dfalbel Oct 9, 2024

lionel- Oct 9, 2024

lionel- Oct 9, 2024

dfalbel Oct 9, 2024

dfalbel commented Oct 9, 2024

Protocol for extending the variables pane #560

Are you sure you want to change the base?

Protocol for extending the variables pane #560

Conversation

dfalbel commented Oct 2, 2024 • edited Loading

Motivation

How it's implemented

Comparison to RStudio

This comment was marked as resolved.

This comment was marked as resolved.

lionel- left a comment

Choose a reason for hiding this comment

dfalbel commented Oct 3, 2024

lionel- commented Oct 3, 2024

t-kalinowski commented Oct 3, 2024

lionel- commented Oct 3, 2024

t-kalinowski commented Oct 3, 2024

dfalbel commented Oct 3, 2024

lionel- commented Oct 3, 2024

dfalbel commented Oct 3, 2024 • edited Loading

lionel- left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dfalbel commented Oct 9, 2024

dfalbel commented Oct 2, 2024 •

edited

Loading

dfalbel commented Oct 3, 2024 •

edited

Loading