Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Build a meta-feature (evaluation) engine in Python #2

Open
PGijsbers opened this issue Sep 2, 2024 · 2 comments
Open

Build a meta-feature (evaluation) engine in Python #2

PGijsbers opened this issue Sep 2, 2024 · 2 comments
Assignees

Comments

@PGijsbers
Copy link

The evaluation engine is a component on the server which handles multiple tasks. This is currently implemented in Java and we want to rebuild it in Python, and compartmentalised per each function, for easier maintenance/more accessible to new contributors. One of its tasks is calculating meta-features over tabular datasets.

The engine should take tabular datasets and calculate a set of meta-features of them. Meta-features with an existing name should produce identical results, as much as possible currently available meta-features should remain available. Probably want to work with PyMFE.

@PGijsbers
Copy link
Author

@joaquinvanschoren you were assigned and there is a listed "in progress". Could you write down what progress there is, if any? Then unassign yourself (assuming you are not working on this).

@joaquinvanschoren joaquinvanschoren removed their assignment Sep 6, 2024
@joaquinvanschoren
Copy link

@NathanFCarvalho worked on this from March-June. He has written a script to compute meta-features with PyMFE which works on almost all datasets (tested on about 5000 datasets, but slow on the very large ones). It's a script because PyMFE does most of the work.

All code and documentation is here:
https://github.com/NathanFCarvalho/OpenML_Metafeature_Extraction

The remaining task would be to store the computed meta-features in OpenML, and rework the code so it can run as a cronjob.
Sidenote: PyMFE uses different names for the metafeatures, and they can be quite cryptic. Nathan made a mapping to more understandable names. However, these are not 100% the same as the existing meta-features. We need to decide whether we want to keep the old meta-features, or exclusively use the new ones for consistency.

I unassigned myself since I have a lot on my plate already, but this should be a very doable and well-contained task.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: In Progress
Development

No branches or pull requests

3 participants