Add support to multilabel #340

glemaitre · 2017-09-03T22:11:04Z

We should add support to multilabel when y can be converted back to multiclass.
It means that the sum of each row should be one.

The text was updated successfully, but these errors were encountered:

chkoar · 2017-09-04T08:42:27Z

Are we talking about multilabel or multioutput/multiclass?

glemaitre · 2017-09-04T08:52:06Z

those are always confusing. an example will speak for itself (but it should a multilabel case encoding a multiclass)

[[0 0 1]
 [1 0 0]
 [0 1 0]]

is a multilabel-indicator type encoding the following:

[[2]
 [0]
 [1]]

chkoar · 2017-09-04T09:12:16Z

I wouldn't call it multilabel. It is a binarized version of the target, right?
I am -1 for adding that logic inside the algorithms. We could use the LabelBinirizer for that, no?

massich · 2017-09-04T14:25:45Z

@chkoar I think that @glemaitre refers to provide the same support for y as scikit-learn does ( see here )

MarcoNiemann · 2017-11-09T10:35:26Z

Well, shouldn't multi-label be:

[[0,1,1],
 [1,0,0],
 [0,1,0],
 [1,0,1],
 [1,0,1],
 ...]

Because the version mentioned by @glemaitre appears - as stated by @chkoar - to be a binarized version of a multi-class problem. But the difference between multi-class and multi-label is that multi-class only allows the assignment of a single class to the target instance, whereas in a multi-label case it can be an arbitrary amount of class assignments.

For an implementation one might consider the label powerset transformation of multi-label data into a multiclass data set. So e.g. for the data set above one might apply the following transformation:

[[1],
 [2],
 [3],
 [4],
 [4],
 ...]

For all people searching for a quick and dirty solution I appear to have some success with the following solution:

from skmultilearn.problem_transformation import LabelPowerset
from imblearn.over_sampling import RandomOverSampler

# Import a dataset with X and multi-label y

lp = LabelPowerset()
ros = RandomOverSampler(random_state=42)

# Applies the above stated multi-label (ML) to multi-class (MC) transformation.
yt = lp.transform(y)

X_resampled, y_resampled = ros.fit_sample(X, yt)

# Inverts the ML-MC transformation to recreate the ML set
y_resampled = lp.inverse_transform(y_resampled)

(Use of the skmultilearn package for convenience sake to avoid custom transformation!)

glemaitre · 2018-03-20T08:55:53Z

imblearn accept by default one-vs-all enconding from now on

j-greer · 2018-07-19T13:38:57Z

@MarcoNiemann your solution works well when the imbalance occurs across the i^th dimension of y rather than the j^th.

Expanding upon your example:

[
[0,1,1],
[0,1,1],
[1,1,1],
[1,1,1],
[1,1,1],
[1,1,1],

 ...
]

Can be considered imbalanced along rows but take the following example:

[
[0,0,1],
[1,0,0],
[1,0,0],
[1,1,0],
[1,1,0],
 ...
]

This is imbalanced in the sense that y_i3 is mostly zero. Do you know of a way of addressing this type of imbalance problem using imbalanced-learn? @glemaitre

rjurney · 2019-07-31T22:01:59Z

@glenmaitre This seems an unsolved problem in the Python space. Support for this would be amazing.

glemaitre · 2019-08-06T21:06:51Z

@rjurney The issue is that the literature does not address this problem. So I am not really sure how we could go forward. It would be cool to have an overview of the full literature. It is a while I did not look at it.

HabeebullahEbrahemi · 2019-09-25T08:07:12Z

#just correcting the import part for my case python 3.7
from skmultilearn.problem_transform import LabelPowerset

daanvdn · 2019-10-17T08:02:06Z

@glemaitre, I found the article below that proposes MLSMOTE, an adaptation of SMOTE to multi-label problems:

Charte, Francisco, et al. "MLSMOTE: Approaching imbalanced multilabel learning through synthetic instance generation." Knowledge-Based Systems 89 (2015): 385-397.

There is also an (open-source) java implementation on github: https://github.com/tsoumakas/mulan/blob/master/mulan/src/main/java/mulan/sampling/MLSMOTE.java

aamin21 · 2019-10-17T18:54:03Z

Any update on this? Stuck on this one.

woolr · 2020-01-09T23:54:44Z

@daanvdn do you know if anyone has implemented this in Python?

daanvdn · 2020-01-10T07:34:08Z

Not that I know of..

…

Sent from my mobile phone

On Fri, 10 Jan 2020, 00:54 Dan, ***@***.***> wrote: @daanvdn <https://github.com/daanvdn> do you know if anyone has implemented this in Python? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#340?email_source=notifications&email_token=ABM5Q7V3JNIQ24PKEGGCHHLQ462MLA5CNFSM4DZNIZ6KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEISGLZI#issuecomment-572810725>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABM5Q7RPHYFJXO2DWRKOFPLQ462MLANCNFSM4DZNIZ6A> .

alfredsasko · 2020-03-24T10:40:37Z

@daanvdn, @glemaitre I read the referenced article by @daanvdn. Researches claim to be MLSMOTE superior in highly imbalanced multi-label datasets compared to other popular algorithms like BR, RAkEL, and CLR. They also provide algorithm pseudocode. I am trying to implement it in my project. ones I succeed will share the code with you.

t-lini · 2020-03-25T12:00:09Z

It might be worth also considering ML-ROS and ML-RUS as multilabel random over- and undersampling methods respectively, which were introduced by the authors of the article referenced by @daanvdn in an article prior to MLSMOTE, see:
F. Charte, A.J. Rivera, M.J. del Jesus, F. Herrera, Addressing imbalance in multilabel classification: measures and random resampling algorithms, Neurocomputing 163(9) (2015) 3–16, http://dx.doi.org/10.1016/j.neucom.2014.08.091.
These algorithms might be a good choice if you do not want to or can not use synthetic resampling methods. Implementations in Java are also available in the MULAN package:
https://github.com/tsoumakas/mulan/blob/master/mulan/src/main/java/mulan/sampling/MultiLabelRandomOverSampling.java
https://github.com/tsoumakas/mulan/blob/master/mulan/src/main/java/mulan/sampling/MutilLabelRandomUnderSampling.java
I will try to implement these methods in Python.

chkoar · 2020-03-25T12:03:56Z

I will try to implement these methods in Python.

That would be a great addition

SimonErm · 2020-05-05T13:11:41Z

I have tried to implement MLSMOTE in Python, but since I am not an experienced Python programmer, it consists of a lot of stackoverflow solutions and ugly code. As far as logic is concerned, it should be correct.
https://gist.github.com/SimonErm/b06c236cafdeb79fdf7adb90aef04fec

chkoar · 2020-05-05T13:41:39Z

@SimonErm I encourage you to add docstrings, write comments with your intention wherever you think it is appropriate, write some tests and open a PR in draft mode, so we could discuss your code in the PR.

Vishnux0pa · 2020-06-16T09:26:33Z

@SimonErm I tried you code and it works but it generates a random number of samples i,e I cant specify how many samples I would need. Is there a way to do that? Also, it would be good it you can share the paper

SimonErm · 2020-06-16T17:36:06Z

@Vishnux0pa That's because the number of generated labels is driven by the imbalance ratio of each label which is also discribed in the paper. You can find a reference in the description of the PR . It's the same mentioned by daanvdn:

I found the article below that proposes MLSMOTE, an adaptation of SMOTE to multi-label problems:

Charte, Francisco, et al. "MLSMOTE: Approaching imbalanced multilabel learning through synthetic instance generation." Knowledge-Based Systems 89 (2015): 385-397.

There is also an (open-source) java implementation on github: https://github.com/tsoumakas/mulan/blob/master/mulan/src/main/java/mulan/sampling/MLSMOTE.java

xelandar · 2020-08-30T08:33:43Z

As far as I can see another implementation of MLSMOTE can be found here (via this medium article). A haven't tested it yet, but thought it would be good to share it here in relevant thread.

chkoar · 2020-08-30T08:40:51Z

@xelandar there is already a PR here but it hasn't got a review yet, probably due to lack of time.

balvisio · 2022-09-21T15:16:30Z

I have created a new PR that implements MLSMOTE: #927.

imaspol · 2024-08-30T13:38:57Z

Hi, it would be great to have a version of classification_report_imbalanced for multilabel imbalanced data. Do you plan to implement it?

glemaitre mentioned this issue Sep 4, 2017

[WIP] Add support for multilabel #341

Closed

glemaitre mentioned this issue Sep 5, 2017

Support for MultiOutputClassifier #337

Closed

glemaitre closed this as completed Mar 20, 2018

niedakh mentioned this issue Dec 5, 2018

Module to handle imbalanced data in scikit-multilearn scikit-multilearn/scikit-multilearn#62

Open

jsl303 mentioned this issue Apr 16, 2019

Oversampling modules return a truncated array in the multi-class instance #489

Closed

SimonErm mentioned this issue May 10, 2020

Feature/add mlsmote #707

Draft

balvisio mentioned this issue Sep 21, 2022

Add MLSMOTE algorithm to imblearn #927

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support to multilabel #340

Add support to multilabel #340

glemaitre commented Sep 3, 2017

chkoar commented Sep 4, 2017

glemaitre commented Sep 4, 2017

chkoar commented Sep 4, 2017

massich commented Sep 4, 2017

MarcoNiemann commented Nov 9, 2017 •

edited

Loading

glemaitre commented Mar 20, 2018

j-greer commented Jul 19, 2018

rjurney commented Jul 31, 2019

glemaitre commented Aug 6, 2019 •

edited

Loading

HabeebullahEbrahemi commented Sep 25, 2019

daanvdn commented Oct 17, 2019

aamin21 commented Oct 17, 2019

woolr commented Jan 9, 2020

daanvdn commented Jan 10, 2020 via email

alfredsasko commented Mar 24, 2020

t-lini commented Mar 25, 2020

chkoar commented Mar 25, 2020

SimonErm commented May 5, 2020

chkoar commented May 5, 2020

Vishnux0pa commented Jun 16, 2020

SimonErm commented Jun 16, 2020

xelandar commented Aug 30, 2020 •

edited

Loading

chkoar commented Aug 30, 2020

balvisio commented Sep 21, 2022

imaspol commented Aug 30, 2024

Add support to multilabel #340

Add support to multilabel #340

Comments

glemaitre commented Sep 3, 2017

chkoar commented Sep 4, 2017

glemaitre commented Sep 4, 2017

chkoar commented Sep 4, 2017

massich commented Sep 4, 2017

MarcoNiemann commented Nov 9, 2017 • edited Loading

glemaitre commented Mar 20, 2018

j-greer commented Jul 19, 2018

rjurney commented Jul 31, 2019

glemaitre commented Aug 6, 2019 • edited Loading

HabeebullahEbrahemi commented Sep 25, 2019

daanvdn commented Oct 17, 2019

aamin21 commented Oct 17, 2019

woolr commented Jan 9, 2020

daanvdn commented Jan 10, 2020 via email

alfredsasko commented Mar 24, 2020

t-lini commented Mar 25, 2020

chkoar commented Mar 25, 2020

SimonErm commented May 5, 2020

chkoar commented May 5, 2020

Vishnux0pa commented Jun 16, 2020

SimonErm commented Jun 16, 2020

xelandar commented Aug 30, 2020 • edited Loading

chkoar commented Aug 30, 2020

balvisio commented Sep 21, 2022

imaspol commented Aug 30, 2024

MarcoNiemann commented Nov 9, 2017 •

edited

Loading

glemaitre commented Aug 6, 2019 •

edited

Loading

xelandar commented Aug 30, 2020 •

edited

Loading