Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MRG] ENH: safe-level SMOTE #626

Open
wants to merge 5 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
18 changes: 16 additions & 2 deletions doc/over_sampling.rst
Original file line number Diff line number Diff line change
Expand Up @@ -152,8 +152,9 @@ nearest neighbors class. Those variants are presented in the figure below.
:align: center


The :class:`BorderlineSMOTE` [HWB2005]_, :class:`SVMSMOTE` [NCK2009]_, and
:class:`KMeansSMOTE` [LDB2017]_ offer some variant of the SMOTE algorithm::
The :class:`BorderlineSMOTE` [HWB2005]_, :class:`SVMSMOTE` [NCK2009]_,
:class:`KMeansSMOTE` [LDB2017]_ and :class:`SafeLevelSMOTE` [BSL2009]_
offer some variant of the SMOTE algorithm::

>>> from imblearn.over_sampling import BorderlineSMOTE
>>> X_resampled, y_resampled = BorderlineSMOTE().fit_resample(X, y)
Expand Down Expand Up @@ -213,6 +214,14 @@ other extra interpolation.
Imbalanced Learning Based on K-Means and SMOTE"
https://arxiv.org/abs/1711.00837

[BSL2009] C. Bunkhumpornpat, K. Sinapiromsaran, C. Lursinsap,
"Safe-level-SMOTE: Safe-level-synthetic minority over-sampling
technique for handling the class imbalanced problem," In:
Theeramunkong T., Kijsirikul B., Cercone N., Ho TB. (eds)
Advances in Knowledge Discovery and Data Mining. PAKDD 2009.
Lecture Notes in Computer Science, vol 5476. Springer, Berlin,
Heidelberg, 475-482, 2009.

Mathematical formulation
========================

Expand Down Expand Up @@ -274,6 +283,11 @@ parameter ``m_neighbors`` to decide if a sample is in danger, safe, or noise.
method before to apply SMOTE. The clustering will group samples together and
generate new samples depending of the cluster density.

**SafeLevel** SMOTE --- cf. to :class:`SafeLevelSMOTE` --- uses the safe level
(the number of positive instances in nearest neighbors) to generate a synthetic
instance. Compared to regular SMOTE, the new instance is positioned closer to
the positive instance with larger safe level.

ADASYN works similarly to the regular SMOTE. However, the number of
samples generated for each :math:`x_i` is proportional to the number of samples
which are not from the same class than :math:`x_i` in a given
Expand Down
2 changes: 2 additions & 0 deletions imblearn/over_sampling/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@
from ._smote import KMeansSMOTE
from ._smote import SVMSMOTE
from ._smote import SMOTENC
from ._smote import SafeLevelSMOTE

__all__ = [
"ADASYN",
Expand All @@ -19,4 +20,5 @@
"BorderlineSMOTE",
"SVMSMOTE",
"SMOTENC",
"SafeLevelSMOTE",
]
Loading