Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ENH] let fill_empty function support to fill NaN value with mean, median or mode #1044

Open
Zeroto521 opened this issue Mar 16, 2022 · 3 comments

Comments

@Zeroto521
Copy link
Member

Zeroto521 commented Mar 16, 2022

Brief Description

As title.
For some data, such as GDP, filling its NaN value with 0 isn't a good idea.
Because most of the GDP values end in million.
We don't fill NaN value with 0 rather mean value.

API

def fill_empty(
    df: pd.DataFrame,
    column_names: list[str | int],
    value: Any = None,
    method: str = None,
) -> pd.DataFrame:
    ...
  1. One of value and method shouldn't be None.
  2. The method should be 'mean', 'median', or 'mode'.

Example

import pandas as pd
import janitor  # noqa

# create a DataFrame
df = pd.Series([2, 2, None, 0, 4], name="nan-col").to_frame()
#    nan-col
# 0      2.0
# 1      2.0
# 2      NaN
# 3      0.0
# 4      4.0

# fill NaN with mean value
df.fill_empty(["nan-col"], method="mean")
#    nan-col
# 0      2.0
# 1      2.0
# 2      2.0
# 3      0.0
# 4      4.0
@samukweku
Copy link
Collaborator

@Zeroto521 impute covers this usecase; at this point, I wonder if it is okay to deprecate one of these functions, so we have just one that covers na filling? @pyjanitor-devs/core-devs

@thatlittleboy
Copy link
Contributor

@samukweku I'm okay with deprecating one of impute or fill_empty. Seems like impute not only covers the "mean/mode/.." use case, but also the imputing with constant value, which is fill_empty's current functionality?

I'll be inclined to keep impute over fill_empty (at least within the DS/ML community, impute is a commonly-used term; not sure about the broader data world.)

@samukweku
Copy link
Collaborator

yea, impute is a wrapper around fillna, with the benefits of the statistics imputation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants