-
Notifications
You must be signed in to change notification settings - Fork 79
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Huggingface Integration #916
base: master
Are you sure you want to change the base?
Conversation
MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Before I start reviewing this in earnest, I would need at least the following 2 pieces of information to be added to the PR:
- Documentation: (it is absolutely fine to have a bullet point list of items that link to the main HF docs)
- Tests
I believe both of these were present in the previous PR.
Hi @Wauplin and @NielsRogge - this PR looks good from my end. Do you have any feedback? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey there 👋 Thanks for the ping!
I left a few comments from an outsider point of view. I do think that the CLI should be more opinionated (understand "have less options and decide things for the user") otherwise we pretty much end up with a CLI close to what huggingface-cli upload
and huggingface-cli download
do.
from pathlib import Path | ||
from GANDLF.utils import get_git_hash | ||
|
||
readme_template = """ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this a simple copy of the model card template found here? If yes, I can suggest to either:
- directly reuse the template from
huggingface_hub
(i.e.ModelCard.from_template(card_data)
without thetemplate_str
). - or define your own template but in this case you should only put the relevant fields and descriptions for your library (instead of having all fields as empty)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @Wauplin for making me aware of this ,I will definitely go through it and make required changes as you mentioned
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @Wauplin,
We had an internal discussion on what would be the best way for us to showcase potential model uploaders with a specific set of required options for the model card. Thus far, we have landed on using a custom model card. The reason to have all the fields present is provide the ability for a user to put in more information than what we require.
Here, we have put the string "REQUIRED_FOR_GANDLF"
for the fields that are explicitly needed for the user to populate, and the rest have been left as present in the template.
In the code, we plan to add 2 checks:
- If
"REQUIRED_FOR_GANDLF"
is found, we present an error to the user saying that this field needs to be populated with appropriate information. - The
Repository
key should always behttps://github.com/mlcommons/GaNDLF
.
Thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems a sensible idea to me yes!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Brilliant, thanks for the confirmation! We'll get on it right away. 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sarthakpati @Wauplin so how can we test this file if we propose the upload functionality as we only have entry points tests, do we have to mention a specific directory there
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps you can leverage one of the existing training tests to test the upload. I would recommend this one, since this would only upload a single model.
Ensure you put an appropriate description for it (such as Unit testing model
or something) to make it clear for anyone viewing it. Is there a way to update an existing model, @Wauplin?
tags += [git_hash] | ||
|
||
card_data = ModelCardData(library_name="GaNDLF", tags=tags) | ||
card = ModelCard.from_template(card_data, template_str=readme_template) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See comment above about template_str
def download_from_hub( | ||
repo_id: str, | ||
revision: Union[str, None] = None, | ||
cache_dir: Union[str, None] = None, | ||
local_dir: Union[str, None] = None, | ||
force_download: bool = False, | ||
token: Union[str, None] = None, | ||
): | ||
snapshot_download( | ||
repo_id=repo_id, | ||
revision=revision, | ||
cache_dir=cache_dir, | ||
local_dir=local_dir, | ||
force_download=force_download, | ||
token=token, | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure this alias is really needed. I would simply call snapshot_download
in other places in the code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still think the alias is not needed and that snapshot_download
could be used by default
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Wauplin actually this alias is use for the alignment of hugging face downloading feature with Gandlf command line design pattern ,change it to by default may abruptly conflict the command line argument
Co-authored-by: Lucain <[email protected]>
@pranayasinghcsmpl some lint fixes (unused variables and whatnot) will be needed for this PR. Thanks for taking care of it! Thanks for your comments and suggestions, @Wauplin! |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #916 +/- ##
==========================================
+ Coverage 94.58% 94.61% +0.03%
==========================================
Files 161 164 +3
Lines 9567 9701 +134
==========================================
+ Hits 9049 9179 +130
- Misses 518 522 +4
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
Support ticket generated with Codacy to explore the coverage issue. |
Codacy folks suggested not to use coverage reporter for anything coming in from other forks 🙄 Anyway, we should be good to go from my end. @Wauplin is this PR good to merge for you? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Integration looks good yes :) I left a few comments but nothing blocking on my side.
Thanks for the iterations!
def download_from_hub( | ||
repo_id: str, | ||
revision: Union[str, None] = None, | ||
cache_dir: Union[str, None] = None, | ||
local_dir: Union[str, None] = None, | ||
force_download: bool = False, | ||
token: Union[str, None] = None, | ||
): | ||
snapshot_download( | ||
repo_id=repo_id, | ||
revision=revision, | ||
cache_dir=cache_dir, | ||
local_dir=local_dir, | ||
force_download=force_download, | ||
token=token, | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still think the alias is not needed and that snapshot_download
could be used by default
@click.option( | ||
"--hf-template", | ||
"-hft", | ||
help="Adding the template path for the model card it is Required during Uploaing a model", | ||
type=click.Path(exists=True, file_okay=True, dir_okay=False), | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe default it to hugging_face.md
to reduce friction? Users are free to provide another template if they want but having one by default should reduce friction and help grow usage.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So we have to provide a default path for the hugging face template so that when ever the user want to upload a ,model by default that template will be uploaded ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's what I suggest to do yes. There is already a hugging_face.md template in this PR so I was suggesting to reuse it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok thanks for you suggestion @Wauplin we will do the same
Co-authored-by: Lucain <[email protected]>
Co-authored-by: Lucain <[email protected]>
Co-authored-by: Lucain <[email protected]>
Co-authored-by: Lucain <[email protected]>
Fixes #727
Proposed Changes
Checklist
CONTRIBUTING
guide has been followed.typing
is used to provide type hints, including and not limited to usingOptional
if a variable has a pre-defined value).pip install
step is needed for PR to be functional), please ensure it is reflected in all the files that control the CI, namely: python-test.yml, and all docker files [1,2,3].