Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Quanitzed Chunking #356

Open
nighting0le01 opened this issue Sep 14, 2024 · 0 comments
Open

Quanitzed Chunking #356

nighting0le01 opened this issue Sep 14, 2024 · 0 comments

Comments

@nighting0le01
Copy link

nighting0le01 commented Sep 14, 2024

🐞Describing the bug
i used the bisect_model() function to split a quantized model into 2 chunks, i tried with 7.1 and 7.0 with reference to this file:

if (cumulative_size_in_mb > half_size and not op.op_type.startswith("const")
, but was facing same issue.


 prog = _load_prog_from_mlmodel(model)

# Compute the incision point by bisecting the program based on weights size
op_idx, first_chunk_weights_size, total_weights_size = _get_op_idx_split_location(
    prog)
print(f"First  chunk size = {first_chunk_weights_size:.2f} MB") # 152.67 MB
print(f"Second chunk size = {total_weights_size - first_chunk_weights_size:.2f} MB") #0.42 MB
print(index=587/2720)
prog_chunk1 = _make_first_chunk_prog(f"index={op_idx}/{len(main_block.operations)") # 587/3000
prog_chunk2 = _make_second_chunk_prog(_load_prog_from_mlmodel(model), op_idx)
System environment (please complete the following information):
coremltools version:8.0b2

here is the code to reproduce,
coremltools version 7.01, i know with 8.0b2 the chunking has moved to CoreMLtools but i think it has the same issue when chunking a quantized or palletized model

Model is simple MobileNet that can be downloaded from coremltools tutorial:https://apple.github.io/coremltools/docs-guides/source/opt-palettization-perf.html#:~:text=0.47-,MobileNetv2%2D1.0,-4%20bit


import coremltools as ct
from python_coreml_stable_diffusion.chunk_mlprogram import (
    _load_prog_from_mlmodel,
    _get_op_idx_split_location,
    _make_second_chunk_prog,
    _make_first_chunk_prog,
)
# link to get model:https://apple.github.io/coremltools/docs-guides/source/opt-palettization-perf.html#:~:text=0.47-,MobileNetv2%2D1.0,-4%20bit
model = ct.models.MLModel('MobileNetV2Alpha1ScalarPalettization4Bit.mlpackage')
prog = _load_prog_from_mlmodel(model)
# Load the MIL Program from MLModel
prog = _load_prog_from_mlmodel(model)

# Compute the incision point by bisecting the program based on weights size
op_idx, first_chunk_weights_size, total_weights_size = _get_op_idx_split_location(
    prog)
main_block = prog.functions["main"]
incision_op = main_block.operations[op_idx]

print(f"op_idx = {op_idx}")
print(f"First  chunk size = {first_chunk_weights_size:.2f} MB")
print(f"Second chunk size = {total_weights_size - first_chunk_weights_size:.2f} MB")
INFO:python_coreml_stable_diffusion.chunk_mlprogram:Loading MLModel object into a MIL Program object (including the weights)..
INFO:python_coreml_stable_diffusion.chunk_mlprogram:Program loaded in 0.1 seconds
INFO:python_coreml_stable_diffusion.chunk_mlprogram:Loading MLModel object into a MIL Program object (including the weights)..
INFO:python_coreml_stable_diffusion.chunk_mlprogram:Program loaded in 0.1 seconds
op_idx = 187
First  chunk size = 1.68 MB
Second chunk size = 0.15 MB
@nighting0le01 nighting0le01 changed the title chunking large models Quanitzed Chunking Sep 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant