Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add blob file type in schema.proto #241

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

mzhl1111
Copy link

Related Issue milvus-io/milvus#27578

@sre-ci-robot
Copy link

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: mzhl1111
To complete the pull request process, please assign czs007 after the PR has been reviewed.
You can assign the PR to them by writing /assign @czs007 in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@sre-ci-robot
Copy link

Welcome @mzhl1111! It looks like this is your first PR to milvus-io/milvus-proto 🎉

@mergify mergify bot added the needs-dco DCO is missing in this pull request. label Jan 25, 2024
@mzhl1111 mzhl1111 force-pushed the Feature-Support-Blob-datatype-#27578 branch from 187106e to 28dc941 Compare January 25, 2024 02:52
@mergify mergify bot added dco-passed DCO check passed. and removed needs-dco DCO is missing in this pull request. labels Jan 25, 2024
@mzhl1111 mzhl1111 force-pushed the Feature-Support-Blob-datatype-#27578 branch 2 times, most recently from f3ec612 to 3a1aefb Compare February 11, 2024 04:15
@mergify mergify bot added the ci-passed label Feb 12, 2024
message FieldData {
DataType type = 1;
string field_name = 2;
oneof field {
ScalarField scalars = 3;
VectorField vectors = 4;
ExtraField extras = 5;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can not change the order of proto files due to compatibility issues.
blobs could be also a scalar datatype?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure.

@@ -65,6 +67,8 @@ message FieldSchema {
bool is_dynamic = 12; // mark whether this field is the dynamic field
bool is_partition_key = 13; // enable logic partitions
bool is_clustering_key = 14;

bool is_blob_saved = 15; // for blob type, mark if the blob is saved
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what does this is_blob_saved mean?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To mark if the blob file is stored in s3 or not.

@mergify mergify bot added needs-dco DCO is missing in this pull request. and removed dco-passed DCO check passed. ci-passed labels Feb 16, 2024
@mzhl1111 mzhl1111 force-pushed the Feature-Support-Blob-datatype-#27578 branch from 45900bc to b0f941f Compare February 16, 2024 19:24
@mergify mergify bot added dco-passed DCO check passed. and removed needs-dco DCO is missing in this pull request. labels Feb 16, 2024
@mergify mergify bot added needs-dco DCO is missing in this pull request. and removed dco-passed DCO check passed. labels Feb 17, 2024
@mzhl1111 mzhl1111 force-pushed the Feature-Support-Blob-datatype-#27578 branch 2 times, most recently from 6364bcf to 5f0d8f6 Compare February 17, 2024 02:35
@mergify mergify bot added dco-passed DCO check passed. and removed needs-dco DCO is missing in this pull request. labels Feb 17, 2024
@mzhl1111 mzhl1111 force-pushed the Feature-Support-Blob-datatype-#27578 branch from f23b5a1 to abbe5bb Compare February 17, 2024 20:44
@mergify mergify bot added the ci-passed label Feb 19, 2024
@xiaofan-luan
Copy link
Collaborator

hi @mzhl1111,

Would you mind if we can have a quick sync on this pr?
Feel free to contact me at [email protected], I'd like to have a 30 minutes quick sync.

The major concern I has:

  1. we don't need special rpc to insert blob, blob is simply like a datatype of the collection and you can simply use insert and search/query to retrieve the data.
  2. there is not much we need to change on proto side expect for schema.proto.

The most important for this pr is:

  1. at proxy, the data is written directly into object storage, and milvus itself only maintain an reference link
  2. on search, if we specify output field as blob, then file link is retrieved to proxy, proxy retrieve the data directly from object storage.

With this pr, we can store image and doc chunk into S3 so we can save great amount of memory (And maybe network traffic).

For user experience, there are no specific rpc

@mzhl1111 mzhl1111 force-pushed the Feature-Support-Blob-datatype-#27578 branch from abbe5bb to 5b23feb Compare July 11, 2024 07:07
@mergify mergify bot added needs-dco DCO is missing in this pull request. and removed dco-passed DCO check passed. ci-passed labels Jul 11, 2024
@mzhl1111 mzhl1111 force-pushed the Feature-Support-Blob-datatype-#27578 branch from 5b23feb to 45520d7 Compare July 11, 2024 07:13
proto/schema.proto Outdated Show resolved Hide resolved
@mzhl1111 mzhl1111 force-pushed the Feature-Support-Blob-datatype-#27578 branch from 4f9614f to e934264 Compare July 22, 2024 19:34
@mergify mergify bot added dco-passed DCO check passed. and removed needs-dco DCO is missing in this pull request. labels Jul 22, 2024
@mzhl1111 mzhl1111 force-pushed the Feature-Support-Blob-datatype-#27578 branch from 9463551 to 2f7de84 Compare July 23, 2024 18:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
dco-passed DCO check passed. size/S
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants