Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature]: Support Blob datatype #27578

Open
1 task done
xiaofan-luan opened this issue Oct 9, 2023 · 6 comments
Open
1 task done

[Feature]: Support Blob datatype #27578

xiaofan-luan opened this issue Oct 9, 2023 · 6 comments
Labels
good first issue Good for newcomers hacktoberfest Issues picked by hacktoberfest kind/feature Issues related to feature request from users

Comments

@xiaofan-luan
Copy link
Contributor

Is there an existing issue for this?

  • I have searched the existing issues

Is your feature request related to a problem? Please describe.

User some times want to store large blobs into Milvus, for example, store a image or a pdf into milvus and retrieve.

To support store blob, milvus don't store data itself but only store the object storage path.

The blob is stored to S3 on proxy, with the correct path, and the actual field on milvus is a string which contains the S3 path.

User can not query or filter on Blob, but blob can be specified as the output field.

On search/query, if blob is specified as output field, proxy is responsible for retrieve the data out and response to user.

Currently, the blob has a limitation of 16M.

Describe the solution you'd like.

No response

Describe an alternate solution.

No response

Anything else? (Additional Context)

To be discussed:

if user ask for large topk, return all the blobs to client may break the grpc size limit.
How to handle it? should we return all the file list to client and retrieve the real blob from client by another rpc?

@xiaofan-luan xiaofan-luan added good first issue Good for newcomers kind/feature Issues related to feature request from users hacktoberfest Issues picked by hacktoberfest labels Oct 9, 2023
@mzhl1111
Copy link

Hi am a current student and want to start my first PR as an open-source contributor, can you assign this to me?

@xiaofan-luan
Copy link
Contributor Author

Sure my man.

/assign @MZHL111
The overall goal is to add a new data type in milvus to store large data chunk (roughly 100KB - 10MB)

let me know what's your plan and I would be glad to help

@sre-ci-robot
Copy link
Contributor

@xiaofan-luan: GitHub didn't allow me to assign the following users: MZHL111.

Note that only milvus-io members, repo collaborators and people who have commented on this issue/PR can be assigned. Additionally, issues/PRs can only have 10 assignees at the same time.
For more information please see the contributor guide

In response to this:

Sure my man.

/assign @MZHL111
The overall goal is to add a new data type in milvus to store large data chunk (roughly 100KB - 10MB)

let me know what's your plan and I would be glad to help

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@mzhl1111
Copy link

@xiaofan-luan Thank you :). Do you know which module or file I should start with to get familiar with the code base?

@xiaofan-luan
Copy link
Contributor Author

To add new data type, you will have to

  1. change the proto definition https://github.com/milvus-io/milvus-proto
  2. add DDL definition
  3. implement proxy node to handle insert
  4. change the wal Serdes
  5. add datanode/querynode processing code
  6. change the storage to support storage
  7. Add support to retreive blob data type out

@mzhl1111
Copy link

@xiaofan-luan
Hi xiaofan, could you please help review the proto definition PR (milvus-io/milvus-proto#241)?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Good for newcomers hacktoberfest Issues picked by hacktoberfest kind/feature Issues related to feature request from users
Projects
None yet
Development

No branches or pull requests

3 participants