Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Store package names in arrow metadata #122

Open
wants to merge 10 commits into
base: main
Choose a base branch
from

Conversation

ericphanson
Copy link
Member

for more informative unknown schema errors

closes #46

Here I chose to only store this information when the schema was defined in a package (rather than in Main or in a script or something), since it's only in that case we can reliably make a suggestion to load that package.

src/tables.jl Outdated Show resolved Hide resolved
src/schemas.jl Outdated Show resolved Hide resolved
src/schemas.jl Outdated Show resolved Hide resolved
Project.toml Outdated Show resolved Hide resolved
Project.toml Outdated Show resolved Hide resolved
@ericphanson
Copy link
Member Author

good call @kleinschmidt, I've switched this to use the schema-version instead of the schema name (defined in the @version call rather than the @schema call)

@omus
Copy link
Member

omus commented Aug 29, 2024

@ericphanson you probably also want to include the version of the package as well since older or newer versions may be incompatible.

Project.toml Show resolved Hide resolved
src/schemas.jl Outdated Show resolved Hide resolved
src/schemas.jl Outdated Show resolved Hide resolved
src/schemas.jl Show resolved Hide resolved
src/tables.jl Show resolved Hide resolved
# Let's test some more error printing while we're here; if we did not have the VersionNumber
# (e.g. since the table was generated on Julia pre-1.9), we should still print a reasonable message:
err = Legolas.UnknownSchemaVersionError(Legolas.SchemaVersion("test-provider-pkg.foo", 1), :TestProviderPkg, missing)
@test contains(sprint(Base.showerror, err), "TestProviderPkg")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should use Arrow.Table to load the table without the package to ensure that the advice we give is accurate

Copy link
Member

@omus omus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some minor code suggestions. Please bump the version to 0.5.22

Comment on lines +28 to +29
v = Legolas.extract_metadata(table, Legolas.LEGOLAS_SCHEMA_PROVIDER_NAME_METADATA_KEY)
@test v == "TestProviderPkg"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a test for the version key?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed

# Check if this module was defined in a package.
# If not, return `nothing`
path = pathof(rootmodule)
path === nothing && return (; name=nothing, version=nothing)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment above makes it sound like this should be:

Suggested change
path === nothing && return (; name=nothing, version=nothing)
path === nothing && return nothing

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

needs to be consistent with the return value described in the docstring of schema_provider though, which is as it is here...

Comment on lines 230 to 251
schema_metadata = LEGOLAS_SCHEMA_QUALIFIED_METADATA_KEY => identifier(sv)
provider_name, provider_version = schema_provider(sv)
provider_name_metadata = LEGOLAS_SCHEMA_PROVIDER_NAME_METADATA_KEY => string(provider_name)
provider_version_metadata = LEGOLAS_SCHEMA_PROVIDER_VERSION_METADATA_KEY => string(provider_version)
if isnothing(metadata)
metadata = (schema_metadata,)
if !isnothing(provider_name)
metadata = (metadata..., provider_name_metadata)
if !isnothing(provider_version)
metadata = (metadata..., provider_version_metadata)
end
end
else
metadata = Set(metadata)
push!(metadata, schema_metadata)
if !isnothing(provider_name)
push!(metadata, provider_name_metadata)
if !isnothing(provider_version)
push!(metadata, provider_version_metadata)
end
end
end
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A cleaner implementation. Assumes that schema_provider returns nothing when there is no provider. Additionally, note that Arrow.getmetadata returns an ImmutableDict{String,String}:

Suggested change
schema_metadata = LEGOLAS_SCHEMA_QUALIFIED_METADATA_KEY => identifier(sv)
provider_name, provider_version = schema_provider(sv)
provider_name_metadata = LEGOLAS_SCHEMA_PROVIDER_NAME_METADATA_KEY => string(provider_name)
provider_version_metadata = LEGOLAS_SCHEMA_PROVIDER_VERSION_METADATA_KEY => string(provider_version)
if isnothing(metadata)
metadata = (schema_metadata,)
if !isnothing(provider_name)
metadata = (metadata..., provider_name_metadata)
if !isnothing(provider_version)
metadata = (metadata..., provider_version_metadata)
end
end
else
metadata = Set(metadata)
push!(metadata, schema_metadata)
if !isnothing(provider_name)
push!(metadata, provider_name_metadata)
if !isnothing(provider_version)
push!(metadata, provider_version_metadata)
end
end
end
metadata = Set{String,String}(isnothing(metadata) ? [] : metadata)
push!(metadata, LEGOLAS_SCHEMA_QUALIFIED_METADATA_KEY => identifier(sv))
provider = schema_provider(sv)
if !isnothing(provider)
push!(metadata, LEGOLAS_SCHEMA_PROVIDER_NAME_METADATA_KEY => string(provider_name))
push!(metadata, LEGOLAS_SCHEMA_PROVIDER_VERSION_METADATA_KEY => string(provider_version))
end

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I agree with this, and it's easier for me to follow at least

Copy link
Member

@kleinschmidt kleinschmidt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mostly agree with @omus review and have left comments where I don't. needs a version bump.

Comment on lines +28 to +29
v = Legolas.extract_metadata(table, Legolas.LEGOLAS_SCHEMA_PROVIDER_NAME_METADATA_KEY)
@test v == "TestProviderPkg"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed

# Check if this module was defined in a package.
# If not, return `nothing`
path = pathof(rootmodule)
path === nothing && return (; name=nothing, version=nothing)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

needs to be consistent with the return value described in the docstring of schema_provider though, which is as it is here...

Comment on lines 230 to 251
schema_metadata = LEGOLAS_SCHEMA_QUALIFIED_METADATA_KEY => identifier(sv)
provider_name, provider_version = schema_provider(sv)
provider_name_metadata = LEGOLAS_SCHEMA_PROVIDER_NAME_METADATA_KEY => string(provider_name)
provider_version_metadata = LEGOLAS_SCHEMA_PROVIDER_VERSION_METADATA_KEY => string(provider_version)
if isnothing(metadata)
metadata = (schema_metadata,)
if !isnothing(provider_name)
metadata = (metadata..., provider_name_metadata)
if !isnothing(provider_version)
metadata = (metadata..., provider_version_metadata)
end
end
else
metadata = Set(metadata)
push!(metadata, schema_metadata)
if !isnothing(provider_name)
push!(metadata, provider_name_metadata)
if !isnothing(provider_version)
push!(metadata, provider_version_metadata)
end
end
end
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I agree with this, and it's easier for me to follow at least

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Feature request: include package name or package url in arrow metadata
3 participants