Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix decode corner case where result isn't promoted #163

Open
wants to merge 12 commits into
base: main
Choose a base branch
from

Conversation

omus
Copy link
Member

@omus omus commented May 3, 2024

I was confused as the element type wasn't updated when I decoded these results:

julia> using Onda

julia> T = Int16
Int16

julia> channels = ["a", "b"]
2-element Vector{String}:
 "a"
 "b"

julia> info = SamplesInfoV2(; sensor_type="eeg",
                            channels,
                            sample_unit="unit",
                            sample_resolution_in_unit=1,
                            sample_offset_in_unit=0,
                            sample_type=T,
                            sample_rate=200)
SamplesInfoV2: (sensor_type = "eeg", channels = ["a", "b"], sample_unit = "unit", sample_resolution_in_unit = 1.0, sample_offset_in_unit = 0.0, sample_type = "int16", sample_rate = 200.0)

julia> samples = Samples(rand(-T(20):T(20), length(channels), 15), info, true)
Samples (00:00:00.075000000):
  info.sensor_type: "eeg"
  info.channels: ["a", "b"]
  info.sample_unit: "unit"
  info.sample_resolution_in_unit: 1.0
  info.sample_offset_in_unit: 0.0
  sample_type(info): Int16
  info.sample_rate: 200.0 Hz
  encoded: true
  data:
2×15 Matrix{Int16}:
 0   -1  19   9  -15  -19  -16  -14  -17  12  11  4  -16  4  9
 5  -16   7  -2    1  -13    7  -13  -11  19  13  8  -15  3  2

julia> Onda.decode(samples, Float64)
Samples (00:00:00.075000000):
  info.sensor_type: "eeg"
  info.channels: ["a", "b"]
  info.sample_unit: "unit"
  info.sample_resolution_in_unit: 1.0
  info.sample_offset_in_unit: 0.0
  sample_type(info): Int16
  info.sample_rate: 200.0 Hz
  encoded: false
  data:
2×15 Matrix{Int16}:
 0   -1  19   9  -15  -19  -16  -14  -17  12  11  4  -16  4  9
 5  -16   7  -2    1  -13    7  -13  -11  19  13  8  -15  3  2

Fixes: #141

src/samples.jl Outdated
@@ -371,13 +371,15 @@ If:

sample_data isa AbstractArray &&
sample_resolution_in_unit == 1 &&
sample_offset_in_unit == 0
sample_offset_in_unit == 0 &&
eltype(sample_data) == typeof(sample_resolution_in_unit) == typeof(sample_offset_in_unit)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when called with Samples, sample_resolution_in_unit and sample_offset_in_unit will always be Float64, so this effectively means decode will always return Float64 values. this is bad for both "categorical" type signals (where the signal data actually is an integer which represent an index) and for Float32 signals which a lot of ML models use.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason this guard is here in teh first place is that it used to be possible (in Onda 0.14 I think) to use integer 0/1 for the offset/resolution in SamplesInfo:

Onda.jl/src/signals.jl

Lines 66 to 67 in 2e33a6e

sample_resolution_in_unit::LPCM_SAMPLE_TYPE_UNION = convert_number_to_lpcm_sample_type(sample_resolution_in_unit),
sample_offset_in_unit::LPCM_SAMPLE_TYPE_UNION = convert_number_to_lpcm_sample_type(sample_offset_in_unit),

That changed with the switch to SamplesInfoV2:

Onda.jl/src/signals.jl

Lines 80 to 81 in 36a7be9

sample_resolution_in_unit::Float64
sample_offset_in_unit::Float64

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are categorical signals encoded at all? It seems like they should use encoded=false which would address that concern. I agree that Float32 shouldn't be converted to a Float64.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

encoded is not a property of the stored signal, it's specific to in-memory Samples objects, so there's no way for a signal to be encoded or decoded.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's definitely annoying...

@omus
Copy link
Member Author

omus commented May 3, 2024

I've revised the change to now work with categorical signals and Float32 signals. At this point the change should be non-breaking. Callers of decode(sample_resolution_in_unit, sample_offset_in_unit, sample_data) have a slight behavior change but I would definitely call it a bugfix.

I'll mention that there may be a pre-existing issue with Float32 signals being converted to Float64 as the only scenario in which they would remain as Float32 when decoding is when the resolution was 1 and the offset was 0. This behavior is retained in this PR and we may want to change it such that a decoded Float32 signal returns a Float32 signal.

@omus omus requested a review from kleinschmidt May 3, 2024 20:49
test/samples.jl Outdated
Comment on lines 148 to 149
decoded = decode(samples, Float32)
@test eltype(decoded.data) === Int16
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We may want to have it such that decode automatically converts the element type on data which doesn't match the specified type. I think I've done enough here so I'll leave that for a future contributor.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would be the remaining work to do on #141

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like the remaining concerns are just about this portion so I'll get this done after all

Copy link
Member

@kleinschmidt kleinschmidt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess I still don't totally see what this gets us over the decode! approach from #141. Even with this, we'll still hit the issue where if the samples are already decoded you wont' get the expected eltype...

I guess the main advantage is that decode(::Samples, ::Type{T}) is no longer (as much of) a lie, but you're still not guaranteed to get T (since we're doing a promote).

The output of decode(::Samples, ::Type{T}) woudl still change in important ways with this PR: the return value no longer aliases the input on the 0/1 case, and the eltype may be different from teh status quo. Whether that's fixing a bug or is actual breakage is kind of a judgment call; given that the current behavior is documented, that in internal discussions at Beacon this has been generally found to be intentional, and that Onda is deep in our tech stack, I'd lean towards being conservative and still calling this breaking even if the current behavior is potentially confusing.

src/samples.jl Outdated
Comment on lines 399 to 403
If `samples.encoded` is `true`, return a `Samples` instance that wraps

decode(convert(T, samples.info.sample_resolution_in_unit),
convert(T, samples.info.sample_offset_in_unit),
samples.data)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this detail is still important here, since it affects the promotion that happens...

@omus
Copy link
Member Author

omus commented May 4, 2024

I guess I still don't totally see what this gets us over the decode! approach from #141. Even with this, we'll still hit the issue where if the samples are already decoded you wont' get the expected eltype...

I guess the main advantage is that decode(::Samples, ::Type{T}) is no longer (as much of) a lie, but you're still not guaranteed to get T (since we're doing a promote).

I've updated the PR one more time and now decode(::Samples, ::Type{T}) always returns samples with the data element type T which properly fixes #141.

The output of decode(::Samples, ::Type{T}) woudl still change in important ways with this PR: the return value no longer aliases the input on the 0/1 case, and the eltype may be different from teh status quo. Whether that's fixing a bug or is actual breakage is kind of a judgment call; given that the current behavior is documented, that in internal discussions at Beacon this has been generally found to be intentional, and that Onda is deep in our tech stack, I'd lean towards being conservative and still calling this breaking even if the current behavior is potentially confusing.

I did a search through the Beacon tech stack on usages of Onda.decode and only found one usage of decode(::Samples, ::Type{T}). In that case decode(samples, Float32) is used and a comment states they explicitly want Float32 elements so I suspect the new behavior would be preferred if the loaded data actually used Float64 elements.

If we decide to go the breaking route there's a few more things I would change to make the extra work worth it. I'm in favor of making this non-breaking and rolling it back if need be.

Comment on lines +60 to +75
@testset "categorical samples" begin
# Signal must use a resolution of one and a offset of zero
info = SamplesInfoV2(; sensor_type="eeg",
channels=["a", "b"],
sample_unit="unit",
sample_resolution_in_unit=1,
sample_offset_in_unit=0,
sample_type=Int16,
sample_rate=200)
samples = Samples(rand(-Int16(20):Int16(20), 2, 5), info, false)

# Requires decoding
encoded = Samples(samples.data, samples.info, true)
decoded = decode(encoded)
@test eltype(decoded.data) === Int16
@test decoded.data === samples.data
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

regardless of whether we decide to change the behavior, I do think that having an explicit test for this behavior is important since it's something we are relying on for a number of Beacon-internal things.

@omus omus requested a review from jrevels May 7, 2024 13:39
@omus
Copy link
Member Author

omus commented May 7, 2024

@jrevels could we get your review on this change as you are the original author of this code?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Onda.decode(samples, ::Type{T}) does not always return Samples{T}
2 participants