Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question about iolists #36

Open
ckampfe opened this issue Dec 14, 2023 · 3 comments
Open

Question about iolists #36

ckampfe opened this issue Dec 14, 2023 · 3 comments

Comments

@ckampfe
Copy link

ckampfe commented Dec 14, 2023

Hey all, thanks for the library. I don't have an issue to report, just a question.

I see that the signature of encode, specifically for a {:binary, data} payload requires that data be a binary, via the is_binary guard in the is_friendly_frame guard:
https://github.com/elixir-mint/mint_web_socket/blob/main/lib/mint/web_socket/frame.ex#L67

Is this requirement for the data to be strictly a binary rather than iodata broadly due to something in the websocket protocol itself? I don't know the inner workings of the websocket protocol and I'm accustomed to using iolists via gentcp, etc., for performance, so I'm just curious if it would ever be possible to send an iolist in this library (or any others).

Thank you!

@the-mikedavis
Copy link
Collaborator

If I remember correctly, iolists don't work well for websocket because iodata must be binaries:

%% erlang.erl in the OTP codebase
%% ..
-type binary() :: <<_:_*8>>.
-type bitstring() :: <<_:_*1>>.
%% ..
-type byte() :: 0..255.
%% ..
-type iodata() :: iolist() | binary().
-type iolist() :: maybe_improper_list(byte() | binary() | iolist(), binary() | []).
%% ..

while the frames are bitstrings:

<<
encode_fin(frame)::bitstring,
reserved(frame)::bitstring,
encode_opcode(frame)::bitstring,
masked?::size(1),
encoded_payload_length::bitstring,
mask || <<>>::binary,
apply_mask(payload, mask)::bitstring
>>

We already send bitstrings through gen_tcp:send/2/ssl:send/2 (indirectly through Mint.HTTP.stream_request_body/3) which says it accepts iodata() rather than iodata() | bitstring() so I wonder if we can trick gen_tcp/ssl into sending an iodata made out of bitstrings? It might not be worth it anyways since we need to convert the payload to a binary in order to mask the data anyways

@ckampfe
Copy link
Author

ckampfe commented Dec 18, 2023

@the-mikedavis Thanks for the explanation, that makes sense.

So, if the payload has to be a binary in order to mask its data then it seems like it probably won't work to use iolists, as whatever payload you pass has to become a binary regardless.

Unless there is some case where it's faster to serialize your payload to iolists with whatever encoding you happen to be using (in my case, CBOR) and then have MintWebSocket call :erlang.iolist_to_binary/1 on it, instead of serializing your payload to a binary directly, and MintWebSocket just sends this payload as it does currently.

I'm not confident that the former would yield higher throughput than the latter, even if it did generate less intermediate garbage or have some other kind of benefit like easier payload construction, etc. Even if there was such a benefit, I think it would be workload dependent, as you'd still have to call :erlang.iolist_to_binary on the payload every single time, regardless. For what it's worth, this is exactly what :gun does, which is why I was also investigating MintWebSocket to see if another approach was possible.

This is an aside but I just want to explain my motivation for opening this issue at all: my application does a fair amount of websocket traffic with CBOR-encoded payloads on pretty limited embedded hardware, to the point where CBOR encoding is a large portion of our benchmarked CPU. The Elixir cbor library encodes by building up binaries, but I was able to rewrite it to encode to iolists, which increased its throughput enough for me to be interested. I was attempting to test it on our production hardware when I discovered that :gun is just calling :erlang.iolist_to_binary/1, negating the throughput increase of faster CBOR encoding. Thinking this was a :gun-specific issue, I tried MintWebSocket, and started getting an error telling me that it couldn't encode an iolist, so here we are.

In any case thanks for the response and for the excellent library, we may still end up using MintWebSocket!

@the-mikedavis
Copy link
Collaborator

the-mikedavis commented Dec 18, 2023

Ah actually I just realized the frames are always binaries:

frame part number of bits
fin 1
reserved 3
opcode 4
masked 1
payload length 7 or 23 or 71
mask 0 or 32
payload always divisible by 8

(so all frames will have bits divisible by 8)

So the only part of switching to iodata that is tricky is adding a new version of the masking function that will work on iodata. Here's what we have for binaries now:

# Mask the payload by bytewise XOR-ing the payload bytes against the mask
# bytes (where the mask bytes repeat).
# This is an "involution" function: applying the mask will mask
# the data and applying the mask again will unmask it.
def apply_mask(payload, mask, acc \\ <<>>)
def apply_mask(payload, nil, _acc), do: payload
# n=4 is the happy path
# n=3..1 catches cases where the remaining byte_size/1 of the payload is shorter
# than the mask
for n <- 4..1 do
def apply_mask(
<<part_key::integer-size(8)-unit(unquote(n)), payload_rest::binary>>,
<<mask_key::integer-size(8)-unit(unquote(n)), _::binary>> = mask,
acc
) do
apply_mask(
payload_rest,
mask,
<<acc::binary, :erlang.bxor(mask_key, part_key)::integer-size(8)-unit(unquote(n))>>
)
end
end
def apply_mask(<<>>, _mask, acc), do: acc

That function could also be rewritten to return iodata - we could build the acc as a list and reverse it at the end instead of appending to a binary.

EDIT: I have some of my thoughts on implementing this here: https://github.com/elixir-mint/mint_web_socket/compare/iodata. It just needs the masking implementation for iodata.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants