Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement parser functions for all types #10

Open
alexanderkjall opened this issue Apr 29, 2018 · 6 comments
Open

Implement parser functions for all types #10

alexanderkjall opened this issue Apr 29, 2018 · 6 comments

Comments

@alexanderkjall
Copy link
Collaborator

We need to be able to parse the data that the database returns into java types.

@davecramer
Copy link
Member

davecramer commented Apr 29, 2018 via email

@alexanderkjall
Copy link
Collaborator Author

I was a bit imprecise. I have just implemented that we send describe messages to the server to ask for the return types of queries.

The different types can then be encoded as text or in binary by the server and the default seems to be text. There might be significant performance gain to be had by using binary, specially for some types as bytea, but that can be a future improvement.

We then get sent the datarows from the server, I store the columns in an Map<String, Object> and that map can then be queried by the Collector that the user has supplied through the ResultMap.get method.

I would like to have as few memory allocations as possible through this process, but I don't know what class the user wants to represent the column as when parsing the DataRow message, so thought was to first parse it to an appropriate type to store in the Map<String, Object> map (postgresql int4 -> java Integer for example) and then have a second set of transformations in the get() method.

Another method would be to convert everything to strings, store the original OID from the db and do all the conversions when the user asks for them in get method. This will save an conversion if the user doesn't want the Class that we parse the result into but requires the driver to keep more state about each column and it will be harder to implement the usage of binary in the future.

@cretz
Copy link

cretz commented Apr 30, 2018

There might be significant performance gain to be had by using binary

Not sure the gain would be that significant, but I concur binary should be implemented after text (if at all). I recommend storing the row data as a two dimensional byte array (or you can just keep the entire byte buffer and store offsets, but you have to traverse anyways) and the row metadata as a separate object and query it as you need, with the columns in two forms: a Column[] which lets you get the columns by index and more information on them than just their name, and a Map<String, Integer> which is a map of column names (lower cased to provide case-insensitive lookups) to indices. Sometimes you'll need more than the column name to do the type of conversion the user asks, so don't eagerly just make a Map<String, Object>, wait for them to ask.

Another method would be to convert everything to strings

Don't do this eagerly. Not everyone wants all of their columns.

@cretz
Copy link

cretz commented Apr 30, 2018

Also, if you want to save yourself some time or just need some inspiration (no credit wanted/needed), you can take a peek at Converters.java that has some things like date formats instead of hand typing, DataType.java that has a bunch of parsers implemented for non-standard types, and QueryTest.java which has a bunch of interesting values that are good to test with (tests params, output, arrays, multidimensional arrays, etc too).

@alexanderkjall
Copy link
Collaborator Author

That not all columns will be read by the user is a very good point. I'll refactor the code so that the conversions happen when the get method is called. I think I'll try to store it as the original byte sequence with offsets, there might even be some upside to implement the CharSequence interface to help with the parsing.

@davecramer
Copy link
Member

Binary has a significant performance benefit, I would argue you should prioritize binary and leave text for things like timestamps which have historically been difficult to deal with in binary

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants