You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I want to summarize here problems that I see with current implementation and some ideas how to overcome it.
Behind the scenes data downloading
In current implementation, data is loaded invisibly for the user. Moreover, it is not only loaded invisibly, it also downloads invisibly.
It leads to the following issues:
Unpredictable times of first geolocate call, it can take from milliseconds (actual lookup) to seconds or even minutes (when data is loaded).
Uncontrollable behaviour: user can't choose whether he wants to load data from an existing file, whether he wants to update the database, or even which base to use.
It is hard to switch from IPv4 to IPv6, an application should load both bases behind the scene and then somehow choose which one to use. See Add Support for IPv6 #21
It is hard to change localization, once again, all necessary files should be loaded behind the scene. It generates additional memory pressure. See Localization #22
It is hard to switch from CSV to MaxMindDB, because it is not quite clear which base to use. See Add Maxminddb support #26
Solution to all of these problems is the following methods which are accessible by user:
load: it should accept various parameters and modes. User can choose between local and internet data loading, between different database formats and localization
update!: it should accept parameters similar to `load, but it should validate the current state of the database and update database if new version is available.
geolocate should be changed to geolocate(::DB, ::IP). For convenience, getindex method can be added db[IP] which works as geolocate.
Loaded Data structure and results
In the current implementation DataFrame is used as a storage format, and Dict{String, Any} used as a return query format.
It leads to the following issues
DataFrame is type unstable by construction, so improper use can lead to unnecessary allocations and overall slowness.
Row construction is rather slow
Output is type unstable, making it the reason of slowdown in final application.
Possible solution:
Use StructArray or Vector of GeoResult structs.
Return GeoResult, which should be concretely typed and have a fixed number of fields. Use sentinel values instead of missing data.
The text was updated successfully, but these errors were encountered:
The next version is 0.6 which should include number of breaking changes. The main idea is to make code type-stable and remove Dict(String, Any) construction, which takes ~95% of the time.
Version 0.7 should target improvement of initial load
1. Improve speed of load function
2. Store DB in a binary form in order to provide a "lazy" load.
Version 0.8 may include support for MaxMind database binary support and conversion utilities. This one is questionable since MaxMind binary file is much slower than native Julia structures. In this case, it can be added as one of the features post 1.0
I want to summarize here problems that I see with current implementation and some ideas how to overcome it.
Behind the scenes data downloading
In current implementation, data is loaded invisibly for the user. Moreover, it is not only loaded invisibly, it also downloads invisibly.
It leads to the following issues:
geolocate
call, it can take from milliseconds (actual lookup) to seconds or even minutes (when data is loaded).Solution to all of these problems is the following methods which are accessible by user:
load
: it should accept various parameters and modes. User can choose between local and internet data loading, between different database formats and localizationupdate!
: it should accept parameters similar to `load, but it should validate the current state of the database and update database if new version is available.geolocate
should be changed togeolocate(::DB, ::IP)
. For convenience,getindex
method can be addeddb[IP]
which works asgeolocate
.Loaded Data structure and results
In the current implementation
DataFrame
is used as a storage format, andDict{String, Any}
used as a return query format.It leads to the following issues
DataFrame
is type unstable by construction, so improper use can lead to unnecessary allocations and overall slowness.Row
construction is rather slowPossible solution:
StructArray
orVector
ofGeoResult
structs.GeoResult
, which should be concretely typed and have a fixed number of fields. Use sentinel values instead of missing data.The text was updated successfully, but these errors were encountered: