Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rewrite using Polars #56

Open
1 of 2 tasks
blaylockbk opened this issue May 8, 2024 · 2 comments · May be fixed by #58
Open
1 of 2 tasks

Rewrite using Polars #56

blaylockbk opened this issue May 8, 2024 · 2 comments · May be fixed by #58

Comments

@blaylockbk
Copy link
Owner

blaylockbk commented May 8, 2024

I'm inclined to re-write this using Polars. I love Polars!

  • Load data into Polars DataFrames.
  • Timeseries data will return be in one dataframe rather than a list of dataframes. (Use categorical dtype for some columns like STID, TIMEZONE, etc.)
@blaylockbk blaylockbk changed the title Write using Polars Rewrite using Polars Jun 13, 2024
@blaylockbk
Copy link
Owner Author

Note to self:

Saving a JSON copy of the returned data that for 18 stations, all variables for 1 month is ~25 MB on disk. Organizing the data into a Polars DataFrame and saved to Parquet is 131KB.

@blaylockbk
Copy link
Owner Author

blaylockbk commented Sep 6, 2024

I'm making great progress on this. Need a todo list

Code

  • Data will be provided in long format by default. Add optional argument to pivot the data.
  • add optional argument to return data as Pandas data frame (for those users who prefer pandas, but I'm telling you that I am fully on the polars bandwagon; I don't use pandas anymore)
  • add optional argument with_latency
  • basic plotting for summary; seaborn will be an optional dependency

Docs

  • rewrite docs with new examples
  • examples of doing rolling/resample windows
  • examples of pivot long to wide format
  • examples of plotting with seaborn
  • show users how to save to parquet (and the benefits of doing so)
  • rewrite readme

GitHub

  • explain big overhaul to users. The entire package is a breaking change; it's practically a new package. Reasons: improve maintainability, I wanted to learn polars, I am learning class Inheritance, long format data frame makes more sense to me.

@blaylockbk blaylockbk linked a pull request Sep 7, 2024 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant