Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimization with caching #10

Open
pettermahlen opened this issue Jun 11, 2015 · 5 comments
Open

Optimization with caching #10

pettermahlen opened this issue Jun 11, 2015 · 5 comments

Comments

@pettermahlen
Copy link
Member

Instead of loading all the classes in all the artifacts, let's do this:

  • create a cache directory somewhere
  • for each jar file we find, compare the last modified timestamp with the timestamp of the corresponding cache file. If cache file is newer, use that. If cache file is older, checksum of file with cached file checksum metadata. If the checksum matches, use it and update timestamp. Otherwise, rebuild the cache file.

The cache file is a sparkey file that maps class name to raw class-file data. It also contains special things like jar_checksum

Then we need a special lazy artifact object that only parses the class-file data when we need the information, which should be when we first discover that the class is needed in runtime, i.e. when we run the reachability analysis.

@pettermahlen
Copy link
Member Author

From @liljencrantz:

  1. Do we really need the higher performance already? Caching brings with it a host of complexity and bugs. It makes the codebase larger and more unwieldly, making rewrites harder. I'd prefer that we only add caching if and when we get to a point where the performance without caching is unacceptable and there is no way to fix that that doesn't involve caching. It's far from clear to me that we're there already.
  2. If we for some reason need to add caching, I think it's super critical that we version the cache contents not only on jar age but also on validator version, so that when we add new features to the validator we can detect that the cache data is old and throw it out.

@pettermahlen
Copy link
Member Author

From @spkrka:

I have a couple of arguments for caching:

  • I think people will run it less frequently if it's slow, and it already feels unnecessarily slow for me.
  • Most dependencies never change, so the cache hit rate should be very good
  • We can isolate the caching part so that it's invisible to the rest of the code, so it shouldn't add complexity

Yes, we should cache on both code version and artifact version.

@pettermahlen
Copy link
Member Author

From @mattnworb:

The cache file is a sparkey file that maps class name to raw class-file data.

I think we should figure out if the costly part of "loading" rt.jar is 1) decompressing or zip file entries in the jar, 2) reading the bytes off disk (in which case I don't think a sparkey file would help), 3) the parsing work that asm does once it is handed an InputStream for the class to load, 4) and/or what we do once we have the ClassNode.

Of course it could be any combination of these four.

My guess would be that 3 is the costliest part, in which case a cache of class name to parsed ClassNode and/or Artifact seems more advantageous to me.

I think it's super critical that we version the cache contents not only on jar age but also on validator version, so that when we add new features to the validator we can detect that the cache data is old and throw it out.

+1

@pettermahlen
Copy link
Member Author

From @mattnworb:

It would also be useful to have a flag in the plugin for enableArtifactCache so the behavior can be disabled.

@pettermahlen
Copy link
Member Author

From @pettermahlen:

I don't think we should use the last modified timestamp; that might mean we don't refresh caches when we should, since you can't trust those timestamps. Otherwise, I think this could potentially be a good idea.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant