PackageObjectFactory's Class.forName usage

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

PackageObjectFactory's Class.forName usage

Seth Pellegrino
Hello Checkstyle Developers,

Our team is using checkstyle across a fairly intensely modularized project (over 200 POMs and counting). Our build performance is nothing to write home about, as you might expect, but I was surprised to discover that checkstyle was accounting for about a third of our build time on my machine.

My interest piqued, I landed in PackageObjectFactory. We're leveraging ~60 checks across most of our modules, and since the maven-checkstyle-plugin creates a new Checker for every module, we ended up there a lot. If I may summarize the logic in this class, it seems to be behaving like so:

For each suffix in ("", "Check"):
 For each package in ("", ...):
   Call Class.forName

where there's usually a half-dozen (or more) packages that we'll end up scanning through in the worst case.

This algorithm seems to me like an abuse of the class loading infrastructure; misses are rarely cached in class loaders so for each non-hit we'll end up scanning the classpath. So, on average, every time we load a check (the hot path through that class), we expect to scan the class path (num_packages * 3/2) times.

As a short-term solution, I've implemented caching (critically, with negative caching) in the PackageObjectFactory, and I've flipped the load order to look for Check classes first (as this is the more common case). These changes are available on bitbucket: https://bitbucket.org/sethp_jive/checkstyle/commits/fa747d132c52f584c7a85ffadd7fc041c449e80e

As a longer-term solution, it would be better to introduce an authoritative mapping between check name and fully-qualified class name. Then PackageObjectLoader would be greatly simplified in code complexity as well as runtime profile – it would merely expand the requested module name if such a mapping exists and then do a single Class.forName call. That way, class loader's caching of existing classes gets leveraged to the maximal extent, and the error case is the only case that we regularly re-scan the whole class path from beginning to end.

Please take a look and let me know what you think.

Thanks,
Seth
------------------------------------------------------------------------------
How ServiceNow helps IT people transform IT departments:
1. A cloud service to automate IT design, transition and operations
2. Dashboards that offer high-level views of enterprise services
3. A single system of record for all IT processes
http://p.sf.net/sfu/servicenow-d2d-j
_______________________________________________
Checkstyle-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/checkstyle-devel
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: PackageObjectFactory's Class.forName usage

Lars Kühne-3
Hi Seth,

thanks for investigating this. I understand your problem and had a quick look at your patch.

I like the reversal of the "" vs "Check" logic, probably the overwhelming majority of our users never write their own checks.  

However, I'm not sure the static cache is the right solution to the problem you discovered. Actually I would expect to find problems within IDEs, such as the eclipse checkstyle plugin. For example, would it work if multiple projects within an IDE are configured to use different sets of available checks?

As an alternative solution, could we declare our own checks in META-INF/services and try to use the java.util.ServiceLoader to load them before we attempt to call Class.forName? That way we might be able to eliminate the original problem of abusing the class loader infrastructure. It would be great if you could investigate that and compare the runtime effect with your current solution.
 
Regards,
Lars

On Wed, Jun 5, 2013 at 8:09 PM, Seth Pellegrino <[hidden email]> wrote:
Hello Checkstyle Developers,

Our team is using checkstyle across a fairly intensely modularized project (over 200 POMs and counting). Our build performance is nothing to write home about, as you might expect, but I was surprised to discover that checkstyle was accounting for about a third of our build time on my machine.

My interest piqued, I landed in PackageObjectFactory. We're leveraging ~60 checks across most of our modules, and since the maven-checkstyle-plugin creates a new Checker for every module, we ended up there a lot. If I may summarize the logic in this class, it seems to be behaving like so:

For each suffix in ("", "Check"):
 For each package in ("", ...):
   Call Class.forName

where there's usually a half-dozen (or more) packages that we'll end up scanning through in the worst case.

This algorithm seems to me like an abuse of the class loading infrastructure; misses are rarely cached in class loaders so for each non-hit we'll end up scanning the classpath. So, on average, every time we load a check (the hot path through that class), we expect to scan the class path (num_packages * 3/2) times.

As a short-term solution, I've implemented caching (critically, with negative caching) in the PackageObjectFactory, and I've flipped the load order to look for Check classes first (as this is the more common case). These changes are available on bitbucket: https://bitbucket.org/sethp_jive/checkstyle/commits/fa747d132c52f584c7a85ffadd7fc041c449e80e

As a longer-term solution, it would be better to introduce an authoritative mapping between check name and fully-qualified class name. Then PackageObjectLoader would be greatly simplified in code complexity as well as runtime profile – it would merely expand the requested module name if such a mapping exists and then do a single Class.forName call. That way, class loader's caching of existing classes gets leveraged to the maximal extent, and the error case is the only case that we regularly re-scan the whole class path from beginning to end.

Please take a look and let me know what you think.

Thanks,
Seth
------------------------------------------------------------------------------
How ServiceNow helps IT people transform IT departments:
1. A cloud service to automate IT design, transition and operations
2. Dashboards that offer high-level views of enterprise services
3. A single system of record for all IT processes
http://p.sf.net/sfu/servicenow-d2d-j
_______________________________________________
Checkstyle-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/checkstyle-devel


------------------------------------------------------------------------------
Get your SQL database under version control now!
Version control is standard for application code, but databases havent
caught up. So what steps can you take to put your SQL databases under
version control? Why should you start doing it? Read more to find out.
http://pubads.g.doubleclick.net/gampad/clk?id=49501711&iu=/4140/ostg.clktrk
_______________________________________________
Checkstyle-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/checkstyle-devel
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: PackageObjectFactory's Class.forName usage

Seth Pellegrino
Hey Lars,

Thanks for your response. I'm not sure I understand the scenario where the eclipse plugin would break, as the cache is the cache is static per module class loader and only String->Class (which is usually a negative sentinel). Concretely, we only cache the fact that "TreeWalker" is not a defined class in loader L, or "com.puppycrawl.tools.checkstyle.TreeWalker" is loaded for L at memory address 0xdef. The project's configurations drive the keys that we would look up, but the cache's answers will be consistent unless class loader itself is not idempotent, e.g. if asking the same instance "Does class a.b.c.D exist?" returns yes at some point after it returned no (or vice versa), I expect this behavior would produce a broken class loader in just about any application, checkstyle or no, and so I had thought it an acceptable behavior to be wrong in this case. That said, I know little of how the IDE plugins work – would you expound the scenario that concerns you?

As for alternative solutions, I had considered the SPI infrastructure when implementing my solution, but I decided against it because it split responsibility for naming a check into two places, one obvious and one not, and imposed a performance but not correctness penalty for missing the non-obvious definition. Additionally, it felt like an abuse of the service provider metaphor to implement "services" which would only ever have a single provider.

My proposal solution was be to provide an authoritative mapping between check name and class, similar to requiring a FQN for modules but in a backwards-compatible way, e.g.

<module name="TreeWalker">
  <module name="AvoidStarImport"/>
</module>
 
<alias name="TreeWalker" class="com.puppycrawl.tools.checkstyle.TreeWalker"/>
<alias name="AvoidStarImport" class="com.puppycrawl.tools.checkstyle.checks.imports.AvoidStarImport"/>

where we would check the alias before falling back the legacy infrastructure.

This approach at least avoids the abuse of java.util.ServiceLoader where we require it to return a single-item iterator, but still suffers from the same split responsibility problem unless we convince ourselves it's OK to fail fast and not perform the legacy behavior. This will break users who had written <module name="checks.imports.AvoidStarImport"/> , but perhaps this is an acceptable loss? In either case, I suspect the negative caching behavior will prove valuable as long as we support the package searching functionality.

Seth


On Aug 4, 2013, at 2:50 PM, Lars Kühne <[hidden email]>
 wrote:

Hi Seth,

thanks for investigating this. I understand your problem and had a quick look at your patch.

I like the reversal of the "" vs "Check" logic, probably the overwhelming majority of our users never write their own checks.  

However, I'm not sure the static cache is the right solution to the problem you discovered. Actually I would expect to find problems within IDEs, such as the eclipse checkstyle plugin. For example, would it work if multiple projects within an IDE are configured to use different sets of available checks?

As an alternative solution, could we declare our own checks in META-INF/services and try to use the java.util.ServiceLoader to load them before we attempt to call Class.forName? That way we might be able to eliminate the original problem of abusing the class loader infrastructure. It would be great if you could investigate that and compare the runtime effect with your current solution.
 
Regards,
Lars

On Wed, Jun 5, 2013 at 8:09 PM, Seth Pellegrino <[hidden email]> wrote:
Hello Checkstyle Developers,

Our team is using checkstyle across a fairly intensely modularized project (over 200 POMs and counting). Our build performance is nothing to write home about, as you might expect, but I was surprised to discover that checkstyle was accounting for about a third of our build time on my machine.

My interest piqued, I landed in PackageObjectFactory. We're leveraging ~60 checks across most of our modules, and since the maven-checkstyle-plugin creates a new Checker for every module, we ended up there a lot. If I may summarize the logic in this class, it seems to be behaving like so:

For each suffix in ("", "Check"):
 For each package in ("", ...):
   Call Class.forName

where there's usually a half-dozen (or more) packages that we'll end up scanning through in the worst case.

This algorithm seems to me like an abuse of the class loading infrastructure; misses are rarely cached in class loaders so for each non-hit we'll end up scanning the classpath. So, on average, every time we load a check (the hot path through that class), we expect to scan the class path (num_packages * 3/2) times.

As a short-term solution, I've implemented caching (critically, with negative caching) in the PackageObjectFactory, and I've flipped the load order to look for Check classes first (as this is the more common case). These changes are available on bitbucket: https://bitbucket.org/sethp_jive/checkstyle/commits/fa747d132c52f584c7a85ffadd7fc041c449e80e

As a longer-term solution, it would be better to introduce an authoritative mapping between check name and fully-qualified class name. Then PackageObjectLoader would be greatly simplified in code complexity as well as runtime profile – it would merely expand the requested module name if such a mapping exists and then do a single Class.forName call. That way, class loader's caching of existing classes gets leveraged to the maximal extent, and the error case is the only case that we regularly re-scan the whole class path from beginning to end.

Please take a look and let me know what you think.

Thanks,
Seth
------------------------------------------------------------------------------
How ServiceNow helps IT people transform IT departments:
1. A cloud service to automate IT design, transition and operations
2. Dashboards that offer high-level views of enterprise services
3. A single system of record for all IT processes
http://p.sf.net/sfu/servicenow-d2d-j
_______________________________________________
Checkstyle-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/checkstyle-devel

------------------------------------------------------------------------------
Get your SQL database under version control now!
Version control is standard for application code, but databases havent
caught up. So what steps can you take to put your SQL databases under
version control? Why should you start doing it? Read more to find out.
http://pubads.g.doubleclick.net/gampad/clk?id=49501711&iu=/4140/ostg.clktrk_______________________________________________
Checkstyle-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/checkstyle-devel


------------------------------------------------------------------------------
Get your SQL database under version control now!
Version control is standard for application code, but databases havent
caught up. So what steps can you take to put your SQL databases under
version control? Why should you start doing it? Read more to find out.
http://pubads.g.doubleclick.net/gampad/clk?id=49501711&iu=/4140/ostg.clktrk
_______________________________________________
Checkstyle-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/checkstyle-devel
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: PackageObjectFactory's Class.forName usage

Lars Kühne-3
Hi Seth,

On Mon, Aug 5, 2013 at 9:09 PM, Seth Pellegrino wrote:
I'm not sure I understand the scenario where the eclipse plugin would break, as the cache is the cache is static per module class loader and only String->Class (which is usually a negative sentinel). Concretely, we only cache the fact that "TreeWalker" is not a defined class in loader L, or "com.puppycrawl.tools.checkstyle.TreeWalker" is loaded for L at memory address 0xdef. The project's configurations drive the keys that we would look up, but the cache's answers will be consistent unless class loader itself is not idempotent, e.g. if asking the same instance "Does class a.b.c.D exist?" returns yes at some point after it returned no (or vice versa), I expect this behavior would produce a broken class loader in just about any application, checkstyle or no, and so I had thought it an acceptable behavior to be wrong in this case.

OK, thanks. The "static final Map" probably just made brain shut down, so I missed that you were using the classloader as a lookup key. I'll have to take a second look at your patch.

 
That said, I know little of how the IDE plugins work – would you expound the scenario that concerns you?


I am not familiar with the internal operation of eclipse-cs either, and reading your explanation above, it might just work correctly. To be sure we should really give a heads up the the eclipse-cs project before we change our code.

 
As for alternative solutions, I had considered the SPI infrastructure when implementing my solution, but I decided against it because it split responsibility for naming a check into two places, one obvious and one not, and imposed a performance but not correctness penalty for missing the non-obvious definition. Additionally, it felt like an abuse of the service provider metaphor to implement "services" which would only ever have a single provider.

No, not single provider services. What I meant was to use some interface like "Configurable" as a service class and to list all concrete implementations in META-INF/service/com.puppycrawl.tools.checkstye.api.Configurable. Instead of your static cache we could then use ServiceLoader.load(Configurable.class, classloader).iterator() to obtain all implementations and build a non-static cache from there without throwing tons of ClassNotFoundExceptions. For each class packageName.SomeCheck that we hit in the iterator, we could build up multiple cache keys for that class, like "packageName.SomeCheck", "SomeCheck" and "Some". If I understand you correctly, this is pretty close to the longer term solution you proposed in your original email, it just builds up the alias mappings in code, not declaratively.

Lars


------------------------------------------------------------------------------
Get 100% visibility into Java/.NET code with AppDynamics Lite!
It's a free troubleshooting tool designed for production.
Get down to code-level detail for bottlenecks, with <2% overhead.
Download for free and get started troubleshooting in minutes.
http://pubads.g.doubleclick.net/gampad/clk?id=48897031&iu=/4140/ostg.clktrk
_______________________________________________
Checkstyle-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/checkstyle-devel
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: PackageObjectFactory's Class.forName usage

Seth Pellegrino
Hey Lars,

> I am not familiar with the internal operation of eclipse-cs either, and reading your explanation above, it might just work correctly. To be sure we should really give a heads up the the eclipse-cs project before we change our code.

Good idea – I've sent them a message. It looks like they've overridden the PackageObjectFactory themselves, so they will be insulated from these changes, but may run into trouble later.

> OK, thanks. The "static final Map" probably just made brain shut down, so I missed that you were using the classloader as a lookup key. I'll have to take a second look at your patch.

Thanks – as I mentioned, even if we do implement a better long-term solution I think these changes will be useful as long as we care to maintain backwards compatability.


> No, not single provider services. What I meant was to use some interface like "Configurable" as a service class and to list all concrete implementations in META-INF/service/com.puppycrawl.tools.checkstye.api.Configurable. Instead of your static cache we could then use ServiceLoader.load(Configurable.class, classloader).iterator() to obtain all implementations and build a non-static cache from there without throwing tons of ClassNotFoundExceptions. For each class packageName.SomeCheck that we hit in the iterator, we could build up multiple cache keys for that class, like "packageName.SomeCheck", "SomeCheck" and "Some". If I understand you correctly, this is pretty close to the longer term solution you proposed in your original email, it just builds up the alias mappings in code, not declaratively.

Ah, now I understand. That's a nice improvement on my longer-term proposal, thanks for the feedback!

Seth
------------------------------------------------------------------------------
Get 100% visibility into Java/.NET code with AppDynamics Lite!
It's a free troubleshooting tool designed for production.
Get down to code-level detail for bottlenecks, with <2% overhead.
Download for free and get started troubleshooting in minutes.
http://pubads.g.doubleclick.net/gampad/clk?id=48897031&iu=/4140/ostg.clktrk
_______________________________________________
Checkstyle-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/checkstyle-devel
Loading...