mercredi 31 mai 2017

VM profiling in Pharo

The Squeak VMProfiler is now available headless in Pharo.
It allows you to profile an expression and get detailed statistics about where the time was spent on the VM side (JIT, interpreter, GC).

It is still in development so feel free to use it and report any bugs. Suggestions and comments are welcome !

Loading the profiler


First, you need to load the project in a Pharo 6 image.
Make sure you are using one of the latest VM, as the profiler uses plugins that were not in the older ones. You can download the latest VM+image using this script :


wget -O- get.pharo.org/60+vmLatest | bash


Then, you can either load the VMProfiler from the calatog browser (World Menu>Tools>Catalog browser) and install the stable version :

Or use the script below :
Gofer new
url: 'http://www.smalltalkhub.com/mc/Pharo/MetaRepoForPharo60/main';
package: 'ConfigurationOfVMProfiler';
load.

((Smalltalk at: #ConfigurationOfVMProfiler) project version: #stable) load

Using the profiler


Profiling an expression


When the configuration is loaded, a workspace should open :

String streamContents: [ :s | VMProfiler headlessSpyOn: [ 1 to: 10000000 do: [ :i | Object new ] ] reportOn: s ]

It basically shows you how to use the profiler : 
  • you spy on the expression you want to profile (here, the expression is  [ 1 to: 10000000 do: [ :i | Object new ] ])
  • the profiling report is then stored as a string (here, s
  • You have to inspect the string to actually read the report
Try and inspect the example from the workspace : you will see the profiling report displayed on screen.

If you want to test it with your own expression, then just type :


 String streamContents: [ :s | VMProfiler headlessSpyOn: [ myExpression ] reportOn: s 

and inspect it. As the profiler is cadenced at 1.3 GHz, make sure your expression runs for quite a long time to actually get a large amount of samples, otherwise the results may not be significant.


Analyzing the report


The report is composed of different parts :  

1- General informations about the profiling
  • VM used and its path 
  • Memory-related data (size of the eden, number of stack pages, code size) 
  • Duration of the profiling, the sampling frequency 
  • Total number of samples (a sample = an instruction pointer) gathered during the profiling

2- The profiling report in itself

This part provides the computed profiling statistics, based on the number of samples and their locations.
It is subdivided in 2 parts :
  • the time spent in the methods generated by the JIT compiler.
For instance, the Behavior>>new method is the one in which we spent the most time. Indeed, over 195 samples, 61 refer to this method
  • the time spent in the c code of the VM (interpreter and GC)
For instance, the copyAndForward method is the one in which we spent the most time. Indeed, over 17 samples, 6 refer to this method.


3- Statistics about the memory


This part deals with the state of the memory after the profiling. It helps detecting potential memory leaks.


4- Statistics about the Garbage collection(s)

This part lists the operations performed by the garbage collector during the profiling. It can help identifying where to tune the GC, especially when a large (and potentially abnormal) amount of time is spent in one of these operations.
 
  • full : the number of full GC performed and their total execution time.
  • scavenges : the number of scavenges (eden to survivor area) performed and their cumulated execution time.
  • tenures : the number of tenures (promotion to old space) performed.
  • root table : the number of root table (old objects pointing at young objects) overflows

5- Statistics about Compiled Code Compactions

This part deals with the number of compiled code compactions and their cumulated execution time. This kind of compaction occurs when the Cog method zone is full. A high (and time consuming) number of compactions may show that you need to increase the Cog Method zone size.


6- Statistics about Events

Like the previous parts, this part identifies other events that can occur during the profiling and can help you identify if something is wrong.
For instance, the size or the number of stack pages can be modified if a high amount of stack overflows is observed.

 

Next steps


As mentioned in my GSoC proposal, there are still plenty of things to do !

First, in the next few weeks, the profiling should get more detailed as the in-development JIT and its speculative inlining will require the mapping of the profiling samples to a range of bytecode to actually get a significant profiling.

Then, I will add a UI to the Pharo VMProfiler.

dimanche 14 mai 2017

Accepted as a Google Summer of Code 2017 student !


It turns out I will be spending my summer coding for the Pharo Consortium this year, which is pretty good news !

I will work on improving the VMProfiler ; here is a small abstract of my proposal :

Improving the VM Profiler

The current VM profiler is a sampling profiler cadenced at 1.3GHz tracking down where the time is spent in the C code of the VM (for the interpreter and the GC) and in the machine code zone (for the code generated by the JIT). The VM profiler cannot track down precisely where the time is spent when executing the code generated by the JIT. It can track down in which methods the time is spent, but it cannot track down in which part of those methods the time is spent.

Goals

  • Port the VM Profiler to Pharo (it only works in Squeak for the moment)
By making sure the UI stays independent and by patching all the uses of the library OSProcess to make it work in both Squeak and Pharo
  • Get a more detailed profiling
By using the JIT API to map machine code instruction pointer to bytecode program counter, to be able in the end to show in which bytecode range of methods the time is spent instead of only in which methods the time is spent.

Expect to find here some detailed posts related to profiling and the things I will work on!