Difference between revisions of "Profiling"

m
(4 intermediate revisions by one other user not shown)
Line 3: Line 3:
 
For those who don't know what is a code profiler, it is a tool that measure effectiveness of the code and outputs statistics about the CPU usage: number of times each function was called, how much time it took to execute, and where was it called from.  
 
For those who don't know what is a code profiler, it is a tool that measure effectiveness of the code and outputs statistics about the CPU usage: number of times each function was called, how much time it took to execute, and where was it called from.  
  
Gideros rpvoides two means of enabling profiling:
+
Gideros provides two means of enabling profiling:
- By launching a profiling session directly through 'Profile' menu item or toolbar icon from Gideros Studio
+
* By launching a profiling session directly through 'Profile' menu item or toolbar icon from Gideros Studio
- By calling Core.profiler* API from lua
+
* By calling Core.profiler* API from lua
  
 
In the former case, Gideros Studio will display the profiling results when the stop button is hit in Gideros Studio toolbar. In the latter, you have to handle profiling data collection and display by yourself.
 
In the former case, Gideros Studio will display the profiling results when the stop button is hit in Gideros Studio toolbar. In the latter, you have to handle profiling data collection and display by yourself.
Note that even if you started profiling through Gideros Studio, you can still use Core.profilier* API within your code.
+
 
 +
'''Note that even if you started profiling through Gideros Studio, you can still use Core.profilier* API within your code.'''
 
Gideros profiling API available from lua is:
 
Gideros profiling API available from lua is:
- Core.profilerStart(), which instructs gideros start collecting profiling data
+
* Core.profilerStart(), which instructs gideros start collecting profiling data
- Core.profilerStop(), which pause collecting
+
* Core.profilerStop(), which pause collecting
- Core.profilerReset(), which clears collected data
+
* Core.profilerReset(), which clears collected data
- Core.profilerReport(), which returns a table containing collected profiling data
+
* Core.profilerReport(), which returns a table containing collected profiling data
 +
 
  
Enough said about the internals, let's see how profiling can help us!
+
;Enough said about the internals, let's see how profiling can help us!
  
 
So suppose you are running the following code:
 
So suppose you are running the following code:
  
 
<source lang="lua">
 
<source lang="lua">
local function myFunction1(a)
+
local function myFunction(a)
 
local function c(a) return math.cos(a*math.pi/180) end
 
local function c(a) return math.cos(a*math.pi/180) end
 
return c(a)+c(a+1)
 
return c(a)+c(a+1)
Line 28: Line 30:
 
and happen to use it very often. It is far from being optimized (on purpose, right) and turns out to be slow, but how can we do better ?
 
and happen to use it very often. It is far from being optimized (on purpose, right) and turns out to be slow, but how can we do better ?
  
Lets ask the profiler how does it run, say we want to run it 1000000 times:
+
Lets ask the profiler how does it run, say we want to run it 100000 times:
  
 
<source lang="lua">
 
<source lang="lua">
for i=1,1000000 do myFunction(i) end
+
for i=1,100000 do myFunction(i) end
 
</source>
 
</source>
  
 
This code outputs:
 
This code outputs:
 +
<table><tr class="theader">
 +
<th class="fnum">#</th><th class="ftime">Time</th><th class="fpct">Ratio</th><th class="fcount">Count</th><th class="fname">Function</th><th class="floc">Location</th></tr>
 +
<tr class="current"><td><a id="f1">[1]</a></td><td>  423</td><td>  0%</td><td>    1</td><td>callFile</td><td>=[C] 00421900(callFile)</td></tr>
 +
<tr class="callee"><td></td><td>  423</td><td>100%</td><td>    1</td><td><a href="#f2">Unknown</a></td><td>@main.lua:0:033FAC64</td></tr>
 +
<tr class="sep"><td colspan=6>&nbsp;</td></tr>
 +
<tr class="caller"><td></td><td>  423</td><td>100%</td><td>    1</td><td><a href="#f1">callFile</a></td><td>=[C] 00421900(callFile)</td></tr>
 +
<tr class="current"><td><a id="f2">[2]</a></td><td>  423</td><td>  4%</td><td>    1</td><td>Unknown</td><td>@main.lua:0:033FAC64</td></tr>
 +
<tr class="callee"><td></td><td>  405</td><td> 96%</td><td>100000</td><td><a href="#f3">myFunction</a></td><td>@main.lua:26:033FD87C</td></tr>
 +
<tr class="sep"><td colspan=6>&nbsp;</td></tr>
 +
<tr class="caller"><td></td><td>  405</td><td>100%</td><td>100000</td><td><a href="#f2">Unknown</a></td><td>@main.lua:0:033FAC64</td></tr>
 +
<tr class="current"><td><a id="f3">[3]</a></td><td>  405</td><td> 35%</td><td>100000</td><td>myFunction</td><td>@main.lua:26:033FD87C</td></tr>
 +
<tr class="callee"><td></td><td>  265</td><td> 65%</td><td>200000</td><td><a href="#f4">c</a></td><td>@main.lua:27:033FC328</td></tr>
 +
<tr class="sep"><td colspan=6>&nbsp;</td></tr>
 +
<tr class="caller"><td></td><td>  265</td><td>100%</td><td>200000</td><td><a href="#f3">myFunction</a></td><td>@main.lua:26:033FD87C</td></tr>
 +
<tr class="current"><td><a id="f4">[4]</a></td><td>  265</td><td> 60%</td><td>200000</td><td>c</td><td>@main.lua:27:033FC328</td></tr>
 +
<tr class="callee"><td></td><td>  107</td><td> 40%</td><td>200000</td><td><a href="#f5">cos</a></td><td>=[C] 6E416FE0(cos)</td></tr>
 +
<tr class="sep"><td colspan=6>&nbsp;</td></tr>
 +
<tr class="caller"><td></td><td>  107</td><td>100%</td><td>200000</td><td><a href="#f4">c</a></td><td>@main.lua:27:033FC328</td></tr>
 +
<tr class="current"><td><a id="f5">[5]</a></td><td>  107</td><td>100%</td><td>200000</td><td>cos</td><td>=[C] 6E416FE0(cos)</td></tr>
 +
<tr class="sep"><td colspan=6>&nbsp;</td></tr>
 +
<tr class="current"><td><a id="f6">[6]</a></td><td>    2</td><td>100%</td><td>  313</td><td>gideros_enterFrame</td><td>=[C] 00423050(gideros_enterFrame)</td></tr>
 +
<tr class="sep"><td colspan=6>&nbsp;</td></tr>
 +
<tr class="current"><td><a id="f7">[7]</a></td><td>    0</td><td>100%</td><td>    1</td><td>broadcastEvent</td><td>=[C] 00423340(broadcastEvent)</td></tr>
 +
<tr class="sep"><td colspan=6>&nbsp;</td></tr>
 +
</table>
  
< [ 6]   3768 100% 1000000 <a href="/profile/main">@main</a>.lua:22
+
This is a list of all functions ran by the code sorted by CPU usage (highest first). The first chunk is about a function called ''callFile''. It is an internal Gideros Function which loads each lua file. In our case it was used to launch main.lua, which corresponds to block [2]. But let's focus on our function ''myFunction'', we can see it is covered by block [3]:
= [  1]   3768 36% 1000000 myFunction <a href="/profile/main">@main</a>.lua:1
+
<table><tr class="theader">
> [  2]   2424 64% 2000000 c <a href="/profile/main">@main</a>.lua:2
+
<th class="fnum">#</th><th class="ftime">Time</th><th class="fpct">Ratio</th><th class="fcount">Count</th><th class="fname">Function</th><th class="floc">Location</th></tr>
--------------------------------
+
<tr class="caller"><td></td><td405</td><td>100%</td><td>100000</td><td><a href="#f2">Unknown</a></td><td>@main.lua:0:033FAC64</td></tr>
< [  1]   2424 100% 2000000 myFunction <a href="/profile/main">@main</a>.lua:1
+
<tr class="current"><td><a id="f3">[3]</a></td><td>   405</td><td> 35%</td><td>100000</td><td>myFunction</td><td>@main.lua:26:033FD87C</td></tr>
= [  2]   2424 63% 2000000 c <a href="/profile/main">@main</a>.lua:2
+
<tr class="callee"><td></td><td265</td><td> 65%</td><td>200000</td><td><a href="#f4">c</a></td><td>@main.lua:27:033FC328</td></tr>
> [  3]   892 37% 2000000 cos =[C] 6E416930
+
</table>
--------------------------------
+
So the function ''myFunction'' took, as expected, most of the processing time. Figures tell us that it was called 100000 times and consumed 405ms on CPU. We also learn that it was called from an unnamed function located at main.lua:0, which is the toplevel lua code of main.lua file, and that all the time (100%) spent in myFunction was due to being called by main.lua.
< [  2]   892 100% 2000000 c <a href="/profile/main">@main</a>.lua:2
+
More interesting, we discover that 65% of the time (265ms) was spent in calls to function ''c'' located at main.lua:27, and that c was called 200000 times, twice per call.
= [  3]   892 100% 2000000 cos =[C] 6E416930
 
--------------------------------
 
= [  4]     0 -1%     0 =[C] 00456FA0
 
--------------------------------
 
= [  5]     0 -1%     0 stop <a href="/profile/profiler">@profiler</a>.lua:7
 
--------------------------------
 
= [  6]     0 -1%     0 <a href="/profile/main">@main</a>.lua:22
 
> [  1]   3768   1% 1000000 myFunction <a href="/profile/main">@main</a>.lua:1
 
--------------------------------
 
= [  7]     0 -1%     0 <a href="/profile/profiler">@profiler</a>.lua:2
 
--------------------------------
 
= [  8]     0 -1%     0 profilerStop =[C] 004536A0
 
--------------------------------
 
 
 
This is a list of all functions ran by the code sorted by CPU usage (highest first). The first chunk is:
 
 
 
< [  6]   3768 100% 1000000 <a href="/profile/main">@main</a>.lua:22 (This is the function that called this one)
 
= [  1]   3768 36% 1000000 myFunction <a href="/profile/main">@main</a>.lua:1 (This is the considered function)
 
> [  2]   2424 64% 2000000 c <a href="/profile/main">@main</a>.lua:2 (This is the function that was called by this function)
 
 
 
So the function 'myFunction' took most of the time. Figures tell us that it was called 1000000 times and consumed 3768ms on CPU. We also learn that it was called from function [6], located at main.lua:22, and that all the time (100%) spent in myFunction was due to being called by main.lua:22.
 
More interesting, we discover that 64% of the time (2424ms) was spent in calls to function [2], named 'c' and located at main.lua:2, and that c was called 2000000 times (twice per call).
 
  
The second chunk, about function [2], that is c@main.lua:2 shows us that 37% of the time was spent in math.cos(), and 63% in the function itself.
+
The next chunk is about function ''c'': 40% of the time of ''c'' was spent in math.cos(), and 60% in the function itself.
  
We now that calling a function is expensive, and that 'c' function is rather simple. Lets try to inline it:
+
We now that calling a function is expensive, and that ''c'' function is rather simple. Lets try to inline it:
  
 
<source lang="lua">
 
<source lang="lua">
local function myFunction2(a)
+
local function myFunction(a)
 
return math.cos(a*math.pi/180)+math.cos((a+1)*math.pi/180)
 
return math.cos(a*math.pi/180)+math.cos((a+1)*math.pi/180)
 
end
 
end
Line 79: Line 84:
  
 
Now profiler report looks like this: (truncated)
 
Now profiler report looks like this: (truncated)
 +
<table><tr class="theader">
 +
<th class="fnum">#</th><th class="ftime">Time</th><th class="fpct">Ratio</th><th class="fcount">Count</th><th class="fname">Function</th><th class="floc">Location</th></tr>
 +
<tr class="caller"><td></td><td>    65</td><td>100%</td><td>100000</td><td><a href="#f2">Unknown</a></td><td>@main.lua:0:033FC4FC</td></tr>
 +
<tr class="current"><td><a id="f3">[3]</a></td><td>    65</td><td> 73%</td><td>100000</td><td>myFunction</td><td>@main.lua:26:033FB670</td></tr>
 +
<tr class="callee"><td></td><td>    17</td><td> 27%</td><td>200000</td><td><a href="#f4">cos</a></td><td>=[C] 6E416FE0(cos)</td></tr>
 +
<tr class="sep"><td colspan=6>&nbsp;</td></tr>
 +
<tr class="caller"><td></td><td>    17</td><td>100%</td><td>200000</td><td><a href="#f3">myFunction</a></td><td>@main.lua:26:033FB670</td></tr>
 +
<tr class="current"><td><a id="f4">[4]</a></td><td>    17</td><td>100%</td><td>200000</td><td>cos</td><td>=[C] 6E416FE0(cos)</td></tr>
 +
</table>
  
< [  5]   912 100% 1000000 <a href="/profile/main">@main</a>.lua:22
+
''myFunction'' is already faster: 65ms now where we had 405ms before.
= [  1]   912 76% 1000000 myFunction <a href="/profile/main">@main</a>.lua:6
 
> [  2]   215 24% 2000000 cos =[C] 6E416930
 
--------------------------------
 
< [  1]   215 100% 2000000 myFunction <a href="/profile/main">@main</a>.lua:6
 
= [  2]   215 100% 2000000 cos =[C] 6E416930
 
 
 
myFunction is already faster: 912ms now where we had 3768ms before.
 
 
We still call math.cos two times. Lets make it (and math.pi) local.
 
We still call math.cos two times. Lets make it (and math.pi) local.
  
 
<source lang="lua">
 
<source lang="lua">
local function myFunction3(a)
+
local function myFunction(a)
 
local cos=math.cos
 
local cos=math.cos
 
local pi=math.pi
 
local pi=math.pi
Line 100: Line 107:
 
and profiler report:
 
and profiler report:
  
< [  6]   798 100% 1000000 <a href="/profile/main">@main</a>.lua:22
+
<table><tr class="theader">
= [ 1]   798 73% 1000000 myFunction <a href="/profile/main">@main</a>.lua:10
+
<th class="fnum">#</th><th class="ftime">Time</th><th class="fpct">Ratio</th><th class="fcount">Count</th><th class="fname">Function</th><th class="floc">Location</th></tr>
> [  2]   216 27% 2000000 cos =[C] 6E416930
+
<tr class="caller"><td></td><td>   60</td><td>100%</td><td>100000</td><td><a href="#f2">Unknown</a></td><td>@main.lua:0:033FAC64</td></tr>
 +
<tr class="current"><td><a id="f3">[3]</a></td><td>   60</td><td> 71%</td><td>100000</td><td>myFunction</td><td>@main.lua:26:033FD1E8</td></tr>
 +
<tr class="callee"><td></td><td>    17</td><td> 29%</td><td>200000</td><td><a href="#f4">cos</a></td><td>=[C] 6E416FE0(cos)</td></tr>
 +
</table>
  
We saved time in myFunction, while cos() took the same time.
+
We saved time in ''myFunction'', while ''cos()'' took the same time.
  
 
What if we use new gideros deg to rad operator instead of math.pi and 180 constant ?
 
What if we use new gideros deg to rad operator instead of math.pi and 180 constant ?
  
 
<source lang="lua">
 
<source lang="lua">
local function myFunction4(a)
+
local function myFunction(a)
 
local cos=math.cos
 
local cos=math.cos
 
return cos(^<a)+cos(^<(a+1))
 
return cos(^<a)+cos(^<(a+1))
Line 117: Line 127:
 
and profiler report:
 
and profiler report:
  
< [  5]   741 100% 1000000 <a href="/profile/main">@main</a>.lua:22
+
<table><tr class="theader">
= [ 1]   741 71% 1000000 myFunction <a href="/profile/main">@main</a>.lua:16
+
<th class="fnum">#</th><th class="ftime">Time</th><th class="fpct">Ratio</th><th class="fcount">Count</th><th class="fname">Function</th><th class="floc">Location</th></tr>
> [  2]   216 29% 2000000 cos =[C] 6E416930
+
<tr class="caller"><td></td><td>   57</td><td>100%</td><td>100000</td><td><a href="#f2">Unknown</a></td><td>@main.lua:0:033FC4FC</td></tr>
 +
<tr class="current"><td><a id="f3">[3]</a></td><td>   57</td><td> 69%</td><td>100000</td><td>myFunction</td><td>@main.lua:26:033FB9DC</td></tr>
 +
<tr class="callee"><td></td><td>    17</td><td> 31%</td><td>200000</td><td><a href="#f4">cos</a></td><td>=[C] 6E416FE0(cos)</td></tr>
 +
</table>
  
 
We saved a few more precious milliseconds!
 
We saved a few more precious milliseconds!
  
 
As you can see, the profiler is an useful tool to spot CPU bottlenecks in your app, and some times a few minor changes in the code can make big differences: avoid table access, avoid function calls, avoid duplicate computations.
 
As you can see, the profiler is an useful tool to spot CPU bottlenecks in your app, and some times a few minor changes in the code can make big differences: avoid table access, avoid function calls, avoid duplicate computations.
 +
 +
 +
'''PREV.''': [[Event_system]]<br/>

Revision as of 11:25, 7 January 2020

Gideros comes with an integrated native lua profiler.

For those who don't know what is a code profiler, it is a tool that measure effectiveness of the code and outputs statistics about the CPU usage: number of times each function was called, how much time it took to execute, and where was it called from.

Gideros provides two means of enabling profiling:

  • By launching a profiling session directly through 'Profile' menu item or toolbar icon from Gideros Studio
  • By calling Core.profiler* API from lua

In the former case, Gideros Studio will display the profiling results when the stop button is hit in Gideros Studio toolbar. In the latter, you have to handle profiling data collection and display by yourself.

Note that even if you started profiling through Gideros Studio, you can still use Core.profilier* API within your code. Gideros profiling API available from lua is:

  • Core.profilerStart(), which instructs gideros start collecting profiling data
  • Core.profilerStop(), which pause collecting
  • Core.profilerReset(), which clears collected data
  • Core.profilerReport(), which returns a table containing collected profiling data


Enough said about the internals, let's see how profiling can help us!

So suppose you are running the following code:

local function myFunction(a)
	local function c(a) return math.cos(a*math.pi/180) end
	return c(a)+c(a+1)
end

and happen to use it very often. It is far from being optimized (on purpose, right) and turns out to be slow, but how can we do better ?

Lets ask the profiler how does it run, say we want to run it 100000 times:

for i=1,100000 do myFunction(i) end

This code outputs:

#TimeRatioCountFunctionLocation
<a id="f1">[1]</a> 423 0% 1callFile=[C] 00421900(callFile)
423100% 1<a href="#f2">Unknown</a>@main.lua:0:033FAC64
 
423100% 1<a href="#f1">callFile</a>=[C] 00421900(callFile)
<a id="f2">[2]</a> 423 4% 1Unknown@main.lua:0:033FAC64
405 96%100000<a href="#f3">myFunction</a>@main.lua:26:033FD87C
 
405100%100000<a href="#f2">Unknown</a>@main.lua:0:033FAC64
<a id="f3">[3]</a> 405 35%100000myFunction@main.lua:26:033FD87C
265 65%200000<a href="#f4">c</a>@main.lua:27:033FC328
 
265100%200000<a href="#f3">myFunction</a>@main.lua:26:033FD87C
<a id="f4">[4]</a> 265 60%200000c@main.lua:27:033FC328
107 40%200000<a href="#f5">cos</a>=[C] 6E416FE0(cos)
 
107100%200000<a href="#f4">c</a>@main.lua:27:033FC328
<a id="f5">[5]</a> 107100%200000cos=[C] 6E416FE0(cos)
 
<a id="f6">[6]</a> 2100% 313gideros_enterFrame=[C] 00423050(gideros_enterFrame)
 
<a id="f7">[7]</a> 0100% 1broadcastEvent=[C] 00423340(broadcastEvent)
 

This is a list of all functions ran by the code sorted by CPU usage (highest first). The first chunk is about a function called callFile. It is an internal Gideros Function which loads each lua file. In our case it was used to launch main.lua, which corresponds to block [2]. But let's focus on our function myFunction, we can see it is covered by block [3]:

#TimeRatioCountFunctionLocation
405100%100000<a href="#f2">Unknown</a>@main.lua:0:033FAC64
<a id="f3">[3]</a> 405 35%100000myFunction@main.lua:26:033FD87C
265 65%200000<a href="#f4">c</a>@main.lua:27:033FC328

So the function myFunction took, as expected, most of the processing time. Figures tell us that it was called 100000 times and consumed 405ms on CPU. We also learn that it was called from an unnamed function located at main.lua:0, which is the toplevel lua code of main.lua file, and that all the time (100%) spent in myFunction was due to being called by main.lua. More interesting, we discover that 65% of the time (265ms) was spent in calls to function c located at main.lua:27, and that c was called 200000 times, twice per call.

The next chunk is about function c: 40% of the time of c was spent in math.cos(), and 60% in the function itself.

We now that calling a function is expensive, and that c function is rather simple. Lets try to inline it:

local function myFunction(a)
	return math.cos(a*math.pi/180)+math.cos((a+1)*math.pi/180)
end

Now profiler report looks like this: (truncated)

#TimeRatioCountFunctionLocation
65100%100000<a href="#f2">Unknown</a>@main.lua:0:033FC4FC
<a id="f3">[3]</a> 65 73%100000myFunction@main.lua:26:033FB670
17 27%200000<a href="#f4">cos</a>=[C] 6E416FE0(cos)
 
17100%200000<a href="#f3">myFunction</a>@main.lua:26:033FB670
<a id="f4">[4]</a> 17100%200000cos=[C] 6E416FE0(cos)

myFunction is already faster: 65ms now where we had 405ms before. We still call math.cos two times. Lets make it (and math.pi) local.

local function myFunction(a)
	local cos=math.cos
	local pi=math.pi
	return cos(a*pi/180)+cos((a+1)*pi/180)
end

and profiler report:

#TimeRatioCountFunctionLocation
60100%100000<a href="#f2">Unknown</a>@main.lua:0:033FAC64
<a id="f3">[3]</a> 60 71%100000myFunction@main.lua:26:033FD1E8
17 29%200000<a href="#f4">cos</a>=[C] 6E416FE0(cos)

We saved time in myFunction, while cos() took the same time.

What if we use new gideros deg to rad operator instead of math.pi and 180 constant ?

local function myFunction(a)
	local cos=math.cos
	return cos(^<a)+cos(^<(a+1))
end

and profiler report:

#TimeRatioCountFunctionLocation
57100%100000<a href="#f2">Unknown</a>@main.lua:0:033FC4FC
<a id="f3">[3]</a> 57 69%100000myFunction@main.lua:26:033FB9DC
17 31%200000<a href="#f4">cos</a>=[C] 6E416FE0(cos)

We saved a few more precious milliseconds!

As you can see, the profiler is an useful tool to spot CPU bottlenecks in your app, and some times a few minor changes in the code can make big differences: avoid table access, avoid function calls, avoid duplicate computations.


PREV.: Event_system