Difference between revisions of "Profiling"
(update?) |
m (formatting) |
||
Line 1: | Line 1: | ||
− | Gideros | + | The Ultimate Guide to Gideros Studio |
− | For those who don't know what is a code profiler, it is a tool that | + | __TOC__ |
+ | |||
+ | == Profiling == | ||
+ | Gideros comes with an integrated native Lua profiler. | ||
+ | |||
+ | For those who don't know what is a code profiler, it is a tool that measures code effectiveness and outputs statistics about the CPU usage: number of times each function was called, how much time it took to execute, where it was called from. | ||
Gideros provides two means of enabling profiling: | Gideros provides two means of enabling profiling: | ||
− | * | + | *by launching a profiling session directly through 'Profile' menu item or toolbar icon from Gideros Studio |
− | * | + | *by calling Core.profiler* API from Lua |
In the former case, Gideros Studio will display the profiling results when the stop button is hit in Gideros Studio toolbar. In the latter, you have to handle profiling data collection and display by yourself. | In the former case, Gideros Studio will display the profiling results when the stop button is hit in Gideros Studio toolbar. In the latter, you have to handle profiling data collection and display by yourself. | ||
− | '''Note that even if you started profiling through Gideros Studio, you can still use Core. | + | '''Note that even if you started profiling through Gideros Studio, you can still use Core.profiler* API within your code.''' |
− | |||
− | |||
− | |||
− | |||
− | |||
− | + | Available Gideros profiling API: | |
+ | *Core.profilerStart(), which instructs Gideros to start collecting profiling data | ||
+ | *Core.profilerStop(), which pause collecting | ||
+ | *Core.profilerReset(), which clears collected data | ||
+ | *Core.profilerReport(), which returns a table containing collected profiling data | ||
− | + | == Example == | |
+ | Let's see how profiling can help us, suppose you are running the following code: | ||
<source lang="lua"> | <source lang="lua"> | ||
local function myFunction(a) | local function myFunction(a) | ||
Line 26: | Line 31: | ||
</source> | </source> | ||
− | and happen to use it very often. It is far from being optimized (on purpose, right) and turns out to be slow | + | and happen to use it very often. It is far from being optimized (on purpose, right) and turns out to be slow. How can we do better? |
− | + | Let's ask the profiler how does the code perform, say we want to run it 100,000 times: | |
<source lang="lua"> | <source lang="lua"> | ||
for i=1,100000 do myFunction(i) end | for i=1,100000 do myFunction(i) end | ||
Line 34: | Line 39: | ||
This code outputs: | This code outputs: | ||
− | <table><tr class="theader"> | + | <table> |
− | <th class="fnum">#</th><th class="ftime">Time</th><th class="fpct">Ratio</th><th class="fcount">Count</th><th class="fname">Function</th><th class="floc">Location</th></tr> | + | <tr class="theader"> |
+ | <th class="fnum">#</th> | ||
+ | <th class="ftime">Time</th> | ||
+ | <th class="fpct">Ratio</th> | ||
+ | <th class="fcount">Count</th> | ||
+ | <th class="fname">Function</th> | ||
+ | <th class="floc">Location</th> | ||
+ | </tr> | ||
<tr class="current"><td><a id="f1">[1]</a></td><td> 423</td><td> 0%</td><td> 1</td><td>callFile</td><td>=[C] 00421900(callFile)</td></tr> | <tr class="current"><td><a id="f1">[1]</a></td><td> 423</td><td> 0%</td><td> 1</td><td>callFile</td><td>=[C] 00421900(callFile)</td></tr> | ||
<tr class="callee"><td></td><td> 423</td><td>100%</td><td> 1</td><td><a href="#f2">Unknown</a></td><td>@main.lua:0:033FAC64</td></tr> | <tr class="callee"><td></td><td> 423</td><td>100%</td><td> 1</td><td><a href="#f2">Unknown</a></td><td>@main.lua:0:033FAC64</td></tr> | ||
Line 60: | Line 72: | ||
</table> | </table> | ||
− | This is a list of all functions ran by the code sorted by CPU usage (highest first). The first chunk is about a function called ''callFile''. It is an internal Gideros Function which loads each lua file. In our case it was used to launch main.lua, which corresponds to block [2]. | + | |
− | <table><tr class="theader"> | + | This is a list of all functions ran by the code sorted by CPU usage (highest first). The first chunk is about a function called ''callFile''. It is an internal Gideros Function which loads each lua file. In our case it was used to launch main.lua, which corresponds to block [2]. |
− | <th class="fnum">#</th><th class="ftime">Time</th><th class="fpct">Ratio</th><th class="fcount">Count</th><th class="fname">Function</th><th class="floc">Location</th></tr> | + | |
+ | Let's focus on our function ''myFunction'', we can see it is covered by block [3]: | ||
+ | <table> | ||
+ | <tr class="theader"> | ||
+ | <th class="fnum">#</th> | ||
+ | <th class="ftime">Time</th> | ||
+ | <th class="fpct">Ratio</th> | ||
+ | <th class="fcount">Count</th> | ||
+ | <th class="fname">Function</th> | ||
+ | <th class="floc">Location</th> | ||
+ | </tr> | ||
<tr class="caller"><td></td><td> 405</td><td>100%</td><td>100000</td><td><a href="#f2">Unknown</a></td><td>@main.lua:0:033FAC64</td></tr> | <tr class="caller"><td></td><td> 405</td><td>100%</td><td>100000</td><td><a href="#f2">Unknown</a></td><td>@main.lua:0:033FAC64</td></tr> | ||
<tr class="current"><td><a id="f3">[3]</a></td><td> 405</td><td> 35%</td><td>100000</td><td>myFunction</td><td>@main.lua:26:033FD87C</td></tr> | <tr class="current"><td><a id="f3">[3]</a></td><td> 405</td><td> 35%</td><td>100000</td><td>myFunction</td><td>@main.lua:26:033FD87C</td></tr> | ||
<tr class="callee"><td></td><td> 265</td><td> 65%</td><td>200000</td><td><a href="#f4">c</a></td><td>@main.lua:27:033FC328</td></tr> | <tr class="callee"><td></td><td> 265</td><td> 65%</td><td>200000</td><td><a href="#f4">c</a></td><td>@main.lua:27:033FC328</td></tr> | ||
</table> | </table> | ||
− | So the function ''myFunction'' took, as expected, most of the processing time. Figures tell us that it was called | + | |
− | More | + | |
+ | So the function ''myFunction'' took, as expected, most of the processing time. Figures tell us that it was called 100,000 times and consumed 405ms on CPU. We also learn that it was called from an unnamed function located at main.lua:0, which is the toplevel lua code of main.lua file, and that all the time (100%) spent in myFunction was due to being called by main.lua. | ||
+ | |||
+ | More interestingly, we discover that 65% of the time (265ms) was spent in calls to function ''c'' located at main.lua:27, and that c was called 200,000 times, twice per call. | ||
The next chunk is about function ''c'': 40% of the time of ''c'' was spent in math.cos(), and 60% in the function itself. | The next chunk is about function ''c'': 40% of the time of ''c'' was spent in math.cos(), and 60% in the function itself. | ||
Line 80: | Line 105: | ||
Now profiler report looks like this: (truncated) | Now profiler report looks like this: (truncated) | ||
− | <table><tr class="theader"> | + | <table> |
− | <th class="fnum">#</th><th class="ftime">Time</th><th class="fpct">Ratio</th><th class="fcount">Count</th><th class="fname">Function</th><th class="floc">Location</th></tr> | + | <tr class="theader"> |
+ | <th class="fnum">#</th> | ||
+ | <th class="ftime">Time</th> | ||
+ | <th class="fpct">Ratio</th> | ||
+ | <th class="fcount">Count</th> | ||
+ | <th class="fname">Function</th> | ||
+ | <th class="floc">Location</th> | ||
+ | </tr> | ||
<tr class="caller"><td></td><td> 65</td><td>100%</td><td>100000</td><td><a href="#f2">Unknown</a></td><td>@main.lua:0:033FC4FC</td></tr> | <tr class="caller"><td></td><td> 65</td><td>100%</td><td>100000</td><td><a href="#f2">Unknown</a></td><td>@main.lua:0:033FC4FC</td></tr> | ||
<tr class="current"><td><a id="f3">[3]</a></td><td> 65</td><td> 73%</td><td>100000</td><td>myFunction</td><td>@main.lua:26:033FB670</td></tr> | <tr class="current"><td><a id="f3">[3]</a></td><td> 65</td><td> 73%</td><td>100000</td><td>myFunction</td><td>@main.lua:26:033FB670</td></tr> | ||
Line 89: | Line 121: | ||
<tr class="current"><td><a id="f4">[4]</a></td><td> 17</td><td>100%</td><td>200000</td><td>cos</td><td>=[C] 6E416FE0(cos)</td></tr> | <tr class="current"><td><a id="f4">[4]</a></td><td> 17</td><td>100%</td><td>200000</td><td>cos</td><td>=[C] 6E416FE0(cos)</td></tr> | ||
</table> | </table> | ||
+ | |||
''myFunction'' is already faster: 65ms now where we had 405ms before. | ''myFunction'' is already faster: 65ms now where we had 405ms before. | ||
+ | |||
We still call math.cos two times. Lets make it (and math.pi) local. | We still call math.cos two times. Lets make it (and math.pi) local. | ||
<source lang="lua"> | <source lang="lua"> | ||
Line 100: | Line 134: | ||
</source> | </source> | ||
− | + | Now the profiler report: | |
− | <table><tr class="theader"> | + | <table> |
− | <th class="fnum">#</th><th class="ftime">Time</th><th class="fpct">Ratio</th><th class="fcount">Count</th><th class="fname">Function</th><th class="floc">Location</th></tr> | + | <tr class="theader"> |
+ | <th class="fnum">#</th> | ||
+ | <th class="ftime">Time</th> | ||
+ | <th class="fpct">Ratio</th> | ||
+ | <th class="fcount">Count</th> | ||
+ | <th class="fname">Function</th> | ||
+ | <th class="floc">Location</th> | ||
+ | </tr> | ||
<tr class="caller"><td></td><td> 60</td><td>100%</td><td>100000</td><td><a href="#f2">Unknown</a></td><td>@main.lua:0:033FAC64</td></tr> | <tr class="caller"><td></td><td> 60</td><td>100%</td><td>100000</td><td><a href="#f2">Unknown</a></td><td>@main.lua:0:033FAC64</td></tr> | ||
<tr class="current"><td><a id="f3">[3]</a></td><td> 60</td><td> 71%</td><td>100000</td><td>myFunction</td><td>@main.lua:26:033FD1E8</td></tr> | <tr class="current"><td><a id="f3">[3]</a></td><td> 60</td><td> 71%</td><td>100000</td><td>myFunction</td><td>@main.lua:26:033FD1E8</td></tr> | ||
<tr class="callee"><td></td><td> 17</td><td> 29%</td><td>200000</td><td><a href="#f4">cos</a></td><td>=[C] 6E416FE0(cos)</td></tr> | <tr class="callee"><td></td><td> 17</td><td> 29%</td><td>200000</td><td><a href="#f4">cos</a></td><td>=[C] 6E416FE0(cos)</td></tr> | ||
</table> | </table> | ||
+ | |||
We saved time in ''myFunction'', while ''cos()'' took the same time. | We saved time in ''myFunction'', while ''cos()'' took the same time. | ||
− | What if we use | + | What if we use Gideros deg to rad operator instead of math.pi and 180 constant? |
<source lang="lua"> | <source lang="lua"> | ||
local function myFunction(a) | local function myFunction(a) | ||
Line 118: | Line 160: | ||
</source> | </source> | ||
− | + | Profiler report: | |
− | <table><tr class="theader"> | + | <table> |
− | <th class="fnum">#</th><th class="ftime">Time</th><th class="fpct">Ratio</th><th class="fcount">Count</th><th class="fname">Function</th><th class="floc">Location</th></tr> | + | <tr class="theader"> |
+ | <th class="fnum">#</th> | ||
+ | <th class="ftime">Time</th> | ||
+ | <th class="fpct">Ratio</th> | ||
+ | <th class="fcount">Count</th> | ||
+ | <th class="fname">Function</th> | ||
+ | <th class="floc">Location</th> | ||
+ | </tr> | ||
<tr class="caller"><td></td><td> 57</td><td>100%</td><td>100000</td><td><a href="#f2">Unknown</a></td><td>@main.lua:0:033FC4FC</td></tr> | <tr class="caller"><td></td><td> 57</td><td>100%</td><td>100000</td><td><a href="#f2">Unknown</a></td><td>@main.lua:0:033FC4FC</td></tr> | ||
<tr class="current"><td><a id="f3">[3]</a></td><td> 57</td><td> 69%</td><td>100000</td><td>myFunction</td><td>@main.lua:26:033FB9DC</td></tr> | <tr class="current"><td><a id="f3">[3]</a></td><td> 57</td><td> 69%</td><td>100000</td><td>myFunction</td><td>@main.lua:26:033FB9DC</td></tr> | ||
<tr class="callee"><td></td><td> 17</td><td> 31%</td><td>200000</td><td><a href="#f4">cos</a></td><td>=[C] 6E416FE0(cos)</td></tr> | <tr class="callee"><td></td><td> 17</td><td> 31%</td><td>200000</td><td><a href="#f4">cos</a></td><td>=[C] 6E416FE0(cos)</td></tr> | ||
</table> | </table> | ||
+ | |||
We saved a few more precious milliseconds! | We saved a few more precious milliseconds! | ||
− | As you can see, the profiler is | + | == Conclusion == |
+ | As you can see, the profiler is a useful tool to spot CPU bottlenecks in your app. | ||
+ | |||
+ | Sometimes a few minor changes in the code can make a big difference. In general, avoid table access, avoid function calls, avoid duplicate computations. | ||
'''PREV.''': [[File_system]]<br/> | '''PREV.''': [[File_system]]<br/> | ||
+ | '''END''' |
Revision as of 18:41, 10 May 2020
The Ultimate Guide to Gideros Studio
Profiling
Gideros comes with an integrated native Lua profiler.
For those who don't know what is a code profiler, it is a tool that measures code effectiveness and outputs statistics about the CPU usage: number of times each function was called, how much time it took to execute, where it was called from.
Gideros provides two means of enabling profiling:
- by launching a profiling session directly through 'Profile' menu item or toolbar icon from Gideros Studio
- by calling Core.profiler* API from Lua
In the former case, Gideros Studio will display the profiling results when the stop button is hit in Gideros Studio toolbar. In the latter, you have to handle profiling data collection and display by yourself.
Note that even if you started profiling through Gideros Studio, you can still use Core.profiler* API within your code.
Available Gideros profiling API:
- Core.profilerStart(), which instructs Gideros to start collecting profiling data
- Core.profilerStop(), which pause collecting
- Core.profilerReset(), which clears collected data
- Core.profilerReport(), which returns a table containing collected profiling data
Example
Let's see how profiling can help us, suppose you are running the following code:
local function myFunction(a)
local function c(a) return math.cos(a*math.pi/180) end
return c(a)+c(a+1)
end
and happen to use it very often. It is far from being optimized (on purpose, right) and turns out to be slow. How can we do better?
Let's ask the profiler how does the code perform, say we want to run it 100,000 times:
for i=1,100000 do myFunction(i) end
This code outputs:
# | Time | Ratio | Count | Function | Location |
---|---|---|---|---|---|
<a id="f1">[1]</a> | 423 | 0% | 1 | callFile | =[C] 00421900(callFile) |
423 | 100% | 1 | <a href="#f2">Unknown</a> | @main.lua:0:033FAC64 | |
423 | 100% | 1 | <a href="#f1">callFile</a> | =[C] 00421900(callFile) | |
<a id="f2">[2]</a> | 423 | 4% | 1 | Unknown | @main.lua:0:033FAC64 |
405 | 96% | 100000 | <a href="#f3">myFunction</a> | @main.lua:26:033FD87C | |
405 | 100% | 100000 | <a href="#f2">Unknown</a> | @main.lua:0:033FAC64 | |
<a id="f3">[3]</a> | 405 | 35% | 100000 | myFunction | @main.lua:26:033FD87C |
265 | 65% | 200000 | <a href="#f4">c</a> | @main.lua:27:033FC328 | |
265 | 100% | 200000 | <a href="#f3">myFunction</a> | @main.lua:26:033FD87C | |
<a id="f4">[4]</a> | 265 | 60% | 200000 | c | @main.lua:27:033FC328 |
107 | 40% | 200000 | <a href="#f5">cos</a> | =[C] 6E416FE0(cos) | |
107 | 100% | 200000 | <a href="#f4">c</a> | @main.lua:27:033FC328 | |
<a id="f5">[5]</a> | 107 | 100% | 200000 | cos | =[C] 6E416FE0(cos) |
<a id="f6">[6]</a> | 2 | 100% | 313 | gideros_enterFrame | =[C] 00423050(gideros_enterFrame) |
<a id="f7">[7]</a> | 0 | 100% | 1 | broadcastEvent | =[C] 00423340(broadcastEvent) |
This is a list of all functions ran by the code sorted by CPU usage (highest first). The first chunk is about a function called callFile. It is an internal Gideros Function which loads each lua file. In our case it was used to launch main.lua, which corresponds to block [2].
Let's focus on our function myFunction, we can see it is covered by block [3]:
# | Time | Ratio | Count | Function | Location |
---|---|---|---|---|---|
405 | 100% | 100000 | <a href="#f2">Unknown</a> | @main.lua:0:033FAC64 | |
<a id="f3">[3]</a> | 405 | 35% | 100000 | myFunction | @main.lua:26:033FD87C |
265 | 65% | 200000 | <a href="#f4">c</a> | @main.lua:27:033FC328 |
So the function myFunction took, as expected, most of the processing time. Figures tell us that it was called 100,000 times and consumed 405ms on CPU. We also learn that it was called from an unnamed function located at main.lua:0, which is the toplevel lua code of main.lua file, and that all the time (100%) spent in myFunction was due to being called by main.lua.
More interestingly, we discover that 65% of the time (265ms) was spent in calls to function c located at main.lua:27, and that c was called 200,000 times, twice per call.
The next chunk is about function c: 40% of the time of c was spent in math.cos(), and 60% in the function itself.
We know that calling a function is expensive, and that c function is rather simple. Lets try to inline it:
local function myFunction(a)
return math.cos(a*math.pi/180)+math.cos((a+1)*math.pi/180)
end
Now profiler report looks like this: (truncated)
# | Time | Ratio | Count | Function | Location |
---|---|---|---|---|---|
65 | 100% | 100000 | <a href="#f2">Unknown</a> | @main.lua:0:033FC4FC | |
<a id="f3">[3]</a> | 65 | 73% | 100000 | myFunction | @main.lua:26:033FB670 |
17 | 27% | 200000 | <a href="#f4">cos</a> | =[C] 6E416FE0(cos) | |
17 | 100% | 200000 | <a href="#f3">myFunction</a> | @main.lua:26:033FB670 | |
<a id="f4">[4]</a> | 17 | 100% | 200000 | cos | =[C] 6E416FE0(cos) |
myFunction is already faster: 65ms now where we had 405ms before.
We still call math.cos two times. Lets make it (and math.pi) local.
local function myFunction(a)
local cos=math.cos
local pi=math.pi
return cos(a*pi/180)+cos((a+1)*pi/180)
end
Now the profiler report:
# | Time | Ratio | Count | Function | Location |
---|---|---|---|---|---|
60 | 100% | 100000 | <a href="#f2">Unknown</a> | @main.lua:0:033FAC64 | |
<a id="f3">[3]</a> | 60 | 71% | 100000 | myFunction | @main.lua:26:033FD1E8 |
17 | 29% | 200000 | <a href="#f4">cos</a> | =[C] 6E416FE0(cos) |
We saved time in myFunction, while cos() took the same time.
What if we use Gideros deg to rad operator instead of math.pi and 180 constant?
local function myFunction(a)
local cos=math.cos
return cos(^<a)+cos(^<(a+1))
end
Profiler report:
# | Time | Ratio | Count | Function | Location |
---|---|---|---|---|---|
57 | 100% | 100000 | <a href="#f2">Unknown</a> | @main.lua:0:033FC4FC | |
<a id="f3">[3]</a> | 57 | 69% | 100000 | myFunction | @main.lua:26:033FB9DC |
17 | 31% | 200000 | <a href="#f4">cos</a> | =[C] 6E416FE0(cos) |
We saved a few more precious milliseconds!
Conclusion
As you can see, the profiler is a useful tool to spot CPU bottlenecks in your app.
Sometimes a few minor changes in the code can make a big difference. In general, avoid table access, avoid function calls, avoid duplicate computations.
PREV.: File_system
END