Finding out why your Linux computer performs the way it does has been a hard task. Sure, there is Oprofile, and even ‘perf’ in recent kernels. There is LatencyTOP to find out where latencies happen.
But all of these tools are rather limited when the software stack that has the performance issue is more complex than a single program. The tool that comes closest to being useful is `bootchart‘, but that has a rather limited resolution.
To solve this, I have been working on a new tool, called Timechart, based on ‘perf’, that has the objective to show on a system level what is going on, at various levels of detail. In fact. one of the design ideas behind timechart is that the output should be “infinitely zoomable”; that is, if you want to know more details about something, you should be able to zoom in to get these details.
The rest of this blog post describes some aspects of timechart, using real life examples and screenshots. However, it is really hard to show the power of timechart on such a static page, to get a real feeling of what timechart can show you really ought to try it out yourself.
Timechart basics
The output of the timechart program looks like this:

At the very top, a very short reminder of what each colors means is given.
The second part of the output is the per-cpu information. In the case of this trace, taken on my Penryn based dual core laptop, there are two horizontal bars, one for each core.
The third part of the output is the per-process information, the grey bars in the lower 2/3rd of the image. Each significant process has its own bar.
In the example above, bash was very busy (as indicated by the blue color), and thus also kept one of the CPUs busy at all times (also in blue in the CPU part of the output). Around the 4 second mark, the firefox program had some activity (although it was never really quiet) and the X server later on became active as well.
In the current zoom level it is a bit hard to see, but the yellow pieces in the process bars indicate processes that are waiting for the scheduler to give them a timeslice, while red areas would have indicated that the respective processes are waiting for disk IO.
This view gives us a high level idea of what is going on, but it does not look very detailed. The design philosophy behind timechart is that the output is “infinitely zoomable”. By using the zoom function of your SVG reader, you can zoom into an area of the output that you are interested in. For example, lets look at the Xorg activity to find out what is going on:

Here you can see Xorg starting out sleeping (gray), but at some point activity happens (as indicated by the little green lines; these green lines show communication). Once the communication happens (X11 draw requests), you can see that Xorg alternatingly is running code (blue) or waiting for the CPU scheduler to give it a timeslice (yellow). If you look careful you see a few “2″s in this zoom level already; these denote the cpu number Xorg is executing on.
At this zoom level, there are still activities that are so detailed that they don’t make much sense yet… so of course, we can zoom in even further.
Rather than showing the output of a futher zoom, I welcome you to take your own look around using this example output of timechart.
Scheduler delays
By now, everyone who has followed Linux kernel development in the last few weeks has seen the controversy surrounding the CPU schedule, with the BFS-versus-mainline debate.
The timechart tool visualizes the delays caused by the scheduler using yellow areas in the per process time display. For example, in the image below, you can see that after some scheduler delay, the process runs for a short time, as indicated by the blue area. After a little while, the process is waiting for (disk) IO, likely because it’s reading a file that’s not yet in the page cache. Once the IO is complete (as indicated by the little red dots), the process now ends up waiting for the CPU scheduler (yellow) to execute the program (blue). In the trace below, the program then runs briefly before hitting an IO delay again; in fact there are several “run, wait for IO, wait for scheduler” sequences in the trace below. As it happens, this trace is from the “make” program during a kernel compile.

Communication cost
One of the “hidden killers” of performance is the cost due to communication latencies. The timechart tool tries to visualize these using thin green lines between processes that communicate.
For example, in the picture below a “cc1″ process (top) is communicating with an “as” process (bottom), as part of a kernel compile. The first three green lines show that “cc1″ is communicating (sending data in this case) to the “as” process, which, after each communication needs to wait for the process scheduler (yellow area) before it gets to act on the data. The second three lines show the inverse communication; where “as1″ is waking up “cc1″ (that was waiting for a short time, as indicated by the short grey area the first and third time, the second time the wait was so short that the grey area is invisible). And again, the yellow areas indicate that “cc1″ first has to wait for a time slice before it gets to actually execute (blue).

Other real world examples where I’ve used timechart to find communication latencies is measuring why starting the “hal” program is so slow; it turns out that there are seemingly hundreds of dbus round trips involved in that between various components, and timechart shows this beautifully. (I’ve left the diagram out of this blog post to keep the size reasonable).
Power Management tuning
timechart can also be used as part of the tuning of some of Linux’ power management. Controlling the CPU frequency is perhaps an obscure area of the kernel where only a handful of people are actively working… but as it happens I care about this area and timechart is a very useful tool here. With timechart, we can see when frequency changes happen, what lead up to that change and then what the impact of said change is. For example take the portion of a timechart below:

You can see here that the “claws-mail” program started running on the CPU for some time, when the CPU frequency (indicated with the yellow thick line) was about 1/3rd of the maximum frequency. Pretty soon after that, the cpu frequency gets increased by the kernel to the maximum, and when “claws-mail” is done, the frequency is lowered in two steps. As you can see in the graph, the middle step of this frequency reduction was totally unneeded. This highlights an area of optimization… expect patches for this on the Linux Kernel Mailing list soon.
As with most things in timechart, zooming in will give you additional information:

In the zoomed in graph above you can see that the actual frequency is displayed inside the yellow lines, and you can also see in more detail how much cpu utilization there actually was that lead to the decrease from 2.0 Ghz to 1.6 Ghz.
Inkscape
I recommend the use of the inkscape program to view the output of timechart. I know there are many SVG readers, but I have found that several of them aren’t capable of coping with the high level of detail that timechart provides.
It almost seems that some SVG capable programs have an O(N^2) algorithm, and with timechart, the N gets very high. Others don’t allow you to zoom in far enough to get all the detail that timechart can provide.
That doesn’t mean I’m entirely happy with inkscape; it looks like version 0.46 has at least some performance problems, and also I’d like to have the maximum zoomlevel increased so that it’s possible to zoom in even further.
Naming contest
Several people I talked to don’t like the name “timechart”. Unfortunately, nobody has come up with a better name for the tool yet.
If anyone has an idea for a better name, please let me know…
Very pretty – and what’s wrong with ‘timechart’? It’s pretty descriptive.
Are the green comms lines purely showing timing or are you following the connectivity? It’s pretty hard on a desktop system with dbus, and X stuff flying all over to find out who is waiting for who.
Dave
The green lines connect who is talking to whom. The “flying all over” problem is a tough one, so I show a full line connecting 2 tasks only if they are adjacent, otherwise I show two “half” lines on either side.
I suggest “Penguinoscopy”.
How about ‘lifeline’?
Hi Arjan,
Looks like another nice tool. Last week I was doing something similar – using an oscilloscope. I have two XScale boards talking over ethernet, and throughput is not as high as I’d like, but finding where the time goes is hard. So we wired up a 4-channel scope to the ethernet data enable signals and a GPIO at each end, and what to blame for the slowness quickly became apparent.
One of the best features is that many events are superimposed on the scope screen, with persistence, and you can see the variation and different behaviour modes. You might benefit from somehow folding your tool’s traces so that multiple events are superimposed, somehow.
Could you briefly explain what hooks you use? I considered connecting up a GPIO so that it’s asserted when the processor is idle, or when it’s running process $x, or when it’s in the kernel, but finding all the right places to add that was beyond me.
Cheers, Phil.
Where do I find the tool?
You’re not kidding about SVG. After letting Illustrator CS2 chew on the file for 15 minutes on my Mac (through Rosetta)… it crashed. Couldn’t open the file at all.
Nice tool, hope it can help solving http://bugzilla.kernel.org/show_bug.cgi?id=12309
where do i find the source?
How about “delatency” ?
Eyal.
[...] van de Ven introduces a new tool, called “timechart” on his weblog. Timechart is meant to help visualize and diagnose [...]
The sources were posted as patches to the linux-kernel mailing list. At this point, there are some kernel infrastructure enhancements that timechart needs, so I can’t just provide an “easy download” kind of link…
Nice work! Looks very useful.
Is it possible to see the same information at thread level instead of process level?
What about “timspect” (for timing inspector) or “laspect” (for latency inspector)?
How does this compare to LTTng, and what it the future plans for timechart compared to LTTng?
How about “ChronoGraph” ?
Dunc.
[...] #timechart – another #profiling tool for #Linux http://blog.fenrus.org/?p=5 [...]
What’s the problem with using LTTng and its Gtk based[1] LTTv viewer?
http://lttng.org/
It gives “infinitely zoomable” view of the whole system and copes fine with large data sets.
[1] Not probably interesting to kernel devs, but there’s also an Eclipse version of it in development which is shown in the first presentation here:
http://ltt.polymtl.ca/tracingwiki/index.php/TracingMiniSummit2009
http://eclipse.org/linuxtools/
I have to agree Timechart is a bad name. At first glance I thought it was a personal time management prog and did not read the post.
Just make it more descriptive / specific. How about CPUtimechart or processtimechart. That would remove the personal time management connotations.
have you tried Xara LX as an svg viewer?
River System seems like a good name as it shows the flow, turbulence, and branches of the system or maybe just River.
Nice tool. Gives enough graphical information about scheduling that I really need. Will try definitely try it out tomorrow
Tnx and keep up the good work.
Opera (very good), Firefox (no SMIL), and KHTML/Webkit-based browsers like Safari and Konqueror can do SVG as well. You might want to add an appropriate Content-Type image/svg+xml to your Apache, see also http://wiki.svg.org/SVGZ.
For broadest compatibility, you might want to specify a DOCTYPE at the start of the file (after the XML declaration, which in this case can have standalone=”yes” and is not really needed).
If you set width and height to 100% and specify the design space with a viewBox attribute, browsers will automatically zoom to fill the window at any size.
[...] Timechart:在操作系统中放大 发布人 雪山の小兔 – 分类LINUX, 软件 本站基于CC发布,欢迎转载,请注明: 转载自Osss.cn 开源社区本文链接地址: Timechart:在操作系统中放大Intel开发者Arjan van de Ven正在致力于一款名为Timechart的工具,这款工具可以详细得以图标的形式记录Linux操作系统的性能表现。 Van de Ven,曾致力于开发节能工具Powertop。他希望能提升工具的效率,比方说Oprofile、LatencyTOP和Timechart的Perf。新的Timechart将提供图形结果和提示,它的分析工具会分析所有系统中的进程。 Timechart使用SVG向量格式来为用户提供图像结果,开发者则建议大家使用Inkscape来浏览输出结果。 更多的细节可以到Van de Ven的博客中看到。到现在为止,还没有可供下载的版本。 [...]
Obviously it should be called “showtime”.
Some of the names I could think of,
Perfchart
PerfDchart (Performance Diagnosis chart)
Systemchart
Monperfchart (Monitor Performance chart)
Perfanachart (Performance analysis chart)
Sysperfchart (System Performance chart)
Sysinfochart (System Info chart)
Sysdiagchart (System Diagnosis chart)
Thnx.
[...] van de Ven introduces a new tool, called “timechart” on his weblog. Timechart is meant to help visualize and diagnose [...]
Seems pretty obvious to me. Call it what it is — “timegraph”.
Excellent tool. Now we need a svg shell front end to display a rolling segment of this with zoom buttons. Watching this in realtime would pay off.
Just a quick update: Linus has just merged the timechart patches, so if you run a git snapshot of his tree as of today, just go to the tools/perf directory and type “make”. Then you can do “perf timechart record” and “perf timechart” to get your own svg file….
I second the “showtime” proposition, but I guess that now that Linus has accepted the patches it’s too late…
Just a question – Paul Fox has been trying to port over Dtrace for Linux for some time now (ftp://crisp.dynalias.com/pub/release/website/dtrace).
Would’nt it have been better for linux to complete that work and build Timechart on top of that ?
Since the Dtrace license is incompatible with the license of the Linux kernel, the Dtrace port for Linux will never go into the mainline kernel. Basing technology on something like this, rather than the perf infrastructure which is already in the mainline kernel… does not strike me as a good idea.
’showtime’ likely gets in trouble wrt trademarks…
Unfortunately, one of the web browsers can actually zoom and pan freely into the image.. making them not very useful for viewing the timechart output.
Xara LX crashed… better luck next time
Awsome! Sysadmins around the world will love you!
[...] about the interesting problem of cache misses); bootchart; the kernel function trace; the LLTng, Timechart, an finally the famous [...]
[...] a tool that spits out a graphical view of the time your system is spending doing various things. http://blog.fenrus.org/?p=5. Perf events is in Fedora 12 and so is ftrace. The timechart thing is only in mainline now. Slides [...]
I wonder how much faster it would be to render using a GIS file format such as ESRI shapefile for example rather than SVG.
GIS software are generally optimized to render large amount of vectorial data (using spatial indexing internally to render faster I assume).
I could not find a SVG to SHP converter (there might be one, I only searched for 2 minutes), but perhaps timechart could have different backends to output in SVG or SHP?
qgis can be used on Linux to render SHP files.
hm, ‘perf timechart record’ just gives me the usage information.
Recording it in other ways seems to be working.
though creating the graph loads my cpu infinite for about an hour or so.
Maybe perf needs to be profiled in first place? ;-P
[...] record”를 이용해 기록한 내용을 SVG 그래픽파일로 변환해주는 “Timechart” 도구 등이 그것입니다. Timechart는 bootchart와도 비슷하지만, 커널 [...]
I could not find a SVG to SHP converter But I could not find a SVG to SHP converter where i find these conveter
Why Linux will never go into the mainline kernel. some one tell the reason please