TSC Frequency For All: Better Profiling and Benchmarking

Have you ever tried using LLVM’s X-Ray profiling tools to make some flame graphs, but gotten obscure errors like:

==65892==Unable to determine CPU frequency for TSC accounting.

==65892==Unable to determine CPU frequency.

Or worse, have you profiled every function in an application, only to find the sum of all function runtimes accounted for ~15 minutes of a 20-minute run? Where did those five minutes go!?

Well, we’ve run into both situations, so we built a solution in the form of a Linux kernel module called tsc_freq_khz. We’ll take a quick but deep dive into x86 timers to explain the problem, but first, let’s get to the goods. The tsc_freq_khz module enhances the performance of profiling and benchmarking tools like X-Ray in virtualized environments. No more “Unable to determine CPU frequency” errors!

The tsc_freq_khz module also makes these tools more accurate on newer Intel processors. X-Ray uses the x86 Time Stamp Counter (TSC), a low-latency monotonically increasing clock, to measure event duration, and assumes that TSC frequency is equivalent to maximum clock speed. This assumption is wrong on newer CPUs, and leads to improper time measurement. The tsc_freq_khz module provides a way to read the actual TSC frequency. No more missing minutes in your profiling data!

The tsc_freq_khz module works by exporting the Linux kernel’s internal value of the x86 Time Stamp Counter (TSC) frequency, tsc_khz, to userspace via sysfs. Programs can query this value by reading /sys/devices/system/cpu/cpu0/tsc_freq_khz.

Several open-source projects, like X-Ray and Abseil, already check for the presence of tsc_freq_khz, but until now, that file was only present on Google’s production kernels. The tsc_freq_khz module enables TSC frequency export to userspace for everyone else.

The trouble with timestamps

Before we explain what we did and why this works, let’s do a quick introduction to timestamps on the x86 architecture.

An x86 machine has at least six different ways to measure time:

Each method has unique and subtle flaws that make it completely unworkable for certain applications. This cornucopia of timers is what happens when you maintain 30 years of backwards compatibility and never remove a feature because someone, somewhere, is depending on it.

Despite its many flaws, the TSC is useful for benchmarking and profiling. It has extremely low latency, because the circuitry is directly on the CPU, and it is accessible directly from user-mode applications. Very useful profiling tools, like X-Ray, rely on the TSC for accurate measurements.

However, TSC measures time in ticks, which are not comparable across different processors and are therefore largely meaningless. For profiling and benchmarking, we want to measure time in comparable units, like nanoseconds.

From ticks to nanoseconds

So how many nanoseconds are in a tick? The basic formula for converting some duration of ticks to nanoseconds is simple:

time_in_nanoseconds = (tsc_count_end - tsc_count_start) * tsc_frequency

Unfortunately, determining TSC frequency is difficult, and in some cases, e.g., cloud-based or other virtualized environments, impossible. Until now.

The core issue is that the Linux kernel does not provide a way for applications to know the TSC frequency, although the Linux perf utility does attempt to calculate TSC frequency via Intel PT. There are reasonable arguments for not exposing the value directly, because TSC frequency is fairly obscure, and there are cases where the value is completely meaningless. Until recently, the processor’s maximum clock speed was also an accurate approximation of TSC frequency.

However, this is no longer true. Using maximum clock speed as the TSC frequency will give the wrong results on newer Intel CPUs. This is the cause of the missing minutes when profiling.

Additionally, the maximum clock speed, accessible in Linux via /sys/devices/system/cpu/cpu0/cpufreq/cpuinfo_max_freq, is not available in cloud-based or other virtualized environments. This is the cause of “Unable to determine CPU Frequency” errors.

cpuinfo_max_freq is populated by the cpufreq driver used for frequency scaling—that is, making the CPU run faster or slower depending on power-saving settings. Naturally, frequency scaling is not typically permitted in virtualized environments, because each physical CPU is shared with multiple virtual tenants. Hence, this value is not present.

A hint from Google

Timestamp measurement code in X-Ray refers to a mystery sysfs entry named /sys/devices/system/cpu/cpu0/tsc_freq_khz, which sounds like it provides exactly what we need: TSC frequency in kilohertz. Unfortunately, there are absolutely no references to that file in the Linux kernel source code. What’s going on?

More searching reveals the following hint in the Abseil source code:

  // Google's production kernel has a patch to export the TSC
  // frequency through sysfs. If the kernel is exporting the TSC
  // frequency use that. There are issues where cpuinfo_max_freq
  // cannot be relied on because the BIOS may be exporting an invalid
  // p-state (on x86) or p-states may be used to put the processor in
  // a new mode (turbo mode). Essentially, those frequencies cannot
  // always be relied upon. The same reasons apply to /proc/cpuinfo as
  // well.
  if (ReadLongFromFile("/sys/devices/system/cpu/cpu0/tsc_freq_khz", &freq)) {
    return freq * 1e3;  // Value is kHz.
  }

Jackpot! The comment tells us that:

  • Google runs a custom Linux kernel in production.
  • Google is aware that cpuinfo_max_freq should not be used for benchmarking.
  • The kernel’s calculation of TSC frequency is sane, and
  • Google’s internal kernels have a patch to export TSC frequency via sysfs, but the patch has not been upstreamed to the main kernel tree.

TSC frequency For all!

Why should only Google kernels have access to the TSC frequency? Since the Linux kernel is open source, we can write our own kernel module that does the same thing. Fortunately, the Linux kernel already computes the TSC frequency during boot and stores it in the tsc_khz variable. All we had to do was export the variable via sysfs.

Our kernel module, tsc_freq_khz, does exactly this. It creates a sysfs entry that reads the tsc_khz variable defined by the kernel and exports it via /sys/devices/system/cpu/cpu0/tsc_freq_khz. The module is extremely simple but also quite useful.

Follow the build instructions on Github and test the module by inserting it into your kernel:

$ sudo insmod ./tsc_freq_khz.ko
$ dmesg | grep tsc_freq_khz
[14045.345025] tsc_freq_khz: starting driver
[14045.345026] tsc_freq_khz: registering with sysfs
[14045.345028] tsc_freq_khz: successfully registered

The file at /sys/devices/system/cpu/cpu0/tsc_freq_khz should now be populated. (The values on your system will be different.)

$ cat /sys/devices/system/cpu/cpu0/tsc_freq_khz
2712020

Warning: Please do not use this code in a real production system. While it shouldn’t cause problems, it does make some assumptions, e.g., that CPU0 exists. There are also no sanity checks to warn if the TSC value is nonsensical or unreliable.

Conclusion

Seemingly simple problems like this one can lead us down a fascinating rabbit hole. In time, Google’s internal patches may make it into the Linux kernel, or kernel developers may create their own official patch to export TSC frequency to userspace. Meanwhile, tsc_freq_khz can be a stopgap for those who want accurate measurements with LLVM’s excellent X-Ray profiling tools.

Enjoyed reading this? Well, we thrive on interesting issues and challenging problems. We’d love to work with you to solve your security or software engineering challenges — please contact us.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s