Thursday, July 5, 2012

Another One Bites the Dust

Today I decided I was going to hop onto a game for a bit, or at least I thought.  I stopped my current Xmonad instance and ran my startx-sli script (thank you Nvidia, for making TwinView still hate SLI in Linux nearly a decade after you introduced the feature), only to find my X server hard crash.  It still responded to some basic cursor movements but no new windows would draw.  Being the tinkerer that I am, I decided to to see what updates I've run recently, assuming that I of course broke it.

Well I did have a few updates in the pipeline that needed a few recompilations, one of them being an updated udev (xorg needed to be rebuilt).  However, as I soon discovered, this did not fix things.  I could have sworn I had been running in SLI on the version 302.xx drivers (the module loaded in the new defunct config).  I scoured dmesg and Xorg.0.log to find some cryptic NVRM error codes.  Thinking this is strange, I googled around a bit and found nothing relevant to the problem.

So I decided to run the nvidia-bug-report.sh utility they ship with their drivers.  Well that's when I saw this dreadful message that was grepped out of /var/log/messages and gzipped to be sent to nvidia:

Jun 27 23:29:09 eggsbenedict kernel: [291206.148151] NVRM: GPU at 0000:02:00.0 has fallen off the bus.


And that's when I realized I hadn't actually used the other card in well over a week (I had been doing work with both monitors, instead). I'm pretty sure that Linux watched my card die and didn't even crash the X server, it just kept going without me noticing. I decided to hope that my suspicions were wrong against all odds and rebooted into Windows 7, only to be stuck after the splash screen, where it became apparent to me that it was finally dead.

Dead 8800 GTX

Rig with a new vacant PCI-Ex slot



Sunday, June 3, 2012

Intel TurboBoost and Linux

My most recent rig, a Core i7-3930k, supports the very useful TurboBoost feature from Intel.  This essentially looks at the utilization of each core, bringing the lower utilized cores to a deep C-state when possible.  This effectively creates more thermal headroom to overclock the more active cores to higher frequencies (somewhere around 3.8 GHZ).

To do this, I had to configure my kernel (it is running Gentoo Linux, by the way) to allow for CPU frequency scaling and to utilize P-States.  I also had to enable the TurboBoost feature in the BIOS (not even sure that name is appropriate for modern era motherboards).

Doing all of this was fairly straightforward.  What wasn't straightforward, however, was grabbing the CPU clock frequency at any given interval.  Part of the problem is that the TurboBoost feature doesn't exactly correspond to clock frequency in the traditional ACPI sense.  The measured clock frequency through traditional procfs interfaces and cpu power tools will display the standard (i.e. stock) clock frequency (and below if you use the corresponding governor).  I wanted to -
A.) ensure that Turbo Boost was actually working and the performance gains I was experiencing weren't in fact placebos, and B.) I wanted a mechanism to share this information every 2.5 seconds in xmobar.  

In recent years there have been posts on LKML indicating a utility which is made for just this.  The utility is turbostat (and cpupower, to an extent).  Initial blog posts and LWN posts pointed that it was on gitorious but a more thorough search showed that the source code to these utilities is now being distributed with the kernel source tree.  The code can be found under tools/power/{x86/turbostat,cpupower}.  After compiling these utilities I was able to observe (unfortunately only as root), that my CPU both supported and was utilizing TurboBoost.

The output of turbostat looks a little something like this:


cor CPU    %c0  GHz  TSC    %c1    %c3    %c6    %c7   %pc2   %pc3   %pc6   %pc7
          0.36 2.44 3.20  23.14   0.52   0.00  75.97   0.00   0.00   0.00   0.00
  0   0   1.04 1.99 3.20   1.10   0.99   0.02  96.85   0.00   0.00   0.00   0.00
  0   6   0.12 2.31 3.20   2.01   0.99   0.02  96.85   0.00   0.00   0.00   0.00
  1   1   0.49 1.43 3.20   1.03   0.88   0.00  97.60   0.00   0.00   0.00   0.00
  1   7   0.14 1.70 3.20   1.38   0.88   0.00  97.60   0.00   0.00   0.00   0.00
  2   2   0.74 2.98 3.20  55.78   0.25   0.00  43.22   0.00   0.00   0.00   0.00
  2   8   0.07 1.82 3.20  56.46   0.25   0.00  43.22   0.00   0.00   0.00   0.00
  3   3   0.21 1.77 3.20   0.21   0.02   0.00  99.56   0.00   0.00   0.00   0.00
  3   9   0.06 2.17 3.20   0.35   0.02   0.00  99.56   0.00   0.00   0.00   0.00
  4   4   0.53 3.16 3.20  79.42   0.38   0.00  19.66   0.00   0.00   0.00   0.00
  4  10   0.57 3.56 3.20  79.38   0.38   0.00  19.66   0.00   0.00   0.00   0.00
  5   5   0.12 1.77 3.20   0.32   0.62   0.00  98.93   0.00   0.00   0.00   0.00
  5  11   0.18 2.20 3.20   0.27   0.62   0.00  98.93   0.00   0.00   0.00   0.00

Where it shows corresponding averages for the time spent in a given c-state.  Passing a -s parameter gave instead a summary of all of the cores (and only printed the column headers once).  There were a couple of problems with parsing this, however.

1.) You needed to be root to execute it.
2.) It did not self terminate, it needed a kill signal sent by a keyboard interrupt

While there may have been a way to fork an instance off and grab only two lines with bash, FIFOs, and shell scripts, I took the path of least resistant and wrote a quick C application meant to be run with setuid privileges.  This allowed me to open a pipe and close it off, effectively terminating the continuous output of turbostat (I'd rather have had it work more like iostat and friends where it allows for an argument to specify how many times to print).

Here is the source to my "turboinfo" program:

#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <sys/types.h>

void usage(char *progname)
{
    fprintf(stderr,"Usage: %s [-i <interval>]\n",progname);
    exit(1);
}

int main(int argc, char *argv[])
{
    int arg;
    size_t alloc = 1000;
    uid_t userid = getuid();

    while ((arg = getopt(argc,argv,"")) != -1) {
        switch(arg) {
            default:
                usage(argv[0]);
                break;
        }
    }

    //const char *basecmd = "sudo turbostat -s -i 1 2>&1";
    char ghz[8];
    char *junkbuffer = malloc(sizeof(char)*1000);
    const char *basecmd = "turbostat -s -i 1 2>&1";

    seteuid(0);
    setuid(0);
    FILE *turboPipe = popen(basecmd,"r");
    seteuid(userid);
    setuid(userid);

    /* ignore first line */
    getline(&junkbuffer,&alloc,turboPipe);
    fscanf(turboPipe,"%*s %s %*s %*s %*s %*s %*s %*s %*s %*s %*s\n",ghz);
    free(junkbuffer);

    fprintf(stdout,"%s\n",ghz);
    pclose(turboPipe);

    return 0;
}

After that is was a matter of chown'ing the binary to belong to root and giving it the setuid bit.  I then adjusted my xmobarrc accordingly:

Config { font = "-*-terminus-*-r-*-*-*-*-*-*-*-*-*-u"
       , bgColor = "black"
       , fgColor = "grey"
       , position = Top
       , lowerOnStart = True
       , commands = [ Run Weather "KCVG" ["-t","<tempF>F","-L","54","-H","80","--normal","green","--high","red","--low","white"] 36000
                , Run Com "uname" ["-r"] "kern" 36000
                , Run Date "%m/%d/%y %H:%M" "date" 100
                , Run Com "sh" ["~/bin/mpd.sh"] "mpd" 25
                , Run Com "~/bin/turboinfo" [""] "cpu" 25
                , Run StdinReader
       ]
       ,sepChar = "%"
       , alignSep = "}{"
              , template = "%StdinReader%}{[<fc=#08FFE2>%mpd%</fc>]|%kern%|%cpu%|%KCVG%|<fc=#ee9a00>%date%</fc>"

Here is the end result:
Xmobar displaying clock frequency post TurboBoost for single threaded loads


Edit: An alternative to this approach may have been to write a shell script to execute on startx which is forked and dumps the output to a fifo generated via mkfifo.  The "info" script executed by xmobar would then pull from the FIFO at an interval that was somewhat close to the update interval passed to turbostat. I prefer the ondemand approach, despite it requiring a lower level of implementation.