|
|
Subscribe / Log in / New account

A report from OSCON 2007

August 2, 2007

This article was contributed by Donnie Berkholz

O'Reilly's annual OSCON in Portland, Ore., is perhaps the only major conference in North America that spans the entire spectrum of open-source communities. This makes it a great opportunity to learn from people who may be encountering the same sorts of problems in a vastly different environment. Other events such as FOSDEM or LCA already provide this kind of environment, but for those of us who are US-based, it's helpful to have one with a lower travel budget. I highly recommend giving a talk if you're going so you get in free, though, since registration costs hover around US$1000 and up. It's clearly not a nonprofit conference.

Numerous groups met preceding the main part of the conference, one of them a group of people involved with running a variety of free/open-source projects. At the foundations summit, most of the discussion centered around dealing with the issues facing nonprofits, such as trademarks, fundraising and bookkeeping. But in the same way as a full conference, the "hallway track" here was the most useful. As the number of people grows, the discussion gets slower and slower, but meeting the people involved with other foundations is invaluable. The summit ended Tuesday, and next day, the exhibit hall and regular sessions began.

In his session, Arjan van de Ven talked about efforts to reduce power use, focusing on a few main problems to avoid in your code. The first, not surprisingly, was polling. There is no excuse for polling, with the advent of things like inotify. He said, "Frequent polling causes spattergroit."

His second enemy was timers. It costs power to keep moving your CPU in and out of idle states, so you want to group timer events together rather than having them randomly spread throughout time by a number of programs. On the kernel side, you can use round_jiffies() or round_jiffies_relative(), and in userland, you can use glib's g_timeout_add_seconds()not g_timeout_add(). Some work is underway to add this functionality to glibc as well. You don't want the entire Internet doing this at the same time, however, so each computer must group its events at a slightly different time.

Arjan's final enemy was disk I/O. Since disks have moving parts, they consume a lot of power (at least until solid-state disks grow more common). High-speed links such as SATA and SCSI also eat power when not in power-saving mode. Gotchas here include opening files, even when in cache, because of the access time update (use the O_NOATIME flag to open() when possible), and looking for files or directories that don't exist (even when using inotify, this always goes to disk).

A special case of this is media playback. The key is avoiding constant spinups of DVDs as well as hard drives by using large buffers — Arjan suggested 20 minutes of video or a minute of audio. Also, decode in large batches so you can be idle longer.

Tools such as powertop and strace are key in tracking down the culprits. Powertop can tell you where to look, and strace can tell you more about what any programs are doing. Near the end, Arjan showed a graph of how tuning and recent fixes dropped a Fedora 7 default installation from a power consumption of 21W down to about 15.5W. That just a few fixes dropped it by so much shows how broken things were, but we're now on the right track. A good goal is to aim for 50 or less wakeups a second, because getting below that level generally doesn't gain you much more.

A man with the job title "Disruptive Innovator" gave a talk with about 550 slides in 45 minutes. Rolf Skyberg of Ebay applied Maslow's hierarchy of needs to technology to try to explain how users behave. The first level is survival, the second is security, and the third is belonging. Computer programs apparently haven't managed to get any higher up on the scale yet. In terms of programs, survival means the program runs without segfaults; security means the program is useful; and belonging means the program is pretty. The more energy users spend finding the basics (help, logging in, etc.), the less they have to spend doing something useful. But one thing worth remembering is that people using a program may have higher needs than you expected. For example, the iPod isn't just useful, it's pretty. And people really care about that prettiness despite the lack of features like an FM transmitter, a recorder, etc. that many other, less popular MP3 players have.

Luke Kanies talked about Puppet, a server automation tool he wrote in Ruby. It's a replacement for earlier popular tools such as cfengine. He really promoted the architecture, because any component in the entire system can be replaced and reused separately. Puppet's made of three main layers: server, networking and client. The server layer contains a compiler, a file server, a certificate authority and a report handler. The networking is XMLRPC over HTTPS. The client layer includes a resource abstraction layer, transactions and a resource server. Each of these individual components can be ripped out and replaced if you don't like it. You could change the configuration language, use a different method of communication, or whatever else your heart desires.

The resource abstraction layer contrasts the most with other tools such as cfengine. It abstracts all the concepts like "install a package," "add a user," "add a group" and so forth so you can run Puppet on any Linux or other Unix-like OS and retain a simple configuration file without OS-specific details. The layer supports about 10 different distributions and other operating systems, and it's not difficult to add more.

Work is underway to create a library of Puppet config files (or recipes) to reduce all the duplication, and that should greatly ease adoption of Puppet. Puppet seems like a well-thought-out and extensible tool, so it will be interesting to watch where it goes.

Clinton Nixon talked about dealing with legacy PHP code, but many of the points are generally applicable to refactoring any code. His three primary suggestions were to separate the controller and the view, even if you don't have a solid MVC architecture; to call methods instead of including code that runs from the include file; and to get rid of global variables.

His rules for view code were that control structures, printing, and display-specific, unnested functions were allowed, but assignment and other function calls were prohibited. He suggested beginning by drawing a line at the top of the code and adding a comment that says "view code below here," then gradually migrating controller code above the line until you can move it to a separate file. For loops, encapsulate the variables in an object. Once you've gotten to this point, you may find duplicated views that you can factor out.

Untangling a web of included files is a process of figuring out the inputs and outputs, wrapping the entire file in a method, then refactoring. The nice part about this style of refactoring is that the code always works. There's never a point where you check in the code and it's broken.

Finally, he recommended two books: Working effectively with legacy code, by Michael Feathers, and Refactoring by Martin Fowler. Although the Fowler book is a classic, he recommended the newer book by Feathers because it's more approachable.

At the close of the sessions Thursday, Dave Jones gave his now-infamous "User Space Sucks" talk. Since most people have gotten the basic idea of this talk, I'm only going to mention the new information. Dave re-ran his tests a week ago on Fedora 7 to look at disk I/O during the bootstrap process, and he found that it had actually gotten even worse since FC6. Counts of stat(), open() and exec() calls had either increased or stayed the same. But the problem has grown harder, because the offenders no longer stand out in the same way as the originals.

OSCON always provides some entertaining and educational talks, provided you've got a way to get into them. But its free content isn't too shabby either. The exhibit hall, all of the BOFs and parties (of which there are many), and the accompanying OSCAMP (like FooCamp, BarCamp, etc.) and FOSCON (mostly about Ruby) are all gratis. It stands nearly alone in the U.S. as a conference that spans across all of the open-source world, although a niche certainly exists for a lower-margin meeting like FOSDEM or LCA on this side of the ocean.


Index entries for this article
GuestArticlesBerkholz, Donnie


to post comments

A report from OSCON 2007

Posted Aug 3, 2007 0:03 UTC (Fri) by jwb (guest, #15467) [Link] (24 responses)

Looking for files or directories that don't exist? This is 99% or more of what glibc does behind your back. Look at this strace of cat(1):

open("/usr/lib/locale/en_US.UTF-8/LC_IDENTIFICATION", O_RDONLY) = -1 ENOENT
open("/usr/lib/locale/en_US.utf8/LC_MEASUREMENT", O_RDONLY) = 3
open("/usr/lib/locale/en_US.UTF-8/LC_TELEPHONE", O_RDONLY) = -1 ENOENT
open("/usr/lib/locale/en_US.utf8/LC_TELEPHONE", O_RDONLY) = 3

and on and on and on. Executing a subprocess is a veritable ENOENT bonanza as the linker/loader looks in all sorts of improbable places for bits of your program. If looking in the wrong directory for files that don't exist is a power consumption problem, we have a ton of work ahead of us.

A report from OSCON 2007

Posted Aug 3, 2007 1:15 UTC (Fri) by dberkholz (guest, #23346) [Link] (1 responses)

I don't think it would be as problematic upon application startup as during runtime (for example, looking to see whether a file's appeared every 100ms).

Importance of application startup

Posted Aug 3, 2007 3:34 UTC (Fri) by jreiser (subscriber, #11027) [Link]

One dose of 'powertop' or other profiler will find the "sore thumb" applications or usages that poll; but these are rare. Over half of all processes run for 0.1 CPU seconds or less, and for these the "startup" consumes the vast majority of resources. A shell with builtin busybox (mv, ln, cp, rm, date, ls, ...) can be a significant savings. The general case would be a learning shell that could identify and remember the state after startup of a large class of frequently-run applications.

A report from OSCON 2007

Posted Aug 3, 2007 3:05 UTC (Fri) by joey (guest, #328) [Link] (5 responses)

I think the question is, why is your cat looking for locales at all, if it's just catting a file. Mine only looks for locales if I run it with --help or something.

A report from OSCON 2007

Posted Aug 3, 2007 4:48 UTC (Fri) by carenas (subscriber, #46541) [Link]

LANG=C

would get rid of all those extra "locale" file lookups

A report from OSCON 2007

Posted Aug 3, 2007 7:36 UTC (Fri) by tzafrir (subscriber, #11501) [Link] (3 responses)

Actually the help message is surely somethng that needs translation. Hence there is a point in looking at LC_MESSAGES for the translation just for running cat --help .

Error messages cat spits out in case it can't find your file can also be translated.

A report from OSCON 2007

Posted Aug 3, 2007 17:56 UTC (Fri) by ajross (guest, #4563) [Link] (2 responses)

strace -eopen cat /dev/null

It reads the (non-existent) locale files even when no output is generated.

A report from OSCON 2007

Posted Aug 4, 2007 12:10 UTC (Sat) by tzafrir (subscriber, #11501) [Link] (1 responses)

right, so you want to complicate the code of every program:

if(open failed) {
get_translations();
error message
}

print_help_message(){
get_translations();
help message
}

A report from OSCON 2007

Posted Aug 4, 2007 20:36 UTC (Sat) by madscientist (subscriber, #16861) [Link]

That's not required. Every time a program needs to translate a message, it calls a function to do the translation. It should not be too difficult to modify the code for "lazy evaluation", where the locale catalog is not set up until the first time a message needs to be translated, and hide this inside the i18n libraries.

A report from OSCON 2007

Posted Aug 3, 2007 4:15 UTC (Fri) by Nick (guest, #15060) [Link] (1 responses)

Trying to open file names that don't exist should not always go to disk.
The kernel has negative dentry caching, so only the first such access
would hit the disk (no different from opening a file that does exist).

A report from OSCON 2007

Posted Aug 3, 2007 4:56 UTC (Fri) by arjan (subscriber, #36785) [Link]

that's indeed true in theory. In practice it seems that the lifespan of negative dentries is so short that you still go to disk ;(

A report from OSCON 2007

Posted Aug 3, 2007 14:49 UTC (Fri) by drepper (subscriber, #5153) [Link] (12 responses)

The open calls only happen if you or your distribution provider (or both) make mistakes or you are using truly ancient code. On Fedora and RHEL there is exactly one single open call

open("/usr/lib/locale/locale-archive", O_RDONLY) = 3

In fact, the individual files don't even exist anymore.

So, stop complaining if you cannot even keep your machine running correctly.

A report from OSCON 2007

Posted Aug 3, 2007 18:33 UTC (Fri) by jwb (guest, #15467) [Link] (3 responses)

I can see that your reputation for being a flaming jerk is well deserved. Thanks for your practically useless reply and commercial endorsement of your employer's product.

A report from OSCON 2007

Posted Aug 9, 2007 2:25 UTC (Thu) by lysse (guest, #3190) [Link] (1 responses)

Maybe drepper was out of line, but I have to say that from my perspective, this response was completely out of proportion.

A report from OSCON 2007

Posted Aug 9, 2007 8:47 UTC (Thu) by nix (subscriber, #2304) [Link]

Well, it is notable that Ulrich gave no information at all about how to create it (although a one line response would have given all that was needed) and that there is essentially no documentation anywhere that describes it, either.

`Flaming jerk' is over the top. `Unhelpful' or `uninformative' would be more accurate.

A report from OSCON 2007

Posted Aug 10, 2007 11:37 UTC (Fri) by liljencrantz (guest, #28458) [Link]

Pot. Kettle. Black.

A report from OSCON 2007

Posted Aug 3, 2007 21:30 UTC (Fri) by tetromino (subscriber, #33846) [Link] (7 responses)

Ulrich, that's the first time I've ever heard of /usr/lib/locale/locale-archive. None of the distributions I use (Ubuntu and Gentoo) have it. No manpage on any of my systems mentions it. Is there some documentation somewhere for what this file is, what format is it in, what it is used for, how is it written or generated, and how distros (the ones that do not have the main glibc maintainer on their staff) can use it?

A report from OSCON 2007

Posted Aug 3, 2007 23:03 UTC (Fri) by johnkarp (guest, #39285) [Link] (5 responses)

I fixed the issue on my gentoo box by modifying /usr/sbin/locale-gen (its
a script). One of the strings inside has "--no-archive", which I removed.

I reran locale-gen, and the archive was created. The number
of "cat /dev/null" syscalls dropped by 68%, in particular the number
of 'open' calls went from 31 -> 4.

I filed a gentoo bug:
http://bugs.gentoo.org/show_bug.cgi?id=187658

A report from OSCON 2007

Posted Aug 4, 2007 0:14 UTC (Sat) by jwb (guest, #15467) [Link] (3 responses)

On Ubuntu, this is the rational given for using --no-archive:

"The rationale is that with an archive it is not possible to mix system and user defined locales by setting LOCPATH."

Presumably this complaint was raised by some user who wanted to so exactly that thing. On Ubuntu, this can be changed in /etc/belocs/locale-gen.conf

A report from OSCON 2007

Posted Aug 5, 2007 16:51 UTC (Sun) by jbailey (subscriber, #16890) [Link] (2 responses)

Right. In Ubuntu we don't use the locales from the upstream glibc tree on the grounds that they're quite frequently wrong and (justifiably) hard to get updated. Since we already have communities of folks through Launchpad who are doing the translations for the software, we trust them to tell us the locale-specific needs that they have.

There are a few ways that we could fix this better and work on having multiple locale archives or some such, but we just haven't gotten there yet.

Ideally this could be resolved

Posted Aug 9, 2007 2:21 UTC (Thu) by JoeBuck (subscriber, #2330) [Link]

If the translations can be pushed upstream, everyone can benefit and you can use the cache.

A report from OSCON 2007

Posted Aug 9, 2007 8:34 UTC (Thu) by nhippi (guest, #34640) [Link]

> In Ubuntu we don't use the locales from the upstream glibc tree on the grounds that they're quite frequently wrong and (justifiably) hard to get updated. Since we already have communities of folks through Launchpad who are doing the translations for the software, we trust them to tell us the locale-specific needs that they have.

This sounds a lot like you are saying you are trading "high quality and working with upstream" with "whatever we get from users without verifying".

Or maybe let me put it the other way:

If the upstream really is "frequently wrong", despite of being "(justifiably) hard to get updated", how come this "(justification)" does not apply to updates via launchpad? What magic makes launchpad updates so much more high quality that it's worth to fork locales?

> There are a few ways that we could fix this better

Please do. Ubuntu is getting a reputation of doing quick hacks to hide the problem instead of actually fixing the issue.

re: bitten by: http://lkml.org/lkml/2007/1/4/232

A report from OSCON 2007

Posted Aug 9, 2007 3:37 UTC (Thu) by jdub (guest, #27) [Link]

I updated an existing Ubuntu bug with information from this thread, and some strace logs of 'cat /dev/null' with and without a locale-archive.

https://bugs.launchpad.net/bugs/55906

Thanks for raising this!

A report from OSCON 2007

Posted Aug 4, 2007 13:40 UTC (Sat) by csamuel (✭ supporter ✭, #2624) [Link]

You can create the archive yourself by doing (as root):

locale-gen --archive

A report from OSCON 2007

Posted Aug 15, 2007 7:04 UTC (Wed) by set (guest, #4788) [Link]

hmmm... strace of cat for me produces less than 25 lines of output
and only 3 files opened, none of them locale. Probably because I
run gentoo and hate on nls....

A report from OSCON 2007

Posted Aug 3, 2007 16:48 UTC (Fri) by Hanno (guest, #41730) [Link] (9 responses)

> Arjan suggested 20 minutes of video or a minute of audio.

Mixup?

A report from OSCON 2007

Posted Aug 3, 2007 17:19 UTC (Fri) by arjan (subscriber, #36785) [Link] (5 responses)

video tends to come from a cd/dvd (one of those rotating energy slurping thingies) while audio tends to come from the harddisk (or usb). For cd/dvd the "spin the disk up" is a much higher cost so you tend to want to do it much less frequent than for in core storage...

A report from OSCON 2007

Posted Aug 4, 2007 13:56 UTC (Sat) by man_ls (guest, #15091) [Link] (2 responses)

How can you store 20 minutes of video? On DVD this can amount to more than 700 MB, is it practical to cache that much information?

A report from OSCON 2007

Posted Aug 4, 2007 20:21 UTC (Sat) by arjan (subscriber, #36785) [Link] (1 responses)

that depends on how much ram you have (free)....

(and on the actual datarate you have)

A report from OSCON 2007

Posted Aug 5, 2007 1:05 UTC (Sun) by drag (guest, #31333) [Link]

well the problem I have is that it should discriminate based on disk type, not on format.

As often as not I am reading audio from a cdrom and video from the harddrive. :)

A report from OSCON 2007

Posted Aug 9, 2007 12:08 UTC (Thu) by arafel (guest, #18557) [Link]

What you say is true, but it's also true that if you're prepared to buffer 20 minutes of video, buffering the whole audio track (whether MP3 or raw audio) should be possible. What was the reason for saying 1 minute, rather than just 'all of it'...?

A report from OSCON 2007

Posted Aug 14, 2007 8:01 UTC (Tue) by nim-nim (subscriber, #34454) [Link]

However mostoptical drives make a loud noise during spinnup or at maximum spin speed. This is highly conterproductive when reading audio or video (and a huge power drain besides)

It's much better to spin the media at a slow constant rate rather than stop and restart it all the time. Unless you can buffer long periods (at least 15-30 min of content)

video buffering

Posted Aug 5, 2007 5:54 UTC (Sun) by akanaber (subscriber, #23265) [Link] (2 responses)

> > Arjan suggested 20 minutes of video or a minute of audio.
> Mixup?

I'd guess it was "20 seconds of video".

video buffering

Posted Aug 5, 2007 6:05 UTC (Sun) by akanaber (subscriber, #23265) [Link] (1 responses)

> I'd guess it was "20 seconds of video".

Whoops. I just noticed that some of the comments above were from arjan. Never mind, and sorry for the noise.

I was just a bit surprised at the idea of buffering 20 minutes of DVD video. That's certainly a large buffer.

video buffering

Posted Aug 11, 2007 0:43 UTC (Sat) by giraffedata (guest, #1954) [Link]

I can't understand how there could possibly be a rule of thumb for this. Besides the fact that it's far from reliable to assume video is on optical storage and audio is on magnetic storage, there seems to be a huge assumption about how scarce memory is on the system.

If buffering 700MB means you have to push out of memory other stuff so that you then have to spin up a disk and get it back multiple times, it probably doesn't make sense.

But if there is 1400MB of memory essentially unoccupied, stopping at 20 minutes of buffering doesn't make much sense.


Copyright © 2007, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds