Poking at toybox today, porting over my old bunzip2 engine. I'm using an old snapshot of it and bringing it up to date from there, and wow it produces a lot of warnings under gcc 4.1. All of which are crap, of course. (When I feed the compiler -funsigned-char, telling me that "char" and "unsigned char" differ in signedness is STUPID. Also, under those circumstances array index being "char" isn't something to warn about when you'll accept unsigned char.)
But that's not the worst of it. By no means. Those I can work around (Removing all "unsigned" declarations from char variables and -Wno-char-subscripts respectivey.) But for some of these, gcc's continuing lack of orthogonality makes it impossible to fix one thing without breaking another. (It's too bad the FSF controls such an important piece of software as gcc; they're really, really bad at maintaining software.)
In 4.1.1 they glue together warnings they can comptently produce ("is used uninitialized") with warnings that they can't competently produce ("may be used uninitialized") and regularly spit out about perfectly valid code. The only way to get rid of the spurious "may be used uninitialized, but we're not actually sure" warnings is to feed it -Wno-uninitialized, which chops out the "this is definitely used uninitialized" warnings too, which are real.
The Linux kernel guys have been struggling with this one since 4.1 came out, and it's not easy to fix. You can't just run the output through grep -v because then you get "lib/bunzip.c: In function ‘read_bunzip_data’:" lines hanging by themselves in the output, and there's nothing in there you can easily hook on. You have to know where they occur, so you need something that looks at multiple lines and figures out which ones are superfluous based on their position.
The solution I've come up with so far is:
gcc $BLAH 2>&1 | sed -n -e '/may be used uninitialized/{s/.*/\n/;h;b};1{x;b};: print;x;/\n/b thing;p;: thing;${x;p}' >&2
Which is a bit rickety for my tastes. I can also patch gcc to rip out broken code, in this case gcc/tree-ssa.c, stub out the function warn_uninitialized_phi(). But patching gcc to remove broken code is like trying to waterproof the ocean. GCC has some very useful bits surrounded by layer upon layer of sheer crap...
Hey, there is a way to turn down the backlight on my dell thingy. The function key plus cursor down. Good to know. (We're on the car trip back from Eric and Cathy's...)
So I took the squashfs patch out of the kernel so I could build it on armv4l again. (I know the bug is in gcc, but the squashfs patch is triggering it.) And this gave me an armv4l root filesystem I could in theory chroot into.
The problem is, qemu's application support hasn't got any built in chroot functionality that I'm aware of. If I chroot into the arm filesystem before running qemu-arm, then qemu (and the shared libraries it links against) need to be in the new root filesystem. I tried building a static version of busybox chroot for qemu, but it says it can't exec the program I feed it. I don't know if this is a problem with the arm filesystem or qemu.
What I really need to do is package it up into an ext2 image, but to do that I need gene2fs, and since I've been meaning to add mke2fs and gene2fs to toybox anyway, this seems like a good time to take a stab at that.
At Eric and Cathy's in Malvern. Spent the day with them, lunch at Magnolia Cafe (not the one in Austin, this is a bookstore in... Exton, I think) where I got Peter and the Starcatchers. Just finished it, and since Eric and Cathy are off at the game store with Garrett and Fade's in game, I've finally fired up my laptop to upload the work I did yesterday in the car and maybe poke at my email a bit.
And of course something I wanted to test is if the new #includes work. The new firmware linux pages use server side includes, which I can't view from my laptop because file:// won't do SSI. But I forgot, the router Eric has is this brain-damaged thing that won't show me landley.net from the inside. (Apparently asking for the public-side IP and expecting that to be port-forwarded back into one of the machines behind it is asking too much for its' little brain. It forwards ssh just fine, but when the wireless is asking for http from this meachine, obviously it wants to talk to the administrative CGI interface thing. Talking to the web pages our static IP serves only works from _outside_ the firewall.)
The fix? Dial out to a machine at work with ssh -X where I fire up konqueror and have the display happen here on my laptop through X11 forwarding. (It's astounding how much of modern technology relies on exactly that sort of insane workaround...)
By the way, if you ever need a chunk of technobabble, this gets spit out in the middle of the gcc build, right before it starts eating insane quantities of memory:
Automaton `athlon_fp' 15522 NDFA states, 99908 NDFA arcs 15522 DFA states, 99908 DFA arcs 463 minimal DFA states, 3038 minimal DFA arcs 273 all insns 21 insn equivalence classes 3057 transition comb vector els, 9723 trans table els: use comb vect 3057 state alts comb vector els, 9723 state alts table els: use comb vect 9723 min delay table els, compression factor 1 17533 all allocated states, 102661 all allocated arcs 32710 all allocated alternative states 6256 all transition comb vector els, 16780 all trans table els 6256 all state alts comb vector els, 16780 all state alts table els 16780 all min delay table els 0 locked states num transformation: 0.009999, building DFA: 8.076774 DFA minimization: 0.386941, making insn equivalence: 0.000999 all automaton generation: 8.586694, output: 0.057990
I believe I understand about 5% of that. I suspect they output it more because they think it looks cool than for any actual digagnostic purpose. Oh well.
So there's still some cleanup to do in the native build environment. The dynamic linker is still linking stuff with the loader pointed at /tools, and ldd is installed in /tools/usr. I've got an environment variable for that, but what's a good default?
Longish car ride with Garrett and Fade, off to visit Eric. Redid the website and overhauled chunks of documentation. I should probably poke at toybox again for a bit, but not quite yet...
Merry day after the day after christmas!
The 64-bit paper Eric and I spent ~6 months working on made slashdot, which I suppose is good. The comments are universally unhelpful (even the highly moderated ones seem to be from people who didn't bother to read the actual article), but it's resulted in a lot of email: to Eric, which he sometimes forwards to me when it's a part I wrote. (Considering I wrote over 2/3 of it, this has come up more than once.) Apparently, my web page is in a bad enough state that people can't find my email address on it. I should fix that.
The whole point of the paper (from my perspective) was to establish that an important transition is coming up, it will occur on a hard deadline, it has historically been a winner-take-all situation, and that there are certain requirements to be a candidate. I just wanted to showcase the problem. Eric's more of a completist than I am, he wants to present a complete solution and wrap a bow around it and go "here". The problem is, if people are inclined to argue about the solution, I don't want them to dismiss the importance (or even existence) of the problem.
This one isn't even really a problem, it's an opportunity. It only becomes a problem if it's and opportunity we're failing to exploit and handing over to someone else who will.
Sigh.
Beating on the uClibc build some more. I could just use sed on the libc.so linker script to rip the paths out, but I want to figure out why they're there. Looking at the makefile it looks like they shouldn't be there, and yet the result of the build has them. Weird.
Strangely, when I extract the uClibc snapshot and build it by myself, the lib/libc.so it creates doesn't have this. Is it being added during the install? Yes, yes it is. The breakage was added in svn 11654, apparently intentionally. I wonder why? Ok, whip up a quick patch to remove it...
Sigh. I keep finding new and exciting ways to break FSF software. They're such brittle programs. This time, I broke bash. (Add a third line to config.cache to fix it.)
Ok, now I have a native build environment with a toolchain that works, but always static links the executables. Well, it's an improvement... It's not copying libc.so. Try the patch again...
Ha! And there was much rejoicing.
Merry day after christmas!
So the current firmware linux version is building a complete i686 uClibc native development environment, which I can chroot into and get a shell prompt for (chroot build/mini-native-i686/tools/bin/env PATH=/tools/bin /tools/bin/sh) and everything's linked against uClibc properly. But for some reason when I build "hello world" gcc is trying to link against libc.so.0 at the absolute path to where I built the thing in my home directory. How is it finding this path?
Oh wonderful. All of the uClibc shared libraries in the native build environment are linked to the build-time absolute path of any other libraries they depend on. How the heck did that happen? (Remember how I wanted to get a copy of ldd working for the cross-compiler? Yeah, now you know why...) I assume that the only reason they work to chroot with is they're already loaded when it goes to look for them, so it doesn't doesn't to instantiate a new copy, and thus doesn't notice that they're _not_there_ in the chroot environment. (Anybody who thinks a chroot jail provides anything approaching isolation for security purposes is woefully misinformed.)
Ok, so how did the native environment get screwed up during the build? I'd guess that the wrapper providing rpath and rpath-link with absolute paths is a bad idea. That's behavior from the wrapper I inherited, and I have no idea what it was trying to accomplish. (What do those _do_? Time to read documentation...)
Ok, at least part of it is that RUNTIME_PREFIX and DEV_PREFIX aren't just install targets but get wired into the build. Hmmm... Testing, it seems that the string fed to RUNTIME_PREFIX winds up in the .so files. Because it's linking against the dynamic linker (ld-uClibc.so.0) which is a particular piece of brain damage TLS came up with. That's kind of evil and nasty.
Merry christmas again!
So gcc's ./configure is using an undocumented feature of bash. Gee, what a surprise. Apparently ${CC-cc} is a synonym for ${CC:-cc}. Good to know.
Ha! I figured out the correct way to lie to gcc! (If you want to get any piece of FSF software to work properly, you have to lie to it to bypass its stupid built-in assumptions. This is why I made busybox sed respond to --version with "this is not gcc 4.0"; it made binutils ./configure work. The hard part is figuring out _how_ to lie to it so it does what you want.)
In the case of cross-compiling gcc from i686 to i686, all you have to do is make the tuples different: if the target is i686-unknown-linux-gnu then make the build machine something like "i686-walrus-linux-gnu". Then it can't tell they're the same, and thus it cross-compiles.
On a totally unrelated note: I've discovered that the whole dark text on white background vs white text on dark background thing boils down to how bright or dark the room you're in is. In a brightly lit room, dark text on a white background is easier to read. In a dark room, a white background means you're staring into a bright light, just like the classic interrogation technique. (Pop quiz: whey they tie the gumshoe to a chair, what do they stick in his face? A desk lamp. Why? Try using kmail with the lights off for five minutes.)
I mention this because this long car trip seems like an excellent time to catch up on linux-kernel postings, except that the sun went down at 5 pm and it's dark in the car, and the white background of kmail gets really unpleasant after a minute or two. Konsole has settings->schema, but there doesn't seem to be anything like that for kmail. (I could set a dozen different colors manually, and then set them back when I'm in a bright room again, but that's just silly. And if there's a way to turn down the backlight on my dell laptop, I dunno what it is.)
Just another little way the Linux desktop experience continues to suck. News to nobody...
Merry christmas everybody. I'm off to Tracy's christmas party in Michigan. Back in a couple days...
(And since I have the chargy thing from the last trip to michigan, I can use my laptop in the car! Probably won't be able to update the web version until I get back, but this is one more reason I really love Mercurial. I can check stuff in during a 5 hour car trip without web access. Yes, this is more common for me than you'd think. Subversion really cramped my style...)
Renamed the "build/temp" directory to "build/temp-$ARCH", which means I can build for different architectures at the same time, using the same scripts and the same source tarballs. (The various destination directories were already labeled with the architecture, but not the working directory.)
Do I have enough memory to do two builds at once? My laptop has 512 megs and a 750 meg swap file on top of that, but building current glibc is a serious pig. I think as long as the two build don't try to do genattrtab at the same time, it'll probably work out. We'll see.
Heard back from the guy the GCC 4.1.1 bug is assigned to. There's no patch out there I can apply. (Debian's bug tracker is wrong. Big surprise.) As to whether or not there will _be_ a 4.1 patch, or whether I should either wait for 4.2 (which already has this fixed) or revert to 4.0 (which didn't have this problem), he said "The 4.1 branch is certainly in an uncertain state..." and left it at that.
So I'm beating on i686. Making an x86_glibc->x86_uClibc cross-compiler wasn't so hard, but making a _native_ compiler (x86_uClibc->x86_uClibc) is being a pain because binutils and gcc are nuts. (They're both x86, therefore you're not cross compiling, therefore we won't use the toolchain you told us to but will instead use the host gcc.)
So I've added force_cross_compile and block_host_toolchain functions to include.sh. (Which turn out to be of dubious utility, but at least help in debugging the darn problems...)
Squashfs 3.1's most recent patch (for 2.6.18) won't apply to the 2.6.19 kernel. (Well, the patch applies but then the build breaks, because struct inode changed...) The sourceforge site doesn't have an updated patch, but the CLFS guys have one.
My download script doesn't know how to deal with something like that. It downloads source tarballs, not patches to them, and it would have to override the horrible name wget wants to save the file under (which is everything after the last slash). I'm not that interested in teaching it because patches are by their nature small and transitory. This is big as patches go, but I think I'm still comfortable sucking it into my build.
Oh great:
CC fs/squashfs/inode.o fs/squashfs/inode.c: In function 'get_cached_fragment': fs/squashfs/inode.c:516: fatal error: internal consistency failure compilation terminated.
I have no _idea_ how to fix that one. I think it's a gcc internal error. (Line 516 is the closing curly bracket of a function...) CTRL-ALT-Google and... Yup, it's a gcc bug. Debian's bugzilla says "fixed upstream" and points at that gcc entry. But if there's an actual _patch_ attached to that entry, I can't find it... But I can email the guy the bug's assigned to...
Screw it. I've been meaning to poke at a couple other architectures, and since gcc 4.1.1 is simply broken on arm...
Things I think Joe (the new CEO at timesys) should probably read:
Joel Spolsky's article Camels and Rubber Duckies, on pricing software and market segmentation. Important insight: it makes no sense to price software between $1000 and $75,000, because once you're up past an individual manager's purchasing authority, devoting a full-time salesbeing to shepherd the purchase through the approvals process costs $50,000. This is a big hole, and you're either above it, or below it. Montavista is above it. Timesys is below it. (In reality, you can sometimes go up maybe as far as $3000. It depends on the customer, and where the individual manager's purchasing authority is. Think of how much server hardware costs, then aim around there.)
Joel Spolsky's article Amazon vs Ben&Jerry's, on planning your growth curve. I don't believe timesys is an Amazon style company, I strongly believe we're Ben&Jerry's all the way. (This may not be what our venture capital backers want to hear.)
I wrote a series of seven articles for The Motley Fool long ago about the three waves of corporate development. We are a wave 1 company, and need to be treated (and understood) as such. The articles are How a start-up evolves, How companies grow up, Berkshire Hathaway's sustainability, How Xerox forfeited the PC war, The power of "business commandos", Microsoft's split personality, and The Innovator's Dilemma. And here's some follow-up analysis somebody did of the above series.
And just for fun, a couple of couple quick "state of the industry" articles. World Domination 201 is an article I co-authored with Eric Raymond about how the upcoming switch to 64-bit processors (in 2008) inevitably obsoletes the Win32 API the same way DOS was left behind on 16-bit hardware, and how this transition affects the industry. (For TimeSys this means we'd better support x86-64 as a host platform pronto.) And to tie it back to Joel's articles, How Microsoft lost the API war is an article from two years ago that's still quite relevant.
So now that mini-native.sh is building, I'm starting on the packaging stage so QEMU can run it. I'm trying to figure out if packaging counts as part of stage 2 or if I need to insert one between 2 and 3. (The script that runs within the emulator to drive the native build environment is definitely a separate stage, but is the packaging part of the cross compiling or separate? Dunno.)
I suppose I could add it to mini-native.sh and just add a --nopackage command line option. Except I need to upgrade include.sh's command line parsing a bit if I'm going to start including more than one option per stage. Hmmm...
I wonder how much this new kvm thing could accelerate non-x86 qemu?
Fought against make and bash for most of the day. Both eventually surrendered, with the assistance of the CLFS-1.0.0 docs and much head scratching. Had to track down some weird config options. (Ok, bash doesn't actually _have_ a "--screw-readline", but close. This is a minimal build environment, it hasn't got the curses, termcap, or terminfo libraries. Deal with it.)
Using make 3.81 because it's there (and bloated, and evil, and broken, and it needed me to enable GLIBC_GLOB support in uClibc, one more deviation from defconfig). And bash 2.05b because the newer stuff is silly pointless bloat. (Of course bash is already pretty bloated; the need for --without-bash-malloc to disable its own built-in malloc implementation and force it to use the libc one is silly. The fact that said malloc implementation no longer compiles under gcc 4.x due to those guys deciding that a perfectly legal construct is not an error is hysterical.)
Free Software Foundation: see entry for "sirius cybernetics corporation marketing department".
I need to replace _all_ this crap. But right now, I want to get a working build environment with what's there before I start substituting components. And that's actually pretty close. I think all the packages I need have now built. Tomorrow I need to package the result up and run it under qemu.
And Garrrett gave his two weeks notice today. It's official, I'm the last survivor of the department I was hired into.
I need to start a todo for FWL, of stuff I need to do but not right now. Fix "make utils" is uClibc, add miniconfig to the kernel and uclibc, make an x86 version, finish toybox and replace busybox with it...
Discovered the Bruce Perens Does Not Speak For Me page. (Yes, I still hold a grudge.) Had trouble signing up (it sent me an email with a link to click on, and when I did I got a page saying "unknown key"), but I still approve of the concept.
Added color to the build so when I run download/cross-compiler/mini-native all at once I can see which script it's on without having to run "grep === out.txt". I should make a master build.sh to run 'em all, but I don't want to complicate things...
Vaguely happy with the cross-compiler again, and mini-native.sh is coming along reasonably well. Still need to tune the wrapper to deal with A) native builds where /lib and /usr/include are relevant, B) compilers with no prefixes, C) C++. And probably more. Plus I need to add make and bash to the package list, and then package the result up into an ext3 filesystem image. The wrapper stuff is new to me, the rest I've done before...
Somehow, I strained the last knuckle of my left pinky. The most trivial possible injury, you say? It hurts to type A or hit the left shift key. Sigh.
Fighting with the gcc native build for arm. I have to provide three platform designations (build=686, host=arm, target=arm) and yet it wants me to provide it an existing cross compiler as part of its' config. Ok, so why did it want "host" then? I think I'm doing a canadian cross without meaning to, but I'm going to make the darn thing _work_.
What the heck is gcrt1.o? Huh, apparently the wrapper script is generating that. Weird. (Ah. Bug in the wrapper, enabling profiling support when it shouldn't. I know that oprofile support was yanked from uClibc ages ago, and yes this means the wrapper theoretically supports glibc as well as uClibc. I just don't care very much. :)
I should probably organize mini-native to build a bootable kernel with a static busybox in initramfs, and then add uClibc to that so it can be dynamically linked, and then pile binutils/gcc/make on top of that. Right now it's doing things in a sub-optimal order. (Or at least less flexible than it could be.) I also have the leftover issue that cross-compiler.sh tests against qemu, but doesn't build it. (Now mini-native.sh is building it, but that's not where the test is.)
Hmmm... In both cases I sort of have a "lean" and "full" version of each script. The lean cross-compiler doesn't include qemu or a bootable kernel. The lean version of mini-native doesn't build gcc/binutils/make. Probably I should shuffle things to build that way, if I can think of a good way to say this on the command line. (I don't want _four_ scripts, and don't really want two command line arguments to each script, either. Although an optional "--lean" argument isn't too bad...
I should not have to define _nine_ environment variables to get gcc to cross-compile. Sigh.
Reading an interview with Eric Raymond that Google brought up while I was looking for something else (I was curious because it mentioned my name). About "Aunt Tillie": I must say I'm really starting to hate the old bat, and if I could write software to exclude _just_her_, personally, without affecting anyone else, I'd do it. Not a very successful metaphor, at all.
Ok, I implemented a --short option to the build and moved the qemu build into cross-compiler. I'm pondering moving the linux kernel build in there too, since it already extracts the kernel tarball and installs headers from it to make uClibc work. The question is, should I then re-build uClibc in mini-native, or just copy the headers and libraries out of the cross compiler toolchain?
Interestingly, this brings up something I hadn't noticed before: uClibc's "make utils" is broken for cross compiling. If I build ldd with the cross compiler it builds an arm executable I can't run on the host. If I build ldd with the native compiler it gets the arm compiler's arguments (and dies complaining that little-endian is an unknown option). How do I build a cross ldd for uClibc? I think I need to ask the list...
And software Suspend failed to resume my laptop again.
Saw "Casino Royale" with Fade. The first half was good, but it went way downhill. (Perhaps I'm watching the wrong movie franchise when it annoys me that bond girls are a variety of red shirt, but in previous movies they at least had a _chance_ to survive.) But far more annoying was the 15 minutes of unpleasant suspense after the main plotline's been resolved for the second time (what is this, Bond vs Freddy Kruger?) and Bond has Found Obviously Doomed Happiness(tm). It wasn't "how will this turn out" suspense, it was like those comedies where something unbearably embarassing's happened to the protagonist and you just have to sit there and wait for it to be over. There was no question of "will this fall apart", because we _know_ the actor has signed a multi-movie deal. We even had a pretty good idea she was going to die (nothing new: Sean Connery's bond even got married). The only real question was would she would betray him first? (Ok, mostly the question was how, but I thought they might try to get clever.) And finding out was slower than the end of Return of the King, which had at least earned our indulgence by that point. (Asking why she's dead but he's not even unconscious when they went underwater together is nit-picking; by that point I just wanted the darn thing to _end_ already.) Yes, at the very end he's finally pissed off enough to have the James Bond Themesong, but who cares? (Ah, at last a villain who's earned a good thrashing. Please don't actually show any of it.)
Once you're on to your _sixth_ Main Bad Guy for the movie (jungle rebels, guy chased from the pit fight into the embassy, husband of corpsette #1, the big evil banker, betraying assistant guy at the hospital, and finally the guy shot in the knee at the end) they all start to run together.
I remember writing at length about the importance of a good villain to a movie (I was using the Batman series as an example so it was probably back when Batman Begins came out), but it must have been private email because Google isn't finding it. (My attempts did turn up this and this and this, which just goes to show I've spent a lot of time on the internet and that for some of it I was extremely sleep deprived and overcaffeinated. So nothing new there...)
Got the last week's work checked in to the Firmware Linux repository, finally. Tends to get a bit tangled up if I leave it that long, but I hate to check in something that doesn't work.
Spent the day at the office rather than telecommuting, and got very little done as a result. (Ok, I figured out how to delete stale files without using "find", by using "date -r" and the shell's wildcard expansion instead.
The bus ride home was even less fun than usual. Let's see: I walked through a huge cloud of marijuana smoke along the sidewalk, which I remain allergic to. I was reminded that the bathrooms at the McDonalds near my bus stop require you to make a purchase to get a token to use them, adding that touch of charm to the community. Wait for the kind of bus I needed to show up: 20 minutes. Number of other people waiting for the bus who spit on the sidewalk during this time: 3.
I believe I have upgraded Pittsburgh from "hate" to "despise".
Fade has friends visiting, with a toddler, who is fascinated by the kittens and keeps trying to eat the little Penguicon 5.0 foam penguin she tore off my backpack.
And there are once again mailing lists for toybox and FWL, thanks to David Mandala. Check the respective web pages for subscription info.
I want to find whoever designed "find" and beat them severely. The user interface is just horrible. Today's gripe: you can "find -newer" but not "find -older". They seem to think you can just use \! -older (escaping the ! so the shell doesn't try to interpret it), but on a filesystem with 1 second granualrity calling "touch" on a marker right before checking your files (calling touch on each selected file) and then trying to find all the ones older than your marker? The same second counts as ! newer. ! > is not the same as <, because >= and <= exist. Except to the people who designed "find", who apparently had no interest in making a general purpose utility, just a bundle of special cases.
*smack*
I need this to make the download script clean out unrecognized files from the sources/packages directory when you upgrade to a new version. Implementing this in shell is being stroppy.
Spent an annoyingly large chunk of the day tracking down the uClibc bug. Turns out the fix is to remove weak_function from __pthread_mutex_unlock() in uClibc_pthread.h. I have no idea _why_ this fixes it, but it does. (When the weak_function is there, the call to __uclibc_mutex_unlock() segfaults. But when it's _not_ there, the call to __pthread_mutex_unlock() in the uclibc one gets forwarded to __pthread_return_0() which was the point. I'm confused.
Here's hoping the uClibc mailing list can sort it out...
So ever since Erik got his cake the uClibc repository has been in flux as they check everything in that should go in the 0.9.29 release. In FWL I've been using the Nov 28 snapshot of uClibc, but last night's was finally a decent stopping point after all the thread locking changes, and worth testing. So I upgraded.
Alas, armv4l doesn't work anymore, segfaults on exit from "hello world". I narrowed down the change to something between svn 16820 and svn 16827, but nothing between those compiles. I pinged the list, and maybe they'll
I should add a readlink() call on the argv[0] checking in the wrapper script. That way if somebody makes a gcc symlink pointing to the cross compiler, it'll still be able to figure out where everything lives.
Reading the Cross Linux From Scratch snapshot I have on my laptop (the 1.0.0 x86_64 book, I think. Time to cross compile a native compiler, and since I can never remember why this is _any_ different than a canadian cross (it shouldn't be), I'm going back to the docs.
Wow they apply a lot of patches to stuff. I wonder if they're needed? I'm always impressed how _bad_ gcc is. Its code generation backend is great (thanks to cygnus and codeweavers and so on), but the front end is a steaming pile of garbage (which I'm working around with a wrapper script to override _everyting_, and CLFS is applying lots of patches to), and its build system is insane.
One of the reasons I zapped the separate symlinks directories yesterday is my naming wasn't quite consistent. The script was cross-compiler, the working directory was cross-compiler, but the build was build-cross. (Of course now I've added -$ARCH to the working directory, but that's not a bad thing.) This means that the working directory for mini-native.sh should be "build/mini-native-$ARCH", which is nice and straightforward.
Hmmm... CLFS is exporting environment variables (CC, CXX, AR, AS, RANLIB, LD, and STRIP) for the various cross tools. Apparently the binutils build only needs CC and AR (although failure to die is no guarantee it built properly, or that this works for all architectures).
More cleanup on firmware. Making an include.sh to put shell functions in, since mini-native.sh will need things like dotprogress.
I could just make it one big shell script, but I want to be able to run (and debug) the sections independently. If a cross-compiler is useful by itself, then building it by itself makes sense. Downloading the source is conceptually separate from building the source. (In previous iterations of FWL, I tried to get fancy with if statements, which was a learning experience in why not to do that.)
Ok, made an include.sh and shoveled lots of build.sh and download.sh into it. Renamed build.sh to cross-compiler.sh, and started a mini-native.sh. And now I've got a funky question: how should I handle the source symlink directories?
Right now there's sources/build-cross which contains version-independent symlinks for each package the cross-compiler build uses. And there's another directory, sources/build-native, that contains symlinks to additional packages needed by mini-native.sh. The problem is, mini-native also needs the packages out of build-cross. The setupfor() function looks in the appropriate directory using the "STAGE" variable, so if I keep them separate I have to tell mini-native that its stage is sometimes build-cross and sometimes build-native.
I could just have one big directory of version-independent symlinks. It's nice to have them split out so you can see what stage is using what packages, but right now there's implicit knowledge (build-native uses build-cross packages too), and duplicating the symlinks probably isn't an improvement.
And there are two packages in a nebulous halfway zone between build-cross and build-native: qemu and linux-kernel. I use qemu application emulation to check that the cross compiler can build a target "hello world" that runs, but I'm not building it as part of the cross compiler build at the moment. And when extracting the linux-kernel source, it's fairly easy to build a kernel at that point (the source is right there, it's 3 or so extra lines of script).
The thing is, the cross compiler doesn't actually need qemu, or the linux kernel. The mini-native build needs both. But it's easy for the cross-compiler build to make them. But then the cross-compiler tarball is bigger, and contains unnecessary stuff...
Decisions, decisions...
Ok, all the symlinks are now in sources/build-links. I'm doing redundant package extraction at the moment, but I can do something about that in the future by fiddling with setupfor in include.sh.
Driving back from Michigan. It was good to see Penguicon's other co-founder, Tracy, who's buried in work getting a nursing degree.
It's a 5 hour drive each way, and Garret's driving, but I only managed about a half hour of work on my laptop on the way up, because the little 95 watt inverter I got is useless. (My laptop pulls more than that just charging the battery when it's switched off). On the way back, I got one of the types of inverters David had for the trip to Ottawa Linux Symposium: this sucker's rated for 500 watts! Seems to be holding up quite well. :)
Firmware Linux proceeds apace. The cross toolchain building script is now building a kernel I can boot under qemu, all the way up to the point where it panics trying to mount the root filesystem (which is to be expected because I didn't give it one). I'm not _entirely_ cartain that's its' job, of course. That's probably the job of script #2, the one that builds the minimal native build environment to run under qemu.
Yes, I am a geek: I start numbering from 0. Script #3 will be the one that runs inside qemu, but what exactly it should do is vague enough at this point that I'll hold off until I'm there. (Bootstrapping gentoo from stage 1 is one option. So is building linux from scratch. The distcc acceleration bit goes there too.)
These source tarballs are a bit expensive to extract (the kernel one is flipping huge). If the extracted packages are still around, it can save a lot of time. (Deleting them and then re-extracting them again is kind of silly.) On the other hand, I don't want mini-native.sh to depend on cross-tools.sh just having run. It needs the output (the cross_compiler directory), but shouldn't rely on anything in the temp directory. It's a classic speed vs disk space trade off. Matter of opinion, really.
I'm also trying to make it so the cross tools are reusable. (Currently it's a c-only toolchain, not c++, but I can expand it later.) I suppose if I really care about this I can try to download .gz files instead of .bz2 files (which are a lot faster to extract), or just have the sources directory be kept uncompressed all the time. I'm not patching any of the source yet (I used the wrapper script to beat gcc into submission instead). And if I kept uncompressed source around I'd have to be more careful about building everything out of tree...
Basically, I'm stopping to do some cleanup before tackling mini-native.sh.
Off to Michigan, for a Penguicon board meeting. Garrett (the uClibc++ guy) and I are going in his new car. It's shiny.
The press release finally made it out at the end of Friday: Larry Quit. I don't know why it took a week to figure out how to phrase these two words, nor why they've been treating it like it's bad news. I was kind of enthused when it happened, and if the company's looking for a new CEO David Mandala would be _perfect_. (Not that I could even mention this to David while the gag order about Larry leaving remained in effect. And I have yet to talk to the new temporary CEO.) I was also worried that Laurie would turn this into yet another power grab. (If she becomes the new CEO, I'm quitting that day.)
Unfortunately, talking about this on freenode's #timesys channel (where the ex-employees hang out), after I _thought_ the gag order was lifted, resulted in Al chewing me out. It was the nicest, politest two hour reprimand I've ever received, but he made it quite clear that Laurie is more important to the future of the company than I am. (After all, she's now the voice of TimeSys, and what I want to know after reading that is who is the _junior_ director of developer exchange services.) I've also been told there's more going on behind the scenes than I know about, and I shouldn't speculate about things they're intentionally keeping us engineers in the dark on. I should just shut up and focus on my code.
Sigh. I was enthused up until that point. I really was. I thought maybe we could get David back, or maybe my friend Mark would want a job here when he graduates in May. I was thinking I'd still _be_ here in May. That was my mistake, I started caring again. Ever since David left I've felt I was working at a company that fundamentally didn't value engineers, and ever since Piggy and Greg left (the last of the old guard), my work has focused on leaving the the company in the best possible position I could arrange on my way out. Tuesday's meeting contained some announcements I thought were very good news, something I could work with and build on, but it's been made clear I can never talk about it. The gag order will never be totally lifted, and only Laurie is allowed to talk to anyone outside the company. (Ah pervasive secrecy, the hallmark of any good open source endeavor. Not.)
So now I'm back to wondering why I'm still here. I suppose I should at least _meet_ the new temporary CEO before deciding to head back to Austin. I made an appointment, and he may have time for me this coming Tuesday.
At any rate, I've unsubscribed from the #timesys channel on freenode, and I'm not going to mention work here in my blog anymore after today. (I don't want to get Al in trouble with Laurie.) Maybe Garrett will still blog about it. It's been made clear to me that it's not my problem anymore, so I'll shut up and focus on my code.
Laurie still hasn't gotten the press release out.
The 2.6.19 kernel build hangs, calling "git". I have no idea _why_ it's calling git trying to build a stable release, and I have no idea why ubuntu decided to install git (I use mercurial), but "rm /usr/bin/git" (as root) unhung the build. Pinged the kernel list about it to see if they have any ideas...
And I've unsubscribed from the new firmware list, after two people chewed me out over how vitally important it was that I enable "Reply-to" munging. (I wouldn't mind so much except one of them owns the list server, and he went on about it for 15 minutes after I established that I have my reasons for _not_ wanting todo it.) I've been on linux-kernel since 1998, where I learned my mailing list habits. They don't do that. Neither do the kernel sub-lists I'm on (like uml-devel), which are hosted on sourceforge. The busybox and uClibc mailing lists don't do that. The mercurial mailing list doesn't do that. The dropbear mailing list doesn't do that. My idea of what's normal for a list is _not_ doing that.
Screw it. I have my development blog here, and I have an email address I can receive replies at. Have fun on that list (which is still there). Go ahead and add reply-to headers. Makes no difference to me at this point, I won't be reading it.
Pondering various toybox design issues. The help system: where should help text come from? Right now I'm trying to phrase the Config help text in such a way that I can script something to harvest that and turn it into man pages, but not everything has a config option. (For example, toysh has builtin commands "cd" and "exit", and I'm thinking of adding "help".) Ideally I just want one place to put help text, but then again ideally I want just one place to add applets.
Right now, adding an applet is a three step process: 1) drop your C file (named the same as the applet) in the toys directory, 2) add your applet to toys/toylist.h, 3) add your applet to toys/Config.in. Ideally I'd like it so you just drop your C file in there, but there's at little more information I need about each file. Making this work would involve some kind of markup at the top of each C file, and running grep (and sort) against the lot of them. Not impossible, just haven't worked out quite how I want to go about it yet. (The big question is "what would this markup look like"? Hmmm...
In toylist.h you need the NEWTOY() line, and you potentially need to declare your applet structure to go in the union of global structures (although that part's optional). Everything in Config.in is optional, but if you have it you need to define one or more config symbols, with type information (generally but not always bool), dependency information, and two flavors of help text for each symbol (single line and multiline). A default value wouldn't hurt but I'm not that tied to it, to me the interesting default states are allyes, allno, and "maximum sane config" which is allyes minus a few specific symbols.
The tricky part is when one applet has more than one config option, such as df having a sub-option to enable -P and -k. Right now, they're just separate config options and I haven't written aything to glue their help text together, but each line that starts with a "usage: " is intended to get glued together into a man page. (Interactive help isn't _quite_ the same as a man page, but close enough I can fake it.)
Hey, there's a fun new way for software suspend to fail to resume. Soft lockup detected on CPU0! (You've been suspended for 2 hours you twit, who enabled the software watchdog? Thanks kubuntu!)
Erik got his cake! Woot!
Strange goings on at TimeSys. There was an all hands meeting on tuesday (while I was telecommuting), about which I can't say anything until the press release comes out. I've been refreshing the website all day, but no go...
The secondary DNS has updated, and the new mailing list for Firmware Linux is working. I added a link for it to the bottom of the Firmware Linux web page. Need to reorganize that site like I did busybox.net.
Beat on the wrapper extensively today to get uClibc to build under it. Lots of head scratching, and going "hmmm... so how is this _supposed_ to work?" Implemented the interception of several -print-* commands (and yes, those have one dash, despite being longopts; the ugliness of gcc knows no bounds). The library part build snow. The utils (like ldd) don't, but I'm not quite sure how they _ever_ built. I'm not sure those actually cross-compile, it's saying "-nostdlib" and then only supplying _one_ of the two library directories needed. (You need the system library directory, and the gcc library directory.)...
Oh, of course. It needs those symlinks to the kernel source. Right... Ok, fingers crossed, that might actually be a toolchain. Currently only does armv4l little endian, and I haven't got soft-float working yet, but it's a start.
I also have a mailing list for firmware linux, courtesy of J_Man from the #cross-lfs channel on freenode. Waiting for the secondary dns to update and for a bit of testing before I add it to the web page...
Banging on the uClibc wrapper script some more. I've put the ugliness of the wrapper aside for a moment to contemplate the ugliness of gcc.
I don't mind gcc having its own lib directory, separate from the system lib directory. I can add an extra -L to get crtbegin.o and libgcc.a and so on, and the name cross-compiler/lib/gcc/armv4l-unknown-linux-gnu/4.1.1 is kind of ugly (ending with two non-constant names) but I can cope with that. I can also see having a separate include directory for stdarg.h and stddef.h and headers like that which come from the compiler, not from the system libraries. But having that include directory be cross-compiler/lib/gcc/armv4l-unknown-linux-gnu/4.1.1/include is just WRONG. cross-compiler/lib is a library directory. /usr/include isn't under /usr/lib. Ick!
So the question is whether to update the wrapper script to point at the ugly include directory, or just move the darn include directory somewhere sane, which means I don't have to hardwire the target tuple and gcc version into the wrapper binary, which would be very nice. (I also have no idea what the cross-compiler/lib/gcc/armv4l-unknown-linux-gnu/4.1.1/install_tools directory is for, possibly that can just be deleted...)
(Crazy guy on the bus ride home, who I didn't say a word to, spent ten minutes screaming into his cell phone about how I'd insulted him and he was going to punch me. My mistake was in A) looking directly at him for maybe 3 seconds, B) getting up and moving to another seat. Having gone to college in Camden, New Jersey, I knew the proper response to that kind of lunatic: Ignore them, and don't make eye contact again. Did I mention I hate Pittsburgh? Luckily, it has the Te Cafe, which is open for another 2 1/2 hours, and I have a pot of Assam Borengajule, which is nice and highly caffeinated tea. Where was I?)
Ok, so shuffle the lib and include directory into cross-compiler/gcc/lib and cross-compiler/gcc/include, delete the install_tools thing as probably useless... And bang on the wrapper script yet more (zap xstrcat(), use asprintf(), fix more paths)...
HA! Got the toolchain plus wrapper to build a "hello world" that runs under qemu-arm! Go me! (Only the statically linked one runs, the dynamically linked one segfaults qemu. But I think this is a qemu problem, I had another arm toolchain do that too. System emulation gets a lot more love than application emulation.)
So now that the toolchain works for the simple case, step 1 is to make sure that build.sh can reproduce this if I run it again, and step 2 is to go through the rest of the build and find all the strange places it fails. (Which are likely because I've just tested static linking, not creation of shared libraries and all sorts of other weird things. Plus I'll probably need a linker wrapper too, although I think that can just be a shell script...)
And it doesn't rebuild, but for an interesting reason. When I patched the wrapper build snippet into build.sh, I added it right after the gcc build. Which means the uclibc build is trying to build using the wrapper script. This is both a slight chicken and egg problem (not much of one, though) and a test of things like library creation that I haven't tried getting the wrapper to do yet.
You know why I'm pissed off about GPLv3? Because the FSF is abusing the "lifeboat clause" of GPLv2. The whole point of the "or later" thing is that if the GPLv2 ever had a court decision against it proving it unenforceable, the FSF could fix whatever problems showed up. Well, no problems showed up, the GPL turns out to be every bit as enforceable as we could possibly have hoped for. SCO and Wallace lost their copyright claims spectacularly, gpl-violations.org is positively cleaning up in germany, and GPLv2 may even give us a bit of patent protection (see Novell's massive backpedaling in its Microsoft deal).
The problem isn't that GPLv2 is unenforceable, it's that the FSF unilaterally decided that GPLv2 isn't good enough, and hijacked the lifeboat to pursue a brand new political agenda. Fighting off Tivo and Xbox? I honestly don't care. I didn't sign up for that. All I ever asked for was "can I have patches for the changes you made", because this is pragmatically useful to me: it means the darn codebase doesn't fork so much. I didn't ask for the root password to your server so I could run my code it, and I don't mind if you burn it into ROM. Mod chips exist for a reason, and debating the legality of mod chips (or software patents, or decss, or retroactive copyright extension) is a totally separate issue, and this is the wrong way to address any of that mess.
And yet the FSF STILL has the gall to say "you can't do GPLv2 only, what if it's declared unenforceable"? Hey dude: the lifeboat sank. That was your fault. Get over it, and SHUT UP ABOUT IT. It doesn't MATTER what's in GPLv3, you shouldn't have issued it because GPLv2 ain't broke, and doesn't need to be fixed. You're pulling a Darth Vader: "I am altering the bargain, pray I don't alter it any further." I'm not buying it.
I like GPLv2. Doesn't mean I have to like the FSF. (I also like the Cathedral and the Bazaar but don't agree with Eric on libertarianism, like the constitution without owning slaves like more than half its' authors did, like Ride of the Valykeries without agreeing with Wagner's politics, and don't particularly care where the original Volkswagen Bug came from: it's a cool car.)
And yes, this is why I quit BusyBox.
P.S. Notice how much code out there says is just "the GPL" without specifying a version, since version 2 was the only one that mattered? Well according to section 9, people can pick any version of the GPL ever released by the FSF, including not just GPLv3 but the long-abandoned GPLv1. The FSF is saying that people explicitly bought into it, but they're not just relying on the people who blindly cut-and-pasted the recommended boilerplate, they're also relying on the fact that an awful lot of people just plain didn't READ the thing. I say GPLv2 explicitly now, so section 9 will not apply to my code. But then intellectual property law is a hobby of mine.
P.P.S. To add insult to injury, the FSF is attempting to legislate in an area it obviously knows _nothing_ about. For example, to re-flash a current 2-meg linksys you need a screwdriver, a specialized piece of hardware to connect to a JTAG port, and if you want a serial console you'll probably need a soldering iron. This isn't due to DRM, this is due to them being _cheap_. And they're being _nice_ by leaving the JTAG port wired up; if they left it out reflashing these things would be a _real_ pain. Compared to that, a mail-order mod chip really isn't a significant jump, and whether or not it's legal to mod-chip an Xbox is a lot more interesting (and important) a legal question than anything to do with GPLv3 trying to launch a _boycott_. "You can't run GPLv3 software on the Xbox!" Gee, I'm sure Microsoft is just crushed by this. _IDIOTS_
I could go on for a while (how GPLv3 actually makes forking worse, comparing GPLv3 to CDDL, why the FSF is lying about the kernel, the horrible timing of the whole thing), but I'll stop now.
Actually made a commit to my tcc fork today, for the first time in 6 weeks. (It just removed some unnecessary code, but still. It's a change to a project I thought I'd abandoned. Maybe I'll do more of them.)
Spent the rest of the day fighting with soft-float in gcc. I have no idea _where_ the actual implementation lives for the soft float functions gcc sprinkles calls to into the programs. I thought it was in libgcc.a but linking against that didn't help. Jim Gifford seems to think they live in glibc. Except that uClibc has a config option that requires building with soft-float, and how could it do that if the implementation of soft-float is in glibc? Makes no sense...
Yay, "Spurious interrupt in atkbd.c". When my laptop fails to resume, it's usually either that or it just goes through the normal bootup sequence without noticing that it was suspended rather than shut down. Can we say "software suspend is mature technology"? No, I mean with a straight face...
Yeah, my fault for still using the 6.06 Ubuntu instead of the 6.10 one...
Switched my firmware test build from x86_64 to armv4, among other things because QEMU can do both application and system emulation for arm, but under x86_64 it can only do system emulation. (The final build doesn't care, but it makes debugging easier.)
The uClibc build is barfing, unable to find various floating point functions that are in libgcc.a, because it's not linking against that. Yes, by default the paths are _so_ screwed up you can't even link uClibc. (It seems to compile fine, it's just the final link that fails.) Which means I need the wrapper script earlier than I thought, but ok.
Banging on that now. It continues to be evil.
The 2.6.19 kernel is out! Woot.
So George Budreau gave me some suggestions about firmware linux. He played with the 0.8.x releases (including the year of so of development in 2005 I never really got released properly). And this has led me to pondering two things: 1) file layout, 2) starting a darn mailing list already.
Taught firmware/downloads.sh to make symlinks. Now maybe people can actually reproduce my builds based on the hg repository. (Not that I'd expect the result to _work_ yet, but I'm getting there. Possibly this weekend.
Attacking the gcc-uClibc.c wrapper script. It's... evil. It has hardwired gcc version numbers in the paths it generates. It has its' own "basename()" function because CYGWIN hasn't got one. After a massive #ifdefectomy I still need to supply GCC_BIN, TARGET_DIR, EXTRAGCCFLAGS, and DYNAMIC_LINKER to get it to build. Still, I've got it down under 600 lines and it builds, so it should be manageable. Still needs a heck of a cleanup pass after I get it working, though.
I get a loooooot of spam.
One of the early architects of the current build system (Brian) was willing to carve one day a week from his schedule to come back to work for us. (Actually I think the arrangement was that we'd pay his current employer for one day a week of his time.) Unfortunately, it was decided that one day a week wasn't good enough, and he was pressured for more, so now Brian won't be coming back to work for us after all. Sigh.
Spent yesterday evening helping Beth move stuff off the 6th floor. (If a system administrator's job description can include wielding a windex bottle and heavy lifting, a programmer can volunteer to help out.) I have no idea why we have nine boxes of envelopes with the company return address preprinted on them. (Big, heavy boxes.) The "stolen" copier turns out to have wheels, so it's now in a cube on the fourth floor. (Its' monthly lease payments haven't been made in over a year, but we haven't arranged to return it to the company we were leasing it from either. They stopped maintaining it after we stopped paying, and it's long since out of toner or whatever is wrong with it. It's a large doorstop that Beth doesn't have the authority to deal with, and the CFO has so far refused to take any interest in.)
Oh, and marketing wants root access to the new public-facing engineering server. (I'm amazed the request took this long.) The original idea for that box was something all the engineers had root access to and could put whatever we felt like on. Unfortunately, this idea didn't survive contact with the CEO, who acted outright incredulous at the idea of us engineers using our own judgement about what was and wasn't a good idea to publish publicly. Since then, just about everybody who was going to use the box has left the company, so it's sort of a moot point now. Personally, I just put stuff on my own domain (or sourceforge, or busybox.net) when I want to make it public.
Adding a download script so FWL can populate its sources directory automatically, and I'm fighting with wget. "Connecting to unc.dl.sourceforge.net... Connection timed out. Retrying". It retries, in a loop, _FOREVER_. (Possibly the busybox version isn't this brain-dead. I'll guarantee you the toybox version won't be. But I'm starting with what's on ubuntu.)
Oh, and the sourceforge mirror system continues to suck rocks for this sort of thing. Just sayin.
I'm using a uClibc nightly snapshot because 0.9.28 is over a year old, and I'm using a Linux 2.6.19 release candidate because 2.6.18 got the header stuff wrong (and didn't include unifdef in the build). Other than that, everything's a release version with no patches (so far).
And Mercurial won't let me add a symlink to the repository. Lovely. That screws up my plan for sources/build-cross. Guess I'll have to generate them in the download script as well, probably with a sed rule looking for a dash followed by a digit. Hmmm...
Jim the database guy gave his notice today.
When I first started at timesys, Manas asked if Firmware Linux was something timesys should be interested in. At the time, the answer was "no". But things change, and now I'm trying to bring FWL up to speed so we can actually use it (or at least properly evaluate it as a build system). Starting with digging up the old mercurial repository and giving it a real web page.
So, NOFORK annotation of applets, interacting with Ctrl-Z. Hmmm.
NOFORK is for toysh, so toybox can run built-in commands rapidly in succession without having to spawn child processes. Just call the appropriate command_main() as a function. For some applets this is mandatory; they have to run in the same process context as the parent, because they modifiy process state (such as cd, set, read, exit..). For others it's a speed hack, because fork is slow at the best of times (essentially for cache reasons, stemming from the page table manipulation that gives us copy on write). For toybox, we're trying to use vfork() instead to be nommu friendly, and _that_ freezes the parent until the child does an exec() or an _exit(). Requiring that we find and exec() ourselves to unfreeze the parent means builtins can't be any faster than external programs. (Plus, execing the currently running program is nontrivial because exec has to happen by path and what _is_ the path to your current executable? Ok, I already wrote code to handle this one as best I know how, see toys/which.c:which_in_path(), but that code can break if you run something at a relative path that's not in $PATH and then chdir, and if you chroot your executable may not be visible within your process's current root filesystem at all.)
It's quite possible I'm using the "royal we" in the above paragraph, since nobody but me is working on toybox yet. Oh well.
Anyway, using nofork gets tricky when the applet blocks for input or for output. At your shell command line, go "read i" and then hit ctrl-z to suspend it. Nothing happens, right? That's because read isn't a child process of the shell, it sets an environment variable in the shell, so it has to run in the shell's process. Classic example of the need for nofork, and also of the side effects of nofork. You can ctrl-c out of a blocked nofork applet (the shell just needs a signal handler that can longjmp to a cleanup routine), but can't suspend it because there's no easy way to resume.
So if echo becomes a nofork, and you "echo blah | thingy_that_can_block", you can't ctrl-z to suspend it. Lots of other applets (like df, which... most of them if they're written cleanly enough and have CFG_TOYS_FREE enabled) don't actually _need_ to fork. So maybe I need a _second_ category, CANFORK, that says "only fork if the output of this goes to something that can block", which is a bit strange to try to think of a test for. (Does that mean it's not a tty, or just that it's redirected with ">" or "|"? What if your entire shell invocation has a ">" around it at a higher level? Back to "not a tty", although it's possible to kill -STOP an xterm and if there's no tty involved anywhere, where would the ctrl-z come from? And how does that differ from kill -STOP from an unrelated process? Because it doesn't potentially have to drop you back to a command shell, so the whole shell can suspend. Ok...)
If I was using real fork() instead of vfork() I could cheat slightly in CANFORK commands by doing a fork halfway through the run (probably _not_ from the ctrl-z signal handler, but you never know), suspending the child process, and adding the new child to the suspended process list, and letting the shell continue.
Of course suspending pipes is a whole can of worms because nobody remembers to handle -EINTR and short writes/reads, which SIGSTOP tends to generate. If you don't get it right, piping things into and out of tar can produce corrupted archives. This is an area I have some experience in.
So what's the worst case scenario of making nofork's unsuspendability unpleasant? Can you "source" a script and pipe the output somewhere? Let's see what bash does... Yes, turns out you can. Ok, so if I do a `yes "echo hello" > test.sh`, let it run for a dozen megabytes or so, and then `source test.sh | less` what happens? Huh. It blocks for a while before I see any output in less; I think the script has to complete before less gets to run. (Possibly it's cheaing and writing it to a temp file, because pipes shouldn't absorb that much data between processes.) Can I ctrl-z during that delay? Yup, but it seems to abort the "source" if I do... (Of course, "what does bash do" is merely a frame of reference. It's not at all the same as "what's a good idea to do".)
Ha! Got the mercurial web browser up and running. That was a bit of a pain. Got toybox and tinycc hooked up to it, but not firmware linux yet.
Yay thanksgiving holiday. Time to bang on code!
So, today's technical problem is "should I have two string list formats in toybox". A linked list of strings can work two ways in C: each node in the list is a pair of pointers (one to the next node, one to the string), or each node in the list is a pointer followed by a char array containing the string data. Currently I have both: in lib/lib.h the two pointers are "struct arg_list" and the pointer plus array is "struct string_list".
The advantage of string_list is you have one allocation per node so you fragment memory less. The advantage of arg_list is sometimes you already have a string preallocated (such as your command line arguments) and copying it wastes memory (potentially huge amounts thereof if somebody does something crazy like df -t "$(cat war_and_peace)". (I'm sure there are non-artificial examples.)
However, there's an inherent downside of having two ways of representing the same data. It's extra complexity. I'm not sure the advantage is worth it.
On the other hand, if collapsed to just one it would have to be arg_list, and right now arg_list is only used for things that the list collector didn't allocate. There would still need to be a distinction made between "the strings this points to were allocated by me, and I should free 'em" and "I didn't allocate these strings, and shouldn't free them". Right now, that distinction is something that breaks obviously if you do it wrong, and that's a good thing...
In other news I'm going through my various "I should add support for this" notes, and one of 'em is support for device-mapper. This is HORRIBLY documented. Google is unhelpful, the kernel's Docuemntation/device-mapper doesn't say anything about how to use it from outside the kernel, I can't get dmcrypt to compile (it wants /bin/cpp and Ubuntu hasn't got one), and dmsetup source code is a bit hard to track down... (Have I repeated that I hate packages.qa.debian.org for not pointing at the upstream version of packages? It's first place in google rankings for most source searches, but it's a dead end. A mirror's nice but I want the site this came from. Where's the authoritative source? Debian, you ain't it.) Ok, found dmsetup.c... And it's a mess. A small, clean mess if you can imagine such a thing. What does this code DO? What are the communications it has with the kernel?
Right, 'find /usr/include -name "dm*"' led me (eventually) to /usr/include/linux/dm-ioctl.h, which is something. Off to read that...
Setting up proper Mercurial cgi repository web browser thingies for toybox, FWL, and my dormant tinycc fork (which I might resurrect if anybody shows any interest and Fabrice continues to neglect the project).
This is precisely the sort of thing I'm bad at. I pick up new tools very slowly. I have a knack for making all the wrong assumptions, breaking things in new and interesting ways, and generally not getting what the author meant if they weren't explicit about it. (This is the downside of being able to look at a problem and see three different approaches to solving it off the top of my head. The fact that my current glasses are ~4 years out of date and I thus read in 30 second bursts before I have to look away for a bit may be a contributing factor here.)
I've been putting off getting new glasses because I've been meaning to get laser surgery. Wierd Al had it, so it's probably a mature technology by now (I.E. the sharks have good aim and plenty of practice). And TimeSys even provides health insurance...
Actually making the hgwebdir.cgi work properly is turning out to be a pain, but the problem isn't mercurial (yet), it's Apache. I _really_ need to set up my own server. Eric loves running "standard" utilities like bind, sendmail, and apache 2.0. I'm really not that enthused by any of those...
Toybox's option parsing support sucks much less now. About a quarter of it is still unimplemented and I suspect the darker corners of it still not just misbehave but outright segfault, but the main codepath is actually parsing all the options required by the "df" command now. Woot! (Not that df is _using_ them right, but that's a separate problem. Possibly.)
I have lots of test command lines listed in a todo file. I should grab the test harness shell script I wrote for busybox (or at least the last version of it that was all my code) and start turning some of these into actual runnable regression tests. (It stopped being all my code when Mike Frysinger decided to "clean it up for better portability". He meant well, but I was _intentionally_ varying spacing and using different function definition styles because one of the things I wanted to test was bbsh. Now toysh. :)
I have a burst of weird little side items to do now, commands like setsid and watch that are easy to do but don't really advance my first major goal for toybox, which is to replace BusyBox in Firmware Linux. I should get FWL together enough to do a build so I can see what commands are actually _used_ in the build. I used to know, a couple years ago. There are a lot of them, but I mostly remember the ones I had to fix because the BusyBox versions had a bug (sed, awk, sort, tar, gzip...). This doesn't give me the list of ones like "echo" and "ls" that were already there and worked well enough already. That's the todo list I should be focusing on...
Either way, once the number of applets commands (this
is not java!) gets larger I'll have to scale my build infrastructure a bit
more. Right now toys/*.c gets build and then the results discarded by
--gc-sections if they're configured out, I should really filter that _before_
the build to speed things up (and be not _suck_ on build environments that
don't support --gc-sections, like most gcc 3.x toolchains or statically
linking against the ever-defective glibc).
I also need to figure out a rational way to sort the command list in menuconfig. Right now it's one big list, alphabetical. I should probably group the filters, the daemons, the noforks (shell builtins)...
Also pondering that I have a dozen or so _big_ projects queued up (shell, fdisk, mke2fs, web server, name server, dhcp client and server, finish bzip2 compression support, redo of sed, maybe a gzip rewrite, figure out if I want to write an AWK or make puppy eyes at Dmitry Zakharov... And more.)' Each of those is what, a month's worth of work by itself?
So I'm integrating spamassassin into filchmail, and improving the error checking while I'm at it. (How do you figure out that output redirected with ">" filled up the disk? It's not a "$?" test. I vaguely remember reading something about this somewhere, possibly it's a bash extension...)
And what a difference it made: of the 600 new spams I've recieved in the past four hours, "spamassassin --mbox -L" caught 32 of them as spam. Stunning. It even let through the ones that consist of entirely high ascii characters that render as funky graphics characters (and yes, kmail can render japanese and such, this is _gibberish_). Apparently, without the network tests it's kind of pathetic, and with the network tests it's very very slow. Still, no false positives yet, so that's something.
And Ubuntu finally decided to fail its resume (with the most common complaint: an endless series of panic messages about a spurious interrupt due to something touching hardware directly, probably X11), which means my laptop has rebooted. As long as I've lost all the open windows anyway, it's time to upgrade to 6.10... Maybe tomorrow.
So, the new build system, and the meeting with Larry yesterday. What I proposed to Larry is roughly based on the approach I've been working on for Firmware Linux since forever (see the writeup I did on the 14th, or for that matter January 10), except for TimeSys I recommended that once we get a minimal native build environment running under QEMU, we use that to build Stage 1 of Gentoo Embedded. I'd worked out some of the details of this with Piggy just before he left, and bounced it off Al and Garrett Tuesday morning before the meeting with Larry. (I have lots of practice discussing technical things while otherwise extremely incoherent. Sleep deprivation in the name of geekdom's almost a hobby with me.) They all helped fine-tune bits of it. (I hadn't decided between Gentoo and Ubuntu before Piggy spoke up, but he several convincing reasons.)
The current build system is based on TSRPM, which is actually 3 different sets of functionality glued together into one big tangled heap: 1) a bundle of heuristics to shove under package builds (a bit like fakeroot) to make them cross compile against their will (an inherently hard problem), 2) a cross-compile toolchain builder which is really a different problem from #1, 3) a converter to turn RPMS into other RPMs, or into ipkg, or tarball... Even though TimeSys GPLed TSRPM there was zero uptake outside of timesys. Everybody I talked to heard "RPM" and ran screaming, which to be honest was my initial reaction too. On top of everything, TSRPM requires versions of RPM that nobody has, releases put out by the old RPM maintainer after Red Hat fired him. My laptop has Ubuntu on it, and I've never gotten TSRPM to work on that at all. Did I mention the author/maintainer of TSRPM left TimeSys months ago, and now has a full-time job at another company?
The rest of the old build system is primarily in-house black magic too. Our build cluster system is homegrown. (Piggy wanted to open source it, but didn't manage to before he left. Personally I think most of the build system could be replaced with distcc and ccache and the result would be a net improvement.) Porting each new version of Fedora to platforms like ARM and PPC which Red Hat doesn't support is rather a lot of work too, although that work's already been done for FC5 (by people like Piggy, Greg, Walt, Sam, Chris... What does that list of people have in common? All now working at other companies).
The main advantage of the _new_ build system (other than being simpler in absolute terms -- don't underestimate that) is that it replaces all our in-house black magic with stuff that's got existing active open source communities out there. (And where the open source project isn't good enough for what we need yet, like CLFS, we can start with what they've got and push stuff back upstream until the external project _is_ good enough for what we need.) That means next time we hit a bottleneck and haven't got the resources to get something done when we need it, there are people out there other than us who know this stuff. We can contract out, or recruit expertise, or in extreme cases file bug reports and make puppy eyes on the appropriate mailing list.
Right now if we recruit somebody to help us, we have to stop and train them first, and that's Brooks' Law staring you right in the face. Just releasing your code doesn't fix this, you need an actual active community you can pull from to find people who already know this stuff because that's what they were doing anyway, now they're just doing it for _you_. It's not good enough to just release code and documentation.
Strangely, losing most of our engineering staff may have been a positive step for Timesys if it means we _must_ embrace open source development the rest of the way. We'll see. Kill or cure, but I think we can pull it off. With FC5 shipped, we're in good shape for 6 months or so. We've got Brian back part-time (paid his new company for some of his time each week), presumably he can keep the current build system running for a while.
Anyway, the new build system has four parts. 1) A cross toolchain built similarly to how CLFS does it (although we need far more than they support yet, but we can fix that one :), 2) use that to cross-compile a minimal build environment, somewhere between the Linux From Scratch /tools directory and what Firmware Linux does, enough to bootstrap our way up to Gentoo Stage 1, 3) Run the minimal build environment under QEMU system emulation to natively compile everything else (optionally using distcc as an accelerator as described on the 14th), 4) build Gentoo (and/or Gentoo Embedded) from from Stage 1 natively under emulation. Optionally, 5) make the appropriate binary packages for everything (convert the gentoo portage binary packages into RPM, .deb, .ipkg...)
The ironic part is I was working on something like this before Timesys hired me (well, steps 1-3 of that, anyway, following LFS instead of Gentoo beyond that point), and work keeping me busy with other things is the main reason I haven't finished it yet. (Ok, the BusyBox maintainership sucking up all my free time for many moons was a big part of it too.) The best part of this job is still that that they pay me to work on stuff I was already doing anyway. :)
Larry also agreed that uClibc should be a heck of a lot more prominent at TimeSys. (That was left over from the last meeting I had with him, the week before. Apparently, I'm not the only person who told him this: Ashish did too, presumably before he quit?) Todo item #1 is to get some uClibc toolchain tarballs up on the website so people don't have to hunt down the appropriate bits. Right now you have to install a reference distro to get a glibc cross-compiler, and then add one or more uClibc support RPMs to that, which is just nuts. Instead we should have a tarball that gives you the toolchain when you extract it, which is sort of what crossdev tried to do (but what they had up broke a few months after everybody but me who worked on it (Sam, Manas, and Chris Faylor) all quit. (Memo: don't make the download directory point into a live build repository, or somebody who re-runs the build with different parameters can replace all your Linux executables with cygwin executables without realizing it. Yes, I complained about this to Sam back when he did it. No, I don't know how to fix it. No, we haven't got a system administrator.
That doesn't mean we didn't _hire_ a a system administrator. We hired an excellent system administrator, Beth, she's just not allowed to do system administration ever since she got transferred under Laurie, who handed her a bottle of Windex and turned her into a janitor. These days Beth's tasks include a mandatory 3 hours/day clearing out the 6th floor so we can sublet the thing now that everybody from there moved down to the 4th. For the move to the 4th floor she did things like clear out the refrigerator. (She also tells me she found an intern from CMU who would do system administration stuff for us for free, and Laurie vetoed it. I forget the reason. We're all amazed Beth hasn't just quit yet, but she's still being _paid_ as a system administrator...)
Where was I? Tarball for uClibc, right. And after my "aha!" moment about the uClibc wrapper last week, I even know how to make it relocatable now. (Actually implementing it is something I haven't had time to tackle yet, due to being in Minnesota. 4-year-olds are time consuming to visit. So are 7-year-olds and 13-month-olds, but it's really the 4-year-olds that have the market cornered in this case.) A relocation wrapper for uClibc toolchains is something else I can feed upstream.
And now thanksgiving. Kelly and Steve are coming to visit from Maryland, meaning the 4 day hackathon I thought I'd have has turned into maybe 2 days, but hey...
Sunday morning Fade felt vaguely ill, and had lots of headaches. Yesterday evening she threw up a lot. Last night after I went to bed I felt ill and had to get up to visit the bathroom every half hour, all night long. (Feeling like I was having a heart attack each time. Last time I felt like that was when I had food poisoning.) Around 5 am monday morning, I finally threw up (a lot). The _fun_ part is that Monday's the day we'd scheduled the drive back to Pittsburgh. (I had to work Tuesday, and so did Fade.) Because nothing says "time to take a car trip google maps estimates at 14 hours if you don't stop" like needing a potty break every 20 minutes and worrying if a diet soda is going to make you throw up.
We'd planned to leave bright and early, and finally got on the road about... 1PM maybe? (I don't remember, I was kind of out of it. Fade drove first since she had half a day's headstart on recovering from the thing.) We tried to bypass Chicago this time because it sucked so badly on the way out with a toll every 30 feet, and wound up getting funneled back into the thing anyway and caught in an hour of stop and go traffic on a toll road at 2 am anyway because "construction" that didn't involve any actual work being done had it funneled down to 1 lane for no obvious reason. Apparently organized crime in that place didn't stop after Al Capone, it just became less competent.
We finally got home just before 10 am, so I could shower and change and catch a bus in to work having sort of slept a bit in the car while Fade was driving, but not much. (Fade's game store job didn't start until noon, but her webmistress gig had stuff for her to do that she'd wanted to get done before then. Couldn't use laptops in the car because the little cigarette lighter to AC outlet thing we'd bought for the trip didn't work. The previous one had one too many sodas spilled in it.)
The kittens remain highly cute, and Garrett had kept all the food dishes full. The adult cats all wanted lots of reassurance, including Dragon, but I had to foist most of this on Fade since I had to go to work.
The big item for the day (the reason I couldn't just telecommute) was that Larry the CEO wanted to talk to me. I managed to be coherent enough to recommend a new build system strategy to him (and the reasons for it), and he seemed to think it's worth a try. What I proposed is roughly based on the approach I've been working on for Firmware Linux since forever (see the writeup I did on the 14th, or for that matter January 10), except for TimeSys I recommended that once we get a minimal native build environment running under QEMU, we use that to build Stage 1 of Gentoo Embedded. I'd worked out some of the details of this with Piggy just before he left, and bounced it off Al and Garrett Tuesday morning before the meeting with Larry. (I have lots of practice discussing technical things while otherwise extremely incoherent. Sleep deprivation in the name of geekdom's almost a hobby with me.) They all helped fine-tune bits of it. (I hadn't decided between Gentoo and Ubuntu before Piggy spoke up, but he had several convincing reasons. I already miss Piggy.)
Just a _little_ out of it at the end of the day. Speaking of which, Little (the cat) is still alive! She's 17 years old now and has turned black with age (which I didn't know cats did) but despite having multiple major health problems her entire life she's still hanging in there. This is the cat my sister and brother found as a lost kitten in the forest when we lived at 6 White Eagle in New Jersey, the one with 7 claws on each paw for a cumulative total of 28 claws, who caught a full-grown _deer_ when she was a kitten. (Kiggy. Purr purr.) Now an insulin dependant diabetic (2 injections a day, yes my sister's as much of a pushover for cats as I am) on top of the asthma, food sensitivity that makes her barf up everything except venison (the irony was noted long ago: she wanted it, she got it, although she also got really bored with it after a while), and these days geriatric arthritis. Little continues to bemuse vetrinarians, who first called in their students because you don't get a lot of double-duoclawed cats in Minnesota, and later because the various combinations of health problems she came down with (and yet managed to survive anyway, out of sheer stubornness as far as I can tell) made interesting learning experiences. 17's pretty respectable for any cat, let alone one who wasn't expected to be long for the world a decade and a half ago....
While we were all out for breakfast this morning, the cat knocked over a glass of milk one of my newphews had left on the table. The only thing between that glass of milk and the edge of the table? My laptop. Everywhere else on the table, dry.
Grrr.
Luckily it was off. I unplugged it when I found it and wiped off what I could, and when my sister got home from the grocery store I borrowed a screwdriver and unscrewed the various things on the bottom that had milk at the edges and got the milk I could find out of the inside (the fan and heat sink seem to have taken the brunt of it; some huge wads of cat hair in there). Gave it a few hours to dry off, and now I'm trying it. It booted, seems to be working so far...
Visiting my sister Kris in Minnesota. My father's visiting this weekend too, so Fade gets to meet lots of my family. I also get to see my nephews (who I haven't seen since Sean wasn't walking yet and his fourth birthday was Thursday) and neice (who's 13 months and this is the first time I've seen her). Google maps said it was a 14 hour drive to get her, but we left thursday evening around 8 and didn't get in until after noon friday. Looks like I'll be taking Monday off from work too, which is not a big surprise.
Naturally I spent a couple hours thumping on my laptop after everybody else went to bed saturday night. (I'm a geek.) Finally got the options parsing code checked in, and hooked up to various applets. It's still got some glaring bugs, and large incomplete sections, but the basic infrastructure's there and can handle "df -a". (But not "df /dev/hda" yet, shouldn't be hard to track down what I screwed up when I'm more awake, though.)
There are probably a dozen simple apps I can do now that I have option parsing code. That was one of the big missing infrastructure bottlenecks.
As always, "hg clone http://landley.net/code/toybox" oughta do it. I get thursday through sunday off next week for thanksgiving, and this time I'm staying home for it. Might be able to get a new release out (0.1.0 probably).
I expect the kittens will have doubled in size by the time we get home. (Garrett's feeding the cats while we're gone. We largely have to trust Dragon to mother them properly whether we're there or not, but we need to pet them a lot (for socialization purposes of course) when we get back. And take lots of pictures.)
So I finally got toybox's get_optflags() to the point where it compiles, and now I"m hooking it up so I can test it, and what I decided to do was put the option strings into the toy_list[] structure for each applet. (This is the one defined by the NEWTOY() macro in toys/toylist.h, and is very roughly analogous to BusyBox's "applet".) This way toybox_main() can call get_optflags() to automatically the parse command line options for each applet, and then on exit we return to main to free everything allocated by that (such as toys.optargs). It's all nicely done in one place so each individual applet doesn't have to duplicate this code. (I was already planning to make error_exit() perform an interceptable longjmp() back to toys_main(), so this works nicely.)
The problem is that options can have arguments, ala "mount -o blah". BusyBox passed varargs to getopt_ulflags() to handle this, but I don't want to go there. Instead, I'm taking advantage of the fact that all the global variables each toy uses are defined in common structure. (The global "toy" is a union of those structures. Yeah it's a bit confusing that there's both a "toy" and a "toys". The plural is a structure containing information presented to all applets, the singular has just your applet's data. I'm pondering renaming that because it's a bit subtle for my tastes, but I haven't come up with a better name yet and the compiler catches it if you switch 'em.)
The way I'm taking advantage of "toy.appname" is to declare the variables containing the arguments in order, and have get_optflags() fill in the structure directly. Yes, I have to make sure the structure is packed, but every data type arguments are saved in is processor word size anyway (4 bytes on 32 bit machines, 8 bytes on 64 bit machines). If it's inserting padding in that, something is wrong. And manually advancing through structure fields is no uglier than parsing varargs would have been.
Dragon had five kittens last night night. (They are _so_cute_, Fade has pictures.)
Anybody want a cat? (In about 6 weeks.)
Garrett and I helped Piggy clean out his desk this evening. (He gave his two weeks notice monday, but is on vacation all next week.) I won't be in town for his farewell lunch on Friday because I'm heading to my sister's place in Minnesota thursday night.
I realized something: I can stop fighting with the darn gcc paths if I bypass them entirely. Fighting with gcc's path logic is an amazing pain because it has not only gets everything wrong but has layers of conflicting brokenness, but the old uClibc wrapper script deals with most of this for uClibc already. It tells it -nostdinc and -nostdlib and then supplies all the paths it needs manually. I can just upgrade the _wrapper_ to be properly relocatable. (The reason the uClibc developers abandoned the wrapper is you have to build gcc from source anyway to get versions of things like libgcc_s that don't leak references to glibc. But I'm doing that anyway, this solves a different problem.) If I wanted to make the wrapper work for glibc I'd have to modify it a lot, but if I just want it to work with uClibc that's a known and solved problem, I just have to dust it off and update it for current uClibc.
I've known for a while that the only way to fix the gcc path logic is to rip it out and replace it, because it's horked beyond belief. My mistake was trying to do it by patching gcc. That's a neverending time sinkhole, and since the gcc folks will never take a patch from someone who won't sign any copyrights over to the FSF (let alone take a patch that's licensed GPLv2 only), maintaining a big out-of-tree patch wasn't something I was looking forward to. (And now you know how it _got_ into this state. The original cathedral Eric compared Linux against in CaTB was the FSF's development model.)
But maintaining a version of the _wrapper_ that works for Firmware Linux, that I can do.
Of course I don't get to work on it _this_ weekend either. Last weekend, at Eric's working on the 64-bit paper. This weekend, at my sister's in Minnesota. Maybe thanksgiving weekend I'll finally have some time to bang on this...
So my approach to cross compiling in Firmware Linux has always been very different from what TimeSys does. They use TSRPM, a bundle of heuristics that works a bit like "fakeroot", but instead of pretending to be root it selectively pretends to be the target environment. It's very complicated (and unfortunately built around RPM source package builds instead of normal tarball builds). The fundamental problem is that cross-compiling is hard.
My approach to cross compiling (in Firmware Linux) is to avoid as much of it as possible, and native compile under emulation instead. This makes the whole host compiler vs native compiler thing _go_away_, means that you _can_ run the binaries you build during things like ./configure, and if you probe the system to determine whether you're big-endian or whether "long" is 64 bits, you get the right answer.
To make this work, I cross-compile a minimal build environment (currently seven packages), and then run that build environment under an emulator, and use that to natively build a new system. In FWL 0.8 my cross-compiling was glibc to uClibc, and I cross-compiled just enough of a uClibc environment to run it under User Mode Linux and build the rest natively. For 1.0, I need to be able to build a minimal system for each type of interesting target supported by QEMU (x86, x86-64, arm, PPC, and mips) and run it under QEMU to build a final system.
I've been poking at this ever since QEMU got powerful enough to replace User Mode Linux in my build system in 2005. Cross Linux From Scratch finally having a 1.0 release makes this _much_ easier, although I've gotten sidetracked by lots of things. Relocatable toolchain, currently.
The big problem is that building under emulation is slow: UML builds things at about half the speed of a native build, and QEMU is more like 1/3 the speed. But it turns out, that's pretty easy to get around too.
How do you make compiling under emulation work at the speed of a native compiler? Simple: use distcc. Run the distcc daemon outside the emulator, and have the distcc client running in the emulator dial out through the virtual network to run the cross-compiler we made earlier (the thing we built the minimal build environment with before firing up the emulator). The compiler under the emulator is still preprocessing all the source code and running make and ./configure, so all the funky bits that cross-compiling tends to screw up are done natively under the emulator (this also means you don't get the full speed back, although if you want to get fancy you could always put distcc with the cross-compiler on more than one machine), but the CPU-intensive compilation bits are done on the host system outside the emulator.
I've still got to get a cross-compiler to build the minimal environment, though. I've gotten sidetracked into getting an x86-64 toolchain relocatable, which is hard because gcc's path logic sucks rocks, and I haven't had time to work on this for weeks now...
After a long weekend spent at Eric's house, we finally managed to get the 64 bit paper up.
Finished might be too strong a word (there's one more unfinished section Eric's working on), but it's nice to finally have something to point to.
I've mentioned the gimp is a piece of garbage, right?
So the next set of pictures of the coke zero and pepsi one cans came in, and I want to rotate, resize, and crop them. I've been fighting with the gimp for an hour trying to make it do this. It pops up about five useless windows when you start it, and closing some of these makes the program exit (with no warning). The picture I opened has a menu, eventually I found how to rotate the sucker ninety degrees to the left (it's not under "edit", it's "image->transform". There's a totally separate "rotate" mode under "tools->transform" which doesn't give you the simple "ninety degrees to the left" option but instead you have to drag with the touchpad, but luckily I didn't have to use that). Then resize. First find out how to view at the real size the people using the web page will see (view->zoom) but that's not actual resizing, that's under tools->transform->scale except that when it was done I had the small image in the upper left hand corner of a big checkered area. "Ah, it created an un-asked-for layer!" Except that image->flatten turned the checkerboard into a big white area that was still part of the image, which it INSISTED ON SAVING. Ok, select and crop, I want a rectangular selection... Note that you can't do this through the selection menu...
It went on like this for an hour before I gave up on the gimp and went
to install imagemagic. I couldn't find it on the command line (im
In case you ever wonder why Linux on the desktop is nowhere... It's because we suck at this sort of thing. Badly. The kind of suck that takes serious effort to achieve.
Heading to Eric's place for the weekend to work on the 64 bit paper and finally get it out. Oh yeah, this puts me in the perfect mood...
While working on command line parsing (now about 3/4 done) I made a design decision that integers or counters it returns should be longs, because that's always the same size as a pointer so I can zero them all in the first pass without worrying about type. This is guaranteed by the LP64 model, and I added a comment to that with the URL, which I'd also added to something in lib/functions.c (the integer parsing code) and I went "ok, I need to write some kind of readme for this to go in), which turned into a design document, which is now on the web page.
So there's a 3 hour tangent. You wonder why programming goes so slowly sometimes? That's why...
And now it's dinner time.
Got to see the colo facility today. Spent most of the day at it, because the corporate website was down. Beth, Garret, and I went to "the cage" to bang on the old server a bit (which has had apache processes sticking in D state on and off for days; reboot makes it go away for a while but it comes back). A kernel upgrade didn't help, so it's time to swap out hardware. Wheee.
Replaced the old server with the two new servers (with the whole heartbeat failover thing). This has been planned for a while but which we just haven't had the manpower nor the senior management approval to pull off until it became a crisis. (Besides, it would have meant taking the website down and risking that the new one might not work the same way initially, and the mantra from management has been "five nines" for a while. The reply from engineering has been "nine fives" for several months now, and we're pretty comfortable with the achievability of that one given our current manpower levels. :)
Funky little subtleties in options parsing. The longopts structure needs a str member for the name of the longopt it's looking for, but should it be a "char *str" pointing into the source string, or a "char str[0]" at the end of the struct with a dynamically allocated size which we copy to? The first approach has the overhead of the pointer (and a length argument), but the second approach has a redundant copy of the data, and extra code to calculate the size of the allocation, and I probably have to call strlen to do the strncmp() unless I want to add a library function to "match start of string" which java and python seem to have, but libc doesn't.
Either would work fine, it's a question of which is better. Those are the ones that really make me stop and go "hmmm" for half an hour at a time...
And Greg gave his two weeks notice today. I didn't think Greg would be the next one to quit because one of the other engineers already has an offer (but is waiting to hear back from another company before deciding which one to leave for), but it's not exactly a surprise at this point either.
Greg was timesys' last remaining toolchain developer. I've been coming up to speed (and have already been told I'm inheriting Greg's responsibilities), but I generally don't mess with glibc on non-x86 platforms (I do uClibc instead), I don't build toolchains with tsrpm (make them with shell scripts), and I'm not even familiar with half as many hardware platforms as he is. Spent chunks of today trying to get a brain dump from him, and more Wednesday...
That brings the number of engineers in Pittsburgh down to 6. Although they just hired somebody to redo the website in PHP, against the unanimous advice of what's left of engineering. (Because what would we know about it?)
Implementing get_optflags(). It's rather involved.
Today's fun detail is that "cpio -0oHnewc" should be "cpio -0 -o -H newc" but "tar xjfCv blah.tbz dirname files/" should be "tar -x -j -f blah.tbz -C dirname -v files/". So parsing in dash mode behaves differently. Right.
The bigger question is how to represent longopts. I know the general semantics I want: "a(long)". I can't use - as the separator because some longopts have dashes in them, and I'm mostly restricting these to characters that have special meaning for the shell and thus would be clumsy to use as options anyway. (I could add an escape syntax to the opt string parsing if necessary, but I hope it doesn't become so.)
The problem is, some long options don't have an associated character. So I put them at the front of the list. For example, "(long1)(long2):a(longfora)(altlongfora):" which means --long2 and -a take an argument, but --long1 does not. --longfora and --altlongfora are synonyms for -a. The question is, do --long1 and --long2 set the same bit the way --longfora and --alotlongfora do? The answer I'm going with is "no", if you want to group long options you associate them with a short option (even if it's something crazy like \0x7f or some such).
Implementation is seldom the hard bit. Figuring out exactly what I want the code to _do_, and why, is generally the hard bit.
Implemented "pwd" and consolidated the toybox command list into a single #include file (toys/toylist.h). Adding a new command should now involve touching three files (adding it to toys/toylist.h, adding it to toys/Config.in, and adding the actual toys/toyname.c file to implement its functionality by providing toyname_main.c).
My current big hurdle is implementing get_optflags(). Just about all applets need this, and it's nontrivial to do it right. But once I've got that (or before if I don't mind going back and doing a cleanup pass) I can add lots of small, simple applets that are fifteen minutes each to implement (ala pwd).
Writing a new index.html for the toybox web page. The web page is the logical place to have my big applet lists (to-do, in progress, and done). I also need to write up design documentation.
One problem this highlights is the fragmentation between my web directory, my toybox work directory, and the toybox web page. My web directory is ~/www on my laptop. My toybox directory is ~/toybox/toybox (with the first dir full of temporary files, and the second containing the .hg repository). Under my .hg repository is a www directory containing bits of web page for toybox, but that's not the current one. The current toybox web page is in "~/www/code/toybox". On the one hand, it's nice to have the web page in the project's repository, so changes to the web page are versioned and in sync with the project. On the other hand, the "downloads" directory contains files that shouldn't be in the repository, I don't particularly want my working directory (full of various changes I haven't checked in, many of which don't even build and some of which are temporary in nature and I'm going to revert them) synced to the website, and I like having the ~/toybox directory as a nearby dumping ground for todo items and temporary files.
I might be able to symlink "~/toybox/toybox/www" to "~/www/code/toybox" and then just not check in lots of the files. I should probably also symlink "~/toybox/toybox/.hg" to "~/www/code/toybox/.hg" (assuming this doesn't give mercurial fits). That would cut down several copying steps when I publish stuff, then I could just rsync the www directory up to the server and live would be good. I wonder if it would actually work?
Well, the symlink of the .hg directory works...
The reason it takes me so long to make any changes to gcc is every time I start tracing through the code to figure out how something happens, I encounter huge patches of unnecessary crap and start ripping them out. They never remove anything, they just add more workarounds, and then when the complexity becomes overwhelming they add ways to override the bits that don't work, and then ways to override that. In gcc.c alone you've got sysroot and multilib, hardwired crap like STANDARD_STARTFILE_PREFIX2 and standard_exec_prefix, throwing yet more crap in spec files, and of course environment variables galore (COMPILER_PATH, LPATH, GCC_EXEC_PREFIX...), plus the various ./configure --exec-prefix options. They have whole classes of different _mechanisms_ for specifying where stuff might live, and rather than try to get it right in one place they make big paths which they append to and check all sorts of places. Just because gcc installed a file doesn't mean it has the _foggiest_ idea where to find that file.
You could literally delete over half the code in gcc.c without removing anything useful. And don't get me started on the steaming pile of crap that is libiberty...
The problem is, I don't want to maintain a big patch against gcc. Even if it mostly just removes stuff, this codebase changes and any patch will get out of date rapidly. And _my_ patches would never get merged upstream (even if I was willing to sign over my copyrights to the FSF, which I'm not, and even if I was willing to license it under "GPLv2 or later" rather than just GPLv2, which I'm not).
I did an xabspath() which works a little like realpath() except we don't depend on a nonstandard extension to allocate memory for us, and it doesn't stat() the path components. (When blah is a symlink, what blah/.. resolves to is a bad thing to rely on anyway.) Instead it adds cwd if necessary and parses . and .. itself.
So I need find_in_path(), to parse a colon separated path to find a file, but I need two very different syntaxes for it. The first use is things like "which" and execvp, that look at $PATH to find executable files. The second is gcc's #include search paths, which don't care about the executable bit. The bigger difference is that if the file we're looking for is itself a path (like "../fruitbasket"), the "which" style looks for it in the current directory and bypasses the colon-separated path search altogether, while the #include style treats it as relative to each path component. I could almost implement the "which" behavior (find_exec_in_path()) as a wrapper around the "#include" behavior (find_in_path()), except that if the executable bit isn't set on a file we have to look at the rest of the $PATH componenets. So I either need an optional test for the executable bit in the #include behavior, or a way to resume a search, or a way to return a list of results, or a callback...
Went with a hybrid: put the optional test in a function, passed in a flag to control it. This code is useful both for toybox and for beating gcc about the head and shoulder to get it relocatable.
Speaking of which, yesterday I downloaded the codesourcery coldfire toolchain to play with qemu's new coldfire support, and guess what? That toolchain is relocatable.
I'd be getting more done this evening, but the SuSE/Microsoft partnership announcement is leaving me boggling a lot...
Fun in the server room this evening. Piggy and Garrett and I were all working late, and we deicded to go down the street to Wendy's for dinner. When Piggy and I were ready to head out we didn't see Garrett so I checked in the server room for him. He wasn't in there (lights were off), but I stopped for a moment anyway because something was wrong. Took a moment to place it. Ordinarily, you open the door and get an arctic blast, but this time the server room was over 80 degrees in there, and climbing. The air conditioning was running full blast, and accomplishing nothing.
So we spent an extra half-hour shutting down half the machines, arranging fans, and notifying management before we could go to dinner. Luckily, most of the customer-facing stuff got moved to the colo facility over the past couple months (and this is _why_ we wanted that to happen).
If I hadn't just happened to check the server room we probably wouldn't have noticed until morning, but which time all the boxes would have been pretty thoroughly cooked...
I've bumped into some strange bug in Mercurial where "hg diff" refuses to see that the file in my current working directory has changed. Maybe it's becuase I have a random month-old snapshot of mercurial installed on my machine instead of a release version? Upgrade to the current snapshot... Nope, no change.
Darn it. I want to work on toybox tonight. I don't want to debug Mercurial. Even if this is just a strange documentation issue and I did something wrong, I have no way of figuring out what it is and fixing it. I can't work on toybox tonight because Mercurial is broken for me. That's annoying. (This is the second time I've hit this bug, but last time I had a backup of the repository.)
Ok, most of an hour later I've figured out that somehow, when I added the kconfig directory I created a second head. Dunno how. Run "hg manifest" on -r 9 and -r 10. Yeah, all the files from 9 aren't in 10, and vice versa. Beats me.
Merging them back together was darn nontrival, because I hadn't noticed the new head and had thus done more work in the directory since then, and only noticed when I couldn't check any of the results. It insisted the files weren't changed because the new head was only tracking the files I'd added in revision 9, not any of the files that existed before. (The log would show the old versions of course, but "hg manifest" and then "hg heads" showed my repository had somehow developed zaphod beeblebrox syndrome. Right.)
When you do an hg merge and the files in two heads are different, it'll use the 3-way merge tool, or pull up an editor to let you merge the two heads manually. But if the difference is between one of the heads and the contents of the current directory, it can't cope. It just complains about the file being different and aborts, even if the other head doesn't have that file. So what I had to do was run hg merge, move the one file it complained about to a backup name, and then run hg merge again to identify the next file. (Of course it would only complain about one file at a time.) Repeat that until it ran out of files to complain about, commit this (even though nothing really changed, you have to commit to make the two heads go away now that they're merged). And NOW move the backup files over the ones in the working directory, and start commiting the new development you did on the bus ride home.
Not the world's most user-friendly tool.
Happy halloween. I doubt anybody's going to make it up to our door to get the Wendy's Frosty coupons we got to hand out (well that way we're not tempted to eat them), because the entire stairwell reeks of pot (which is presumably what the downstairs neighbors are handing out).
I'm banging on the Linux kernel's kconfig, applying the changes I made for BusyBox to the 2.6.19-rc3 version so I can submit a patch back to mainline. In general, this makes kconfig more useful to non-kernel projects. I dunno if they'll want them or not, but I should maintain this patch anyway.
My patch also lets you build it out of tree (two #includes were missing), which makes it easier to just grab the kconfig directory and drop it into other projects' makefiles. To build it, you go:
#!/bin/sh KCONFIG=linux-2.6.19-rc3/scripts/kconfig # Create symlinks for shipped files so gcc can find/understand them. (cd $KCONFIG; for i in *.c_shipped; do ln -s $i `echo $i | sed 's@_shipped$@@'` ; done) # Build mconf and conf gcc $KCONFIG/{mconf.c,zconf.tab.c,lxdialog/*.c} "-DCURSES_LOC=" -lcurses -Os -s -o mconf gcc $KCONFIG/{conf.c,zconf.tab.c} -Os -s -o conf
And now it's evening and I'm back at home trying to stay away from the front door (which positively reeks of pot; I've been coughing at length). I'm adding menuconfig support to toybox and trying to figure out how the help entries should go: I can make them man page snippets and generate the man pages from the Config file help text, or I can document what adding each help entry does ala "Add terminal control to bbsh. This is necessary for interactive use, so the shell isn't killed by CTRL-C." but then I'll have to have a _second_ set of help text.
Hmmm...
Darn it. A few days ago I spelled "toybox" in ascii with the coke zero and pepsi one cans I have lying around, and took a picture with Piggy's digital camera. Got the pictures back today and double-checking it I found out I spelled out "xoybox". Sigh...
Try, try again. On the bright side, I've dusted off some knowledge of the gimp, which means I once again want to punch its designers in the face repeatedly for being _STUPID_. Dug around, found how to rotate something after five minutes of searching, and then tried to crop and it KEPT WANTING TO ROTATE every time I clicked on the image. It took longer ot figure out how to get it to _stop_ rotating than it did to start, despite the fact that after I've said "I like the rotate it's got now" the first time, rotating it _again_ can only undo that rotate. These operations do not stack. Argh!
Marvelous engine, the gimp. Lovely plumage. Pity the user interface is nailed to the perch. Oh well. (I know it's mostly used as a batch image editing program, triggered from image files on the command line, usually in a shell script. It shows. Trying to use it as a photoshop clone has been an exercise in frustration for years now, not because the engine can't do what I'm asking of it but because the GUI wasn't designed by users of that kind of program. It's like driving a car designed by people who don't drive. They're experts in internal combustion and it gets marvelous fuel efficiency and has wonderful acceleration, but it's totally impossible to steer the thing.
This has not been a very productive weekend for me. The downstairs neighbor has taken to smoking huge quantities of weed, which I'm allergic to. It seeps up through the floorboards or something so the place stinks of it after a while. This is better than his regular 3 am screaming matches ("take the baby" repeated rather a lot, because, according to the scream, she had no place to stay after he threw her out, and it was too cold for a baby to spend the night in the car. I gather it was his baby. (Did I mention I hate Pittsburgh? We're in the nicest part of town we could find...) I went downstairs to talk to him, he's a big black guy with dreadlocks. Apparently, some stereotypes actually do exist. Sigh.
Plus my main concern about moving back up north was seasonal affective disorder. When it starts getting dark outside at 4pm, I tend to hibernate. Whole lotta napping going on the past few weeks, and I still feel tired. (I also might have a sinus infection, hard to tell with my downstairs neighbor regularly triggering an allergy.)
Still, the Te Cafe is nice. I'm very glad to have found that, I've been going to it a lot recently. For one thing, it's not full of marijuana smoke.
So anyway, df needed two completely different control loops: one for filesystems listed on the command line and one for the no arguments version which lists a filtered version of everything in /proc/mounts. The code that reads /proc/mounts into a linked list also adds the contents of stat() and stavfs() to that list (which isn't that big in either case).
In the show-all case, I can zap duplicates when listing all filesystems by comparing st_dev. It turns out I don't have to do anything fancier than this because in the case of undermounts, I get the currently active filesystem when I stat the path so it'll show up as a duplicate anyway.
In the command line case I have to stat each path from argv (since it may not even _exist_, and may be a subdirectory or even a file if it does), and once I've done that I can compare that st_dev with the ones in the mounts list to find the filesystem to display (taking the _last_ match to skip --bind mounts; remember, the list is in reverse order).
The question is how to get the show all case to recognize bind mounts and show the oldest one. Right now, that's getting filtered as a duplicate, but I think what I _want_ to do for each st_dev is show the oldest filesystem with the same device. Hmmm...
I added -Wall to the toybox build and fixed a dozen things. I seriously need to give it a real build and configuration infrastructure. Plus I need to implement some non-stub command line option parsing code.
I'm trying to implement a df command that uses /proc/mounts, and hiding overmounted filesystems turns out to be a real pain.
My general theory is to traverse /proc backwards, and skip any filesystems that are equal to or under the mount point of a later mount. The first reason this doesn't work is mount --move doesn't reorder the entries in /proc/mounts.
For example, in ubuntu 6.06, /proc and /dev are mounted under initramfs, then the new root (/dev/hda1) is mounted afterwards, and during switch_root the /proc and /dev entries are "mount --move"d into the new root. This means that the entries for "/proc" and "/dev" come before the active "/" in /proc/mounts, but they're still visible under "/". (Ordinarily, mounting a filesystem over another one hides sub-mounts in the old overmounted filesystem. For example: "mkdir wombat; mount -t tmpfs wombat wombat; cd wombat; mkdir under; mount -t tmpfs under under; cd under; touch thingy; cd ../..; mount -t tmpfs another wombat; mkdir wombat/under; ls -l wombat/under".)
Also, it's a pain to detect --bind mounts; there's just not enough information in /proc/mounts. Detecting duplicate mounts seems simpler (if the block device field for both mounts is "/dev/hda1", it's a duplicate), except what if you have two tmpfs mounts both named "wombat"? False positive on the match...
My second theory was to solve all this with statvfs, which has an f_fsid field that should uniquely identify each mounted filesystem. Then all I'd have to do is try each entry in reverse /proc/mounts order, and only keep the entries that have an f_fsid field we haven't seen before. (I could still break this with mount --move, because the active entry at a given mount point doesn't have to be the last one in /proc/mounts, but now it requires two moves to screw up a given mount point, and only applies to direct overmounts rather than sub-mounts.)
The problem with _that_ is that statvfs() returns an f_fsid of 0 for every filesystem on my ubuntu, and this has been a known bug for four years now. Same for statfs().
So now I'm poking around at a normal "stat", to see if I can do something intelligent with st_dev, st_ino, st_rdev... I don't have high hopes for this...
Huh. It turns out st_dev is what f_fsid was supposed to be. Yes, even /proc and instances of tmpfs have a unique st_dev. Go figure...
Reading RFC 2131 which is the one on dhcp. I realize that servers out in the wild violate the spec left and right, but it strikes me as a better starting point than slapping wireshark on my laptop and snooping a live transaction with the coffee shop's wireless router. (That comes after I've implemented the spec. :)
There are now links to individual entries in this blog. Yet another use of python...
The new kubuntu is out. Downloading many iso images. Maybe _this_ one will install on my new x86-64 server. (With vnc, I wouldn't need a monitor after the initial install, even moreso than ssh. Plus it's way easier to use a graphical client like adept to install stuff than try to beat dpkg or apt-get into submission from the command line.)
Banging on toybox df, because it's there. Figuring out how to hide undermounts, and reconcile the SUSv3 command line options with the gnu ones (-t, for example)...
I _really_ need to write a dhcp client that isn't crazy. Piggy and I got the isc dhcp client statically linked and embedded in something at work today, and _wow_ is that a mess. It's fielding _four_ packets in a standard transaction (discover, offer, request, ack, with the possibility of NAK in there too), plus it has to make a few calls to ifconfig and route, and maybe write an /etc/resolv.conf file. Yes, it _can_ use a script for this, but it really shouldn't require one. Why is this hard?
Got some work in on toybox on the way home, checked in a small change and it's up in the hg repository on the web page. I should do a decent web page for that.
Deleting environment variables leaks memory. This is apparently a design flaw in all Unix-like systems. (Wow.) I'm fixing it in toybox, though, and might propogate the fix back up to uClibc. (If glibc wants to pick it up from there, that's their problem.)
Fundamentally, the environment variable list is just the next argument to main() after argv[]. This optional third argument (envp) it's another nul-terminated array of char pointers, each of which is a "name=value" string. (There's no length for this one ala argc, but then argc isn't needed either since the last element of argv is always a NULL pointer.) The startup code that runs before main() copies envp into the global char ** variable "environ". (See "man 7 environ" and marvel that you finally can name something useful the 4k of overhead you can't get rid of in all your programs actually does). Functions like getenv() and putenv() use environ, and you can iterate it through yourself if you life (and if you don't care about thread safety, which you don't have to if you just don't use threads).
The tricky part is who owns what memory? A process's initial environment variables are copied into its address space by exec() the same way its command line arguments are. This memory is not part of the heap managed by malloc(), and thus you shouldn't call free() on it.
Adding or replacing environment variables later in your program involves mallocing a new "name=value" string and sticking it in environ[] somewhere. If the current environ[] array isn't long enough, you allocate a new one, copy the old data into it, and update environ to point to the new array.
Of course, this leaks memory left and right.
Since the original environ array wasn't on our program's heap so it's not our job to free it, but if we expand envrion[] and then expand it again, we leak memory if we don't free the first one we allocated when replacing it with the second one we allocated. So we need to know _when_ to free stuff. In this case, there's a traditional solution: have a second global variable (it's static to libc so it doesn't matter what it's called, but let's call it old_environ) that starts out NULL and otherwise points to the environ we need to free. So you can malloc the new one, copy the old one, free(old_environ), and set old_environ=environ. You can even be fancy and use realloc(old_environ, newsize) since when it's NULL, you get a fresh allocation anyway (although you still need to memcpy the data yourself if it was NULL).
But there's a bigger problem. Each "name=value" string is also a chunk of memory. And currently, they never get freed. putenv() inserts the string you pass into environ verbatim, so you can free it yourself if you want to, but setenv() creates a copy neither replacing it nor calling unsetenv() will free it. It just leaks.
There are two problems here. The first is that environ[] mixes strings from two sources, one of which we didn't allocate and thus can't free. The other is that getenv() returns a pointer into the string environ[] points to (not to the start of the string either, but right after the '=') and we don't know when those stop being used.
Currently, there _are_ no possible environment variable semantics that allow you to repeatedly update environment variables without leaking memory. My "fix" is to create a new way to do this that doesn't leak, by adding some extra tracking to setenv/putenv/unsetenv, and adding a new "envfree()" that will actually free the environment variable when approriate, rather than leaking it.
We have to distinguish the environment variables we can't free (the ones that came from exec) fro the ones we can (the ones we added). Every program starts with a fixed number of non-freeable variables allocated by exec() that live outside of your heap. All the others are allocated with malloc() and we can free 'em. (At least in the setenv() case. Passing a constant string to putenv() and then calling envfree() instead of unsetenv() is "pilot error". Don't do that.)
So the first thing to do is maintain a count of the number of unfreeable entries. This is initialized to the number of entries in environ before we do our first modification of the environment (and we know which modification is our first because last_environ is null). Whenever we add a new environment variable, put it at the end of the array (this is already the case). Whenever we remove an environment variable whose index in environ is less than this count, decrement the count. Whenever we remove an environment variable that's greater than or equal to this count, it's something we added and thus something we could potentially free(). (Corner case: whenever we replace an environment variable that's less than this count, remove the old one and then add the new one to the end rather than updating in place. We can always do this if we care more about small code size than speed.)
This keeps track of when you _can_ free an environment variable, but doesn't say when you should. Just doing this blindly screws up existing programs, because what if somebody did a getenv() and kept the pointer around after doing an unsetenv()? So we create a new function, envfree(name), which acts like unsetenv(name) but doesn't leak memory. Then it's the caller's job to get the usage right. (I.E. don't keep pointers from getenv() past the corresponding envfree(), and if you putenv() something that can't be freed use unsetenv() instead of envfree() to get rid of it. Or just always use setenv() which creates a copy already.)
I'm implementing this for toybox's library. When I'm done I can adapt it to uClibc and submit it there, but I'll have to keep the toybox version so toysh works with current shared libraries.
There's a trick to moving large live data sets from one server to another with minimal downtime. (Here "live" means in the process of being constantly updated, like a database or mail archive.) The trick is to run rsync to copy the old data to the new server while it's still live, creating an approximate copy that has the bulk of the data right (albeit with some updates missing and some version skew between files), and then taking the old one down, running rsync again (much faster since the bulk of the data's copied and it just has to fix up the differences), and then putting the new one up.
Wrote up some design notes for the new design of Firmware Linux. This one is QEMU-based and can cross-compile to non-x86, as opposed to the old UML-based x86-only version.
I also dug up my old UML howto and put it in the new writings/docs directory, and copied the cross-compiling howto I wrote for timesys' crossdev site in there (with a link to the original, but it seriously needs to be expanded).
It's interesting to note that of the four people who worked on http://crossdev.timesys.com (me, Manas Saskena, Sam Robb, and Chris Faylor) I'm the only one left...
Back to thumping on the relocatable toolchains. (The "hit the standard build with sed repeatedly" approach, not the tinygcc approach. I'll do that later, it's highly nontrivial unraveling gcc's outright pathological configuration and build process.)
There's some sort of Maintainability Event Horizon gcc seems to have fallen over. Bypassing the stupid things gcc is doing and simply overriding the result with the correct answer after the fact is much easier than removing the brain-damage, cleaning up the mess, and implementing something sane. Of course it's not _right_, but it's certainly easier. This is a symptom of horribly bad design. The main problem is if I do an actual clean-up, it'll hit the standard FSF politics (of which Debian is a mere symptom) and the patch will never go in.
Yes, Debian used to be the official Linux distribution of the FSF. (That's why it says "GNU/Linux" instead of just Linux like everybody else.) But like gcc and glibc it had a falling out with the FSF and the project distanced itself from them. (In the case of Cygnis' forks of gcc and glibc, the original FSF project stagnated until it was declared dead, and the fork inherited the name to save face on the part of the FSF.)
In the case of gcc, the file "gcc/gcc.c" is not just the front-end wrapper thing. It defines a new programming language ("The Specs Language"), implements an interpreter for it, and then proceeds to define all of gcc's command line options in this new language (hardwired into the program). This is the "spec file" nonsense I commented on earlier...
This weekend, I need to get BusyBox 1.2.2 out, and need to get toybox 0.0.1 up. And to get toybox up, I need to update my web page.
Long ago (back at WebOffice) I cloned the guts of the rsync engine in Python, but it was too slow (could only do about 100k/second on a 400 mhz machine). Now I'm once again cloning the guts of rsync, but this is so I can put it in toybox. There is much pondering of mathematical notation and head scratching going on...
Went through the BusyBox applet list and categorized them. The ones I wrote entirely, the ones I wrote most of, the ones I could trivially rewrite, and then the ones that would be work to reproduce (categorized into groups). I have a much bigger toybox todo list now.
A recruiter emailed me out of the blue today. Spent the fifteen minutes to update my resume (something I needed to do anyway), but I'm still not really that interested in moving to California. It's crowded and expensive...
And I pinged my ex-manager David to see if he had any contacts I could use to find customers for the new company I might start. He wasn't against the idea, but wants to wait until I'm not actually working at TimeSys anymore before talking about anything concrete (so he can't be accused of recruiting me away from TimeSys). If I decide to do this right, I also need to ping Eric Raymond, my contacts at CELF and the SFLC, maybe the uClinux guys...
Yesterday at dinner I talked to Fade about starting a new company. She's not exactly enthusiastic, but is willing to go along with it. Now I just have to decide if I really want to do this, and if so when?
I spoke to Mark on the phone today about starting my own business. He's the first guy I'd hire, as soon as I could afford an employee. He's a good programmer, smarter than I am, energetic, outright awesome at paperwork, and our skillsets complement each other quite well. And we've worked together before: Linucon wasn't quite just the two of us (Stu helped a lot), but the main reason I couldn't chair Linucon 2 was Mark didn't have time for it.
Mark's focused on graduating, in May. He has no plans after that. I have until then to get a business together that can hire him. Hmmm...
I suppose it's about time I caught up with the group: all my coworkers have contingency plans for what to do after TimeSys implodes. Yes, that's a "when", not an "if". Since I started at TimeSys, over half the engineering staff has left (Brian, Gary, Bill, Manas, Pragnesh, Sam, Walt, Chris, Ashish, David, Christian, Tony... and Ravi just left on a five week "vacation" back to India. I'm probably missing a few.) In addition to me, this leaves Garrett, Piggy, Beth, Jeremiah, Joseph, Macej, and Greg. That's it. (Outside of Engineering we lost Sandra the human resources lady and Miranda the accountant. As far as I can tell, human resources and accounting are either no longer being done, or have been outsourced.)
Of the remaining engineering staff (other than me), I know of three actively looking for a new job (as in they've already updated their resume and sent out copies to places), two are "networking" (which is essentially looking for a new job without sending out resumes yet), and the other two I don't know well enough to ask.
I've largely ignored this up until now, but it's become increasingly difficult. I know five people who have gone to Larry the CEO about this. Three of them are no longer with the company. My previous boss (David) had a 13 page list of things needing fixing (alas, I didn't get to read it). David tried to give it to Larry the CEO as a PowerPoint presentation, which Larry the CEO cut off on page 1 insisting nothing was wrong and that he didn't want to hear it. I mentioned David leaving on October 3. (Garrett inherited his code.)
The remaining engineers (and our manager, Al, who is a nice guy but as far as I can tell doesn't code) had a meeting with Larry the CEO on Monday which we thought would be about the morale problem, but which instead confirmed that Larry continues to live on a river in Egypt.
Meeting with Larry is always tricky, since just about everything that's wrong with the company can be traced back to him if you try. I brought up one specific and relatively innocuous problem (a gratuitous multi-week delay in the Mercurial migration), and Larry cut me off and angrily blamed the problem on my ex-manager David.
I was in the meeting where Larry caused this problem, directly ordering me and Piggy to go talk to another group to confirm they had nothing to do with it and wouldn't be impacted by it (they didn't, and we already knew this, but we had the meeting anyway), then write up a report on the meeting and wait for Larry's approval to proceed. This approval was supposed to come the next week (why would it take a week? That's what Larry said.) But the approval never came, unless you want to count Monday's meeting. This is not something I heard secondhand: I was there. I heard Larry speak. How do you tell your CEO to his face that he's lying about something you personally witnessed?
That's the point at which I started making post-TimeSys plans.
The depth of stupidity in gcc is awe inspiring. I'm trying to build some of the "gcc/c-*.c" files, and they want to #include "options.h", which is a generated file. The comment at the top of the file says "This file is auto-generated by opts.sh", except the gcc source code doesn't contain a file named "opts.sh". What it _does_ contain is gcc/opth-gen.awk which is spitting out the blatantly incorrect comment.
It's not that the code has changed a lot over the years. That's normal, I expect that. It's that it's never been cleaned up. They go in and make changes that render other parts of the code pointless and stupid and actively wrong, and they never change any of the other parts to match. And I can't think of a better example than opt-gen.awk generating a file and writing the comment "This file is auto-generated by opts.sh" into it. If a human being ever looked at that line in the awk script, they either didn't bother to fix it, or didn't have the authority.
It's ALL like this. This is what I'm up against in trying to come up with a clean subset of this CRAP. It's a giant mass of scar tissue, hideously overcomplicated and working at cross-purposes.
By the way, opth-gen.awk consumes output from opt-gather.awk. Guess what's at the start of opt-gather.awk? Right after the full page of gratuitous GPL boiler plate the first function is a BUBBLE SORT IMPLMENTED IN AWK. You can't make this stuff up. Did anyone ever stop to go "is this a good idea?" No, they did not.
There was an article in the Pittsburgh newspaper about start-ups eschewing venture capital, and instead trying to become profitable with much smaller initial investments from savings, friends and family, credit cards. I knew about this already, but it's good to be reminded of it.
And yeah, between the venture capital panel on the 14th and the "you don't need veture capital" article in the business section, I can read the tea leaves. I should start a business after TimeSys implodes.
Updating my home page. It's been around three years since I've touched the thing, and it was stale then. If I'm going to revive Firmware Linux, host toybox, and do tinygcc, it should be on a page that qualifies as "not dead".
Also made a checkin to toybox, the first one in a couple weeks. Now toysh is caught up with bbsh. Woot.
Fabrice showed up and made a checkin to tcc, the first one since February. Unfortunately, this means my fork is out of sync with cvs. If the maintainer's taking an interest in the project again, I should stay out of his way. Oh well, I've been poking at tinygcc anyway. Not much to show for it. (I did print out around 600 pages of documentation on gcc internals, though. That'll take a while to dent...)
Today was Christian's last day. He has a tendency to use "fish" as his standard metasyntactic variable (temporary filename, etc), and I did indeed get to say "so long, and thanks for all the fish" to him on the way out. Eight of us left...
The depth of the stupidity of gcc is just impressive. In the gcc 4.1 source code, go look at libiberty/make-relative-prefix.c". Starting around line 200 there's a function that finds the current gcc executable (by examining argv[0] and searching $PATH if necessary). This explicitly exists to allow gcc to be relocatable.
But in obvious places like add_standard_paths() in gcc/c-incpath.c, it doesn't call make_relative_prefix(). Instead there's code in the ./configure files, and libtool, and even a check in gcc/gcc.c to abort the build if you don't supply absolute paths so they can hardwire them into the binary.
So they went the trouble to create infrastructure for something, and then went out of their way to disable that infrastructure elsewhere. Dealing with the FSF is like watching Sun Microsystems: everybody is busy undoing everybody else's work.
This is actually fairly easy for me to fix, but I'm going to have to maintain my own patch, because the GCC developers don't accept patches. Oh well, that just means I can license it GPLv2 only. :)
Remind me to write a rant about orthogonality and use make_relative_path() as an example. And reinventing the wheel (save_string/strndup).
Was gcc written by people totally unfamiliar with the concept of a compiler? I'm not talking about the guts of the thing. The math-inspired bits parsing C into an intermediate representation, running optimizer passes over it, and spitting out assembly. That gets a lot of debugging.
No, what amazes me is that "gcc--print-search-dirs" doesn't tell you where it's looking for #include files. There are only two interesting search paths the compiler has: where it looks for #include files and where it looks for libraries. Source code comes out of the current directory (or where -c points to on the command line), and output files go into the current directory (or where -o says). That's it, that's all that's interesting.
When the compiler tries to exec another tool (like ld) it should use $PATH just like any other program would. Not that gcc does this, because it's special. Short-bus special. Feed it "--print-prog-name=ld" and boggle at the stupidity. It should not be messing with this. It should execvp like everybody else. I don't _care_ if it doesn't workon Solaris. Fix solaris.
And don't start on spec files. Spec files were introduced to try to overcome some of the gaping design flaws in gcc, but the spec file 4.1 actually _uses_ is hardwired into the source code because they found out it didn't really work.
In other news, libmudflap and libssp continue to be horked (they don't like cross-compiling), but they're easy enough to disable once you figure out how.
Sigh. You wonder why buildroot and crosstool and CLFS spent so much effort building toolchains, like it's black magic? It's not black magic, it's just that the complexity of gcc's internals (like the optimizer pass) scared away everybody who knows basic engineering (like using $PATH instead of reinventing the darn wheel). The solution is to RIP STUFF OUT until you reach bits that work.
Spent most of the day at Carnegie Melon. My co-worker Beth dragged me to a panel about Venture Capital, where Joel the Venture Capitalist Who Owns TimeSys was one of the panelists. We pondered telling him TimeSys is imploding, but wound up not doing so. (It's his fault for leaving Larry in charge after Sandra told him the same thing a few months ago.) Finally got back to banging on toolchains late in the evening. (After watching highlander episodes from Netflix: wow that series is worse than I remembered, or at least the acting is in season one.)
Previously, I got about halfway through playing whack-a-mole with the darn non-absolute paths. There's plenty of code in gcc to turn relative paths into absolute paths relative to where the gcc executable is, and there are comments that the code was made fully relocatable (so you could take a compiler and its support files, tar them up, drop them in another directory, add them to the path, and it would jsut work). This would let people install a cross-compiler in their home directory without needing root priviledges. Good thing.
Unfortunately, gcc is written by a large number of people who are not only unaware of each other's agendas, but don't really seem to like each other. So this stuff was badly sabotaged, and fixing it up involves ripping lots of stuff out. About halfway through, I noticed the "sysroot" support, and after a bit of misleading googling thought "maybe this has something to do with making a relocatable toolchain". Turned out to be a blind alley: if you feed it a sysroot it can't find your host system's header files to build against anymore. So today I'm back to the whack-a-mole approach.
I intend to get tcc to do all this _right_, but at the moment tcc doesn't build lots of the packages I need (like an unmodified Linux kernel, and qemu, or busybox thanks to the lack of dead code elimination in the optimizer), and it also doesn't have many targets yet (I'm trying to get x86-64 to work, which isn't a platform tcc can output code for yet). So I'm getting gcc to work first, then worrying about tcc afterwards.
Ok, the _reason_ my mailserver has been down is Eric finally got his DSL replaced with a fiber connection. (He's happy. It's fast!) Unfortunately, it has a new IP address. Bye-bye 66.92.53.140, hello new one I haven't got memorized yet (71.162.243.5).
So my domain is parked on a server called "grelber" in Eric's basement, which also does the thyrsus.com domain. I've been meaning to move it to the new x86-64 server I'm setting up on my cable modem, but after having a static IP in Austin for a couple years and never doing anything with it I decided to get the server ready before upping my cable bill this time. The server isn't together yet, and it'll take another day or two to squeeze a static IP out of the cable company after I get that together.
So for now, getting my domain to point to grelber's new address is the immediate solution. First, I update my email sending and recieving scripts (which I hardwired the IP address into after a few too many experiences with flaky coffee shop DNS). Grab my email: and I have 15 new messages (including spam). I usually have a few hundred, and not having checked email in a day and a half I'd expect well over a thousand. Apparently I last checked my email shortly before the IP switch, and these trickled in before that.
Grelber has my domain's DNS, email server, web server, and an ssh shell account with a directory I backup stuff to. The only one I can access purely by IP is the ssh, and doing so shows that Eric missed updating my zone file; trivial to fix with vi and a quick "killall -HUP named" later, doing a "dig@71.162.243.5 landley.net" shows the new address, including with -t mx so mail can flow. Except it's not that simple.
The next problem is that I have to tell the upstream .com servers where my new nameserver is before anybody will look at what that nameserver contains. I only have one nameserver because my domain only has one machine. My mailserver, web server, and my domain's ssh account are all on the same IP address, and if that machine goes down, there's nothing for the dns server to point at anyway. So the DNS server might as well be on that machine as well. This confuses the _heck_ out of the DNS infrastructure (which _insists_ on redundancy, even if what it's pointing at isn't redundant). Over the years I've mostly hit it with a lead pipe in the right places to get it to shut up, but it's cropped up again.
According to whois the domain registrar for landley.net is ns.com (it used to be tucows, but they got sold). Grubbing around on my hard drive found my login information for their web page, and after struggling with the _horrible_ user interface for a bit (when you list domains it says click on the domain you want to edit it. This is wrong; none of them are clickable, the way you edit a domain's namserver is to back up to the previous page and select "add nameserver". Remember: you specified the domain when you logged _in_.) It says that my nameserver is grelber.thyrsus.com, which points to the old IP (66.92.53.140). Ok, edit nameserver: it'll let me change the name. I don't want to change the name, I want to change the IP. Ok, add nameserver: if I type an IP in here it says it's not a domain name. It wants me to list nameservers BY DOMAIN NAME. Ok, delete the old nameserver, add the new nameserver, "grelber.thyrsus.com", and... It fetches the old IP for it, not the new one. This is just brilliant. Where is it getting the old IP from? Does it have it cached?
Ok, let's drill down from the root servers. "dig ." lists a root server (but just a name, not an IP address), "dig @A.ROOT-SERVERS.NET com" lists the 13 .com servers, and "dig @a.gtld-servers.net thyrsus.com" is saying that thyrsus has two name servers, and for the one under thrysus.com it's handing out a cached copy of the old IP, with a 2 day timeout. Maybe this is where ns.com is getting it?
The reason Eric's email is working is he has two nameservers, and it's failing over to the second. So, tell ns.com one of the other names under thyrsus.com: snark.thyrsus.com is a CNAME, it doesn't like that. The actual A record is on thyrsus.com, but ns.com says that doesn't resolve (perhaps it's confused by the zone delegation at the same level)? Ok, how about the alternate name on the SOA: root.thyrsus.com... Nope, it won't take that either.
As far as I can tell, ns.com is refusing to fail over to the second nameserver. Did I mention how totally broken it is for the nameserver infrastructure to keep track of nameservers BY NAME? Try sticking "nameserver landley.net" in your /etc/resolv.conf and see how far you get. The point of DNS is to turn names into numbers, not to turn names into OTHER NAMES. They had to bolt on an "additional section" to the protocol to make this work at ALL, and that passes in cached information that you have no way to tell it to invalidate and refresh. (You've just got to wait for it to time out.) But somehow don't consider this a design flaw. Sigh.
Right. My co-worker Piggy synced my domain to a completely unrelated nameserver in chicago, and ns.com managed to find THAT one and use it. And I'm already getting spam again.
I still need to get the new server up and running...
My mailserver is down. Ordinarily this is the case for 5 minutes, but today it's been down all day. Annoying.
It's definitely toolchain week here. I've now gotten a cross toolchain for x86-64 probably working based on the Cross Linux From Scratch instructions, with a couple of caveats.
First, I say "probably" because although it built a "hello world" that claims to be an x86-64 binary and won't run on my laptop, it turns out QEMU doesn't currently have an x86_64 application emulation mode. So I can't test it.
Secondly, the paths are all horked. I had to use strace to figure out where it was looking for ld64-uClibc.so.0 and create a symlink so it could find the sucker. But more to the point, long ago a guy at cygnus (now at codeweavers) made most of gcc work with relative paths (relative to wherever the gcc executable lives), and that infrastructure is still there. So even though the makefile and ./configure and a few checks in the source code barf if you don't give it absolute paths, all you have to do to make the code work is REMOVE THE CHECKS. Yes, that's deeply stupid. Welcome to a project maintained by the Free Software Foundation.
So I've backed up and I'm ripping the paths a new one. This isn't a complete fix, this is just a whack on the head with a hammer. All _sorts_ of things are wrong with this package. (For example, gcc's tendency to look in a dozen different places for things it itself installed is hilarious. This is clearly a project that has no idea what it's doing. Why binutils and gcc are two different packages is another open question.)
Huh, turns out that binutils has grown an absolute path too. And fixing that is nontrivial, because binutils uses libtool. Did I mention that libtool is this hideous useless thing that serves no purpose on Linux but to screw things up? Literally. The point of libtool is to make building shared libraries work the same on lots of different platforms (like AIX and Solaris), and it does this by applying the limitations of each to all of them. It creates "equality" by making them _all_ suck. Again, deeply stupid. In this case, libtool insists that ranlib must have an absolute path (which is wrong). Did I also mention that libtool, like configure, is an enormous ugly shell script?
Memo to self: looking at process_options() in gcc/toplev.c, and it's bedtime.
I'm not maintaining tcc. I did convert the tcc cvs to mercurial (tailor apparently works), applied some of the outstanding patches to it, and anybody who wants to can clone the tree with mercurial (hg clone http-static://landley.net/code/tinycc) or grab the tarball.
But that's not the same as maintaining it. The marvelous thing about mercurial is anybody who wants to can clone my tree and apply their own patches, and trivially re-sync the trees. (With modern distributed source control, the revision history can branch and come back together. It doesn't have to be linear. This is cool.)
The server saga continues. Ubuntu server is evil; there's _way_ too much Debian in it.
So with the SATA power adapter, the new x86-64 server's hardware is finally happy. (Modulo having one too few power outlets in the area so I have to unplug Fade's computer's speakers, and still having to borrow Fade's monitor to poke at the thing.) The software, however, is being evil.
So kubuntu-x86-64 doesn't actually seem to boot on the thing. I don't know why, it's something wrong with the bootloader on the cd which refuses to hand off control to any kernels. The very first time I tried to use the CD I managed to get memtest86 to run, but I can't even reproduce that anymore. It refuses to load any kernel. Since I refuse to touch anything with Gnome on it, I've fallen back to Ubuntu Server.
That at least installed, and is sitting on the drive right now, being useless. Why useless? Because it didn't actually install anything useful like sshd or gcc. And without gcc I can't even compile other stuff from source, so I have to use their package management system. Unfortunately, all the decent wrappers that hide the horrors of dpkg are GUI based, and from the command line I've only got dpkg, dselect, and apt-get.
The first, dpkg, won't download anything. I moved on to apt-get but can't get it to list the available packages. (I can get dpkg to show me what's already installed, but not what's available to install.) This was _after_ fishing around in /etc/apt to rip out the line that said "first check the CD that the install said was done and which went back in the pile of rewriteables." It's ON THE FTP SERVER. Go get it from there! Grrr.
If I could have gotten apt-get to install sshd I could at least fiddle with it from my laptop (I can leave Fade's speakers unplugged for a few hours, but she does kind of need her monitor), but apparently it's not called "sshd" and I can't get a list. (No biggie, I prefer dropbear anyway.) So I really need a compiler.
"apt-get install gcc-4.1" managed to claim that the repository is broken (this package is referred to by other packages, but doesn't exist!), but just plain "apt-get install gcc" installed 4.0. Ok, I'll live. But what it did _not_ install was any of the header files like stdlib.h. (Obviously a compiler has no need of that.) I vaguely remembered (from previous use of adept on kubuntu) that it's in a glibc-devel package, but that's not the name of it, nor is it glibc-dev, devel-glibc, or dev-glibc. Right, time for dselect.
Ok, whoever created dselect needs to be harmed. Step 1, "I'm about to rewrite your config file, replacing ubuntu's servers with debian's servers." NO, you are not, manage to abort that in time, moving on. Step 2, hit enter a few times hoping to get to a list of packages, but instead it goes "I'm about to upgrade lots of the existing packages you just installed." Alright, fine. I didn't ask for this, but I'll live. When it's done with that, go _back_ into the darn menu and hunt around for a package list. When I find it, it's an _evil_ package list that's almost completely illegible and starts by telling me how excited it is about new arrivals, like packages for supporting the HFS filesystem. I do not own a macintosh, go away. Cursoring around isn't helping me find whatever glibc-dev is called, so I think for a minute and go "the bastards who think this is a good UI probably took vi as a model" and sure enough, forward slash brings up search. Feed it "glibc" and it gives me a documentation package (glibc-doc). Right, progress! Forward slash and enter to bring up the next one... and that is apparently _not_ how you do next here. Right, forward slash type glibc again, hit enter... It brings up glibc-doc again. Three times in a row. Brilliant. Give up on search, page down a lot and look through the actual entries. Eventually find a cluster of development shared libraries, and the one I want is "libc6-devel". Sigh. Ok, cursor over that one and hit enter: AND IT STARTS INSTALLING THE HFS PACKAGE! The one it was so excited about earlier, which I NEVER SELECTED. It's not installing the one I was on, it's installing _three_ other random packages that I have to remove now, and it's lost my place so I'd have to go find it again.
Yank server's plug from wall. Plug Fade's speakers/monitor back in to her computer, (and her keyboard, which was handy). Drag laptop to coffee shop, rant a bit into development log, then pull up Cross Linux From Scratch and go BUILD A SYSTEM FROM SOURCE for the new server.
I hate Debian. At any point, if Ubunutu starts acting too much like Debian, I rip that bit out. I'll try again with kubuntu 6.10, since I want to be able to bring up a remote VNC console for the server anyway so having kde on it (not normally running) isn't actually a _bad_ thing. It's got a 250 gig hard drive, I can afford the space for KDE.
(Hey, and more fun: I can't seem to install even the server x86-64 image on qemu-system-x86_64. It hangs at the bootloader. Wheee. Pinging the list. Not much fun cross-compiling my own system if I can't get a reference "known good" system working under the emulator...
Ah, telecommuting again. I can actually get stuff done today. Yay!
Almost done with BusyBox 1.2.2, then I can get out of Denis' way.
Apparently, SuSE is thinking of droping Reiserfs in favor of ext3. Cool. The author later clarified that this isn't likely to happen before SuSE 10.3, and that it has nothing to do with Hans Reiser's continual flamewars on the kernel list (I last participated in this particular flamewar four years ago, and apparently nothing's changed since then), or Hans' current legal troubles. It's that Reiser3 has been abandoned by namesys, and Reiser4 is a totally different filesystem they're not interested in moving to.
Grabbed the sata power adapter on the way in this morning, and Fade took it home with her she met me for lunch. Once I get the server installed, I have to configure sshd, httpd, dnsd, and smtpd to move my domain. And to replace the linksys I have to configure dhcpd and firewall rules, plus dig up a second ethernet card and a hub. Then call the cable modem company and get a static IP. Then migrate the domain...
But first? Geek night.
Or not. Dealing with the Pittsburgh bus system is bad enough on my morning commute, but when the bus doesn't come for 20 minutes while it's raining and I have to transfer to a second bus to get where I'm going and it's an optional trip? I should have just driven downtown, except that there's no parking anywhere in Pittsburgh. (This place said it had parking, but I don't actually believe them.)
It's evening, I can work on toybox. Making decent progress getting the erstwhile bbsh adapted. (It's not quite "ported over", more cutting and pasting snippets around a different core, and backfilling library functions as I find I need them.)
I really need to get busybox 1.2.2 out today, just to be done with it. Right...
Saga of the server: the first order arrived on thursday, and I found out newegg's idea of "barebones" x86-64 system does not include a processor. Ordered one, and it showed up tuesday. Installed that, but all my spare cdrom drives and keyboards were in storage. Brought that home today, borrowed Fade's monitor, burned an ubuntu x86-64 server install CD (for some reason the kubuntu CD I tried burning first is unhappy; the bootloader comes up but it refuses to load any kernel) ran through the install, and it hung with a blank screen after hardware detection. Figured out why: /sys/block doesn't have any hard drives. Rummage around online to confirm that SATA is not powered through the data cable (even though it's roughly the same PHY as USB 2.0, and they're all using the transciever technology developed for gigabit ethernet). Nope, it needs a power connector which needs a power adapter, and if the drive came with one of those it probably got thrown out in the box. Right. Off to the corner computer store in the morning to pick up a sata power adapter.
Toybox lurches forward, slowly. I may be getting more done on the CLFS/QEMU update to Firmware Linux. Currently trying to bolt 2.6.18 kernel headers onto that, I may need a 2.6.19-pre snapshot.
Here's a news article on that sinkhole next to the Timesys building I never posted when it happened a few months back. The street's long since recovered (although when they filled it in and put in a new street, they put a manhole cover where the sinkhole used to be, and had people down it for a couple weeks afterwards repairing various things that had gone wonky. Our front elevator eventually dried out, and has been back in service for weeks now. The incident seems entirely forgotten now (welcome to Pittsburgh: the plumbing leaks, deal with it), but I was clearing old emails marked "todo" and the link was in one of them.
Once again, the bus was standing room only when I got on, and about another 20 people got on after me. Don't ask me how. The white line behind the driver means _nothing_ in this town, there were a half-dozen people who couldn't et back behind it and the bus continued on, spending a good 10 minutes behind a garbage truck that would stop every 50 feet to pick up trash. The frightening part is I'm getting used to it.
Aaaand my boss (David Mandala) quit. Darn it. The department saw this coming a couple weeks ago, we just hoped we were wrong. Christian quit too. Let's see, since I arrived Brian, Gary, Sam, Pragnesh, Walt... There's a downside to the economy finally unfreezing a bit after six years of Bush.
Hmmm... Do I need a multiplexer applet in toybox? It's nice for testing, but the standalone shell pretty much handles this already. On the other hand, removing the multiplexer doesn't really save any space since the functionality still has to be there for argv[0] instead of argv[1]...
Alas, when I got to the coffee shop at 3pm there was a sign saying it would close at 4 for Yom Kippur. Didn't even pull out my laptop, just read my book. They finally kicked me out at 4:30 and I wandered to the hoagie place (which also has wireless), but on Sundays they close at 6. I'm still not much interested in 61C since I figured out they intentionally removed all but 2 outlets to discourage people with laptops from hanging around, so I came home.
Alas, it's hard to work here, due to four cats. At the chair at my desk Dragon claws or bites me every 30 seconds, demanding attention. Peejee steals my chair whenever I get up from it. I retreat to the bathroom and Aubrey goes "oh, my turn", and meows at the door if I lock her out. (George is happily lying on the couch watching all this.)
Anyway, grabbed my first drop of bbsh.c as a template for toysh.c. This may seem a bit odd since I was previously working on a _third_ drop of bbsh, but since I'm designing toybox differently I have to adapt my old BusyBox code fairly extensively, plus there's a lot of infrastructure I have to recreate anyway. So I might as well get the tiny one going first, and look at the diff between drop 1 and drop 2, and then drop 3.
In some ways, the new project is as much like nash as it is like BusyBox. Nash is a minimalist shell with things like mount bolted onto it (part of Red Hat's mkinitrd package). With toybox, the first interesting applet is a shell and although the other stuff can be called without going through the shell, I'm likely to use the shell a lot testing it. I'm building in standalone shell mode from the ground up and I'm putting into practice my earlier comments about nofork annotation of applets, so there's none of BusyBox's strangeness with duplicate "echo" implementations and such. Here it's not a question of cleaning up apps, but designing and labeling them properly from the start. I don't need the weird mini-applets of lash, either: just repurpose the main applet infrastructure with some nofork applets (such as "cd" and "exit") which are also selected by CFG_SH (which is always defined, 0/1 ala ENABLE).
The three main things remaining before I can get toysh drop 1 up and running are error_exit (need to teach it to take multiple arguments ala printf), get_optflags (totally new implementation I've had in mind for a while, which doesn't call getopt() at all, nor permutes command line arguments), and exec_toy (need to work out how "." and "exec" work in with the nofork annotation).
The project has developed two subdirectories: "toys" where applets reside, and "lib" where code shared between applets goes. At the top are a few files like "main.c" and "toys.h", plus the Makefile, LICENSE, and todo.txt. There's a lib/lib.h that's included from toys.h.
I would dearly love to wander off into a cave (well, coffee shop) for a couple weeks to get this project properly launched, but I doubt TimeSys would be up for that. Oh well.
Resigned as BusyBox maintainer. I just can't do it any more. I'm putting together 1.2.2, and then focusing on Firmware Linux and Toybox and doing a proper miniconfig shrinker (as C code rather than a shell script), and maybe poking at qemu and tcc a bit... And of course catching up on everything _else_ I'm supposed to do at TimeSys.
Server still hasn't shown up. Maybe today.
Still taking a break from BusyBox. I suspect in another week or three Denis will have made it unrecognizeable and I'll just hand maintainership over to him. He has better technical judgement than Bernhard, but the downside is he'll happily add 20 lines of code to fix a minor defect, and the end result is going to be a very bloated BusyBox. (I despise shared SVN. I don't get to decide what to apply.)
In the meantime, I've started new project, "toybox". It started with a single executable that could do different things when called by different names (which gzip was doing back in 1993, long before BusyBox even existed). Now I'm adding "df" to it, since I was in the middle of writing a new one anyway.
According to the tracking web page, the parts for the new server have made it to Pennsylvania. Should arrive by the time I get home from work. I should bug the cable modem people about upgrading to a pro account.
Welcome to my nervous breakdown.
Apparently yesterday's teleconference was just me, Erik, and the SFLC lawyers, without Bruce actually being in it. On the face of it, a good thing. Unfortunately, the outcome was the last straw. Yes, shipping the next version of the project GPLv2 only is allowed. Yes, adding license notices to each individual file I change to show that the new derived work is GPLv2 only is allowed too. But removing the old (now incorrect) license notices is not allowed. What the...?
It turns out that GPLv2 is sloppily written in places. Section 1 says you must keep license notices intact when you redistribute verbatim copies. And even though that section seems intended to apply _only_ to verbatim redistribution, section 2 (about derived works) incorporates section 1 by reference, so you can't change the license notices in a derived work either unless you get permission from the original author. So if you drop a dual license that mentions the GPL, you have to leave the old dual license notice with an addendum saying that the old notice is not correct for the current file.
That's REALLY stupid. I don't personally think it's actually enforceable. The lawyers themselves compared it to jaywalking in terms of severity. But I no longer care.
The problem is my stress level from having to deal with Bruce for the past couple weeks. Yes, this is a technicality that doesn't actually _mean_ anything, and the lawyers confirmed that Bruce is wrong in substance. But during the call I got a mental image of Bruce pointing and laughing, and I just can't stand working on any code base he's ever had anything to do with anymore. He's still taking credit for the success of a project he hasn't touched in ten years, dictating terms to people who are doing real work when he is not, and the end result is that it's just no fun working on BusyBox anymore. He took my favorite hobby and crapped all over it, as is his wont.
Maybe I'll feel better about it in a few days, but I doubt it. The lawyers didn't quite say in so many words that the only way to fix the license notices is to start over from scratch, but it got me thinking. I can probably recreate all the parts of busybox I actually use in about a year, and it might not even take that long if I re-use bits of my code and Erik's code that I've verified nobody else touched (so no other clearances needed). And I have a list of a half-dozen global cleanups I've been meaning to do that I can just get _right_ in the new project from the start.
Considering how often I rewrite things anyway (I've rewritten mount.c three times now), it's awfully tempting...
In other news, I tracked down how to get a cable modem setup like I had in Austin for approximately the same price, and Fade ordered a 64 bit server last night. We can move fadeaccompli.net and landley.net onto that server, and if I keep working on my new toybox project I can throw it on there too. (All a mercurial server really needs is http...)
So, about half an hour ago I finally got to sit down in a coffee shop with my laptop for the first time in weeks, and since then I've had three phone calls. No phone calls all day before that. Three different phone calls since I sat down and tried to concentrate.
I'm going to go read a book.
Why won't Bruce go away? Now there's a teleconference with the SFLC on Monday to try to placate him still more. Why do I have to be involved? I'm happy ignoring the prima donna, but the SFLC apparently still considers him relevant. Go figure.
I suppose this is his way of preventing any GPLv2-only software from getting written: throw continuous temper tantrums to suck up the spare time of people who would otherwise actually be accomplishing something he doesn't want to happen. (He's not affecting the project's direction, but he's certainly slowing it down...)
In other news, switching argc and argv over to globals _is_ making the thing bigger. Annoying, that. Make allnoconfig grew 38 bytes, only 8 of which I can easily account for. I'll have to look at the assembly output, but not tonight...
The darn bus keeps finding new ways to prevent me from working during my hour-plus commute each way. Today, the bus has a driver that seems to think about 5 g's of acceleration is the correct amount for each stop. It's also unclear whether or not this bus actually has shock absorbers.
I love my job. I despise the commute.
My blackfin board arrived! (It was waiting for me on my chair at work.) Technically it says it's a coldfire, but it's a piece of nommu test hardware. Cool! I can test out the serial terminal size probing thingy we've been discussing on the list. (I've been teaching bbsh to do it, but haven't had a test thing since normal xterms and ssh and such get this right anyway, only serial consoles and such don't. Yay test environment.)
Of course, in order to test _that_ I'll need to get a nommu Linux running on the hardware (actually, it may have one), and a cross-compiler that can build BusyBox for it (it has CDs... The documentation talks about using it from Windows 98 or XP, great...)
Got BusyBox's license.html and LICENSE files updated for GPLv2 only, and got the thumbs up on the wording from the SFLC guys. (Bruce Perens can go hang.) I also added a copyright notice to BusyBox itself, which the SFLC guys suggested because it makes license enforcement easier and potentially ups the penalties. (It's also an extra 128 bytes. Need to take a look at reclaiming that somehow.)
The people sending me the blackfin board apparently have my old address (my fault, sorry) and mail forwarding doesn't work for UPS or Fed Ex. This would explain why the board hasn't arrived. I pointed them at my work address (which isn't going anywhere, and I still don't quite remember the new apartment's address off the top of my head).
Finally, I'm getting back to work on bbsh. A neat trick was suggested on the mailing list of using ansi terminal size querying sequences if you don't otherwise know your terminal size, and the obvious place to put that is bbsh's interactive mode startup code. (Set $COLUMNS and $LINES from that.)
Arrrrr! The pirate's favorite consonant!
Fade rescued me from dealing with another Bruce eruption and dragged me out to see Little Miss Sunshine. Not exactly a relaxing movie to watch, but reasonably entertaining subtle dark comedy.
Pondering adding dmcrypt to busybox. Having to configure it at work, and it's waaaaay more complicated than it has any real reason to be. (It wants chunks of e2fsprogs in order to build. Why?)
Sigh. Didn't get a single useful thing done this weekend. Instead I spent it conducting a forensic analysis of the BusyBox code to either prove that BusyBox no longer contains any code copyrighted by Bruce Perens, or to else identify and remove any such code. The current snapshot is here, and I need about one more day to finish it, but I have work tomorrow.
Bruce Perens cropped up on the mailing list after about ten years of silence, and managed to work through my goodwill in a half-dozen emails. By friday, I had a reply to him that I saved into my drafts folder and sat on for over a day to cool off. And then started on the above analysis.
This week I learned the words "Schlemeil" and "Schlamazael". The best definition I heard for Schlameil is "A person so inept even inanimate objects pick on them", although the classic definition of Schlameil is apparently someone who goes into a fancy restaurant and makes the waiter spill a bowl of soup. The Schlamazel is the guy the soup lands on. (No, there's no standard spelling for these, it seems. Different alphabet.) A schlameil is someone who can't help but make a mess out of any situation they involve themselves in.
This forensic analysis is my way of declining to be Bruce's Schlamazel.
Yes, I'm merging GPLv2 only code into BusyBox, and I don't care what Bruce says. It's looking like the next release (December) will be BusyBox 2.0.0 instead of 1.3.0, what with the simplified license, the new unified shell, scripts/individual, possibly the new build infrastructure... Have to ask Erik what he thinks about bumping the major number.
For those of you I just confused: Erik Andersen made BusyBox what it is today. Bruce had declared BusyBox finished and abandoned it 2 years before Erik started looking around for an embedded command line package and found out he had to make one. If BusyBox hadn't existed, Erik would have created it. I have great respect for Erik, because he's earned it.
Bruce meanwhile let the project rot without finding a successor, either because he didn't care or he didn't see any further potential in it. In the seven years Erik maintained BusyBox Bruce didn't post to the mailing list even once. Bruce's web page still points to Lineo as the web page for the project which hasn't been the case since 2001, and he apparently only knew about the existence of because according to oldnews.html, Erik asked him for permission to start a BusyBox website there. And after almost a full decade (1996 to 2006), he shows up out of nowhere and starts dictating terms? How DARE he take credit for what Erik did in his absence?
Even I've already been working on BusyBox twice as long as Bruce ever did. I've written far more original code for the project, integrated more outside code, and put far more total time into the project. Reviewing the history, the idea for one binary that had different functions with different names apparently came from gzip (one of the projects BusyBox sucked in); Bruce didn't even invent _that_.
It's quite possible Bruce didn't know that BusyBox has regularly rewritten existing code over the years in a continuous attempt to come up with improved versions. Heck, I've rewritten mount.c three times now. He came in _thinking_ BusyBox still used his code. But guess what?
We still use chunks of the deflate and inflate engines from gzip-1.2.4, but he didn't write those and he didn't even trim them down for BusyBox. (Other people did; he was Debian maintainer at the time, before he managed to piss them off, and had lots of volunteers.) A chunk of mt.c (the magnetic tape drive control command) is unchanged from Erik's first release of BusyBox, apparently because the sucker hasn't been significantly worked on since 2001 and should probably be removed anyway. Three lines of terminal control code in more.c are that old (I can't prove they _weren't_ added after Bruce's tenure), three lines of mount.c seem pretty much _coincidentally_ the same, and I can't prove where chunks of stack_machine() in dc.c came from. Oh, and we use the same usage message for "sync". But so far? That's all the even potentially Bruce-contaminated code I've found. I'll post a more complete list (and possibly commit a patch, if necessary) after I finish the analysis.
And after that, maybe I can get back to bbsh.
Shortly after midnight I got the OK from Erik Andersen to simplify the BusyBox license to GPLv2 only. And did so.
So, pipes and redirects. I should probably move either pipes/redirects or quoting out to a separate file and get one working before tackling the other, but they're both needed to have a working shell. Yesterday I found out about the <<< and <<- syntax while reading a history of the Bourne shell in Unix v7, and added it to my to-implement list.
Redirects come with a filename attached, and I'm pondering just opening the file immediately and saving the filehandle rather than doing an xstrndup() on the filename. Unfortunately the process context is wrong. The filehandle needs to exist in the child process, not in the parent. Having parse_pipline open the files rather than run_pipeline doesn't inherently bother me too much (after all a synonym for "touch file" is "> file"), but it leaves pending state to be cleaned up in _each_fork_, which sucks. (Memory allocations go away when you exec, but open filehandles you can't mark close on exec do not.)
I also don't want to modify the original string we're called with, so I can use it to display job status, so I can't just stick a null terminator into it and use the original. (Unless job control, wildcards, and environment variables are all disabled, then I could do so. Wildcards and environment variables because substituting those can make the string bigger.)
Hmmm... It sounds like what I want to do is parse the string into argv[0] and then check for redirects at the end of argv at the start of run(). Hmmm... Within a pipe segment you can't have anything after redirects except more redirects, so all I need is an index of the first redirect. Hmmm... "touch one.txt two.txt three.txt; cat < *.txt" gives an error message, "ambiguous redirect". Bash has so _many_ special cases, I'm guessing they're noticing that the wildcard expansion is for a redirect... which you would _have_ to do in order to avoid being screwed up by a file named ">>", of course. Blah.
I've been previously warned that glob(3) is a pain, and wanted to only have one instance of it if I could get away with it. I guess I'll have to investigate the pain more closely before deciding how to proceed. Possibly I can have some kind of wrapper for it in libbb that gives useful results if it doesn't want to naturally...
Of course If I just do an open() rather than fopen() Most of the filehandles I'm opening should be marked close on exec, but that would defeat the purpose here, so I'd need to track them and close them after the fork, and this would mean that .he parent shouldn't have to clean up after it by closing the file after fork().
Okay, quote parsing is a pain in the eyeballs.
When parsing quotes you to track the starting character. " ends with ", ' ends with ', ` ends with `, and so on. Except that sometimes they can nest, meaning you get "` `", so you need a quote stack. Ok, there's only about a half-dozen of them, no problem, right?
With $() you can nest the suckers arbitrarily deep, ala "$(echo "$(echo "$(echo "hello")")")", which means that a finite quote parsing stack introduces an arbitrary limit. So far, I've avoided arbitrary limits. But I also don't want to allocate a linked list to hold ONE BYTE, nor do I want to deal with recursion at quite that point in the function (it would be _awkaward_ to structure the function(s) that way).
I could declare a global char array of 1024 bytes or so to handle my stack, but that's still an arbitrary limit. I suppose I could realloc() the thing to avoid the limit, but I wince at every allocation thinking how it affects reliability on nommu systems.
Hmmm... Thinking about it, not _every_ quote can nest ("'" is a literal ') and the ones that can involve context switches. Parsing $( ) _is_ a function call, so that can handle. What about ` (which can nest if you escape interior ` occurrences). Let's see...
Ok, "echo `echo \`echo \\`hello\\`\``" is a syntax error, but "echo `echo \`echo \`hello\`\``" works. Boggle. Hmmm... But "echo `echo \`echo \`echo \`hello\`\`\``" says that hello is not found, so it's not actually nesting... Ah! "echo `echo \`echo \\\`echo hello world\\\`\``"
Okay... That's still recursively parsed, although I need to stare at it more to work out the ordering of operations. (echo `echo \`echo "\\\`echo hello world\\\`"\`` : ok, in retrospect I should have expected 'hello world' out of that because I know ` works inside ".) And $() is slightly less evily recursively parsed...
Lunchtime.
It has been pointed out to me that pumpkin is a fruit, not a vegetable. You have been warned.
The Pittsburgh LUG (or at least bits of it) are meeting at TimeSys tonight. I haven't managed to get out ot any of the local Pittsburgh events like Geek Night, but apparently they're coming to me.
I find it amusing that the open source people are the ones who pay close attention to licensing, while the proprietary people try to avoid reading them. (Always fun to point out to a Windows person that the license to a piece of software they're using expired over a year ago.)
Terminal control. Was it invented by crazed weasels during a drunken binge, or did it just sort of coagulate over time without any specific malice involved?
I'm trying to learn this stuff. I've brushed against it before enough to make it work, but what the heck _is_ a process group ID, or a session ID? Learning it involves reading the man pages for setpgid, tcgetpgrp, setsid (the system call, not the executable), what sending SIGTTIN actually means, and then what handlers I need for what other signals.
I want to understand what all this _means_, and why each step is needed. I have the source code to at least five shells (the four in BusyBox, plus bash), and the Linux kernel. This could take a while.
What I want to make work is that when the shell runs, it should get control of the terminal (which seems to mean that it gets input and ctrl-c). When a child app runs, _it_ gets the input and ctrl-c, and when that child app exits the shell wakes up again. I'm slightly fuzzy as to why this is brain surgery.
I'm amused by the number of console windows I have open, _all_ of which have something important to the current mess in them (shell/bbsh.c, shell/lash.c, bash's jobs.c, the setpgrp(2) and setsid(2) man pages, the kernel's kernel/sys.c, kernel/exit.c, and include/linux/sched.h... This is not counting the wikipedia entry that Google brought up, which led to an article on the subject that looks good.
It's coming up on 5 am, I should go to bed...
I don't know why I bother to suspend my laptop anymore. For some reason, the new Ubuntu decides about half the time to do a normal bootup instead of a resume, even though the suspend seemed quite successful. Sigh.
.Trying to get the new and fluffier bbsh prototype to actually do something useful, plus making sure that switching all the features off puts it down near the size of the previous minimalistic version. I should make a test suite for bbsh.
Now I need to rewrite "date". Wheee...
Somewhat under the weather today. Went home early from work and collapsed in bed for three hours, then got up and fixed a sed bug. Same old, same old...
Denis Vlasenko has SVN access to BusyBox now, and is doing good things with it. (And breaking the build, but this happens...)I should be banging on udhcpc, but I'm banging on bbsh. Trying to work out how I want to handle pipelines.
blah ; blah | (blah ; blah) | while read blah; do blah; blah
Ah! Now I remember one of the big projects I had to do to make bbsh work right. Fix the llist_t stuff so it can handle arbitrary structures that have a void *next as their first argument. (Because allocating two chunks of memory per list node is silly.) Except if I do that, what's left for llist.c to do? It's now simple enough, is it better to just code it inline? Hmmm... Have to try it and see...
Here's a fun one: "sleep 5; echo hello", then hit ctrl-z to interrupt the sleep. The echo hello part happens immediately after the suspend. "sleep 5 | echo hello" the hello happens immediately, regardless of the suspend. But "(sleep 5; echo hello)", the hello outputs on resume. None of this is exactly surprising, but it is fiddly.
Ok, I got to the point where it builds, runs, but doesn't work. (With all features disabled it's about 300 bytes smaller than before, which is probably a symptom of the not-workingness. With all features enabled it's 500 bytes bigger.) And it's coming up on 4:30 am. Bedtime...
Got a drop of secure boot (the Big Project at work) delivered, and today increased the project's truck number to 2. (I.E. Piggy can now reproduce what I did, so now 2 people have to get hit by a truck for the project to be unmaintainable. Woot.)
Still need to get it using dhcp rather than a static IP. The one built into busybox... has issues. The issc one is over 400k. Painful either way. I'd much rather fix busybox's udhcpc, but what it needs is a complete rewrite and that's a larger time committment than I've got right now...
Spent the evening writing a stripped down summary of the 64-bit paper Eric Raymond and I have been working on for months, bounced it off PJ and LWN to see what they think.
Ok, shell design time: how to remember context in loops. You can't lseek backwards because someone can "cat filename | sh" or just enter a loop from the command line. And you can't keep the parsed shell context: "for i in one two three; do echo $i & done", you need to re-parse and potentially fork off background tasks. This ties in with function definition, you basically have to create a temporary function for everything until the end of the loop.
Right. What else defines a function block like that? while;do;done, if;then;else;fi, (;), {;}... And the contents of a function are effectively one big quoted string, because dequoting involves environment variable substitution (only way to handle "$(echo "blah")"), and the environment variables can change each time through a loop or with each function call.
So parsing this stuff and running it has to be two different passes. And I need a flag for "unterminated parse", and a way to store half-finished data. Hmmm...
Kelly and Steve are visiting town for the Mensa regional gathering over the 3-day weekend, and they dragged me and Fade along. My Mensa membership expired 5 years ago, and I've rememered why. I am INTENSELY bored. ConFluence was a lot more interesting (and had a better con suite, denser panel track, etc). I've spent a couple hours reading "catch-22", which picked up for 50 cents from the used book sale in the game room.
I brought my laptop because I expected to be bored, but I haven't found a decent place to use it yet. I can't quite concentrate in the game room, and the lack of ventilation in here is making me cranky and very sleepy. (Also, catch-22 is a surprisingly tiring book to read, with the stacked tangents and nested clauses and sudden topic changes in the middle of a sentence. 90 pages in I'm ready for a nap.)
Fade's having a blast, though. She found a knitting group. Yes, knitting Mensans. (It makes more sense than the panel on astrology.)
I'm at least 15 years younger than the average age here, why do I feel like I need a nap?
It's the weird little corner cases that makes bbsh take so much time to design. These are not questions that normally come up in casual usage for most people. For example, what do I do about embedded NUL bytes in the command line? Let's see what bash does:
echo `echo -e "woot\0thing"` | od -t x1
0000000 77 6f 6f 74 74 68 69 6e 67 0a
If I just pipe the output of the inner echo to od, I get the embedded NUL so it's going across pipes propery (which is obvious since you can pipe binary data). So an argument can be a normal C string, but a chunk of pipe data is a byte array with a length. Embedded NUL bytes are filtered out of arguments, yet the argument isn't truncated at the NUL byte.
Trying to get the svn->hg script finished involves hitting all these strange little details in things like sed (where applying a range to a command will hit _multiple_ ranges, and EOF counts as a universal ending condition, so trying to use sed -ne '/^$/,/ ---*/p' to print just the comments part of one of the svn commit files I made also tacks on the last chunk of the file from the last blank line to EOF. What you have to do is sed -ne '/^$/,${p;/ ---*/q}' instead.
That was the _easy_ one. Now I need to figure out a way around the one second timestamp granularity in the filesystem that's preventing mercurial from detecting all the changes I'm making. Probably I should extract the list of changed files and feed them to "touch" (since I've got the timestamp of the commit, and have been feeding that in anyway), but the problem is I'm trying to handle filenames as if they might have spaces in them, yet the list of filenames is produced by sed, so newlines break filenames but spaces don't. Try getting bash to respect that. (It's not easy.) Which is why I wasn't assembling a list of filenames but instead telling mercurial to look at the directory and save the changes.
On the other hand, I'm already assembling a list of filenames to delete (because otherwise I wind up with zero length files, which is not the same as deleted files), so I might as well assemble a list to add, and a list of the modified ones. And rename info is in there too as "(from /trunk/busybox/*"). Unfortunately, although the svn delete info seems reliable, the modify info does not. Look at svn 9 for example. The patch doesn't have a "busybox.stable" directory, but the changed file list... Ah, hang on, that's a _branch_. Right. I can filter for that...
(A while later...)
So here I am at the end of the day. You can handle filenames with spaces in them via "sed 'get_filename_list' | while read i; do echo $i; done", but guess where it goes wrong? The natural thing to do in the while loop is assemble an array of entries, but it's the recipient of a pipe which means it's a child process, which means none of its environment variables get set in the parent process! Wheee...
I'm going to bed now. This is like the fifth deeply esoteric problem of the day that took a while of boggling at just to understand, and I'm tired. Yes, writing shell scripts is always like this...
Yesterday afternoon Fade dragged me off to visit Christian and Robin, and I found out that my best Super Smash Bros character is "Princess Peach". (Go figure.)
Today I didn't get a darn thing done until 8 pm, and now I'm at the coffee shop trying to make up for lost time. Let's see...
Darn it, I can't xargs a shell function.
This week, the priority has been the secure boot thing. I've felt monumentally uninspired trying to work on it, but am attempting to bash my way past that. (Sometimes, alas, I have to be an adult. Right.)
However, what I want to _blog_ about is the fun stuff. :)
Read lots and lots of shell related stuff preparing to finally get bbsh.c in. I printed out the lash source code and read it through, twice. One of the ways lash manages to be so small is by being really crippled: for example argument boundaries aren't really tracked, so if a filename has spaces in it and you use * it won't work right. (It also has _no_ flow control statements; there's a little infrastructure for that but it's not hooked up to anything.)
I already have lots of CONFIG options I can break even the lash level of functionality into: Terminal Control, Wildcards, pipes/redirect, (under which is source and here documents), job control, environment variables (under which is local variables and synthetic variables)... Possibly quote parsing is another, I need another pass over the lash code.
And yesterday I printed the bash man page (all 97 pages of it). I'm up to page 9. (No I'm not looking at the bash source code, why would I do that?) This is going to be a looooong read.
I've also started redoing my svn->mercurial script to work on my directory of patch files rather than directly against the repository. Why? So I can fix up things like the rename svn missed or the mess Mike made out of the uClibc directory. The svn history is broken enough that I have to adjust it by hand in places for it to make _any_ sense. (Yeah, I've wanted to make a generic svn->mercurial converter with more brains, but running it repeatedly against the live svn repository is kind of impolite, and svn is so slow it takes _hours_ to finish, so if it breaks up around 14000 you're kind of screwed from a debugging perspective. :)
Another fun thing I'm doing is writing a script to track the relative size of "defconfig" in each release. Basically "svn update -r $next", mv busybox_unstripped busybox_old, make distclean, make defconfig, make bloatcheck > log. In a loop. It's related to the mercurial script since it's another thing you can do on your laptop iterating over the patch list...
Fade's birthday yesterday, didn't even turn my computer on. We went to Ikea, which is exhausting.
Bernhard merged a lot of stuff from his tree into the main BusyBox SVN, and managed to do so in a way that broke my downloads/patches generation script. Sigh. (I'm not mad at Bernhard, I'm mad at SVN. I actually need _more_ developers like Bernhard, but I also need to figure out a way to work with them that isn't so frustrating...)
Having poked through enough to understand what's going on at the start of sysklogd/logger.c. It could use a rewrite.
Possibly it can share code with the "kill" command that has to do a similar lookup of name->number with a fallback to itoa() if the name already is a number. (See libbb/u_signal_names.c for that.) The existing kill code has a funky fallback to "SIG" and is hardwired as to which table to use, so it needs some tweaking.
The mount code also does something similar: an array of struct {char *name, int value}. In the mount code's case it's a long, but that should be happy with an int (it's identical on 32 bit systems anyway, and the 64 bit systems shouldn't make up new APIs for mount any time soon). Of course does mount really need to be able to understand "--16384"? Also, that code is looping using sizeof(signals) instead of an explicit null terminator in the array, and is handling the match inside the loop with a break at the end (rather than having to pass back a "not found" value and test for it, and it's always funky using in-band signaling figuring out what a safely invalid value is...)
In other areas: I'm trying to get pxe to load a Linux kernel. No, I don't want to use syslinux. We're handing out root access through the network to essentially random strangers, and when they're done with the machine we want to get it back to a known state. Step 1 is to power cycle the machine: the board should netboot and reformat its hard drive. The interesting bit is since they had root access, what if they reflashed the bios? Well the one on the motherboard has a physical write protect jumper, but anything else we don't trust, so we can't have a second PXE bios on the network card. (I don't _think_ the hard drive's bios is flashable.) The built-in bios has PXE boot capability, but it's 16 bit assembly code and maxes out at a 523k image.
So, I stripped a 2.6.17 Linux kernel down to about 480k (and can probably get it smaller with Matt Mackall's -tiny patches, but it's a start). If I can make it small enough I can feed that an initramfs with dhcpc, bbsh, and wget and have that fetch a bigger kernel from an http server into rootfs and kexec it.
The problem is, when the pxe thing grabs that and runs it, it goes "this is the old floppy bootsector which is obsolete and was removed after 2.4, use a real bootloader". Aha, it's jumping to the _start_ of the image, not the __start entry point that bootloaders use. Right, how do I trim off the garbage bootsector or patch in a jump? What QEMU does when loading via -kernel is replace the first 512 bytes of the image with its own boot sector, but that has a jump to an absolute address (not a relative jump), and QEMU and the PXE loader aren't loading the kernels at the same address.
My problem is that I can't reproduce the failure (load this image and jump to the start of it) under QEMU. QEMU is being "smart", and I can't get the dumb behavior out of it. And although the real hardware is net accessable (well, once I ssh thorough Timesys' firewall, anyway) the reset switch on it isn't. It would have to be wired into the board farm in order to hook it up to an automatic reset switch toggler, and at the moment there's no room. (The new board farm that will have extra room is what we're prototyping this for in the first place, but we have to move other hardware out to make space and then install extra racks, run more cables...)
Fun...
Ok, autodocifier.pl (and the resulting perl dependency) are GOING AWAY NOW. BUSYBOX DOES NOT NEED TO DEFINE TROFF MACROS FOR DAISY-WHEEL PRINTERS!
Right.
I other news, I got the first BusyBox Weekly News up. And only 2 days late. Woot.
I spent yesterday packing, hauling boxes (and furniture and mattresses and such), going down stairs, going up stairs. Apparently today I have Rigor Mortis. But Fade and I are now in the new apartment and the cats are confused. (Well, currently just Fade, I'm at work. Presumably the cats are still confused.)
(Much later...)
Ok, I think I've wasted about as much time as I care to trying to get glibc to build in this cross-compiler. I don't really _want_ a cross toolchain that links against glibc, I just wanted to test out a known good build of the compiler and linker, but this library is just a piece of _garbage_, and I think enough separate things have broken now (despite CLFS's workarounds) that I'll just slap uClibc in place and see what I need to modify to get it all to line up.
Wow, gcc 4.1.1 has a few real thinkos in it (well it's gcc), but several of the things that CLFS is going out of its way to work around seem to have actually been _fixed_. Needless to say, this is not what I expected out of a gcc upgrade. (I expected a better optimizer and random new breakage.)
Tomorrow, Fade and I rent a truck to move everything to the new apartment. Most of our stuff is in storage, but it's still going to require taking a day off from work. Might get some BusyBox stuff done in the evening, but don't bet on it.
Today, I catch up on BusyBox mailing list backlog and poke at FWL some more. I have to get the cross-compiler to work before I get it relocatable, but there's this huge temptation to fix things out of sequence in a way that's likely to break stuff I haven't proven works in the first place yet. It's the whole "don't pick at it" thing...
This probably means I should even build glibc and make sure I can get something to cross-compile against that (to make sure binutils and gcc check out) before retooling it for uClibc, because that's what CLFS does. So I have to configure a package I have no intention of using for anything, and there's a big temptation to skip this step, but skipping two or three steps at once leads to loooooong debugging sessions, and I haven't got time right now...
P.S. "make gcc-all" is much, much slower than "make quickstrap". And the whole genattrtab/genrecog thing is far and away the biggest memory hog of the entire build process, probably by an order of magnitude. And the generated .c file from this is a bit of a crawling horror. Then there's building for 10 minutes before noticing that the correct version of "ar" for the cross-compiler isn't in the path...
Of course the community (including me) is never going to pitch in and clean this up ourselves, because in order to get a patch accepted we have to sign over our copyrights on a physical piece of paper for the FSF to keep on file. Of the 8 gazillion open source projects I could spend my time on, I'm much more attracted to the ones that aren't pedantic bureaucrats requiring me to fill out _forms_ to participate.
I should take another look at TCC. But not this week...
Fade gets back from sword camp this evening. Woot.
Banging on Firmware Linux again. I've downloaded Cross Linux From Scratch 1.0.0rc3 and I'm going through the instructions to build x86_64 (since QEMU emulates that well enough to run the x86_64 version of ubuntu). The first goal is to use straight CLFS to get a cross-compiler I can use to build a statically linked x86-64 BusyBox I can run under QEMU's application emulation. (And, of course, a script that builds this cross-compiler reproducibly.)
Unfortunately, that's just a start. I want to make it use uClibc instead of glibc. Luckily, I also have buildroot to crib from, although untangling the nested makefiles isn't fun. However, I also want to make it relocatable instead of full of absolute paths, by which I mean: "here's a tarball containing a compiler, extract it anywhere, add the bin directory to your path and call gcc. It can find its headers and ld and so on relative to where gcc itself is."
Chris Faylor at TimeSys got one almost relocatable, modulo five things (three of which I can fix after the fact with sed, and two of which are library linking issues that should be relatively easy to fix in the build.) I need to ask exactly what he did to the sucker. (Possibly extensive source modification was involved, but I dunno yet. I do know the binutils 2.17 build is very insistent that --prefix is an absolute path. Sigh.)
So it's 9:50 pm, and I've just woken up from a nap. I want to go somewhere and spend about 4 hours programming. Unfortunately, this is Pittsburgh, so everything's either already closed or about to.
Eat n' Park is open, but they don't have electrical outlets. (I can do without internet access if I download stuff before heading out, but not without outlets. I need a bigger battery in my laptop.
The busybox .config infrastructure, stolen from linux-kernel, is incredibly frustrating.
I have CONFIG_NITPICK. When I disable that, I want to hide a lot of configuration choices that are too obscure or have too small an impact on the resulting binary to bother most people with. People who really want to tweak their config can switch on CONFIG_NITPICK, but most people won't care.
This does NOT mean that I want all the CONFIG_NITPICK guarded symbols to be switched _off_ when they're not visible. I want them to have the default values. There's a specifier for "default value". I want it to USE it.
I'm going to have to do surgery on menuconfig to get it to what I want again.
Sigh...
So scripts/individual now builds 198 busybox applets. 40 more to go, not counting the e2fsprogs stuff.
Spent most of the day banging on Firmware Linux documentation. I'm essentially starting Firmware Linux over, albeit based on everything I've learned so far and heavily cribbing from the previous version. I've created a mercurial repository and checked in design.html and index.html, and I'm taking the old stage 1 build script (1.1-tools-build.sh) and slowly converting it into a stage 1 build that doesn't expect to run under UML (I.E. it isn't root and it can't expect to execute binaries linked against uClibc for reasons explained in the design documentation. (Which is very much "this is what I did and why" not "this is how the universe should look". This is documentation not legislation, and subject to change without notice as I figure out how naieve/misguided/just plain stupid my previous ideas were.)
As usual when working on FWL, my todo list has gotten longer. I need to write an unifdef.c to submit to linux-kernel so "make headers_install" doesn't require a random environmental dependency (and we filter out the CONFIG_ symbols so they don't polute the standard headers' namespace). I need to make a gene2fs (was planning on it anyway). This is really what's driving bbsh (not _just_ my annoyance at the existing code)....
Heard back from Fade again. Sword camp's working her to exhaustion and she's enjoying it immensely. I'm sorry I couldn't go, but I'm having fun working myself to exhaustion here. (And playing World of Warcraft now that her computer's free, although my old level 59 mage is no more. Started a new character, a priest named Gooficus.)
Poked Erik about a uClibc 0.9.29 release and he pointed me at the uClibc TODO list, which has a 0.9.29 section. Four specific bugs and two more nebulous todo items (both of which may be expendable, although I need to read the OLS paper on NPTL for uClibc. My printed copies are in the trunk of the car, which is in Michigan with Fade, but I have the PDF as well, so...)
I really miss doing this. TimeSys is fun, but I get so much more _done_ when hanging out at a coffee shop being self-directed without feeling guilty about whether or not I can justify what I'm doing as sufficiently work-related. Firmware Linux has _nothing_ to do with TimeSys, but it's indirectly the main reason I became BusyBox maintainer, and getting it working again is likely to result in lots of important BusyBox upgrades.
And this isn't BusyBox maintainer work, either. That's reading the list, answering questions, reviewing patches, and applying patches. Not doing new development. I have a backlog of maintainer stuff and need to go do it. (I also find that hard to do at the office because there are so many distractions, although when I had that tuesday/thursday telecommuting thing I got a lot of it done. I should look into doing that again...)
So Fade's off at sword camp in Michigan. According to Eric Raymond she made it there in time for dinner.
I spent yesterday at the Te Cafe, digging up old things. I dug up the "make standalone" patch I posted to the BusyBox list a year ago and applied it and fixed a few things up so it now builds about 2/3 of the applets in "make defconfig" as individual executables. Renamed it "individual" because we already use standalone for the standalone shell, which is sort of the opposite idea from this one and doesn't mix well. Several other random cleanups to BusyBox while I was there, although I'm eyeing the patch submission backlog and whimpering a bit. I need to tackle that this weekend.
Started the uClibc 0.9.29 changelog. I pestered Erik Andersen about when we might see a release, but he was taking the day off because it's his birthday. (He's 35.) Ok, good excuse, and gives me more time to plow through the old SVN commits.
And I dug up Firmware Linux and brushed some of the dust off. I want to make it work while cross-compiling for non-x86 platforms, which means converting its build process from User Mode Linux to QEMU. (While it's possible I could run User Mode Linux under QEMU's application emulation, this would beat the heck out of QEMU's application emulation to the point I doubt it works yet, and in any case I'd still need to reorganize stuff.)
I pestered Erik Andersen about when we might see a release, but he was taking the day off because it's his birthday. (He's 35.) Ok, good excuse, and gives me more time to plow through SVN commits.So all that was yesterday. Today, I'm writing some FWL documentation and getting scripts/individual to build more applets. And maybe tackle the patch backlog a bit (whimper)...
Updating the FAQ again. Trying to document libbb a bit.
Darn it, we're going to need another case of Pepsi One. We need 31 of them, and Garrett only picked up a 24 pack. (We've already got more Coke Zero on order.)
The bit pattern looks like this:
01010100 01101001 01101101 01100101 01110011 01111001 01110011 00000000
And there's no way they're stacking that high on my desk, I'll need a wall of some kind to stack them up against.
I continue to get paid for this.
In other news, Samba bolts rocket engines to a turtle, but they're very nice rocket engines and the bolt placement was well thought-out.
Tags. Forgot about tags in the svn->hg converter script. Not that this info is there in svn right now (it's in the patchlist/README file instead), but I want it in the mercurial repository.
I also printed out the RFCs for WebDav (from webdav.org), and need to read through them. Apparently, Mercurial doesn't have WebDav support yet. I need to add this. :)
Still grinding away at the uClibc changelog, and the patch backlog from last week.
Fun thought: What we really want for https support in httpd and wget and such is the ability to shell out to stunnel. Except that the stunnel implementations I've found use OpenSSL or SSLeay which are huge. So I'm poking at libtomcrypt and printing out RFCs to see what would be required to implement this, and thinking about donating support to DropBear which already has libtomcrypt. (I'm quite happy to say the system needs stunnel on it to support this, we don't provide it, but here's a sane implementation. I don't want an external library dependency, nor would I want to import more than a file or two of libtomcrypt into BusyBox.
On a side note, here at TimeSys we've noted that we have already acheived nine fives of uptime for the year. Woot! Still haven't found any Pepsi One. We want to make an eight by eight stack of Coke Zero and Pepsi One cans spelling out the ascii for "TimeSys" in binary (with the bottom row being the null terminator), but although we've got tons of Coke Zero, we haven't found any Pepsi One cans yet. Nor have we found anyone who would be willing to drink it.
I spent a lot of yesterday going through the 563 uClibc subversion commits (everything uClibc related between 9690 and 11187) that brought 0.9.27 to 0.9.28. I hope to finish that today and get a proper uClibc changelog entry for 0.9.28 written up before 0.9.29 ships. (Yes, I need to do the same for BusyBox for the last five versions, but it's always easier to clean up somebody else's mess than to tackle your own.)
I spent the rest of yesterday working on a svn->mercurial translation script. It mostly works now, and I left it running overnight. It made it to svn 15031, where it died trying to comprehend Mike's udhcp symlink stuff. (Actually the undoing thereof, but the earlier confusion had left garbage in the tree. I think it was svn 15027 my conversion script didn't follow.)
I was already a bit iffy about the conversion because I knew it was going to miss the renames in svn 13452, because svn itself missed the renames. (I moved Config.in and defconfig to the top level directory with "svn move", but svn still treated the new Config.in as a new add with no history. Possibly because I also deleted the directory they came out of in the same commit, but however it happened it broke.
Mercurial gets renames right but I'd have to stop the convert at that revision and fix it up by hand. But at this point, I might have to do that anyway...
But first, to finish the uClibc 0.9.28 changelog...
[later]... Ok, the damage to the busybox svn started at 14634, included 14635, 14640, 14787, 14792, 15010, 15027, 15031, and 15032 before it worked its way out again. And 13452 is where the missed copy was. Just a note to self for tonight's second attempt to get a mercurial version of the entire repository converted...
Back to slogging on the uClibc 0.9.28 changelog. Boy there were a lot of commits...
OLS was _fun_. Dave's car died on the way back (don't get gas on the New York tollway, it's full of rust flakes or some such).
I'm converting the BusyBox repository to Mercurial. (It was that or git, and it doesn't matter strongly which of the two because they work similarly enough that you can convert back and forth between them without losing the funky lateral branch history that makes 3-way merges work.) I picked Mercurial because it's twice as fast, 1/10th as much code, and I understand the repository layout.
OLS day 1. I am sooooo sleep deprived. (I was yesterday, too.)
Breakfast I found out that canadian Kentucky Fried Chicken is amazingly bland. Whatever happened to the 11 herbs and spices? This had _maybe_ 3, one of which was black pepper and there was hardly any of that. Jonathan Corbet's keynote this morning was great; he's a lot younger, thinner, and blonder than I expected, but still obvious-from-a-distance geeky. Went to a filesystem tutorial that had no oxygen in the room (over 100 people breathing in a space designed for maybe 40 on a good day).
Meant to do a uClibc 0.9.27->0.9.28 changelog on the drive here yesterday, but I didn't download the right stuff. Banged on the mke2fs rewrite a bit, and started rewriting Have to do it on the way back...
Ok, it was my fault I put my cup of tea on the floor. I put milk in my tea, and Dragon loves tea with milk in it. She's going to be _wired_ in a few minutes...
Spent lots of the day banging on BusyBox. I've started auditing libbb, which is deeply icky. For example, we have bb_full_read(), safe_read(), bb_xread(), bb_xread_all(), and bb_xread_char(). That's too many. I removed bb_xread (because only three things were using it and none of them needed to), renamed bb_xread_all to xread(), astripped the pointless bb_ prefix off the other two, removed archive_xread_all()...
All of this spun out of the bug report I got about mkswap segfaulting, which made me read mkswap.c. It wasn't close to the ugliest code in the tree, but it was still roundabout enough I threw it away and wrote a new one. The new one is 46 lines, many of them comments. But logically, it needed xwrite() and xlseek() and it turns out we haven't got those, so...
I've also got a pending change to use the xsetuid() and xsetgid() that I added to the tree yesterday. That's three changes tangled up in my tree, time to check the mess in before it gets worse...
But first, I've got to make it all compile again. (Did I mention I added -Werror to the build flags? I hate warnings. It's either a problem or it isn't. Pick one.)
Oh yeah, and we need an xclose() too. Yes, it's only a problem for nfs which is broken by design, but look at dd.c...
Should it bother me that my biggest accomplishement of the day is rotating the logs on Morris (the busybox.net/uclibc.org server)? Admittedly they hadn't been rotated since the server went up because the logrotate .config file only got ones ending in ".log", not ".net" or ".org", so it was only rotating error.log. But still...
For the record, since November 2004 the busybox.net log was 2.3 gigabytes and the uClibc.org log was 1.5 gigabytes. That's just successful page fetches, not 404s (which are the error.log files)...
I got up early so I could get in to TimeSys to prepare for the Big Meeting this afternoon. I've now been sitting here waiting for the bus for half an hour, with no bus yet. I hate Pittsburgh. Yup, writing that caused a bus to manifest, and to be fair it was only 28 minutes by my watch. I still hate Pittsburgh. I got up at 7:30 but couldn't make it to the office before 10.
So the bus I'd been waiting for just went by without stopping. Didn't even slow down. I don't know if it was full or if it couldn't see me sitting in the bus shelter, or if it just didn't care.
It rained about 6 hours ago. There are still damp patches on the sidewalk, because the humidity's so high that 6 hours isn't enough for it to evaporate. It's too humid for me to be able to clean the dust off my laptop screen without leaving streaks.
I'm looking at the dried spit on the inside of the glass in the bus shelter. (I'm not sitting on the sidewalk in front of it so the next bus can see me, which is probably cleaner than the inside of the bus shelter due to this morning's rain. And this is the _good_ part of town, the one so expensive we can't afford to buy here and renting's a bit painful.)
I hate Pittsburgh.
Addendum: another bus finally came, this one stopped. I found the only open seat, and five minutes later figured out why when my left arm stuck to the fresh gum on the back of the seat. Not sure how much of it's on the back of my shirt. (I was looking at the bottom of the seat, not up cleverly wedged between the plastic and the metal bar at the top of the back of the seat.)
Did I mention the hating of Pittsburgh?
Addendum from Jimmy John's:
The bus let me off into a sudden downpour, but I suppose that's not Pittsburgh's fault, and no city that has a Jimmy John's in it can be _all_ bad. Best of all, I don't have to deal with the bus again (or, once I get in the office, deal with the rest of Pittsburgh) for several hours.
Work's getting busy again. I have to get a system booting from a USB stick and doing various things, and make an ssh test suite that runs on openssh and dropbear, plus finish the darn gdbtest thing that's been hanging over my head forever.
Ah, to be able to focus on BusyBox as much as I want. Yeah, it's the downtime between releases but this is when I can actually get stuff _done_ rather than focusing on an endless yet perpetually urgent stream of other people's patches.
On the bright side, I got netcat -f in to BusyBox, and it was entirely work related. Woot!
I need to remember to bring $60 cash to OLS to pick up my copy of the proceedings...
Spent most of the 4th working fiendishly on the paper with Eric.
Today decided I'd been off caffeine long enough and had a diet coke. Then I decided to rewrite the busybox version of mke2fs from scratch.
The reason I'm working on a from-scratch rewrite of mke2fs is because the code Mike Frysinger dumped into the tree is _crap_, and proof positive that he has no flipping clue what belongs in BusyBox and what doesn't. And since I can't yank his svn commit access (because it's shared with uClibc), I have to clean up after him. In the long term, I'm setting up a git tree and only merging the patches I actually want. In the short term, I'd be happier if he wasn't so darn smug about knowing I have no control over his access to the BusyBox tree. (Yes, he's already told me to "stop standing in the way of progress", apparently defined as merging steaming piles of crap code into the tree.)
Today's bus fun is that I missed my stop on the way home, adding an hour to the trip. Number of people spitting on the sidewalk right in front of me on the way to said bus: three. One of them had worked up quite a puddle. This isn't "patches of wet spit I notice on the sidewalk", or dried spit visible on the glass of the bus shelter. That's all over the place. This is people spitting as I actually walk by. As far as I can tell nobody's actually spit _on_ me yet, but it's only a matter of time...
Oh, and somebody leaning against the front of Timesys' office building as I headed out to the bus (10 pm-ish, I mentioned the caffeine) offered to buy my building entry keycard from me.
I'm officially past "starting" to hate Pittsburgh...
Since Molly moved back to Philadelphia suddenly, there's nobody to take care of the cats over sword camp. So I'm staying home, which frees up five days of vacation, one of which I'm taking today.
Of course catching up on the busybox mailing list on my vacation day is a bit strange, but it honestly is something I do for fun, so... :)
The new ubuntu has some kind of problem with the mouse. At first I thought it was the new "mouse gesture" support stuck on, but it turns out that when I click on things it things I've got the mouse button held down long after I release it. So all sorts of things that shouldn't be drags are. (This doesn't seem to be a mouse problem, it seems to be an X11 problem.)
So a BusyBox release went out on Friday, and I've been ignoring it ever since. Eric and Cathy are visiting, and I've been hanging out with them, and working on the 64-bit paper with Eric.
I installed Wine a few minutes ago, grabbing the prebuilt Ubuntu "Daffy Duck" binary. That seemed to go ok, so I threw in Fade's Sims II CD and clicked on AutoRun.exe. It says it can't find Z:\home\landley\AutoRunGUI.dll, a file that's in in the same directory as the executable I just clicked on (which is the CD, not my home directory).
So I rummaged around on winehq.org and found the Wine User Guide (not to be confused with the Wine Users Guide, which is about porting Windows applications to Linux using the Wine library. I'm interested in running existing Windows binaries on Linux. Namely World of Warcraft, but the CD of that Fade got me for my birthday is still in storage.) The docs mentioned a file called "winecfg", which I ran and has a drives tab with an autodetect button. This found several more drives than the installer seems to have (so why didn't the installer find them?), and I saved that and tried again.
Now it says it can't find H:\AutoRunGUI.dll. Oh yeah, big improvement.
Off to do something else, I think...
That pointed me at
Wow, I get cranky when it's time for a release, don't I?
I haven't made it through the full bugs.busybox.net backlog yet. Several of them are valid but complex enough to fix that I don't want to do it this close to a release (could create more bugs). So there should be a 1.2.1 soonish, but oh well.
(Today's going-to-work bus fun involved somebody smoking on the bus. They're not supposed to since it's such a confined space, but they don't always care and nobody can really do anything to them. And I noticed three separate people spitting on the sidewalk in a fifteen minute period. I'm really starting to despise Pittsburgh...)
So the first tuesday I haven't got permission to telecommute. I'm a bit late heading in anyway (looking at bugs.busybox.net before I go), but I should be in the office before noon (modulo getting a prescription for antibiotic eyedrops filled first).
It's hard to make puppy eyes at one's boss about extending a telecommuting arrangement when one of those eyes is all bloody...
I managed to poke my left eyeball badly enough to make it bleed on thursday, without actually being aware of it until I noticed it bleeding. (Don't ask me how.) This resulted in me getting no sleep Thursday night and my coworkers browbeating me into going to the doctor's on friday, who said it's a minor laceration (queue the Governator, "Is naht ah tumah"), which was reassuring. The anesthetic drops and dye they put in my eyes while diagnosing this made me really sleepy and basically that was the rest of Friday.
Caught up somewhat over the weekend. Reviewed about half of Erik Hovland's patches (not the world's highest signal to noise ratio, but we all have to start somewhere). Got the mdev thing rewritten (which took a while because it had some conceptual problems I had to back out and redo. The submitted patches were nice but the fourth field must be _optional_, you need to be able to select whether to run it at device creation/deletion/both, needed lots more documentation, a way to identify which device when you match a regex...)
Still not quite happy with it...
Still, tonight's snapshot has to be feature complete for 1.2.0. Time to go poke at bugs.busybox.net...
I'm on a bus with a broken air conditioner (it opens its doors every time it stops, even when nobody's getting on or off, so the bus driver can get some air). This is what I've come to expect in Pittsburgh, really. I was on a bus that broke down (after about the fifth stall), busses with very strong odors... Busses so tightly packed you can't sit down are fairly common (although this isn't one or I couldn't use my laptop).
I don't like Pittsburgh...
Got email from somebody about a BusyBox license violation. Forwarded it to the SFLC. And I got an email yesterday about a broken link on the busybox.net web page. (Mandrivia changed its name, Mandrake + trivia or some such. A company that fires its founder isn't really that interesting anymore, but oh well.) The products.html page hasn't had a decent update in a couple years now, I followed a few random links and they were dead or hadn't been updated since 2003. Sigh... Yet another todo item.
I probably couldn't keep up if I did BusyBox full time, but I'd feel better about it. And I could carve a couple days out of the week to work on new code without getting hopelessly behind on the patch fielding...
I miss the couch at metro.
And I have tuesday to work on BusyBox. I know that it's really telecommuting (I have to stay logged into the corporate IRC for example) but to me it _feels_ like having a day off. Working on BusyBox is something I do for _fun_.
Being the maintainer is less fun. I have a huge pile of projects I want to do, like bbsh, but what I have to spend my time on instead is dealing with a constant stream of other people's issues. Somebody found a bug where closing the xterm that busybox vi is running in doesn't kill it, instead it eats 100% of the CPU (going into a loop catching and ignoring SIGHUP). I got into an extended argument about how to handle faster serial speeds (Mike checked in a patch unannounced and during a feature freeze that's very ugly and feels wrong, and I need to think about how to solve it properly. The last time I ignored stuff Mike checked in, we wound up with the steaming pile of e2fs code in the tree that still hasn't been remotely cleaned up...)
Paul and Andre both found problems in mount, which is at least code I wrote and comes as almost a relief to fix. Aurelien Jacobs cares very deeply about "inline" and that resulted in me diverging into unlzma, rx, and vi again. Other people care about the %m extension to printf and when I pointed out the perror() stuff in libbb that resulted in patches for me to field. Robert P. Day is on a cleanup binge which is great but why now? (I had no idea what that alias code he wanted to remove from interface.c was for, so I had to spend a couple hours researching it to make sure it didn't break the "eth0:1" syntax, which is apparently semi-broken on the Ubuntu I'm using anyway (but works fine on Red Hat 9. In the course of this I discovered that recent patches broke building on Red Hat 9, which I'll have to clean up before the release.) His changes uncovered another strangeness in interface.c, a field (inet_aftype.alen) that has an incorrect value but is apparently never used anyway...
The irony is that one of the big time consumers a few days ago was reviewing Bernhard's lash.c patch. The point of bbsh is to make lash (and the other busybox shells) go away, but I can't get time to work on it because I'm reviewing bernhard's patches to the specific existing code I'd like to replace. Now THAT is frustrating...
Didn't get nearly as much done yesterday as I wanted to. Got up, went to Ikea, stayed there for hours, came home tired, wound up going to bed around 8 pm. (The downside of being off caffeine. It's been a week, but it still hits me out of the blue sometimes...)
Slowly shoveling through the BusyBox patch heap, but new stuff keeps landing on top of the stack. I'm trying to shovel my way back to the mdev shellout patches (which I now have two implementations of to compare), but Bernhard dumped a lash patch in my lap and poked me repeatedly for not reviewing it immediately, and although I've now made it about halfway through that, now there's a thread about whether const is worth the bother (it isn't, but proving that is time consuming). These are just the big patches, the little ones pile up to. And _I_ wanna do some programming on this thing rather than just shovel other people's patches. Someday, anyway...
I installed the new kubuntu (Daffy Duck) under qemu. This took several attempts, mostly because the default 128 megs isn't enough memory and it doesn't actually tell you this during boot. I told the boot CD that I wanted a 640x480 screen instead of 1024x768 and was ignored, then went in and changed it to 800x600 after the install (some something that doesn't have the bottom of the qemu window hidden by my task bar), and that didn't work either (although it thought it had. I had to reboot for the size change to take effect, and _now_ it's in 640x480 when I'd prefer 800x600. Even windows gets that right these days.) Oh, and going "init 1" to switch to single user text mode doesn't work with the splash screen, it hangs with the splash screen still up and your console (if any) hidden behind it. And when you modify the bootloader menu to replace "splash" with "1" it prompts you for the root password, which it never set.
On the whole, I must say I'm deeply impressed with their attention to detail, except for the part about that being the truth.
And after all that, it only has gcc 4.0, not 4.1. Of course. (Started installing it to see if it had gcc 4.1 so I could avoid building that from source to see if its optimizer did a better job without "const", but since gcc 4.0 seems to I'm guessing that makes the point well enough... But I already _had_ 4.0 installed...
There's a Penguicon concom meeting this weekend, but I'm skipping it. Too much traveling recently. After the May 29 post, the next two weekends were spent with Eric in Malvern working on the 64-bit paper (still not released yet, although it's mostly done), and then in Baltimore with Kelly and Steve. I write this from Ikea's food court. I love my laptop. Needs a much bigger battery, and possibly internet access through my cell phone. Perhaps when I buy my x86-64 one from linuxcertified.com...
Catching up on my reading before I tackle more BusyBox stuff. I'm reading Fabrice Bellard's Usenix paper on QEMU and Erik Andersen's old OLS paper on uClibc and BusyBox which has some nice history bits in it. Then I need to read the Serial programming HOWTO in preparation for adding setserial and pipecat to busybox (so nobody needs to use minicom to access a serial console anymore). But that's 1.3 material, and I still need to get caught up on the backlog for 1.2...
So nice to have time for this again. Hope it lasts...
One unexpected plus of having tuesday and today off to focus on BusyBox is I've been able to focus more on non-busybox stuff at work. I'm not constantly distracted by how far behind I am on BusyBox, I have time for that now!
Woot. Woot, I say. (Sip from beverage with pinky extended. I need a monacle to pull that off properly. And the "masterpiece theatre" type of room full of books (instead of boxes of books). And a bathrobe.
I'm not catching up nearly as fast as I expected, but I'm not falling _behind_ anymore, which is a plus. And I am catching up, in addition to the pile of new stuff I got three or four old patches reviewed, and one of the new ones was darn hard (wrapping my brain back around sort corner cases) that took a good 45 minutes just to work through all the ramifications, detect collateral damage from the first patch, set up the right test cases...
So nice to be able to do things right again...
I pestered my boss at TimeSys, David Mandala, into letting me telecommute Tuesday and Thursday so I can focus on BusyBox. Minus commute, minus distractions.
It's working _really_really_ well so far. Well, I think so. We'll see how long TimeSys lets me do it. Right now I've got two weeks to try to get caught up for the 1.2.0 release...
I need to keep a journal more regularly. I don't remember what I did three weekends ago.
Having a three day weekend off was marvelous, and I really don't want to go back to work tomorrow. I got a lot done on BusyBox, and even though in theory a chunk of my work week is devoted to busybox, that's just keeping up with maintaining the thing (reviewing and merging other people's patches and answering questions on the mailing list). I hardly get any time to do any actual development. That I have to do on weekends, and those have been overscheduled of late.
Last weekend (~may 20) I was in michigan, for organizational meetings for next year's Penguicon. The weekend before that (~may 13) I was at Eric's in Malvern working on the 64-bit paper. (We got about halfway done. I'll probably go there again _next_ weekend so we can finish it and publish the thing. If I hadn't been exhausted _this_ weekend I'd have gone to Balticon, which is where Eric and Cathy were this weekend.)
I don't remember what I did the weekend before that (May 6-7). Penguicon itself was April 21-23, so I might have had up to two weekends off in the previous five. But I'm pretty sure I was doing something for at least one of those, I just don't remember what...
Let's see, according to my sent-mail folder, May 7 is when I submitted the bloat-o-meter improvements to linux-kernel, meaning that's when I got make bloatcheck into busybox. So I got some work done that weekend. That's also when I was working on converting some applets to use global structures in preparation for bbsh NOFORK stuff (which has since gone back on the to-do heap of unresolved projects starved for time to work on them). I seem to have largely avoided my laptop the weekend before that (I sent exactly one email on the 30th, for example)...
How can having a day job working with busybox so profoundly interfere with my ability to spend enough time working on busybox? Ok, the 2.5 hour daily commute doesn't help. (I miss Metro. Comfortable couch, plenty of electrical outlets, open well after midnight, fairly reliable internet access, and Big Train Spiced Chai. I have yet to find anything in Pittsburgh that's open after 10 pm, and 80% of the places that _claim_ to have free wireless have at best a router with no ISP connection.)
I want time to get up to speed on Gentoo Embedded, and glue the good bits of Firmware Linux onto it. I want to try out the patches that fix the uClibc build under gcc 4.0, and get x86-64 ubuntu installed on my laptop so I can try building and testing 64 bit busybox under it. I want to play with cross-compiling. I want to plow through the zillions of things on my busybox todo list. I want to finish the paper with Eric, and write up a bunch of other ideas I have floating around...
Instead, tomorrow I go back to automating a functional verification testing script for cross-compiled gdbserver. Wheee...
Working from home today, catching up on the BusyBox backlog. After today, I have a three day weekend to devote to BusyBox. I can actually get stuff _done_. What a concept.
The Linux Weekly News guys have asked me to write an article on BusyBox, comparable to the recent article on xfce. Cool. Hope TimeSys doesn't object.
It seems bbsh is becoming a priority. And I need to do the menuconfig update to 2.6.16 so I can put miniconfig into busybox. And I need to finish the 64-bit transition paper I'm writing with Eric, on top of the lwn.net thing. On top of catching up with my 500 back emails in busybox.
Five large projects, three days... Okay...
I hate gdbserver. Its user interface is _stupid_. It speaks a simple serial protocol, but is completely incapable of speaking it through stdin/stdout.
Broke down and did some busybox work today. I feel _so_ much better...
Forcing myself to sit down and work on automating the networked gdb verification test suite for TimeSys. This means I haven't done any busybox work today, and only did a tiny bit yesterday (mostly on the bus or after hours).
I'm getting withdrawl symptoms.
Went to Michigan and back for Penguicon this weekend with Garrett and Molly. Got a bit of work done in the car (5 hour drive each way, Garrett was driving).
If I hadn't already been committed to the Penguicon concom and board meetings, I'd have spent the weekend with Eric working on our 64 bit transition paper. And if I hadn't had to do that, I'd have spent the weekend catching up on BusyBox. (And if I couldn't do that, maybe I'd finally have had a chance to unpack some of the stuff from storage. Or just _sleep_...)
I miss Austin. I'm a bit vulnerable to "Seasonal Affective Disorder", which is medical speak for "it gets dark long enough, I start hibernating". But there's also the fact that the climate up here reminds me of the 13 years I spent in new jersey and the architecture reminds me of the 4 years I spent going to college in Camden. Plus everybody here constantly spits on the sidewalk, which nobody does in Austin. (I don't even know if it's illegal in Austin or what, it just doesn't come up. Maybe it's related to the fact we have a lot of space that _isn't_ paved.)
In an effort to make the busybox patch backlog more manageable, I'm trying to give svn access to two more busybox contributors. We'll see if they take the bait. I'm not asking for more people to review and commit other people's patches, these are just people whose contributions are both high enough quality and high enough volume that it's better for them to commit their stuff directly and if I feel the need to do any cleanups to their work it can always be as a separate commit after the fact, rather than me being a bottleneck. (Or just point out the issue to them and have _them_ do a second commit if they feel like it.)
I'm also poking at process improvements. I moved my various busybox-related scripts into a "toys" directory under my busybox directory, and I'm now saving patches into a "pending" directory so I can have a script that will commit a patch and _delete_ it all in one go. I have 38 "*.patch" files and another 5 "*.diff" files in my busybox directory, the majority of which I've applied but forgot to delete. Stuff gets lost in the noise, and that's bad. On the bright side, I recently worked out that I've already personally merged twice as many patches since the start of this month as all busybox committers merged between the start of 2005 and Erik's call for lieutenants 3 and a half months later. So the overall rate of development's doing ok, at least.
I should have gotten the new menuconfig infrastructure (resync with 2.6.16 plus my miniconfig stuff) in last week. I'm not holding up Jeremiah yet, but I feel bad. Got buried in todo items.
Timesys' toolchains are actually relocatable now, modulo one known bug. Yay rah cool. I can _document_ that for the new crossdev site, assuming that ever actually goes live. When I got here the timesys prebuilt cross-compiler toolchains were brilliant but crotchety, now they're actually installable and usable by mere mortals. Now I need to document which toolchain targets what platforms. Chris Faylor says he got a patch for gcc's multilib support that actually lets it use uClibc. That would be so cool if it went in and worked.
Software suspend on my laptop gets slower and slower the more I do it, until the machine finally panics on resume. The reason is the swap file is getting fragmented (the panics are incidental, usually caused by the suspend happening while X11 is trying to access the hardware I think).
The problem is that if the hard drive has to seek between each 4k read, that adds the seek time (voice coil activation, traversal, and head stabilization over the new target), plus up to a full rotation of the platter for the new sector to move under the head. Sequential access is way way way faster. When the machine's recently rebooted, a save/restore of 12,000 pages takes less than 5 seconds. When it's badly fragmented it can take a full minute, and then there's all the swapping it does afterwards to bring the desktop up and fault in Konqueror and Kmail and such, which is just as fragmented and just as proportionally slow. Ten minutes after rebooting I still hit patches of thrashing, even though the system still claims to have 100 megs of unused memory.
The fix for this is to sequentially fault in as much of the swapped out pages as there is free memory. A lot of that may get discarded again but it's 10x faster to read sequentially (28 megs a second from hdparm -t vs 14k pages taking a minute at 4k/page means just under 1 meg per second), and it also gives that portion of the swap file an opportunity to defragment itself. And it doesn't have to write the pages back if it knows that the swapped in page exists on disk and hasn't been dirtied yet, there's existing code for that I just have to avoid screwing up.
So how to modify the kernel to do this? Dunno, but I know where to find out. The swapoff code can yank a swap partition, and the system has to fault in all the pages in order for the swapoff to work. I don't want to yank the swap partition, but I can look at that code to see what it's doing. I want to fault in all the pages before resuming the frozen processes, to avoid at least some of the thrashing storm. The best part of the swap partition to fault in is probably the end of it, but starting at the beginning and stopping when there isn't enough free memory is easy.
Alas, I have to go do day job now. And day job wants me in a status meeting about producing a spec for a testing project. We had a prototype for the test infrastructure working going into the first meeting, but we're now scheduling our fourth meeting for status reporting and course correction on requirements gathering for the specification of the code we may someday get to write. I understand why we're doing this, but it's amazing how time consuming _not_ doing something can be. (Did I mention they had a previous test product that went through all this, which they're unhappy enough with to want us to come up with a replacement?)
Sigh...
Many things have happened in the past month (like CELF), some of which I mentioned on my livejournal but most of which have gone unremarked.
I've decided to move the firmware linux project to http://busybox.net/~landley/firmware since that can take a pounding if anybody actually starts downloading it, and what I really use it for is testing busybox, uClibc, cross-compile stuff... Seems relevant.
If Erik can flood the uClibc mailing list with buildroot stuff, I can host Firmware Linux in my home directory on busybox.net. :)
It's been hard to get stuff done this week. The coffee shop I've been going to has four (count 'em, four) electrical outlets. The number is intentionally restricted so people don't hang around all day with laptops (even though I buy stuff every hour or so). Bringing in an outlet splitter or power strip gets you yelled at by the manager.
There's another coffee shop/kosher deli down the street, with plenty of outlets and "free wireless internet" in the window. (Marvelous little baked things too.) Alas, their internet hasn't worked in the two weeks I've been coming here. That pretty much sums up every place in Pittsburgh I've tried that claimed to have free wireless, except 61c.
I miss Austin.
I wonder if I should move Firmware Linux to my user directory on busybox.net? Really what I use it for is a test environment to prove that busybox works as part of a development environment. (I did more on it before I had a job. And before I became busybox maintainer, but mostly before I had a scheduled daily committment to sit at a desk for 8 hours every day where I just can't concentrate and where I feel guilty about working on whatever interests me at the moment...)
Guess what? printf() changes errno.
That's just _evil_. Stick in a few debugging printf() calls, and your code's behavior changes...
Ooh, and here's a fun one. When it first runs, the new mount reads /proc/filesystems to get the list of filesystem types to try when autodetecting. If nothing is mounted when it starts and it does mount -a, the list will be empty and even though it mounts /proc as the first thing, the rest of the entries won't be able to autodetect filesystem types unless it rereads filesystems after mounting /proc.
Now given that this assumption is hardwired in (the proc filesystem is mounted at /proc), should I have the mount command mount it if it can't find it? This would screw up people who have /proc in their fstab, and having it as the first entry in fstab is probably the right thing to do anyway. So I guess "reread if the list is empty" is correct. Except that if there are no block device backed filesystems we'll reread for every mount in -a, but we reread for every non -a mount anyway, so oh well...
Bugs that go away before you can fix them are not a good thing. It's the equivalent of roaches scurrying under the fridge before you can squash them. If it stops reproducing, and I didn't _fix_ it, I'm annoyed, and I back up to where I could REPRODUCE THE BUG.
This is one reason I'm not always the world's fastest coder.
I am currently eating a sandwich consisting of fried eggs, cow meat, cheese, lettuce, tomato, onion, and mayonaise. On a bun. With fries. Welcome to Pittsburgh...
Two days ago I got the mount rewrite to compile. Yesterday I started testing and cleaned it up enough that "mount -t proc /proc /proc" didn't immediately segfault. Today I find out why it didn't do anything either: it's trying to --bind mount it. Ok, new rule: don't autodetect bind mounts unless you have no specified filesystem type.
I _really_ need to write up a spec for this thing. I've got bits of one in a half-dozen places now...
Fighting with the busybox mount rewrite.
I am sick of working on the mount rewrite. Almost done.
Who knew day jobs were so time consuming?
I finally got my laptop upgraded from Ubuntu "Horny Hedgehog" to Utuntu "Flatulent Badger". (Actually it's been upgraded for a week, but I just my email moved over. Archives, send, and receive. Woot.)
It's now 1 am, on what is technically a tuesday. I should be in in bed.
I'm trying to get some busybox patches reviewed, bugs fixed, etc. TNT is running a Law and Order marathon. And you wonder why I generally don't have a TV at home?
So I'm in Pittsburgh now. Didn't get a chance to even look at this stuff the first week. Let's see...
User Mode Linux just introduced soft interrupts for a 25% speedup on the kernel compile, which probably helps all the building I'm doing. Yee-ha. I'm still interested in getting cross-compiling to work, but when I finally got here to Timesys and asked the resident cross-compiling experts (some of the most knowledgeable people in the world on this issue), the general consensus was that gcc can't be configured to do what I want it to do without running sed against the source code. Sigh...
Banging on busybox. Now that 1.1.0 is out the bug reports are flooding in, and 1.1.1 probably shouldn't wait until march...
Hmmm... The problem of embedded nuls in busybox sed is tricky. Gotta modify libbb/get_line_from_file.c to return the correct information, and modify sed to remember what a line ended with. (Three states: ended with /n, ended with NUL, or ended with EOF.)
Still learning how cross-compilers work. Right now, just trying to do an x86/glibc -> x86/uclibc cross compiler. The people who created the gcc cross compiler configuration scripts apparently never envisioned anybody trying to use them like this. (The words "like this" may be optional in that sentence. To make this work I may need a chain saw, a bazooka, and some duct tape.)
Also breaking up the scripts. The numbering got unworkable, so I'm now going with individual "build-$TARGET.sh" scripts (where $TARGET is whatever the end result of the script should be), and a set of sequencing scripts (such as group-xtoys.sh, group-rootfs.sh, group-installer.sh, and group-package.sh). What I had before just doesn't scale, and adding the installer (and thus building and packaging two different root filesystems from the same set of sources, with the same set of tools) brought this to a head.
The cross-compiler toolset (group-xtoys.sh) I have to think about. In theory it's pretty useful standalone, so it would be great if I could get architecture-specific versions of firmware-packaged build environments up on the website. But does that mean you extract "xtoys-x86_64.tgz" into an empty directory, chroot into it with $PATH=/xtoys/bin, and cross-compile your new system that way? Or does it mean I create a firmware-xtoys-x86_64 (or i386/ppc/arm) that you run via "qemu-system-x86_64 -kernel xtoys-x86_64 -hda workspace.img"?
I suppose I could always do both. The first is faster and more convenient for people who don't mind running their build as root, and don't care if their system can rebuild itself under itself. The second lets you run the build as a normal user and gets you halfway to self-hosting (since you're building with native tools instead of cross-compiling).
I'm learning about cross-compiling. I have many different sources of information. I have buildroot, and crosstool, and the Linux From Scratch cross-compile book. Both gcc and binutils have mailing lists. And there are various ancient FAQs and HOWTOs available through google.
And all of these sources of information have one thing in common. They SUCK. The most amusing is probably this one which is a link to a link to a website that doesn't seem to have anything to do with the topic at hand. Impressive.
Cross-compiling should not be brain surgery, but something here is _profoundly_ misdesigned. I want to build a compiler that runs on the system I'm building it on, but I want the _output_ of that compiler to be binaries that run on a different system. Hence cross-compiler. I know lots of things about the other system: I know what processor it has. When building, I know where to look for the other system's libraries and the other system's header files. When _running_ one of these binaries, the library search path should be this and the library loader will live at this path. I can write it all down.
What I can't do is specify this information information directly when configuring gcc and binutils. I can only say --TARGET=some-random-name to specify preprepared bundles of features. And if none of the preprepared bundles of features match my needs, can I just make a new bundle? Oh no, parsing the bundle is done by config.in which has to be washed through autotools in order to turn it into something runnable...
(The fact that "make distclean" doesn't reliably undo a ./configure in binutils is just another fun little detail of dealing with software the FSF has anything to do with...)
So now I'm staring at crosstool. I've heard good things about it.
It's 3.8 megabytes. It shouldn't take 3.8 megabytes to make up for the fact that downloading gcc source and going "./configure --prefix=~landley/tmpdir --target-cpu=x86_64" doesn't actually work.
Spent most of the day packing for the trip to Pittsburgh, but I'm finally taking a little time to work on FWL.
Right now initramfs is giving me a bit of trouble because I dropped the symlink forest in favor of trying to get the standalone shell working, and doing this has brought home to me the fact that I _really_ need to make peace with "make". My big shell script is the ideal way to rebuild everything, but when you want to rebuild just portions of the thing it's a serious pain. The point of make is that I can say things like "firmware-uml depends on linux-initramfs.txt and firmware-busybox, and firmware-busybox depends on busybox-miniconfig-initramfs"... Doing this kind of thing in a shell script is a deeply masochistic exercise, because you have to put if statements around the stuff you _don't_ want to run right now, and get the correct behavior by process of elimination.
The obvious way to do this with a shell script is to break up the script into tiny pieces and re-run each piece to rebuild just what needs to be rebuilt, and I'm sort of doing that. But it's a pain trying to keep track of "this bit is rebuilt by this script" and there's constant refactoring to get the granularity I require (which shifts around). Not to mention the fact that different scripts need to run in different contexts (some want the UML wrapper and some don't).
Of course the downside of going the makefile route (other than the fact makefiles are ugly and evil) is you wind up with something like buildroot, which I can't follow the logic of at _all_ because figuring out what happens in which order has to be extrapolated from the dependencies.
There should be a happy medium, somehow. The numbering approach for my scripts is starting to break down as I get more granularity; it was easy when the sequencing was obvious but after inserting and deleting enough scripts the numbering gets a little funky, and in a few cases I now want to run the same script from two different places (such as the bootloader build needing to be part of both the installer build and the actual firmware image build)...
What I should probably do is abandon the numbering of the scripts and instead break the build into individual scripts with names based on what they build. Then put the sequencing in control scripts that call the other scripts in the correct order. (That's sort of what's happening now, only that's not how I've been _trying_ to handle it so it's really ad-hoc and the naming and sequencing are virtually incomprehensible, and the granularity is very ad-hoc.)
Unfortunately, there are three different major contexts scripts can run in right now: the host environment, a UML instance borrowing the host environment, and a UML instance in a chroot (with /tools). The first context is preferable wherever possible (it's fastest): it's needed to build UML in the first place and also used for building the install images at the end. The "UML with hostfs" is used to build /tools, and the "UML chroot with /tools" is used to build final rootfs images (the installer image and the firmware image).
Perhaps I could indicate what context each script needs in the script's filename, either as a prefix or an extension. With a function that does the correct setup for each one automatically: "runit build-bootloader". Except that the context setup and teardown can be pretty expensive (right now I'm exiting and re-running UML to reset that), which is why they're batched together like they are. But perhaps the setup function can do the batching for me...
On a related note, I should look into how gentoo packages its build scripts, to see if there's something out there I can reuse easily. Of course there's the old problem that some of it is just plain funky and custom. The /tools builds and the same packages built to be part of the firmware image are different scripts.
Sigh. The redesign is intriguing, and I need it to scale the complexity of what I'm doing (it all hit the fan when I added the installer build, which creates a separate image from the firmware build containing a different root filesystem). But I also _really_ want to get a release out ASAP.
Hmmm...
So I painted myself into a corner with the mount fixup, threw my hands up and reverted it, patched the bugs that needed to be fixed for 1.1.0, closed a lot of bugs, and did a release. 1.1.0 is out! (I'll fix mount properly later. There should be a 1.1.1 in 2-3 months.)
Something I plan to do with the Firmware build (after the 1.0 release) is get cross-compiling working properly, which means working QEMU into the build. (It'll probably be dog slow, but we'll see...) This should work fairly naturally with the way /tools gets built now, and might even let me get away with not using UML for the tools build at all and just having phase 2 work with QEMU instead. (In theory, QEMU's stdin and stdout can be attached to a serial device in the emulated environment...)
Right now the compiler is built twice, the first time as a sort of cross compiler, but between different C libraries rather than different processors. The first stage compiler runs on "x86 glibc" and builds "x86 uClibc" binaries. I use this compiler to cross-compile a bunch of x86 uClibc binaries (busybox, make, etc), and then finally rebuild the compiler so it's not a cross compiler anymore but a native "x86 uClibc" compiler.
Swapping out the "x86" attached to uClibc with "x86-64", "PPC", or "arm" shouldn't be _that_ big a deal. Of course there are three variables: which processor, which libc, and which path. Right now, I want the first pass compiler to run on "x86, glibc, and /", and produce output that runs on "x86, uClibc, and /tools". The gcc "--host" and "--target" config options only address _one_ of those three. In fact altering the target path to "/tools" is the reason the /tools symlink is needed, which is really where need for UML in the /tools build comes in. In theory, cross-compiling already deals with all these issues, but in practice every time I've tried to learn this I hit all _sorts_ of bugs in gcc.
I'm going to have to join the gcc mailing list, aren't I? Sigh. They're way too close to the FSF. They frighten me. But if I want to learn this stuff...
Argh.
Beating my head on mount. There is no spec, and if there was it would be composed entirely of corner cases.
Busybox release tomorrow. Is the codebase currently ready? Nope.
Working on it.
On the firmware front, I upgraded the busybox snapshot, and of course picked one that doesn't build because of a glitch I checked in in the headers. Need to get the mount rewrite finished and checked in so tonight's snapshot can be tested in the morning. I also upgraded dropbear, due to a security vulnerability in 0.46.
Other than that, it's a busybox day...
Back in Austin. Actually getting some work done again.
The 2.6.15 kernel finally shipped. I've upgraded FWL and it's building to make sure that works ok. (Since most of the build happens under UML, a build after a kernel upgrade is a good test of a new kernel.) Seems to be ok so far. Fingers crossed.
This clears the way for dropping in the new 1.1.0 busybox, which ships Monday. My priority right now is to fix as much as I can get fixed between now and then, and running the result through the FWL build is the best stress and regression test I have at the moment. (I'll be upgrading the busybox test suite during 1.2.)
The only thing standing between me and shipping FWL 1.0 is finishing up the bootable version. I haven't had a chance to even look at that in most of a month. My new job in Pittsburgh starts the 16th, and I'd really like to have 1.0 out before then.
Yeah, I have to pack too. But first, BusyBox 1.1.0...
My flight back to Austin is thursday afternoon. The next busybox release is the following Monday, so I have about three days to get things together. Then I have to pack to go to Detroit.