Random Hacks

Dec 04, 2022 Pair programming with ChatGPT: A simple dice roller
Like many folks, I spent too much of the last couple days playing with the new release of ChatGPT. I’ve been trying discover what it’s good at, and how it breaks. At its best, it’s remarkable—I think it would actually pass many common “hiring screens” for programmers. And it has taken first place on an Advent of Code problem.

But there are various tricks which will break it. Programs with randomized output occasionally fool it, as do programs with multiple if branches. So I set down this morning for a short pair programming session, and wrote a classic dice roller with ChatGPT. The experience was fascinating. Things started out very frustrating, but they wound up pretty mind-blowing by the end.

Attempt 1: GPT fails hard at rand and sscanf in C

Let’s start with a simple statement of the problem:
Me: Write me a command-line program that takes D&D 5e dice expressions like “3d6+1”, rolls the dice specified, sums up the dice and any bonus modifier, and prints the result.

For example:
$ roll 3d6 12 $ roll 3d6 7 $ roll 1d10+5 11
ChatGPT:
#include <stdio.h> #include <stdlib.h> #include <time.h> int roll_dice(int num_dice, int num_sides) { int result = 0; for (int i = 0; i < num_dice; i++) { result += rand() % num_sides + 1; } return result; } int main(int argc, char* argv[]) { if (argc != 2) { printf("Usage: roll <dice expression>\n"); return 1; } // Seed the random number generator srand(time(NULL)); // Parse the dice expression int num_dice = 0; int num_sides = 0; int bonus = 0; sscanf(argv[1], "%dd%d+%d", &num_dice, &num_sides, &bonus); // Roll the dice and sum them up int result = roll_dice(num_dice, num_sides) + bonus; printf("%d\n", result); return 0; }
To compile and run this program, you can use the following commands:
gcc −o roll roll.c ./roll 3d6
This will compile the program and create an executable called roll, which you can then run with the 3d6 dice expression as an argument. This will roll three 6-sided dice and print the result. You can try different dice expressions to see the program in action.

Note: This is just an example program to illustrate how to roll dice in a D&D 5e dice expression. It is not intended to be a complete or fully-featured program.
There are some nice bits here—it actually seems to have some vague idea that 3d6 means “roll 3 dice with 6 sides each”. And it prints out a Usage notice and remembers to call srand. But the program is wrong, so let’s tell GPT about two “bugs”—one a real bug, and one that I was mistaken about.

Read more…
Mar 09, 2019 In nightly Rust, await! may never return (future cancelation)
I’ve been using the proposed await! and Future features in nightly Rust, and overall, I really like the design. But I did run into one surprise: await! may never return, and has consequences I didn’t fully understand. Let’s take a look.

We’re going to use Rust nightly-2019-02-08, and tokio-async-await. This is highly experimental code, and it will require us to convert back and forth between tokio::Future and the proposed std::future::Future.

You can find the full code on GitHub. We’ll start by enabling the experimental features we’ll need:
#![feature(await_macro, async_await, futures_api)] #[macro_use] extern crate tokio_async_await;
Then we’ll import some libraries, and declare two helper functions tokio_fut and boxed_fut, that make it easy to convert from std::future::Future into tokio::Future and into Box<tokio::Future<..>>, respectively. You can look that code up on GitHub.

Next, we define a function delay, which returns a Future that waits for the specified number of milliseconds:
fn delay(millis: u64) -> Delay { Delay::new( Instant::now() + Duration::from_millis(millis), ) }
Canceling a Future

Now, we can define two tasks:
/// An asynchronous function that completes quickly. async fn quick_task() -> Result<&'static str> { println!("START quick_task"); await!(delay(10)).context("delay failed")?; println!("END quick_task"); Ok("quick_task result") } /// An asynchronous function that completes very slowly. async fn slow_task() -> Result<&'static str> { println!("START slow_task"); await!(delay(10_000)).context("delay failed")?; println!("END slow_task"); Ok("slow_task result") }
Here, quick_task waits for 10 milliseconds, and slow_task waits for 10,000 milliseconds. We can combine them using select_all:

Read more…
Mar 08, 2019 Should Rust channels panic on send if nobody's listening?

Lately, I’ve been working on several real-world systems using Rust’s async and tokio. As you can see on the areweasyncyet.rs site, this requires using nightly Rust and the experimental tokio-async-await library. I hope to talk more about these experiences soon!

But today, I want to talk about channel APIs in Rust. A question was raised by @matklad on GitHub:

I’ve migrated rust-analyzer to crossbeam-channel 0.3, and the thing I’ve noticed is that every .send is followed by .unwrap. Perhaps we should make this unwrapping behavior the default, and introduce a separate checked_send which returns a Result?

BurntSushi followed up on Reddit:

Because the vast majority of uses of send are like this: ch.send(foo).unwrap(). That is, you panic because you generally regard it as a bug if you’re still sending values when all receivers have been dropped. Why? Because this is generally a property of the program’s organization.

I hesitate to disagree with two such excellent developers, but my experiences with this issue are almost the exact opposite of matklad’s and BurntSushi’s.

Read more…
Nov 16, 2015 Bare Metal Rust 3: Configure your PIC to handle interrupts correctly
Want to build your own kernel in Rust? See Bare Metal Rust to get started.

We’re almost ready to write a keyboard driver in Rust! But first, we need to deal with two obstacles: setting up the PIC, and handling interrupts without crashing. This is one of the most frustrating steps, as Julia Evans explains in her hilarious and very helpful post After 5 days, my OS doesn’t crash when I press a key:
1. Turn interrupts on (sti).
2. The OS AGAIN crashes every time i press a key. Read “I Can’t Get Interrupts Working” again. This is called “I’m receiving EXC9 instead of IRQ1 when striking a key?!” Feel on top of this.
3. Remap the PIC so that interrupt i gets mapped to i + 32, because of an Intel design bug. This basically looks like just typing in a bunch of random numbers, but it works.
4. 12. THE OS IS STILL CRASHING WHEN I PRESS A KEY. This continues for 2 days.
We’re going to follow Julia Evans’ roadmap. (She saved me several days of suffering.) And once we’re past these next few obstacles, things will get easier. Let’s talk to the PIC first.

The 8295/8295A Programmable Interrupt Controller

We’re going to with the retro approach here, and handle interrupts using the 8295 PIC. You can read all about it on the OSDev wiki, as usual. The PIC works fine in 64-bit mode, but someday, if we want to support multiple processors and I/O, we’ll eventually need to support the newer APIC and IOAPIC. But for now, let’s keep it simple.

Technically, the x86 architecture has two PIC chips, usually known as PIC1 and PIC2. PIC1 handles external interrupts 0–7, and PIC2 handles 8–15. PIC2 is actually chained into interrupt 2 on PIC1, which means that we’ll frequently need to talk to them as a pair.

Unfortunately, the modern x86 architecture reserves CPU interrupts 0-31 for processor exceptions. This means that when we press a key, the CPU will think it just received the “EXC9” mentioned by Julia Evans, which the Intel manual tells me is “Coprocessor-Segment-Overrun Exception.” So we need to tell our PIC that, no, McGyver and Miami Vice are no longer cutting-edge television, that there’s this new-fangled thing called 386 Protected Mode, and that it needs to start mapping interrupts at offset 32.

Read more…
Nov 11, 2015 Bare Metal Rust 2: Retarget your compiler so interrupts are not evil

Want to build your own kernel in Rust? See the Bare Metal Rust page for more resources and more posts in this series. There’s just a few more posts to go until we have keyboard I/O!

Hacking on kernels in Rust is a lot of fun, but it can also result in massive frustration when QEMU starts rebooting continuously because of a triple fault. One good way to minimize frustration is to wander on over to the ever-helpful OSDev wiki. It’s sort of like having an experienced kernel developer on hand to give grouchy but sanity-saving advice.

The OSDev Beginner Mistakes page, in particular, has saved me a couple times already. But there’s one bit of advice that I want to focus on today, which I’ve marked in boldface below:

Beginners often ask “What is the easiest way to do X?” rather than “What is the best, proper, correct way to do X?”. This is dangerous as the newcomer doesn’t invest time into understanding the superior way to implement something, but instead picks a conceptually simpler method copied from a tutorial. Indeed, the simpler route is often too simple and ends up causing more problems in the long run, because the beginner is ignorant of the superior alternative and doesn’t know when it is better to switch. What’s so bad about taking the hard route instead?

Common examples include being too lazy to use a Cross-Compiler, developing in Real Mode instead of Protected Mode or Long Mode, relying on BIOS calls rather than writing real hardware drivers, using flat binaries instead of ELF, and so on. Experienced developers use the superior alternatives for a reason…

So what does that mean, “being too lazy to use a cross-compiler”? It means cheating, and using our regular rustc setup to build ordinary user-space code, and then trying to run it in kernel space. This will actually work, at least for a while. But eventually, we may find ourselves engaged in multiweek debugging nightmares.

So today, I’m going to talk about the sanity-saving difference between --target x86_64-unknown-linux-gnu and --target x86_64-unknown-none-gnu, and how to get your Rust compiler ready for the kernel world.

Read more…
Nov 09, 2015 Bare Metal Rust: Low-level CPU I/O ports
Want to build your own kernel in Rust? See the Bare Metal Rust page for more resources and more posts in this series.

Rust is a really fun language: It allows me to work on low-level kernel code, but it also allows me to wrap my code up in clean, high-level APIs. If you this sounds interesting, you should really check out Philipp Oppermann’s blog posts about writing a basic x86_64 operating system kernel in Rust. He walks you through booting the kernel, entering long mode, getting Rust running, and printing text to the screen.

Once you get a basic kernel running, you’ll probably want to start working on basic I/O, which requires interrupts. And this point, you’ll find that pretty much every tutorial dives right into the in and out instructions. For example, if you look at the OSDev.org introduction to interrupts, the very first code you’ll see is (comments added):
```
mov al,20h  ; Move interrupt acknowledgment code into al.
out 20h,al  ; Write al to PIC on port 0x20.
```
Here, we’re talking to the PIC (“Programmable Interrupt Controller”), and we’re telling it that we’ve finished handling a processor interrupt. To do this, we need to write an 8-bit status code to the I/O port at address 0x20.

Traditionally, we would wrap this up in an outb (“out byte”) function, which might look something like this in Rust:
// The asm! macro requires a nightly build of Rust, and // we need to opt-in explicitly. #![feature(asm)] unsafe fn outb(value: u8, port: u16) { asm!("outb %al, %dx" :: "{dx}"(port), "{al}"(value) :: "volatile"); }
This writes an 8-byte value to the specified port. It uses the unstable Rust extension asm!, which allows us to use GCC/LLVM-style inline assembly. We’d invoke it like this:
outb(0x20, 0x20);
But let’s see if we can wrap a higher-level API around an I/O port.

Read more…
Jul 19, 2015 Proving sorted lists correct using the Coq proof assistant
About 15 years ago, I was hanging out at the MIT AI Lab, and there was an ongoing seminar on the Coq proof assistant. The idea was that you wouldn’t have to guess whether your programs were correct; you could prove that they worked correctly.

The were just two little problems:
1. It looked ridiculously intimidating.
2. Rumor said that it took a grad student all summer to implement and prove the greatest common divisor algorithm, which sounded rather impractical.
So I decided to stick to Lispy languages, which is what I was officially supposed to be hacking on, anyway, and I never did try to sit in on the seminar.

Taking another look

I should have taken a look much sooner. This stuff provides even more twisted fun than Haskell! Also, projects like the CompCert C compiler are impressive: Imagine a C compiler where every optimization has been proven correct.

Even better, we can write code in Coq, prove it correct, then export it to Haskell or several other functional languages.

Here’s an example Coq proof. Let’s start with a basic theorem that says “If we know A is true, and we know B is true, then we know A /\ B (both A and B) is true.”
Theorem basic_conj : forall (A B : Prop), A -> B -> A /\ B. Proof. (* Give names to our inputs. *) intros A B H_A_True H_B_True. (* Specify that we want to prove each half of /\ separately. *) split. - apply H_A_True. (* Prove the left half. *) - apply H_B_True. (* Prove the right half. *) Qed.
But Coq proofs are intended to be read interactively, using a tool like CoqIDE or Emacs Proof General. Let me walk you through how this proof would really look.
Proof.
At this point, the right-hand pane will show the theorem that we’re trying to prove:
1 subgoals, subgoal 1 (ID 1) ============================ forall A B : Prop, A -> B -> A /\ B
Read more…
May 17, 2015 Unscientific column store benchmarking in Rust
I’ve been fooling around with some natural language data from OPUS, the “open parallel corpus.” This contains many gigabytes of movie subtitles, UN documents and other text, much of it tagged by part-of-speech and aligned across multiple languages. In total, there’s over 50 GB of data, compressed.

“50 GB, compressed” is an awkward quantity of data:
- It’s large enough so that Pandas can’t suck it all into memory.
- It’s large enough that PostgreSQL stops being fun, and starts feeling like work. (Although cstore_fdw might help.)
- It’s too small to justify cloud-based tools like Hadoop. As the saying goes, “If it fits on your laptop’s SSD, it’s not big data.” I have USB sticks large enough to hold 50 GB!
Let’s look at various ways to tackle this.

Read more…
Apr 05, 2015 Migrating from Heroku (and Linode) to Docker on AWS

I’ve long been a huge fan of Heroku. They’ve made it super easy to deploy and scale web applications without getting bogged down in server administration. Also, their free tier has been very generous, which made Heroku a perfect place to run weekend projects. (And my clients have happily paid plenty of money to Heroku over the years, so nobody’s been losing out.)

Heroku’s costs and limitations

Lately, the costs of using Heroku for weekend projects have been creeping upwards:

Read more…
Sep 19, 2014 Rust lifetimes: Getting away with things that would be reckless in C++
Over the years, I’ve learned to be cautious with C++ pointers. In particular, I’m always very careful about who owns a given pointer, and who’s in charge of calling delete on it. But my caution often forces me to write deliberately inefficient functions. For example:
vector<string> tokenize_string(const string &text);
Here, we have a large string text, and we want to split it into a vector of tokens. This function is nice and safe, but it allocates one string for every token in the input. Now, if we were feeling reckless, we could avoid these allocations by returning a vector of pointers into text:
vector<pair<const char *,const char *>> tokenize_string2(const string &text);
In this version, each token is represented by two pointers into text: One pointing to the first character, and one pointing just beyond the last character.¹ But this can go horribly wrong:
// Disaster strikes! auto v = tokenize_string2(get_input_string()); munge(v);
Why does this fail? The function get_input_string returns a temporary string, and tokenize_string2 builds an array of pointers into that string. Unfortunately, the temporary string only lives until the end of the current expression, and then the underlying memory is released. And so all our pointers in v now point into oblivion—and our program just wound up getting featured in a CERT advisory. So personally, I’m going to prefer the inefficient tokenize_string function almost every time.

Rust lifetimes to the rescue!

Going back to our original design, let’s declare a type Token. Each token is either a Word or an Other, and each token contains pointers into a pre-existing string. In Rust, we can declare this as follows:
#[deriving(Show, PartialEq)] enum Token<'a> { Word(&'a str), Other(&'a str) }
Read more…
Sep 17, 2014 Deploying Rust applications to Heroku, with example code for Iron
May 30, 2014 Deploying Rust applications to Heroku, with example code for Rustful
May 30, 2014 Installing Rust nightly builds on Ubuntu 10.04 Lucid
May 28, 2014 My personal tool choices for rich JavaScript applications (May 2014)
May 22, 2014 Learning Middle Egyptian with Anki, slowly
May 21, 2014 Scraping your Fitocracy score with capybara-webkit
May 21, 2014 Site update in progress
May 21, 2014 "Build Your Own Probability Monads" paper back online
Jan 18, 2012 Best article I've seen on SOPA
Jun 05, 2011 Screencast: Use Rails and RDF.rb to parse Best Buy product reviews
Jun 03, 2011 Heroku "Celadon Cedar" review
May 20, 2011 Derivatives of algebraic data structures: An excellent tutorial
May 12, 2011 What do these fixed points have in common?
Apr 25, 2011 AWS outage timeline & downtimes by recovery strategy
Dec 20, 2010 The state of Ruby, RDF and Rails 3
Oct 13, 2010 Feedhose demo: Real-time RSS using Node.js and Socket.io
Dec 29, 2009 Visualizing WordNet relationships as graphs
Dec 28, 2009 Experimenting with NLTK
Dec 28, 2009 Interesting Python libraries for natural language processing
Nov 21, 2009 Wave Hackathon
Sep 12, 2009 Upgrading randomhacks.net
Sep 01, 2009 Real-time text annotation with Google Wave
May 08, 2009 Write a 32-line chat client using Ruby, AMQP & EventMachine (and a GUI using Shoes)
May 05, 2009 Financial crisis background and Munger on the banks
Apr 30, 2009 Designing programs with RSpec and Cucumber (plus a book recomendation)
Apr 30, 2009 Remote root holes reported as "denial of service"
Jan 11, 2009 Installing TortoiseGit
Sep 01, 2008 Ubiquitous Hoogle
Oct 02, 2007 Probability monads at Hac 07 II
Sep 18, 2007 Freiburg in October: Scheme, Dylan, and probability monads
Sep 18, 2007 September 8th, 2007
Jul 01, 2007 Ruby-style metaprogramming in JavaScript (plus a port of RSpec)
Apr 28, 2007 Bowling in Haskell: A response to Ron Jeffries
Apr 19, 2007 Robot localization using a particle system monad
Mar 15, 2007 How to make Data.Set a monad
Mar 12, 2007 Monads in 15 minutes: Backtracking and Maybe
Mar 10, 2007 8 ways to report errors in Haskell
Mar 07, 2007 Jim Hefferon's Linear Algebra: A free textbook with fascinating applications
Mar 05, 2007 Three things I don't understand about monads
Mar 03, 2007 Smart classification using Bayesian monads in Haskell
Feb 22, 2007 Bayes' rule in Haskell, or why drug tests don't work
Feb 21, 2007 Refactoring probability distributions, part 2: Random sampling
Feb 21, 2007 Refactoring probability distributions, part 1: PerhapsT
Feb 10, 2007 Probabilistic Functional Programming is cool
Feb 10, 2007 Map fusion: Making Haskell 225% faster
Feb 09, 2007 The first Carnival of Mathematics
Feb 08, 2007 Haskell: Queues without pointers
Feb 07, 2007 Do early adopters use IE?
Feb 02, 2007 Haskell: What happens when you divide infinity by 2?
Feb 01, 2007 Some useful closures, in Ruby
Jan 22, 2007 High-Performance Haskell
Jan 20, 2007 13 Ways of Looking at a Ruby Symbol
Feb 15, 2006 Selenium on Rails, Reloaded: Client-Side Tests in Ruby
Dec 03, 2005 Why Ruby is an acceptable LISP (2005)
Nov 15, 2005 Moving a blog to Typo
Nov 13, 2005 Typo sidebars: Recent Comments and Tagged Articles
Oct 11, 2005 Random Hacks is back online
Oct 11, 2005 McCarthy's Ambiguous Operator
Jul 07, 2003 Preparing for the Winter Garden
Jun 30, 2003 Tomato Update: Weeding and Irrigation
Jun 30, 2003 Mason Update: The Weaver Has Woven
Jun 27, 2003 Comments on "Putting open source on trial"
Jun 27, 2003 Responses to "The Missing Future"
Jun 22, 2003 The Missing Future
Jun 22, 2003 PLUG Protests at SCO
Jun 22, 2003 About the Author
Jun 20, 2003 15 Minutes and 150MB of RAM to Compare Unix and Linux
May 30, 2003 Tomato Progress
May 30, 2003 Checking Code Ownership
Mar 14, 2003 Update on the Strange SCO Case
Mar 10, 2003 SCO Goes Nuclear
Jan 21, 2003 Tomato Advice
Jan 20, 2003 wxWindows Multimedia Work
Jan 20, 2003 Last of the Tomatoes
Jan 19, 2003 Winter Weather
Jan 19, 2003 Back on the Slopes
Dec 31, 2002 Wireless Weblogging
Dec 31, 2002 Hardware Fun With Linux
Dec 15, 2002 Open Source Consultants
Dec 11, 2002 Contractor Hiring Tips
Dec 10, 2002 wxWindows Experiences
Nov 10, 2002 Lightweight Languages 2 Conference (MIT, 2002)
Sep 30, 2002 Fromberger spam filtering paper
Sep 29, 2002 Bayesian Whitelisting: Finding the Good Mail Among the Spam
Sep 24, 2002 Macintosh Developer Pain
Sep 23, 2002 Using Bogofilter with Spam Assassin
Sep 23, 2002 Machine Learning Links
Sep 23, 2002 FTC Spam Archive
Sep 22, 2002 How To Test a Trainable Spam Filter
Sep 20, 2002 Things I Hate About CodeWarrior, Part I
Sep 19, 2002 EU free software study
Sep 16, 2002 Weekend Spam Update
Sep 13, 2002 Why Hygienic Macros Rock
Sep 13, 2002 Bogofilter: A New Spam Filter
Sep 13, 2002 Back from France
Aug 27, 2002 Busy for a While
Aug 13, 2002 RedHat Bill Update
Aug 12, 2002 California Open Source Bill: A Really Bad Idea
Aug 06, 2002 SpamAssassin: An Decent Spam Filter
Jul 23, 2002 Internet Explorer still broken
Jul 22, 2002 Yet Another PHP Security Hole
Jul 22, 2002 Panopticon
Jul 22, 2002 On the Air
Jul 22, 2002 Common RSS Bugs
Jul 21, 2002 Ogg Theora link
Apr 12, 2002 Unix is a zombie
Apr 12, 2002 Random Hacks Site Design
Dec 07, 1998 IMAP multiplexing
Dec 04, 1998 IMAP engine progress
Dec 03, 1998 IMAP command loop
Dec 02, 1998 IMAP and gperf
Dec 01, 1998 IMAP progress report
Nov 30, 1998 Linuxconf and IMAP
Nov 25, 1998 More Gwydion Dylan and BlitzMail
Nov 23, 1998 Gwydion Dylan and BlitzMail
Jul 15, 1998 Parsing C headers with Gwydion Dylan's Melange
Jul 06, 1998 BlitzMail architecture
Jul 04, 1998 Trying out Glade
Jul 03, 1998 Enlightenment and Guile
Jul 02, 1998 Guile's looking much better
Jul 01, 1998 MathMap custom spread filter
Jun 30, 1998 Random Hacks goes online! (and a randomized emboss in MathMap)

Attempt 1: GPT fails hard at `rand` and `sscanf` in C

Canceling a `Future`

The 8295/8295A Programmable Interrupt Controller

Taking another look

Heroku’s costs and limitations

Rust lifetimes to the rescue!

Attempt 1: GPT fails hard at rand and sscanf in C

Canceling a Future

The 8295/8295A Programmable Interrupt Controller

Taking another look

Heroku’s costs and limitations

Rust lifetimes to the rescue!

Attempt 1: GPT fails hard at `rand` and `sscanf` in C

Canceling a `Future`