Code Wide Open: coding

Showing posts with label coding. Show all posts

Wednesday, June 6, 2012

Refactoring C to C++ Part 2 - Strings, Strings, and More Strings

In the previous entry in this series, a general info dump on a converted class was taken. This time a more general rule will be examined: string usage in C++.

One large improvement in C++ coding over C is in the area of strings. With C, a string is just a random memory pointer to what should be a NULL terminated sequence of proper characters. In practice there ends up being many ways that problems with C strings can creep in.

the final zero-byte null terminator might be missed during creation.
some common library functions will ensure null termination, while others do not.
to determine the length of a string, the entire buffer needs to be walked
resizing and appending to strings can be complex multistage operations with many potential failure points.
resizing a string most often invalidates the existing pointer.
tracking different character encodings can be difficult.

With C++ in general strings are represented by the standard class std::string. However that still does not address the issue of encodings. What the meaning of an individual byte or set of bytes is can depend on many factors. Modern programs have to deal with multiple encodings... even if their developers do not always realize it.

With GTK+ programs there are three main encoding values to keep aware of: locale encoding, filesystem encoding and internal encoding. The internal encoding is used for UI widgets and most internal GTK+ calls. The encoding itself is UTF-8. The locale encoding can vary at runtime, and although it is commonly also UTF-8, it can be any other. The filesystem encoding is different, and used for paths. This can vary greatly for systems that have been upgraded over time.

I'll cover encodings a bit more at a different time, but in the context of GTK+ and C++ the potential encoding allows us to select between the two main classes for strings:

std::string: The standard class for strings in C++. Should be used when the data might be in an encoding other than UTF-8. This is such for GTK+ and Glib APIs that operate with either locale or filesystem encodings.
Glib::ustring: A class from Gtkmm that represents strings of UTF-8 data. Aside from other things it manages details of multi-byte UTF-8 single characters, etc.

Thankfully we end up with some fairly simple rules for C++ programs:

Use a single common encoding for as much of a program as possible. For GTK+ this is UTF-8.
Avoid using legacy C strings such as "char *" or "gchar *"
Use Glib::ustring for all UTF-8 encoded strings.
Use std::string for strings that might be in different encodings.
Be very careful about string conversions, and use explicit encodings.
Do not mix strings and byte data.
Use std::vector<uint8_t> for random byte buffers.
For parameters passed into functions, use "Glib::ustring const &" or "std::string const &".
For return values, prefer functions that return "Glib::ustring" or "std::string" (note that these do not use 'const' nor references).
For functions that return multiple strings, take in parameters of either "Glib::string &" or "std::string &"

Finally we end up with a very important question: does any of this make sense? Hopefully some guidance can be quickly drawn from this information. However, if any point needs more clarification, or was missed, please speak up and let me know what to address.

Friday, May 18, 2012

Refactoring C to C++ Part 1

It turns out that a recent Inkscape source change is a good example for showing some of the process of conversion from C to C++ of a GTK+ type. In doing some recent usability changes, I'd done a bit of a cleanup on 'C++ifying' the Inkscape SPCtrlLine type. Trying to keep our source revision history clear and useful, this one cleanup pass went in as a separate change (revision 11321). This also makes it easy to look at for guidance.

A good starting point is to look at the changes to the main header file itself: sp-ctrlline.h.

First is a simple change to a standard GTK+ macro definition. Yes, in general macros are evil, but the few macros listed at the start of the header are following GTK+ conventions.

21    #define SP_TYPE_CTRLLINE (sp_ctrlline_get_type ())
   23 #define SP_TYPE_CTRLLINE (SPCtrlLine::getType())

The "SP" prefixing is legacy naming that we will ignore for now.
In general this seems like a minor change, with only subtle formatting differences, but there is more to it than that.
Instead of invoking a single function with a long name, it now invokes a static method on a class.
The method being called is now merely "getType()" (and thus is template-friendly).

One important point to keep in mind is that in C++, a struct is just a class that defaults to public:. So once we're in C++-land, just think of "struct" as a rough synonym for "class".

Then the main change in the header involves moving a set of simple C functions to instead be class methods:

33  GType sp_ctrlline_get_type (void);
34 
35  void sp_ctrlline_set_rgba32 (SPCtrlLine *cl, guint32 rgba);
36  void sp_ctrlline_set_coords (SPCtrlLine *cl, gdouble x0, gdouble y0, gdouble x1, gdouble y1);
37  void sp_ctrlline_set_coords (SPCtrlLine *cl, const Geom::Point start, const Geom::Point end);

Since sp_ctrlline_get_type() does not have a pointer to an instance, this will be a static method
Since the others start with SPCtrlLine *cl instance pointers, these will become normal methods.
The prefix "sp_ctrlline_" dissappears as a natural part of moving into a class.
The explicit instance pointers (SPCtrlLine *cl) dissappear and are replaced by the implicit "this" pointer of C++ member functions (aka "methods").
To avoid making unnecessary copies of the start and end parameters on sp_ctrlline_set_coords, we change it to pass constant references instead.
Since C++ references are easiest to understand when read left-to-right, we move the 'const' to be just before the & of the reference.

28    static GType getType();
30    void setRgba32(guint32 rgba);
32    void setCoords(gdouble x0, gdouble y0, gdouble x1, gdouble y1);
34    void setCoords(Geom::Point const &start, Geom::Point const &end);

Moving on now to the sp-strlline.cpp file, there are a few things to note. One is switching from static methods to using an unnamed (or anonymous) namespace. That could have allowed us to drop the "sp_ctrlline_" prefix, but that step was skipped for the moment. We do, however, want to fix casts as we go, such as

49        (GClassInitFunc) sp_ctrlline_class_init, 
   51     reinterpret_cast<GClassInitFunc>(sp_ctrlline_class_init),

Inside of the class_init function around lines 63-72/66-72 there is a simplification due to inheritance. There is no need to create object_class and item_class pointers from the passed in SOCtrlLineClass *klass pointer. The members of the parent types are visible, so we can just use klass directly, such as for

klass->destroy = sp_ctrlline_destroy;

Another handy aspect to turning stand-alone C functions in to C++ methods is that we get compile-type checks and safety and can drop run-time checks, such as at the beginning of the new SPCtrlLine::setRgba32() method:

154    g_return_if_fail (cl != NULL);
155    g_return_if_fail (SP_IS_CTRLLINE (cl));

The checks at lines 171-172 are similarly dropped.

Once we get to the body of the method, there are a few interesting points to be seen:

157        if (rgba != cl->rgba) {
158            SPCanvasItem *item;
159            cl->rgba = rgba;
160            item = SP_CANVAS_ITEM (cl);
161            item->canvas->requestRedraw((int)item->x1, (int)item->y1, (int)item->x2, (int)item->y2);
    155    if (rgba != this->rgba) {
    156        this->rgba = rgba;
    157        canvas->requestRedraw(x1, y1, x2, y2);

At new line 155 since a parameter has the same name as a member, we use "this->" to be able to access the member.
There is no need for the casting macro SP_CANVAS_ITEM from line 160, since a subclass has all the superclass accessible.
Since canvas, x1, y1, x2 and y2 are all members and we are now a member function, use of cl-> and item-> can be dropped.
Since canvas is a member and we are in a member function, we can use it directly in new line 157.
C-style casts, and casting in general, are enemies. By dropping the casts to (int), we let the code get simpler, gain the ability to leverage from overloading, and get errors more visible.

Moving on down into gradient-drag.cpp, there is a very important shift in though/approach for pointers. Looking at line 1579/1578 we see a difference in type:

1579         SPCanvasItem *line = sp_canvas_item_new(sp_desktop_controls(this->desktop),
1580                                                                  SP_TYPE_CTRLLINE, NULL);
     1578    SPCtrlLine *line = SP_CTRLLINE(sp_canvas_item_new(sp_desktop_controls(this->desktop), SP_TYPE_CTRLLINE, NULL));

Instead of holding a pointer to the more generic parent class SPCanvasItem, we hold and use a more specific pointer to the sublcass SPCtrlLine.

With GTK+ in C, holding the more generic type is common, and results in, among other things, excessive use of the type check and type casting macros (such as SP_CTRLLINE()). Aside from any performance slowdown they introduce, they hide things, block overriding, and sacrifice compile-time safety for run-time checks. It is far better to have incorrect code that will result in the compiler rejecting it upfront rather than code that will fail at runtime (but only when a user trips over the specific code path in question).

Similar fixes can be seen in the changes to line-geometry.cpp and elsewhere. In pen-context.h, seltrans.h, text-context.h, and node.h the type of the pertinent members have also been changed from the parent class SPCanvasItem to the more specific subclass SPCtrlLine.

In closing, reviewing the entire change with thoughts as to why different things were done can be quite useful. At some point soon I'll be following up with some more examples, along with some summaries of key points to follow and keep in mind. Additionally, this change did not really touch on any conversion from plain GTK+ over to Gtkmm (the C++ wrapper library for GKT+). Subsequent entries will also touch on those.

Friday, April 10, 2009

What's on My Bookshelf

What's on my bookshelf? That is usually a good question, especially when it comes to software developers and people starting out... but my current answer is "nothing". Of course, before people get the impression that I hate books, or can't read, or such, I should probably clarify things. A little while back we had a bit of a problem with our house. Luckily no people were harmed... but this is what our shelves ended up looking like.

That's right, we had a wee bit of a problem with a house fire. As I mentioned, all people were OK, but the contents were totaled... including the few software books I owned. Oh, and our inkjet printer that was halfway across the house didn't fare too well either. (I never knew they made printers out of spaghetti).

However, in talking with some potential Google Summer of Code students over at Cal State LA the subject came up. It is a good one, and very helpful even to programmers who have been out of school for a while. First off, I should point out that I am not one who buys nor reads many technical books. It started off originally from not having much money, so books were a luxury (well, technical ones... books just for reading are a necessity). By the time I could buy more, I didn't really want to since books on programming tend to be out of date so quickly nowadays. And for the most part, I could get detailed and more specific information from the Internet.

The following books, though, should be read by anyone working on software:

"The Mythical Man-Month" - Frederick P. Brooks Jr. (anniversary edition)
ISBN 0-201-83595-9
The granddaddy of them all. Although first published in 1975 it is still very applicable today. The abstract concepts were found and present well and hold up more than a quarter century later. Be sure to track down the anniversary edition with "No Silver Bullet" in it. It also helps that this is a fairly thin book, so one can actually read it. (Admit it, not many of you pour through gigantic tomes. or at least not to the detail to which they deserve)

"Code Complete" - Steve McConnell
ISBN 978-1556154843 First edition (1993)
ISBN 978-0735619678 Second edition (June 2004)
Big fat textbook on software coding. This one is huge, but well worth the read. Also it has already been distilled down to the essentials, so don't skim through it. Definitely one to be taken a chapter at a time. My personal feeling is that this very savvy person set out to write the definitive textbook on software construction and nailed it. This one also is chock full of studies, statistics and citations.

"Agile Software Development with Scrum" - Ken Schwaber, Mike Beedle
ISBN 978-0-735-61993-7
This, I think, is a number one "must read", second only to The Mythical Man-Month (and in slight contrast to Code Complete, which is a "must slog through"). The key here is that it explains why the software industry has gotten its process almost completely wrong. Waterfall is broken and can't work, etc. It also points out how to make a software development process that actually works. I've been on teams that have brought this into companies both large and small, and it works. Very well. Also it is a very thin book, so all should be able to read it. However, if some of you don't want to actually track down and read a book made of actual paper, the first chapter is available online as "Get Ready for Scrum". (Just be sure to read all 8 pages, since the online article lists it as 7. Don't miss that last page)

"Extreme Programming Explained: Embrace Change" - Kent Beck
For those who know of extreme programming, there is a tendency to either hate it or love it (if you hate it, then you don't "get" it). However, I am surprised at how many people I meet nowadays who aren't familiar with it. (I guess they don't get the joke of Microsoft renaming Windows 2000++ to be "XP" either). Anyway, this is a very good book and even manages to weave in citations of both anthropological studies of the 60s and Spinal Tap. When understood and put into practice correctly, I believe that XP practices (Beck's, not Microsoft's) lead to very high productivity. However it is easy to get them wrong, and I believe that even the book "Extreme Programming Implemented" in the same series misses the mark on some things, including pair programming. Again, this is a relatively thin, so no excuses for not reading it.

"Rapid Development" - Steve McConnell
Another good tome by Steve McConnell. Whereas Code Complete focuses on actually writing code, this one focuses on managing a software project. Quite handy, but again one to take a single chapter at a time.

"Peopleware: Productive Projects and Teams" - Demarco & Lister
ISBN 0-932633-43-9
This is a bit of an older book, but very handy for team leads, managers, etc. Also is a thin book, so no excuse...

Code Wide Open

Wednesday, June 6, 2012

Refactoring C to C++ Part 2 - Strings, Strings, and More Strings

Friday, May 18, 2012

Refactoring C to C++ Part 1

Friday, April 10, 2009

What's on My Bookshelf

Blog Archive

Blogs to Watch

About Me

On Twitter...