Showing posts with label software development. Show all posts
Showing posts with label software development. Show all posts

Wednesday, June 6, 2012

Refactoring C to C++ Part 2 - Strings, Strings, and More Strings

In the previous entry in this series, a general info dump on a converted class was taken. This time a more general rule will be examined: string usage in C++.

One large improvement in C++ coding over C is in the area of strings. With C, a string is just a random memory pointer to what should be a NULL terminated sequence of proper characters. In practice there ends up being many ways that problems with C strings can creep in.

  • the final zero-byte null terminator might be missed during creation.
  • some common library functions will ensure null termination, while others do not.
  • to determine the length of a string, the entire buffer needs to be walked
  • resizing and appending to strings can be complex multistage operations with many potential failure points.
  • resizing a string most often invalidates the existing pointer.
  • tracking different character encodings can be difficult.

With C++ in general strings are represented by the standard class std::string. However that still does not address the issue of encodings. What the meaning of an individual byte or set of bytes is can depend on many factors. Modern programs have to deal with multiple encodings... even if their developers do not always realize it.

With GTK+ programs there are three main encoding values to keep aware of: locale encoding, filesystem encoding and internal encoding. The internal encoding is used for UI widgets and most internal GTK+ calls. The encoding itself is UTF-8. The locale encoding can vary at runtime, and although it is commonly also UTF-8, it can be any other. The filesystem encoding is different, and used for paths. This can vary greatly for systems that have been upgraded over time.

I'll cover encodings a bit more at a different time, but in the context of GTK+ and C++ the potential encoding allows us to select between the two main classes for strings:

std::string
The standard class for strings in C++. Should be used when the data might be in an encoding other than UTF-8. This is such for GTK+ and Glib APIs that operate with either locale or filesystem encodings.
Glib::ustring
A class from Gtkmm that represents strings of UTF-8 data. Aside from other things it manages details of multi-byte UTF-8 single characters, etc.

Thankfully we end up with some fairly simple rules for C++ programs:

  • Use a single common encoding for as much of a program as possible. For GTK+ this is UTF-8.
  • Avoid using legacy C strings such as "char *" or "gchar *"
  • Use Glib::ustring for all UTF-8 encoded strings.
  • Use std::string for strings that might be in different encodings.
  • Be very careful about string conversions, and use explicit encodings.
  • Do not mix strings and byte data.
  • Use std::vector<uint8_t> for random byte buffers.
  • For parameters passed into functions, use "Glib::ustring const &" or "std::string const &".
  • For return values, prefer functions that return "Glib::ustring" or "std::string" (note that these do not use 'const' nor references).
  • For functions that return multiple strings, take in parameters of either "Glib::string &" or "std::string &"

Finally we end up with a very important question: does any of this make sense? Hopefully some guidance can be quickly drawn from this information. However, if any point needs more clarification, or was missed, please speak up and let me know what to address.

Read more!

Friday, May 18, 2012

Refactoring C to C++ Part 1

It turns out that a recent Inkscape source change is a good example for showing some of the process of conversion from C to C++ of a GTK+ type. In doing some recent usability changes, I'd done a bit of a cleanup on 'C++ifying' the Inkscape SPCtrlLine type. Trying to keep our source revision history clear and useful, this one cleanup pass went in as a separate change (revision 11321). This also makes it easy to look at for guidance.

A good starting point is to look at the changes to the main header file itself: sp-ctrlline.h.

First is a simple change to a standard GTK+ macro definition. Yes, in general macros are evil, but the few macros listed at the start of the header are following GTK+ conventions.

21    #define SP_TYPE_CTRLLINE (sp_ctrlline_get_type ())
   23 #define SP_TYPE_CTRLLINE (SPCtrlLine::getType())
  • The "SP" prefixing is legacy naming that we will ignore for now.
  • In general this seems like a minor change, with only subtle formatting differences, but there is more to it than that.
  • Instead of invoking a single function with a long name, it now invokes a static method on a class.
  • The method being called is now merely "getType()" (and thus is template-friendly).

One important point to keep in mind is that in C++, a struct is just a class that defaults to public:. So once we're in C++-land, just think of "struct" as a rough synonym for "class".

Then the main change in the header involves moving a set of simple C functions to instead be class methods:

33  GType sp_ctrlline_get_type (void);
34 
35  void sp_ctrlline_set_rgba32 (SPCtrlLine *cl, guint32 rgba);
36  void sp_ctrlline_set_coords (SPCtrlLine *cl, gdouble x0, gdouble y0, gdouble x1, gdouble y1);
37  void sp_ctrlline_set_coords (SPCtrlLine *cl, const Geom::Point start, const Geom::Point end);
  • Since sp_ctrlline_get_type() does not have a pointer to an instance, this will be a static method
  • Since the others start with SPCtrlLine *cl instance pointers, these will become normal methods.
  • The prefix "sp_ctrlline_" dissappears as a natural part of moving into a class.
  • The explicit instance pointers (SPCtrlLine *cl) dissappear and are replaced by the implicit "this" pointer of C++ member functions (aka "methods").
  • To avoid making unnecessary copies of the start and end parameters on sp_ctrlline_set_coords, we change it to pass constant references instead.
  • Since C++ references are easiest to understand when read left-to-right, we move the 'const' to be just before the & of the reference.
28    static GType getType();
30    void setRgba32(guint32 rgba);
32    void setCoords(gdouble x0, gdouble y0, gdouble x1, gdouble y1);
34    void setCoords(Geom::Point const &start, Geom::Point const &end);

Moving on now to the sp-strlline.cpp file, there are a few things to note. One is switching from static methods to using an unnamed (or anonymous) namespace. That could have allowed us to drop the "sp_ctrlline_" prefix, but that step was skipped for the moment. We do, however, want to fix casts as we go, such as

49        (GClassInitFunc) sp_ctrlline_class_init, 
   51     reinterpret_cast<GClassInitFunc>(sp_ctrlline_class_init),

Inside of the class_init function around lines 63-72/66-72 there is a simplification due to inheritance. There is no need to create object_class and item_class pointers from the passed in SOCtrlLineClass *klass pointer. The members of the parent types are visible, so we can just use klass directly, such as for

klass->destroy = sp_ctrlline_destroy;

Another handy aspect to turning stand-alone C functions in to C++ methods is that we get compile-type checks and safety and can drop run-time checks, such as at the beginning of the new SPCtrlLine::setRgba32() method:

154    g_return_if_fail (cl != NULL);
155    g_return_if_fail (SP_IS_CTRLLINE (cl));

The checks at lines 171-172 are similarly dropped.

Once we get to the body of the method, there are a few interesting points to be seen:

157        if (rgba != cl->rgba) {
158            SPCanvasItem *item;
159            cl->rgba = rgba;
160            item = SP_CANVAS_ITEM (cl);
161            item->canvas->requestRedraw((int)item->x1, (int)item->y1, (int)item->x2, (int)item->y2);
    155    if (rgba != this->rgba) {
    156        this->rgba = rgba;
    157        canvas->requestRedraw(x1, y1, x2, y2);
  • At new line 155 since a parameter has the same name as a member, we use "this->" to be able to access the member.
  • There is no need for the casting macro SP_CANVAS_ITEM from line 160, since a subclass has all the superclass accessible.
  • Since canvas, x1, y1, x2 and y2 are all members and we are now a member function, use of cl-> and item-> can be dropped.
  • Since canvas is a member and we are in a member function, we can use it directly in new line 157.
  • C-style casts, and casting in general, are enemies. By dropping the casts to (int), we let the code get simpler, gain the ability to leverage from overloading, and get errors more visible.

Moving on down into gradient-drag.cpp, there is a very important shift in though/approach for pointers. Looking at line 1579/1578 we see a difference in type:

1579         SPCanvasItem *line = sp_canvas_item_new(sp_desktop_controls(this->desktop),
1580                                                                  SP_TYPE_CTRLLINE, NULL);
     1578    SPCtrlLine *line = SP_CTRLLINE(sp_canvas_item_new(sp_desktop_controls(this->desktop), SP_TYPE_CTRLLINE, NULL));

Instead of holding a pointer to the more generic parent class SPCanvasItem, we hold and use a more specific pointer to the sublcass SPCtrlLine.

With GTK+ in C, holding the more generic type is common, and results in, among other things, excessive use of the type check and type casting macros (such as SP_CTRLLINE()). Aside from any performance slowdown they introduce, they hide things, block overriding, and sacrifice compile-time safety for run-time checks. It is far better to have incorrect code that will result in the compiler rejecting it upfront rather than code that will fail at runtime (but only when a user trips over the specific code path in question).

Similar fixes can be seen in the changes to line-geometry.cpp and elsewhere. In pen-context.h, seltrans.h, text-context.h, and node.h the type of the pertinent members have also been changed from the parent class SPCanvasItem to the more specific subclass SPCtrlLine.

In closing, reviewing the entire change with thoughts as to why different things were done can be quite useful. At some point soon I'll be following up with some more examples, along with some summaries of key points to follow and keep in mind. Additionally, this change did not really touch on any conversion from plain GTK+ over to Gtkmm (the C++ wrapper library for GKT+). Subsequent entries will also touch on those.

Read more!

Friday, April 10, 2009

What's on My Bookshelf

What's on my bookshelf? That is usually a good question, especially when it comes to software developers and people starting out... but my current answer is "nothing". Of course, before people get the impression that I hate books, or can't read, or such, I should probably clarify things. A little while back we had a bit of a problem with our house. Luckily no people were harmed... but this is what our shelves ended up looking like.

That's right, we had a wee bit of a problem with a house fire. As I mentioned, all people were OK, but the contents were totaled... including the few software books I owned. Oh, and our inkjet printer that was halfway across the house didn't fare too well either. (I never knew they made printers out of spaghetti).

However, in talking with some potential Google Summer of Code students over at Cal State LA the subject came up. It is a good one, and very helpful even to programmers who have been out of school for a while. First off, I should point out that I am not one who buys nor reads many technical books. It started off originally from not having much money, so books were a luxury (well, technical ones... books just for reading are a necessity). By the time I could buy more, I didn't really want to since books on programming tend to be out of date so quickly nowadays. And for the most part, I could get detailed and more specific information from the Internet.

The following books, though, should be read by anyone working on software:

"The Mythical Man-Month" - Frederick P. Brooks Jr. (anniversary edition)
ISBN 0-201-83595-9
The granddaddy of them all. Although first published in 1975 it is still very applicable today. The abstract concepts were found and present well and hold up more than a quarter century later. Be sure to track down the anniversary edition with "No Silver Bullet" in it. It also helps that this is a fairly thin book, so one can actually read it. (Admit it, not many of you pour through gigantic tomes. or at least not to the detail to which they deserve)

"Code Complete" - Steve McConnell
ISBN 978-1556154843 First edition (1993)
ISBN 978-0735619678 Second edition (June 2004)
Big fat textbook on software coding. This one is huge, but well worth the read. Also it has already been distilled down to the essentials, so don't skim through it. Definitely one to be taken a chapter at a time. My personal feeling is that this very savvy person set out to write the definitive textbook on software construction and nailed it. This one also is chock full of studies, statistics and citations.

"Agile Software Development with Scrum" - Ken Schwaber, Mike Beedle
ISBN 978-0-735-61993-7
This, I think, is a number one "must read", second only to The Mythical Man-Month (and in slight contrast to Code Complete, which is a "must slog through"). The key here is that it explains why the software industry has gotten its process almost completely wrong. Waterfall is broken and can't work, etc. It also points out how to make a software development process that actually works. I've been on teams that have brought this into companies both large and small, and it works. Very well. Also it is a very thin book, so all should be able to read it. However, if some of you don't want to actually track down and read a book made of actual paper, the first chapter is available online as "Get Ready for Scrum". (Just be sure to read all 8 pages, since the online article lists it as 7. Don't miss that last page)

"Extreme Programming Explained: Embrace Change" - Kent Beck
For those who know of extreme programming, there is a tendency to either hate it or love it (if you hate it, then you don't "get" it). However, I am surprised at how many people I meet nowadays who aren't familiar with it. (I guess they don't get the joke of Microsoft renaming Windows 2000++ to be "XP" either). Anyway, this is a very good book and even manages to weave in citations of both anthropological studies of the 60s and Spinal Tap. When understood and put into practice correctly, I believe that XP practices (Beck's, not Microsoft's) lead to very high productivity. However it is easy to get them wrong, and I believe that even the book "Extreme Programming Implemented" in the same series misses the mark on some things, including pair programming. Again, this is a relatively thin, so no excuses for not reading it.

"Rapid Development" - Steve McConnell
Another good tome by Steve McConnell. Whereas Code Complete focuses on actually writing code, this one focuses on managing a software project. Quite handy, but again one to take a single chapter at a time.

"Peopleware: Productive Projects and Teams" - Demarco & Lister
ISBN 0-932633-43-9
This is a bit of an older book, but very handy for team leads, managers, etc. Also is a thin book, so no excuse...

Read more!

Monday, July 21, 2008

Standardization is a Bad Goal...

Recently I've hit upon how to express something that I've learned and worked with over many years: standardization is a bad goal. I know that standardization is something that management, both in software development and out, loves to focus on and push. However, I've often seen it cause more harm than good. There is a much better way to phrase the goal, with "standardization" taking it's proper, subservient role.

The key here is the one word - "goal". Standards themselves are not inherently negative. It is when perspective is lost and they become the goal itself instead of simply a means to a goal that the damage is done. A perfect example of this is military underwear.

For example, in the Eighties, the US Army had come to a realization that much of its purchasing standardization had come to be getting in the way of achieving the mission, and made efforts to reform it. Standard issue underwear were just one example. In order to try to gain consistent quality, specifications for requirements of underwear had been made. Pages upon pages of military specs covered them (tens or hundreds of pages, or perhaps more). However instead of the desired effect of quality and efficiency, over the years these military underwear specs had ended up locking things in to the state of the art from decades long past, pushing up prices and limiting supply. The process was redone with a simple focus on the actual goals and not only did the quality of the 'equipment' supplied to the troops go up, but the price came down.

When it comes to software development, most often there is an unstated goal hiding behind the calls for standardization. The true goal is not really standardization at all, but instead I believe it is most often interoperability. When a piece of software has a "standard" interface to meet (e.g. RFC 821) the goal is that different pieces of software running on different types of computers altogether can talk to each other, aka "interoperate".

Another reason to standardize things in software development is to allow for people new to a project or team to get up to speed quicker and be able to contribute sooner. I would argue that this, too, falls under the more general goal of achieving interoperability, but on a personnel level.

Of course one of the more recognized problems "standardizing" things causes is in limiting the effectiveness of developers. But there is another often overlooked issue: monoculture (and specifically software monoculture). If all developers are on a single version of a single operating system, then chances of code problems going undetected increases. Even moving to a new version of the same operating system can expose latent bugs. Time and again I've seen the quality of a project go up with an increase in variety of developer platforms and tools employed.

Monoculture often infects software development in the name of standardization. When a build system needs to pull code from several sub-projects and put them together, all of the sub-projects need to be able to play nice with the build system. Often management might try to guarantee this by declaring a requirement that all developers "standardize" on a single programming language and a single IDE tool on a single OS platform. Yes, this will reduce problems with setting up the builds, but at what cost? This approach can definitely be called the tail wagging the dog.

The focus needs to shift off of standardization and move to interoperability instead. One should leave the choice of language up to what is most appropriate for the project, based on common factors including target platform, delivery, maintenance, etc. The tool should be left to what allows individual developers to be most productive on whichever projects they get assigned to and may be different from developer to developer (and of course may change even during the course of a single project). Finally, a build system should be chosen to get projects built and delivered as needed. In all of this the requirement for interoperability needs to be explicitly stated and stressed.

A good example is with Java development. There are several choices used for development tools, with IntelliJ IDEA, Eclipse, Emacs and NetBeans among the more common. There are also many OS's that are used as developer platforms, with Linux, Windows and OS X among the more common of these. Despite being very different tools, all of these allow a developer to program in Java. Also most support the common build systems of Ant and of GNU Make in addition to their proprietary project format (and where the tools don't, Ant or Make can be made to support them).

So instead of battling over the various trade-offs of *.ipr files versus .project files, the "build people" and the "engineering people" of a group can standardize on either GNU Make or Ant. Then the requirement for developers would merely be that their workflow was compatible with the project build system and leave the choice of specific tool and/or tools up to the needs of individual developers. Even source control can be mix-n-match. Inkscape's official source repository uses Subversion, but some developers use other interoperable (there's that word again) tools such as SVK, git and others.

Similar comparisons can be made for C and C++, and with Inkscape we see quite a bit of it. Emacs, vi, VisualStudio, Anjuta, Eclipse, Kdevelop, vim, gvim and more have all been seen in use by different contributors. Developers don't don't have to use a standard IDE, but they do use interoperable IDEs and workflows.

To sum up, don't be foolish; clarify your actual goals. As Emerson said, "A foolish consistency is the hobgoblin of little minds."

Read more!