Ghosts of Unix past, part 4: High-maintenance designs

November 23, 2010

This article was contributed by Neil Brown

The bible portrays the road to destruction as wide, while the road to life is narrow and hard to find. This illustration has many applications in the more temporal sphere in which we make many of our decisions. It is often the case that there are many ways to approach a problem that are unproductive and comparatively few which lead to success. So it should be no surprise that, as we have been looking for patterns in the design of Unix and their development in both Unix and Linux, we find fewer patterns of success than we do of failure.

Our final pattern in this series continues the theme of different ways to go wrong, and turns out to have a lot in common with the previous pattern of trying to "fix the unfixable". However it has a crucial difference which very much changes the way the pattern might be recognized and, so, the ways we must be on the look-out for it. This pattern we will refer to as a "high maintenance" design. Alternatively: "It seemed like a good idea at the time, but was it worth the cost?".

While "unfixable" designs were soon discovered to be insufficient and attempts were made (arguably wrongly) to fix them, "high maintenance" designs work perfectly well and do exactly what is required. However they do not fit seamlessly into their surroundings and, while they may not actually leave disaster in their wake, they do impose a high cost on other parts of the system as a whole. The effort of fixing things is expended not on the center-piece of the problem, but on all that surrounds it.

Setuid

The first of two examples we will use to illuminate this pattern is the "setuid" and "setgid" permission bits and the related functionality. In itself, the setuid bit works quite well, allowing non-privileged users to perform privileged operations in a very controlled way. In fact this is such a clever and original idea that the inventor, Dennis Ritchie, was granted a patent for the invention. This patent was since placed in the public domain. Though ultimately pointless, it is amusing to speculate what might have happened had the patent rights been asserted, leading to that aspect of Unix being invented around. Could a whole host of setuid vulnerabilities have been avoided?

The problem with this design is that programs which are running setuid exist in two realms at once and must attempt to be both a privileged service provider, and a tool available to users - much like the confused deputy recently pointed out by LWN reader "cmccabe." This creates a number of conflicts which requires special handling in various different places.

The most obvious problem comes from the inherited environment. Like any tool, the programs inherit an environment of name=value assignments which are often used by library routines to allow fine control of certain behaviors. This is great for tools but potentially quite dangerous for privileged service providers as there is a risk that the environment will change the behavior of the library and so give away some sort of access that was not intended. All libraries and all setuid programs need to be particularly suspicious of anything in the environment, and often need to explicitly ignore the environment when running setuid. The recent glibc vulnerabilities are a perfect example of the difficulty of guarding against this sort of problem.

An example of a more general conflict comes from the combination of setuid with executable shell scripts. This did not apply at the time that setuid was first invented, but once Unix gained the #!/bin/interpreter (or "shebang") method of running scripts it became possible for scripts to run setuid. This is almost always insecure, though various different interpreters have made various attempts to make it secure, such as the "-b" option to csh and the "taint mode" in perl. Whether they succeed or not, it is clear that the setuid mechanism has imposed a real burden on these interpreters.

Permission checking for signal delivery is normally a fairly straightforward matching of the UID of the sending process with the UID of the receiving process, with special exceptions for UID==0 (root) as the sender. However, the existence of setuid adds a further complication. As a setuid program runs just like a regular tool, it must respond to job-control signals and, in particular, must stop when the controlling terminal sends it a SIGTSTP. This requires that the owner of the controlling terminal must be able to request that the process continues by sending SIGCONT. So the signal delivery mechanism needs special handling for SIGCONT, simply because of the existence of setuid.

When writing to a file, Linux (like various flavors of Unix) checks if the file is setuid and, if so, clears the setuid flag. This is not absolutely essential for security, but has been found to be a valuable extra barrier to prevent exploits and is a good example of the wide ranging intrusion of setuid.

Each of these issues can be addressed and largely have been. However they are issues that must be fixed not in the setuid mechanism itself, but in surrounding code. Because of that it is quite possible for new problems to arise as new code is developed, and only eternal vigilance can protect us from these new problems. Either that, or removing setuid functionality and replacing it with something different and less intrusive.

It was recently announced that Fedora 15 would be released with a substantially reduced set of setuid programs. Superficially this seems like it might be "removing setuid functionality" as suggested, but a closer look shows that this isn't the case. The plan for Fedora is to use filesystem capabilities instead of full setuid. This isn't really a different mechanism, just a slightly reworked form of the original. Setuid stores just one bit per file which (together with the UID) determines the capabilities that the program will have. In the case of setuid to root, this is an all or nothing approach. Filesystem capabilities store more bits per file and allow different capabilities to be individually selected, so a program that does not need all of the capabilities of root will not be given them.

This certainly goes some way to increasing security by decreasing the attack surface. However it doesn't address the main problem that the setuid programs exist in an uncertain world between being tools and being service providers. It is unclear if libraries which make use of environment variables after checking that setuid is not in force, will also correctly check if capabilities are not in force. Only a comprehensive audit would be able to tell for sure.

Meanwhile, by placing extra capabilities in the filesystem we impose extra requirements on filesystem implementations, on copy and backup tools, and on tools for examining and manipulating filesystems. Thus we achieve an uncertain increase in security at the price of imposing a further maintenance burden on surrounding subsystems. It is not clear to this author that forward progress is being achieved.

Filesystem links

Our second example, completing the story of high maintenance designs, is the idea of "hard links", known simply as links before symbolic links were invented. In the design of the Unix filesystem, the name of a file is an entity separate from the file itself. Each name is treated as a link to the file, and a file can have multiple links, or even none - though of course when the last link is removed the file will soon be deleted.

This separation does have a certain elegance and there are certainly uses that it can be put to with real value. However the vast majority of files still only have one link, and there are plenty of cases where the use of links is a tempting but ultimately sub-optimal option, and where symbolic links or other mechanisms turn out to be much more effective. In some ways this is reminiscent of the Unix permission model where most of the time the subtlety it provides isn't needed, and much of the rest of the time it isn't sufficient.

Against this uncertain value, we find that:

Archiving programs such as tar need extra complexity to look out for hard links, and to archive the file the first time it is seen, but not any subsequent time.
Similar care is needed in du, which calculates disk usage, and in other programs which walk the filesystem hierarchy.
Anyone who can read a file can create a link to that file which the owner of the file may not be able to remove. This can lead to users having charges against their storage quota that they cannot do anything about.
Editors need to take special care of linked files. It is generally safer to create a new file and rename it over the original rather than to update the file in place. When a file has multiple hard links it is not possible to do this without breaking that linkage, which may not always be desired.
The Linux kernel's internals have an awkward distinction between the "dentry" which refers to the name of a file, and the "inode", which refers to the file itself. In many cases we find that a dentry is needed even when you would think that only the file is being accessed. This distinction would be irrelevant if hard links were not possible, and may well relate to the choice made by the developers of Plan 9 to not support hard links at all.
Hard links would also make it awkward to reason about any name-based access control approach (as discussed in part 3) as a given file can have many names and so multiple access permissions.

While hard links are certainly a lesser evil than setuid, and there is little motivation to rid ourselves of them, they do serve to illustrate how a seemingly clever and useful design can have a range of side effects which can weigh heavily against the value that the design tries to bring.

Avoiding high maintenance designs

The concept described here as "high maintenance" is certainly not unique to software engineering. It is simply a specific manifestation of the so-called law of unintended consequences which can appear in many disciplines.

As with any consequences, determining the root cause can be a real challenge, and finding an alternate approach which does not result in worse consequences is even harder. There are no magical solutions on offer by which we can avoid high maintenance designs and their associated unintended consequences. Rather, here are three thoughts that might go some small way to reining in the worst such designs.

Studying history is the best way to avoid repeating it, and so taking a broad and critical look at our past has some hope of directing is well for the future. It is partly for this reason that "patterns" were devised, to help encapsulate history.
Building on known successes is likely to have fewer unintended consequences than devising new ideas. So following the pattern that started this series of "full exploitation" is, where possible, most likely to yield valuable results.
An effective way to understand the consequences of a design is to document it thoroughly, particularly explaining how it should be used to someone with little background knowledge. Often writing such documentation will highlight irregularities which make it easier to fix the design than to document all the corner cases of it. This is certainly the experience of Michael Kerrisk who maintains the man pages for Linux, and, apparently, of our Grumpy Editor who found that fixing the cdev interface made him less grumpy than trying to document it, unchanged, for LDD3.
When documenting the behavior of the Unix filesystem, it is desirable to describe it as a hierarchical structure, as that was the overall intent. However, honesty requires us to call it as directed acyclic graph (DAG) because that is what the presence of hard links turns it into. It is possible that having to write DAG instead of hierarchy several times might have been enough to raise the question of whether hard links are such a good idea after all.

Harken to the ghosts

In his classic novella "A Christmas Carol", Charles Dickens uses three "ghosts" to challenge Ebenezer Scrooge about his ideology and ethics. They reminded him of his past, presented him with a clear picture of the present, warned him about future consequences, but ultimately left the decision of how to respond to him. We, as designers and engineers, can similarly be challenged as we reflect on these "Ghosts of Unix Past" that we have been exploring. And again, the response is up to us.

It can be tempting to throw our hands up in disgust and build something new and better. Unfortunately, mere technical excellence is no guarantee of success. As Paul McKenney astutely observed, at the 2010 Kernel Summit, economic opportunity is at least an equal reason for success, and is much harder to come by. Plan 9 from Bell Labs attempted to learn from the mistakes of Unix and build something better; many of the mistakes explored in this series are addressed quite effectively in Plan 9. However while Plan 9 is an important research operating system, it does not come close to the user or developer base that Linux has, despite all the faults of the latter. So, while starting from scratch can be tempting, it is rare that it has a long-term successful outcome.

The alternative is to live with our mistakes and attempt to minimize their ongoing impact, deprecating that which cannot be discarded. The x86 CPU architecture seems to be a good example of this. Modern 64-bit processors still support the original 8086 16-bit instruction set and addressing modes. They do this with minimal optimization and using only a small fraction of the total transistor count. But they continue to support it as there has been no economic opportunity to break with the past. Similarly Linux must live with its past mistakes.

Our hope for the future is to avoid making the same sort of mistakes again, and to create such compelling new designs that the mistakes, while still being supported, can go largely unnoticed. It is to this end that it is important to study our past mistakes, collect them into patterns, and be always alert against the repetition of these patterns, or at least to learn how best to respond when the patterns inevitably recur.

So, to conclude, we have a succinct restatement of the patterns discovered on this journey, certainly not a complete set of patterns to be alert for, but a useful collection nonetheless.

Firstly there was "Full exploitation": a pattern hinted at in that early paper on Unix and which continues to provide strength today. It involves taking one idea and applying it again and again to diverse aspects of a system to bring unity and cohesiveness. As we saw with signal handlers, not all designs benefit from full exploitation, but those that do can bring significant value. It is usually best to try to further exploit an existing design before creating something new and untried.

"Conflated" designs happen when two related but distinct ideas are combined in a way that they cannot easily be separated. It can often be appropriate to combine related functionality, whether for convenience or efficiency, but it is rarely appropriate to tie aspects of functionality together in such a way that they cannot be separated. This is an error which can be recognized as the design is being created, though a bit of perspective often makes it a lot clearer.

"Unfixable" designs are particularly hard to recognize until the investment of time in them makes replacing them unpalatable. They are not clearly seen until repeated attempts to fix the original have resulted in repeated failures to produce something good. Their inertia can further be exacerbated by a stubbornness to "fix it if it kills me", or an aversion to replacement because "it is better the devil you know". It can take substantial maturity to know when it is time to learn from past mistakes, give up on failure, and build something new and better. The earlier we can make that determination, the easier it will be in the long run.

Finally "high maintenance" designs can be the hardest for early detection as the costs are usually someone else's problem. To some extent these are the antithesis of "fully exploitable" designs as, rather than serving as a unifying force to bring multiple aspects of a system together, they serve as an irritant which keeps other parts unsettled yet doesn't even produce a pearl. Possibly the best way to avoid high maintenance designs is to place more emphasis on full exploitation and to be very wary of including anything new and different.

If identifying, describing, and naming these patterns makes it easier to detect defective designs early and serves to guide and encourage effective design then they will certainly have filled their purpose.

Exercises for the interested reader

Identify a design element in the IP protocol suite which could be described as "high maintenance" or as having "unintended consequences".
Choose a recent extension to Linux and write some comprehensive documentation, complete with justification and examples. See if that suggests any possible improvements in the design which would simplify the documentation.
Research and enumerate uses of "hard links" which are not adequately served by using symbolic links instead. Suggest technologies that might effectively replace these other uses.
Describe your "favorite" failings in Unix or Linux and describe a pattern which would help with early detection and correction of similar failings.

Index entries for this article
Kernel	Development model/Patterns
GuestArticles	Brown, Neil

Ghosts of Unix past, part 4: High-maintenance designs

Posted Nov 23, 2010 18:08 UTC (Tue) by ironiridis (guest, #60586) [Link] (38 responses)

I don't particularly love the reference to the bible.

Ghosts of Unix past, part 4: High-maintenance designs

Posted Nov 23, 2010 18:20 UTC (Tue) by corbet (editor, #1) [Link] (14 responses)

It's a book. One can find useful advice there. I didn't see any justification for censoring the reference.

Ghosts of Unix past, part 4: High-maintenance designs

Posted Nov 23, 2010 18:23 UTC (Tue) by ironiridis (guest, #60586) [Link] (1 responses)

Not asking for censorship. Just pointing out that I'm not fond of it. One could equally find valuable advice in the Qur'an; I wouldn't be wild about a technical article referencing that either.

Ghosts of Unix past, part 4: High-maintenance designs

Posted Nov 24, 2010 19:33 UTC (Wed) by brother_rat (subscriber, #1895) [Link]

Would you have the same problem if the quote appeared without a reference? The linked article on the Confused Deputy also happens to reference a biblical idea but is much less obvious about it.

Ghosts of Unix past, part 4: High-maintenance designs

Posted Nov 24, 2010 0:06 UTC (Wed) by ikm (guest, #493) [Link] (11 responses)

The reason is quite simple: it's a religious book. Religion, politics and sex are all sensitive flammable topics which should better be left aside. I wouldn't suggest an actual censorship, of course, just something to be aware of.

Ghosts of Unix past, part 4: High-maintenance designs

Posted Nov 24, 2010 10:00 UTC (Wed) by marcH (subscriber, #57642) [Link] (10 responses)

Until very recently, I thought that the high level of education required to enjoy LWN would restrict the readership to reasonable people only. I mean people capable of reason; people with the ability to abstract a couple of harmless sentences away from the totally irrelevant religious book they are from.

I am afraid I have just been proved wrong. Political correctness seems to have infiltrated everything.

Ghosts of Unix past, part 4: High-maintenance designs

Posted Nov 24, 2010 11:04 UTC (Wed) by vonbrand (guest, #4458) [Link]

You'd be surpised by the range of irrational beliefs held by otherwise well-educated people (mostly outside their real area of expertise, that is).

Ghosts of Unix past, part 4: High-maintenance designs

Posted Nov 24, 2010 11:41 UTC (Wed) by ikm (guest, #493) [Link] (8 responses)

I think the word to use here is "selfish", not "unreasonable". Basically a selfish reader would argue that the author is better omit stuff that might rub that reader wrong, and would sound that concern of his. While there is some truth to it, there's also the desire of the author to write what he actually wants to the way he wants to. Most will agree that the latter is to take precedence, but the point is, maybe it is just easier sometimes to avoid any confrontation in the first place. Again, this is for the author alone to decide.

I would also add that "with the high level of education required to enjoy LWN" one could expect a somewhat elevated level of people who are opposed to religion. Of course most of them wouldn't care, but still, references to religious content would be frowned upon to some extent.

Ghosts of Unix past, part 4: High-maintenance designs

Posted Nov 24, 2010 14:55 UTC (Wed) by ironiridis (guest, #60586) [Link] (2 responses)

You're right; sorry about that. I shouldn't be paying for a service that "might rub me the wrong way".

It's been great guys. See ya.

Ghosts of Unix past, part 4: High-maintenance designs

Posted Nov 24, 2010 23:52 UTC (Wed) by ikm (guest, #493) [Link]

I didn't mean to judge anyone, and especially you. For the record, I didn't particularly enjoy that biblical reference as well, so I do support you on this one. But lwn is a nice place, so why not leave people to believe what they want and quote what they want? We all have our differences. It would mean a lot to me if you would try to rethink your decision.

Ghosts of Unix past, part 4: High-maintenance designs

Posted Nov 30, 2010 16:58 UTC (Tue) by jone (guest, #62596) [Link]

'I shouldn't be paying for a service that "might rub me the wrong way".'

yeah .. that's why i don't pay my taxes either :)

but seriously .. i'm guessing it's understandable that some might be overly sensitive particularly if you've been thwacked too often or abused with out of context biblical references .. in a similar vein - i'm guessing that any sort of "open kimono" or "money shot" references might be equally offensive to people who may have been sexually abused

perhaps an entropy analogy would be more appropriate here since it's generally benign and science is generally the more widely accepted school of religious thought that nobody will complain too much about

(let's see if i've covered all the bases .. government/politics - check .. religion -check .. sex - check .. ok - my work here is done)

elevated ?

Posted Nov 24, 2010 15:02 UTC (Wed) by copsewood (subscriber, #199) [Link] (2 responses)

"one could expect a somewhat elevated level of people who are opposed to religion"

Oh dear. Please reread that sentence slowly and try to consider how it might seem, to someone who doesn't agree with you, for you to claim superiority on that account. It might surprise you that Muslims, Jews, Buddhists, Christians, Atheists, Sikhs, Agnostics and others can be found at any point on the scale of learning from illiterate to professorial.

As to the quote of timeless ancient wisdom within the article, I found it amusing, agreeable, appropriate and illuminating. Appropriate because those who won't learn from the mistakes of the past are doomed to repeat these. It could of course be equally appropriate for Bertrand Russell or Karl Marx to be quoted in a well-thought out article in LWN regardless of the fact that these significant thinkers were atheists.

elevated ?

Posted Nov 24, 2010 17:10 UTC (Wed) by ikm (guest, #493) [Link]

Sorry, I didn't mean to insult anyone. I won't go on to describe just why I think it is elevated, let's just leave it as a opinion of mine. As for how it might seem to others -- you are right, I have mentioned there would be no good coming from discussing that.

I also won't participate in this anymore; clearly, this discussion IS the road to destruction no matter how you go about it.

elevated ?

Posted Nov 26, 2010 23:16 UTC (Fri) by giraffedata (guest, #1954) [Link]

It might surprise you that Muslims, Jews, Buddhists, Christians, Atheists, Sikhs, Agnostics and others can be found at any point on the scale of learning from illiterate to professorial.

For my part, I don't claim educated people are superior to uneducated or that religious people are inferior to nonreligious, and I don't even know what "elevated" means as a quality of a person, but let me say that in spite of the diversity you point out, I'm willing to bet there is a strong negative correlation between education and religiousness.

I haven't seen any study of this, and I think one challenge in reporting such would be measuring "religious." I do believe a lot of people who describe themselves as religious aren't really. E.g. in choosing between medical treatments, one based on scientific conclusions and the other based on teaching of clergy, many such people would easily choose the former.

Tying back to the issue with the article, I doubt the author expected us to believe religiously that the road to destruction is wide, but rather to consider from our own educations whether it's true.

Ghosts of Unix past, part 4: High-maintenance designs

Posted Nov 24, 2010 20:07 UTC (Wed) by dlang (guest, #313) [Link] (1 responses)

you religion is that all 'Religions' are invalid and only held by uneducated people, thus any reference to anything Religious should not exist.

you seek to impose your Religion on everyone else by preventing anyone else from even mentioning their Religion, or anything related to it.

you can't do the English language without the KJV

Posted Nov 24, 2010 20:18 UTC (Wed) by dmarti (subscriber, #11625) [Link]

"I have stolen more quotes and thoughts and purely elegant little starbursts of writing from the Book of Revelation than anything else in the English language--and it is not because I am a biblical scholar, or because of any religious faith, but because I love the wild power of the language and the purity of the madness that governs it and makes it music." -- Dr. Hunter S. Thompson (also, Rev. 22:18-19: the first "noderivs" license?)

Ghosts of Unix past, part 4: High-maintenance designs

Posted Nov 23, 2010 19:55 UTC (Tue) by JamesErik (subscriber, #17417) [Link] (13 responses)

I do. 'Tis a good article all around.

Ghosts of Unix past, part 4: High-maintenance designs

Posted Nov 23, 2010 19:58 UTC (Tue) by ironiridis (guest, #60586) [Link] (12 responses)

Make sure to bring up Autotools in the middle of your pastor's next sermon, then.

Ghosts of Unix past, part 4: High-maintenance designs

Posted Nov 23, 2010 21:09 UTC (Tue) by stijn (guest, #570) [Link] (8 responses)

The bible has had, among other things, huge impact on our use of language, on literature, on phrases, sayings, ideas, plots, and shift @list. Contrast this with the automake manual at your leisure. The reference in the fine article seems apt enough, and there is absolutely nothing that suggests disrespect, I would say quite the contrary. Sermons can in fact be wide ranging and include comments on or quote from aspects of modern day life. Your discontent is puzzling.

Ghosts of Unix past, part 4: High-maintenance designs

Posted Nov 23, 2010 21:18 UTC (Tue) by ironiridis (guest, #60586) [Link] (7 responses)

My discontent is merely that the bible is irrelevant to the discussion. There are plenty of other literary examples of the concept portrayed. The bible (at least the version commonly cited today) depicts murder, genocide, torture, and plenty of other vile topics that don't pertain to technical topics.

I suggested the commenter bring up Automake during the next sermon that his or her pastor delivers because it bares the same relevancy. One could discuss the fact that Automake is archaic, filled with ancient lore, long-dead language, and even relates to Creation itself. That doesn't make it appropriate to bring up in church (where the audience to such a discussion wouldn't care for it), just as bringing up the bible in a technical article about technical topics is jarring and bewildering.

I have no ire for Christians, or their beliefs. I simply don't care to be reminded of that particular spectrum of humanity when I am reading about Linux.

Still puzzled?

Ghosts of Unix past, part 4: High-maintenance designs

Posted Nov 23, 2010 21:33 UTC (Tue) by stijn (guest, #570) [Link]

I had assumed you came from the other side of the fence. I see a little more logic to your position now, but I am still mildly puzzled (in the other direction now, it is a pleasingly swaying sensation). The bible is an ancient book, a cauldron of many things, among which it having shaped some of our language and sayings. It is in fact much more than the holy book of christianity. To conclude with something completely (un)related, I've always thought that Richard Feynman was very lucid when commenting on both science and religion.

Ghosts of Unix past, part 4: High-maintenance designs

Posted Nov 23, 2010 22:09 UTC (Tue) by Simetrical (guest, #53439) [Link] (4 responses)

Shakespeare's works include murder, racism, sexism, and plenty of vile topics (many of which, indeed, can be linked to Christianity) that don't pertain to technical topics. Would you have objected to a Shakespeare quote?

Ghosts of Unix past, part 4: High-maintenance designs

Posted Nov 23, 2010 22:31 UTC (Tue) by ironiridis (guest, #60586) [Link] (3 responses)

To be clear, I'm not "objecting". To object would imply that I'd like the article to be censored, which isn't the case.

What I reacted to was referencing material that was simultaneously irrelevant and offensive to some (I'd wager many, in fact, but "some" is irrefutable)... while adding essentially no value to the article itself.

As an example, I don't understand why a piece of work that glorifies the genital mutilation of infants would be chosen to "clarify" why the setuid bit is a high-maintenance design. In fact, I can see clearly and totally without the biblical reference why it is a design that is difficult to maintain and develop around. I can understand the motivation behind its original design, and the frustration and confusion it causes today. All without being reminded that approximately 6,000 boys born each day in the US have their penis mutilated due mainly to a precedent set 4,000 years ago and perpetuated today mostly out of family indoctrination and brainwashing.

It's largely irrelevant, I suppose; my comment won't make any difference in the mind of the editor or author. Clearly my bias clashes with theirs. I simply wanted to make it known that, contrary to popular opinion, a biblical reference is not universally accepted as relevant, friendly, or innocent.

Ghosts of Unix past, part 4: High-maintenance designs

Posted Nov 24, 2010 6:37 UTC (Wed) by nicooo (guest, #69134) [Link]

Why are you reading an article about an operating system named after genital mutilation?

Ghosts of Unix past, part 4: High-maintenance designs

Posted Nov 24, 2010 13:11 UTC (Wed) by jackb (guest, #41909) [Link]

John Harvey Kellogg was also a proponent of circumcision so when you go to the grocery store do you protest the fact that they stock Corn Flakes?

Ghosts of Unix past, part 4: High-maintenance designs

Posted Nov 24, 2010 21:18 UTC (Wed) by jordanb (guest, #45668) [Link]

Remind me to never invite you to any parties.

Ghosts of Unix past, part 4: High-maintenance designs

Posted Nov 23, 2010 22:58 UTC (Tue) by neilbrown (subscriber, #359) [Link]

> There are plenty of other literary examples of the concept portrayed.

I would certainly be interested in any you could suggest. I tried to think of others and the closest I came was Douglas Adams' quip about underestimating the ingenuity of fools - it is in the right sort of direction but has entirely the wrong emphasis.

(Not that I think the bible is either more or less appropriate in a technical article than Dickens or Adams, but I'm keen to broaden my horizons and would love to hear any references you have in mind).

Ghosts of Unix past, part 4: High-maintenance designs

Posted Nov 23, 2010 21:26 UTC (Tue) by JamesErik (subscriber, #17417) [Link] (2 responses)

I'm not sure why a non-programmer of any stripe would care about Autotools, but I expect that many programmers care about fundamental truths of human nature. After all, it's easy and commonplace to: be up to one's eyeballs in debt, surrender liberty for security, use Windows, etc. It's often a long time before one figures out that the easy road was the road to destruction.

Neil makes a good case that this fundamental truth is applicable to the domain of Operating System design. A pat on the back to Neil. And yes I did want to make clear to Our Editor, in light of your post, that indeed someone from the readership liked and appreciated the reference.

Ghosts of Unix past, part 4: High-maintenance designs

Posted Nov 23, 2010 21:31 UTC (Tue) by ironiridis (guest, #60586) [Link]

I'm certainly not interested in an arms race here, but if it mattered to me as much as it seems to matter to you, I'd just end my paid subscription.

Ghosts of Unix past, part 4: High-maintenance designs

Posted Nov 24, 2010 0:15 UTC (Wed) by ikm (guest, #493) [Link]

> it's easy and commonplace to: be up to one's eyeballs in debt, surrender liberty for security, use Windows,

participate on lwn.net flames...

> It's often a long time before one figures out that the easy road was the road to destruction.

Indeed it is!

Ghosts of Unix past, part 4: High-maintenance designs

Posted Nov 23, 2010 20:29 UTC (Tue) by martinfick (subscriber, #4455) [Link]

What's wrong with quoting some of the world's earliest science fiction? :)

Ghosts of Unix past, part 4: High-maintenance designs

Posted Nov 24, 2010 4:25 UTC (Wed) by drag (guest, #31333) [Link] (2 responses)

Now you know what it feels like to be bigot.

It's easy and convenient, isn't it?

Ghosts of Unix past, part 4: High-maintenance designs

Posted Nov 24, 2010 13:58 UTC (Wed) by corbet (editor, #1) [Link] (1 responses)

This kind of comment doesn't help either, though. I'd really like to see a bit less name-calling on LWN...can we try for that, please?

Thanks.

Ghosts of Unix past, part 4: High-maintenance designs

Posted Nov 25, 2010 19:31 UTC (Thu) by drag (guest, #31333) [Link]

My apologies. I'll resist in the future.

Ghosts of Unix past, part 4: High-maintenance designs

Posted Nov 24, 2010 14:34 UTC (Wed) by __alex (guest, #38036) [Link] (1 responses)

Given the author's publicly known religious beliefs it does seem a bit more like preaching than just a casually borrowed allegory. I hope that overtly religious meandering in editorial doesn't become a regular feature on LWN.

Ghosts of Unix past, part 4: High-maintenance designs

Posted Nov 24, 2010 15:22 UTC (Wed) by JamesErik (subscriber, #17417) [Link]

I for one had never heard of the author, much less known about his beliefs. The man is nowhere even close to preaching here: the allegory is apt and doesn't even *mention* any deity for cryin' out loud! There's no "overtly religious meandering" anywhere within 100 miles of this article.

Author: Again, an excellent technical article, with *several* good, insightful citations. Thanks!

Editor: I never fail to get my money's worth from my subscription. You do a great job. Thanks!

Ghosts of Unix past, part 4: High-maintenance designs

Posted Nov 29, 2010 9:50 UTC (Mon) by quotemstr (subscriber, #45331) [Link] (1 responses)

I don't particularly love your objection. The truth of the Bible is a subject of your personal belief system, but irrespective of that, the book remains an important literary work and part of the Western canon. Referencing it is certainly fair game. But, what has been will be again, what has been done will be done again; there is nothing new under the sun.

Ghosts of Unix past, part 4: High-maintenance designs

Posted Dec 3, 2010 2:31 UTC (Fri) by rlhamil (guest, #6472) [Link]

> But, what has been will be again, what has been done will be done again; there
> is nothing new under the sun.

Amazing how nobody complained yet that Ec1:9 (NIV, I think) was rotting their brain!

Ghosts of Unix past, part 4: High-maintenance designs

Posted Dec 2, 2010 21:49 UTC (Thu) by tjc (guest, #137) [Link]

I don't particularly love the reference to the bible.

That was my favorite part. :)

I really like computer science, photography and music, but I just looove the Bible!

Patent?

Posted Nov 23, 2010 18:40 UTC (Tue) by rfunk (subscriber, #4054) [Link] (2 responses)

At least as jarring to me as the Biblical reference was the patent reference: "In fact this is such a clever and original idea that the inventor, Dennis Ritchie, was granted a patent for the invention."
I think it's been quite well documented on this site that the granting of a patent does not necessarily mean that an idea is very clever or original.

Patent?

Posted Nov 23, 2010 21:03 UTC (Tue) by neilbrown (subscriber, #359) [Link] (1 responses)

Sorry, but it seems I'm not very good at irony. I try to avoid the excesses of sarcasm and just end up sounding naive. But I'll never improve without practice, so thanks for the feedback.

Patent?

Posted Nov 23, 2010 21:17 UTC (Tue) by rfunk (subscriber, #4054) [Link]

Ah, sorry, I could just be dense.

Ghosts of Unix past, part 4: High-maintenance designs

Posted Nov 23, 2010 18:50 UTC (Tue) by cmccabe (guest, #60281) [Link] (1 responses)

Hey, thanks for mentioning me, Neil. I should probably mention that Norm Hardy wrote the original paper.

P.S. I really enjoy this series. Ghosts are a lot less scary in the light!

Ghosts of Unix past, part 4: High-maintenance designs

Posted Nov 23, 2010 21:24 UTC (Tue) by neilbrown (subscriber, #359) [Link]

And thank you - the context and conclusions are a little different to mine, but the imagery is an exact match, I just couldn't pass it up.
I always like including literary references from varied sources, but it seems particularly appropriate in an article on 'patterns' as each literary reference is itself a kind of pattern.

Control characters in file names

Posted Nov 23, 2010 18:57 UTC (Tue) by Yorick (guest, #19241) [Link] (69 responses)

Allowing control characters (0x01-0x1f) in file names is clearly a high-maintenance design that we have come to regret ever since. I don't remember ever having seen a legitimate use of this liberty. On the other hand, it could mean that it could be removed, given some courage.

Spaces in file names also cause trouble but can be justified as long as file name are used by people to name documents.

Control characters in file names

Posted Nov 23, 2010 19:48 UTC (Tue) by zlynx (guest, #2285) [Link] (33 responses)

I think that Unix filesystems treating names as pure binary (excepting / and \0) is actually an advantage.

On other operating systems I have ended up with filenames that cannot be deleted. Windows NT with its POSIX layer can create names that Win32 can't handle. OSX HFS can also create names it can't handle.

That happens because the filesystem has to have huge complicated rulesets that provide binary to character mapping, character equivalency mapping and allowable characters. These rules have to duplicate the identical rules in user space. The rules often DON'T MATCH. Leading to all of the above problems.

A Linux with EXT2 on the other hand, can handle UTF-8 filenames even though UTF-8 didn't exist in wide use when EXT2 was invented. And the shell can rename or delete a filename encoded in KOI8 even though the shell doesn't understand the encoding.

Control characters in file names

Posted Nov 23, 2010 20:55 UTC (Tue) by Yorick (guest, #19241) [Link] (13 responses)

I'm in no way suggesting "huge complicated rulesets", only to expand the set of disallowed bytes from {0, '/'} to {0..31, '/'}. I believe the benefits of doing so would outweigh the disadvantages, of which precious few have been shown.

I'm also curious what file names can be created on OS X that "it can't handle", and how.
(Editing raw bytes on disk doesn't count - that way invalid file names could be created in ext2 as well.)

Control characters in file names

Posted Nov 23, 2010 21:26 UTC (Tue) by jzbiciak (guest, #5246) [Link]

I wonder if this could just be controlled with a feature flag and tune2fs and/or a mount option? When set, just disallow creating files which have the troublesome characters. Still allow access to existing files with the troublesome characters, though, so you never have "files you can't get to."

Invariably, whenever I've created filenames with control characters in them, it's been through some strange fat-fingering. I would rather have the OS throw those files away. :-)

Control characters in file names

Posted Nov 24, 2010 0:40 UTC (Wed) by zlynx (guest, #2285) [Link] (2 responses)

An example from my Mac laptop. It was created by a recursive wget from the terminal. This file has been in my .Trash for a year now...

$ ls ShowXml.asp?user_group=5&user_path=user1%2F30462&userid=30462&blogname=ѩӣ֮%C0%E1

$ ls | xxd
0000000: 5368 6f77 586d 6c2e 6173 703f 7573 6572  ShowXml.asp?user
0000010: 5f67 726f 7570 3d35 2675 7365 725f 7061  _group=5&user_pa
0000020: 7468 3d75 7365 7231 2532 4633 3034 3632  th=user1%2F30462
0000030: 2675 7365 7269 643d 3330 3436 3226 626c  &userid=30462&bl
0000040: 6f67 6e61 6d65 3dd1 a9d0 b8cc 84d6 ae25  ogname=........%
0000050: 4330 2545 310a                           C0%E1.

I think that is just the representation the terminal sees and not what is actually in the filesystem because any attempt to delete with that name results in a file not found.

Control characters in file names

Posted Nov 24, 2010 11:08 UTC (Wed) by vonbrand (guest, #4458) [Link]

Have you tried quoting that (with ', not ")? There are many characters special to the shell in there. Sometimes a "rm -i *" helps by giving the "correct" filename to the selection. In very recalcitrant cases, you could write a proggie that unlinks the file by hardcoded name...

Control characters in file names

Posted Nov 24, 2010 12:50 UTC (Wed) by Yorick (guest, #19241) [Link]

Interesting - I tried creating a file by that name in OS 10.5, but when reading out the resulting name, the last two combining characters (U+0304 and U+05ae, corresponding to cc 84 and d6 ae respectively) had been transposed, presumably for reasons of canonical order. I had no problems removing it afterwards.

This would explain why you had trouble removing the file but now how it came to be created in the first place. I have heard claims that the normalisation algorithm in OS X has changed between versions; perhaps you upgraded your system between the creation and removal of the file? It could also just be a plain bug, of course, or a RAM single-bit error, etc. If you can reproduce it, I'm sure Apple would like to know about the bug.

Control characters in file names

Posted Nov 24, 2010 17:01 UTC (Wed) by mjthayer (guest, #39183) [Link] (7 responses)

> I'm in no way suggesting "huge complicated rulesets", only to expand the set of disallowed bytes from {0, '/'} to {0..31, '/'}. I believe the benefits of doing so would outweigh the disadvantages, of which precious few have been shown.

It seems to me that what is broken here is the shell language, not the filesystem, and encouraging people to use better languages is the right fix. For what it's worth, I have had occasion to miss the '/' character that you can't have in Unix filenames. I find it rather silly that that character is encoded into the low-level APIs (the kernel in this case, although having it in the libc API would amount to the same) instead of letting higher levels handle it.

Control characters in file names

Posted Nov 29, 2010 9:57 UTC (Mon) by quotemstr (subscriber, #45331) [Link] (4 responses)

Good luck with that. Unix shells won't change any time soon. It's hard enough to get filenames with whitespace working properly.

Forbidding control characters has an immediate upside and comes at almost zero cost. We could do it tomorrow, and nobody would notice except for increased robustness. Not doing that and instead pining for a perfect solution is just unrealistic.

Control characters in file names

Posted Nov 29, 2010 10:12 UTC (Mon) by mjthayer (guest, #39183) [Link] (3 responses)

> Good luck with that. Unix shells won't change any time soon.

I wouldn't be quite that pessimistic. Unix shells won't change, but now there is python which is being used for a lot of things that shell used to be, and things like upstart and systemd are also reducing the need for new shell code. Old shell code won't all go away, but problems in it can be fixed, and if less new shell code is written the problem is greatly reduced.

Control characters in file names

Posted Nov 29, 2010 18:24 UTC (Mon) by dlang (guest, #313) [Link]

Python has not replaced shell in many areas, and probably never will (just like Perl is used in a lot of places where shell used to be used, but will never replace shell)

Control characters in file names

Posted Dec 2, 2010 17:25 UTC (Thu) by Ross (guest, #4065) [Link] (1 responses)

I don't know anyone using Python as their login shell. That would be a pretty terrible idea IMHO. On the other hand shell does make it easy to handle filenames wrong (just leave out some quotes and it will still seem to work) so python scripts are more likely to do a good job. That's not really going to fix the problem though, no matter how popular Python becomes for scripting compared to shell.

Control characters in file names

Posted Dec 2, 2010 17:40 UTC (Thu) by mjthayer (guest, #39183) [Link]

> I don't know anyone using Python as their login shell.

I was assuming that the biggest problem with filenames and shell script came from actual well-known script files that got exploited, not stuff typed in at the command line. I'm sure that can be a problem too of course.

Control characters in file names

Posted Nov 29, 2010 14:40 UTC (Mon) by nix (subscriber, #2304) [Link] (1 responses)

Handling directory separation at a higher level would be a classic That Hideous Name nightmare, converting paths from a simple string to a complex structure involving N components with associated lengths, which would *still* have to be somehow converted to a string a lot of the time: and if you can do that, you need a quoting mechanism to make it unambiguous, and if you have *that*, you could use the same quoting mechanism at input time, and still retain the /.

The current situation with /-and-no-quoting-characters is simplest of all, and eliminates the numerous attacks we have seen on SQL and other languages involving incorrectly processed quoting characters.

Control characters in file names

Posted Dec 2, 2010 17:27 UTC (Thu) by Ross (guest, #4065) [Link]

Thank you for the sanity :)

Yes, having to concoct paths with some helper function in some nasty encoding would not be an improvement. If people think it's too hard for scripts to handle spaces -- wait until filenames can have slashes in them too!

Control characters in file names

Posted Nov 24, 2010 18:41 UTC (Wed) by brother_rat (subscriber, #1895) [Link]

I don't know about "can't handle", but there are definitely quirks.

One quirk with OSX is that the GUI is consistent with earlier Macs that permitted / in filenames (when : was used as the folder separator), but as / is now restricted to be the folder separator the two characters are swapped over behind the scenes.

This causes very odd bugs with GUI tools that launch CLI utilities. For example, Hugin uses make to process photos, and make doesn't support : in filepaths. However many users put photos in folders with a date in the name, and the dates.

Control characters in file names

Posted Nov 23, 2010 21:47 UTC (Tue) by iabervon (subscriber, #722) [Link] (7 responses)

All of the encodings I could think of consider byte values less than 0x20 to be either invalid or control characters in any context. In fact, I couldn't find any that disagree with ASCII on the interpretation of any valid byte less than 0x40, and only Shift-JIS seems to disagree with ASCII at all below 0x80 (and there only as the second byte of two-byte characters, aside from a few direct character replacements). So it should be viable to consider filenames to be a sequence of bytes with only 0x2F and 0x00 having special meanings, but 0x01-0x1F prohibited entirely. (I think 0x7F could be prohibited as well.). Unfortunately, there are also other control characters, in the 0x80-0x9F range, which cannot be recognized directly from bytes, where 0x9B is the interesting one, because it can start ANSI escape sequences.

Control characters in file names

Posted Nov 23, 2010 22:06 UTC (Tue) by Simetrical (guest, #53439) [Link] (6 responses)

UTF-7, UTF-16, UTF-32, and EBCDIC all treat some byte values below 0x20 differently from ASCII.

Control characters in file names

Posted Nov 23, 2010 22:29 UTC (Tue) by foom (subscriber, #14868) [Link] (5 responses)

...and you can't use any of those as a locale encoding on an ASCII-centric UNIX system. It is expressly prohibited by POSIX.

(If you didn't have any ASCII locales, you could use an EBCDIC locale -- your system just needs to be self-consistent for all the characters in the Portable Character Set, across locales. UTF-7/16/32 are right out, though, since all characters in the Portable Character need to be encoded by a single byte.)

Control characters in file names

Posted Nov 25, 2010 16:19 UTC (Thu) by Spudd86 (guest, #51683) [Link] (4 responses)

UTF16 and UTF32 are out entirely since they would end up with nul bytes, you could conceivably use UTF7 to name a file and it would work, it just wouldn't show the correct name anywhere...

Control characters in file names

Posted Nov 25, 2010 21:03 UTC (Thu) by iabervon (subscriber, #722) [Link] (3 responses)

UTF-7 would be terrible, because the encoded form isn't even unique for a sequence of codepoints. (That is, even if you knew the character sequence for a filename and how it was decomposed and knew it was encoded as UTF-7 in the filesystem, you wouldn't know what sequence of bytes to ask the kernel for.) Also, encoders may not represent a '/' literally in between two blocks of characters outside the Latin-1 range, because it can be more efficient to use all 16 bits instead of the necessary padding to finish the encoded chunk.

In any case, it still wouldn't use bytes in the 0x00-0x1f range.

Control characters in file names

Posted Nov 29, 2010 10:09 UTC (Mon) by jamesh (guest, #1159) [Link] (2 responses)

Those arguments could equally be made against UTF-8, where there are different byte sequences that some UTF-8 parsers will consider equal while others will consider to be invalid (e.g. encoding a '\u0000' as '\xC0\x80'). The solution to this problem is to require that inputs be in a canonical form.

Of course, once you start working with Unicode it isn't really enough to just require unique representations for each code point. You can have multiple sequences of unicode code points that have the same meaning. So you really want a normalised code point sequence encoded in a canonical form.

Control characters in file names

Posted Nov 29, 2010 18:18 UTC (Mon) by iabervon (subscriber, #722) [Link] (1 responses)

UTF-8 actually specifies only one valid byte sequence for a given sequence of code points; which some parsers will accept other sequences, only one is valid and therefore canonical. UTF-7, on the other hand, doesn't have a single valid byte sequence, and doesn't seem to have any obvious canonical form.

The code point sequence issue is real (which is why I was careful not to say "character" anywhere), and unfortunately, there are multiple possible normalizations. So not only do you need a normalized code point sequence, you need one with a particular normalization that everything will agree on. (Also, since the availability of characters may affect the normalization, you might in principle have to specify the version of Unicode, although I think they're careful not to introduce new ways of getting the same character.) And, of course, you have to avoid using Apple products, because they silently rename your files to have a different normalization from what everybody else uses.

Control characters in file names

Posted Dec 1, 2010 2:32 UTC (Wed) by jamesh (guest, #1159) [Link]

I understand that the non-canonical sequences are invalid. However, when UTF-8 was new it was common for decoders to accept the alternative byte sequences (and this often led to security bugs).

My point was that if you picked a canonical representation for UTF-7, and required that file names used it, then it would work okay as a file name encoding. That said, it still isn't a very good idea ...

Control characters in file names

Posted Nov 24, 2010 6:31 UTC (Wed) by error27 (subscriber, #8346) [Link] (4 responses)

If you restricted the filenames, you would do it per mount point and not in the VFS layer. You'd still be able to delete all the files on your network mounted NTFS directory. You just wouldn't be able to copy them to your home directory without a rename.

So you wouldn't have filenames that couldn't be deleted, you'd only have filenames that couldn't be created.

Control characters in file names

Posted Nov 24, 2010 8:04 UTC (Wed) by nix (subscriber, #2304) [Link] (3 responses)

That's horribly nonorthogonal, but might be worthwhile nonetheless (as a mount option, probably on by default).

Control characters in file names

Posted Nov 25, 2010 16:23 UTC (Thu) by Spudd86 (guest, #51683) [Link] (2 responses)

Well part of the point is that such file names are hard to use, so hopefully you don't have any.

(They are the sort of names that make correctly handling file names in a shell script end up taking hundreds of lines, which means nobody EVER does it, which means pretty much nobody has files with that kind of name, except that the breakage from those names can sometimes be a security hole)

Control characters in file names

Posted Dec 2, 2010 19:07 UTC (Thu) by Ross (guest, #4065) [Link] (1 responses)

Why would it take hundreds of lines to handle them? Actually the shell doesn't so much care about control characters as characters found in $IFS which is space, tab, and newline by default. The proposal isn't to remove space, so it won't solve any problems for people writing shell scripts will it?

In any case, someone gave some examples of how to handle whitespace (and anything else) properly in shell scripts below. Use of arrays and proper quoting or find0/xargs0 combinations aren't too complicated and work correctly. The problem is that if there is a mistake, it won't be obvious since it will work with most input.

Control characters in file names

Posted Dec 2, 2010 19:44 UTC (Thu) by cesarb (subscriber, #6266) [Link]

If you remove control characters, you remove tab and newline; just set IFS to tab and newline (removing space) and you can easily and safely deal with filenames with spaces.

Control characters in file names

Posted Nov 24, 2010 17:03 UTC (Wed) by mjthayer (guest, #39183) [Link] (5 responses)

> I think that Unix filesystems treating names as pure binary (excepting / and \0) is actually an advantage.

I think that many people would appreciate having at least a hint as to the character encoding in use. Although in these days of Utf-8 it is less and less relevant of course.

Control characters in file names

Posted Nov 24, 2010 17:57 UTC (Wed) by ikm (guest, #493) [Link] (4 responses)

> I think that many people would appreciate having at least a hint as to the character encoding in use.

My guess is that unix systems just take the easiest approach here - treat the filename as a binary blob, and let userspace do the rest :) I have got to admit that in practice there's less hassle with FSes which are Unicode-aware (think Microsoft), unless you actually start trying to figure just what is that you are allowed to use there for filenames. Then you'd basically just stick to base64 or percent-encoding, which would be the right thing to do in any case.

Control characters in file names

Posted Nov 24, 2010 19:15 UTC (Wed) by mjthayer (guest, #39183) [Link] (3 responses)

> I have got to admit that in practice there's less hassle with FSes which are Unicode-aware (think Microsoft), unless you actually start trying to figure just what is that you are allowed to use there for filenames.

There have been a number of complaints on this thread about filesystems that are encoding-aware and the problems that causes. But actually the filesystem could carry encoding hints without being encoding-aware itself. For example, it could tell user space that a file name is Utf-8 but still just treat the name as a binary blob. The hint would just tell applications how best to display the name.

Control characters in file names

Posted Nov 27, 2010 8:11 UTC (Sat) by cmccabe (guest, #60281) [Link] (2 responses)

> There have been a number of complaints on this thread about filesystems
> that are encoding-aware and the problems that causes. But actually the
> filesystem could carry encoding hints without being encoding-aware itself.
> For example, it could tell user space that a file name is Utf-8 but still
> just treat the name as a binary blob. The hint would just tell
> applications how best to display the name

Well, you could use an extended attribute to represent the encoding of the filename. However, it would be a huge amount of work to change all the applications to check this attribute and act appropriately.

I'm pretty far from being an expert in internationalization, but my understanding is that non-unicode character encodings are considered deprecated. Based on comments made elsewhere in this thread, MacOS and Windows have already decreed that all filenames should be unicode. So is it really worth rewriting all software that dislays filenames in order to better support this legacy stuff? Especially when no other platforms support it at all? As Linus constantly points out, Linux-specific filesystem interfaces don't get used that much, even when they offer great benefits.

I think I agree with Spudd86's solution: there should be some kind of mount option that puts a ruleset in place for filenames. Probably nearly every Linux distribution would disallow filenames that were not UTF-8. A few people running special-purpose systems might mount their rootfs with more restrictive rulesets. Most system administrators already have an unwritten policy about filenames-- they don't create filenames with embedded control characters, crazy stuff like leading dashes, or embedded newlines. Letting system administrators turn their implicit policy into an explicit one would close a lot of security holes.

I wonder if it would be feasible to use the "escaping" option talked about on Wheeler's page. Basically, under this option, the kernel continues to treat filenames as binary blobs on the disk. But when presenting them to userspace, it escapes certain characters in a predictable way. I'm not sure whether this is really feasible, but it seems like the best choice if it is.

Control characters in file names

Posted Nov 30, 2010 1:39 UTC (Tue) by jamesh (guest, #1159) [Link]

As well as being a lot of work, using extended attributes introduces ambiguity. Some extra problems with that suggestion are:

You could have two files in a directory with the same sequence of unicode code points but different byte representations due to be encoding differently.
Applications might encounter paths like /latin1-part/utf8-part/sjis-part and need to check the encoding of each path component in order to display it to the user. Perhaps more difficult would be resolving a unicode path to something like this.
Extended attributes are associated with the file rather than the file name. What do you do if a file has two hard links with differently encoded file names?

Picking one encoding/normalisation is the only sane option, and it would be nice if the kernel would help enforce such a choice.

Control characters in file names

Posted Dec 2, 2010 18:22 UTC (Thu) by Wol (subscriber, #4433) [Link]

One problem with that ... (administrators enforcing policy, that is)

I've worked on a system where a file was composed of sub-files (Pr1mos). This was emulated on nix by using a directory with "special" names inside, namely all the subfiles were "<space><backspace><number>". Because nobody is supposed to touch these subfiles directly.

So if you enforce a policy like that, you could bust a bunch of apps ...

Cheers,
Wol

Control characters in file names

Posted Nov 23, 2010 19:50 UTC (Tue) by vonbrand (guest, #4458) [Link] (5 responses)

Please don't. The "control characters" in the filenames could well be regular characters in other encodings, or be part of e.g. an UTF-8 character. "Not all the world's ~~a VAX~~ASCII"

Control characters in file names

Posted Nov 23, 2010 20:44 UTC (Tue) by Yorick (guest, #19241) [Link] (3 responses)

Please don't. The "control characters" in the filenames could well be regular characters in other encodings, or be part of e.g. an UTF-8 character.

Since you ask me not to, please tell me exactly what encoding you are concerned about. Multi-byte UTF-8 characters do not contain byte 0-127.

Control characters in file names

Posted Nov 23, 2010 21:37 UTC (Tue) by ballombe (subscriber, #9523) [Link] (2 responses)

Probably ISO2022 still widely used in Japan (fortunately less than it used to).

Control characters in file names

Posted Nov 27, 2010 13:10 UTC (Sat) by Cato (guest, #7643) [Link] (1 responses)

ISO2022 is a truly horrible encoding that should never be used, and should certainly not be supported - it can embed normal ASCII characters within a "wide" character, making it very difficult to process.

Having looked into many different encodings, I'd agree with the suggestion to use UTF-*, but in reality systems still need to support legacy 8-bit and 16-bit encodings - there are many filesystems out there with filenames in legacy encodings, and often a mix of encodings.

The ability to mix legacy encodings in a single filesystem is sometimes useful for applications but it creates major data conversion issues when users do this.

Generally I'd agree with banning control characters by default from pathnames in a new OS, but it's too late to do that now with Linux/Unix.

Putting the encoding into the filesystem is suspect, particularly considering the deep unpleasantness of Apple's use of their own two variants of Unicode normalisation form D (NFD) in HFS+ and other filesystems, whereas the rest of the world including Linux and the Web uses normalisation form C (NFC).

Control characters in file names

Posted Nov 29, 2010 10:03 UTC (Mon) by quotemstr (subscriber, #45331) [Link]

Generally I'd agree with banning control characters by default from pathnames in a new OS, but it's too late to do that now with Linux/Unix.

I don't think it's too late at all. The overwhelming majority of legitimate filenames do not contain characters in the range proposed for blacklisting. As an option that's turned on by default, forbidding control characters would present no practical problems whatsoever. Nobody relies on filenames containing ^V or newline.

Control characters in file names

Posted Nov 25, 2010 16:29 UTC (Thu) by Spudd86 (guest, #51683) [Link]

They are also nearly impossible to handle correctly in shell scripts, and you should be using UTF8 for file names.

No one is suggesting this be something done in a non-optional way, but the encodings it would actually break that are also in use on Linux systems are very few and far between (probably largely because EVERYTHING expects those to be control characters, and they break shell scripts, etc. Plus we have UTF8)

Control characters in file names

Posted Nov 23, 2010 20:11 UTC (Tue) by jengelh (subscriber, #33263) [Link] (10 responses)

Wheeler: "this lack of limitations [...] makes it impossible to consistently and accurately display filenames". So yeah.

Let's ban 0x80-0xFF too next to 0x01-0x1F, because they too cannot be accurately be displayed (think of the byte sequence "0x20 0xC2 0x20" when used in contemporary Linux systems)!!11

Control characters in file names

Posted Nov 25, 2010 16:37 UTC (Thu) by Spudd86 (guest, #51683) [Link] (9 responses)

There are strong reasons to disallow 0x01-0x1F in file names (do you know how man lines of shell it takes to write something that can iterate over files and run one command on each when it must handle files that have those in the name? Hundreds, nobody does it so pretty much every shell script ever written will explode if it runs across such a file).

It has nothing to do with accurately displaying them, and everything to do with the fact that the cause actual problems and in a utf8 locale you gain NOTHING from being able to use 0x01-0x1F in file names (if you're going to bring up storing 'arbitrary' binary keys in file names, don't, you already CAN'T because you can't use '\0' or '/')

There's an article about exactly this somewhere, IIRC linked from LWN at some point in the past when the patch that allowed to you disable those chars came up I think.

Control characters in file names

Posted Nov 25, 2010 19:15 UTC (Thu) by jengelh (subscriber, #33263) [Link] (7 responses)

>do you know how man[y] lines of shell it takes

Aha. So... everybody knows it is possible to have files with odd filenames, and everybody keeps on using shells or shell constructs that cannot deal with this properly? I can see the flaw in that.

>something that can iterate over files and run one command on each when it must handle files that have those in the name?

for i in *; do cmd "$i"; done;
find . -whatever -exec cmd \;
find . -whateverelse -print0 | xargs -0 cmd;

There are so many safe ways available. I am really not responsible for people doing UUOC or thelike.

Control characters in file names

Posted Nov 25, 2010 20:31 UTC (Thu) by Spudd86 (guest, #51683) [Link] (4 responses)

The for loop won't work... the find examples only work if you want to run a single, non-shell command.

Control characters in file names

Posted Nov 25, 2010 21:48 UTC (Thu) by jengelh (subscriber, #33263) [Link] (3 responses)

In which case will the for loop not work? (Other than * not globbing files starting with a dot.)

Control characters in file names

Posted Nov 25, 2010 22:00 UTC (Thu) by Spudd86 (guest, #51683) [Link] (1 responses)

if there's a file that starts with - or has any sort of control character it will break.

see here: http://www.dwheeler.com/essays/filenames-in-shell.html and here: http://www.dwheeler.com/essays/fixing-unix-linux-filename... although for some reason I remember it being much worse than that, though being correct everywhere in your script could eventually be a pain.

Control characters in file names

Posted Dec 2, 2010 19:19 UTC (Thu) by Ross (guest, #4065) [Link]

Are you proposing to remove hyphens from filenames too, or is this getting off-topic? :)

Control characters in file names

Posted Nov 25, 2010 23:30 UTC (Thu) by cmccabe (guest, #60281) [Link]

> In which case will the for loop not work? (Other than * not globbing files
> starting with a dot.)

The for loop should be

for i in *; do cmd "./$i"; done;

In case one of the filenames begins with a dash.

Control characters in file names

Posted Nov 26, 2010 10:28 UTC (Fri) by Yorick (guest, #19241) [Link]

Of course file names can be handled safely in most languages, but that's not the point. Wheeler describes it better and in more detail, but briefly, the aim is:

Make it harder to make mistakes, brittle and/or exploitable code. Even flawless programmers are affected by other people's errors.
Eliminate a dangerous class of control character exploits, mainly when displaying file names on terminals.
Allow for more design options. Remember, restricting data formats can be a way to give the programmer more freedom, not less.

To illustrate the last point: The only possible delimiter for files names is currently the null byte, which is not very practical in many languages and in shell scripting in particular. Linefeeds would be much more natural and are supported by many more tools.

The benefits are clear, and the costs appear to be very low. The only serious objection I have seen so far concerns existing file names using an ISO 2022-based encoding. There are several possible solutions: allowing the control character restriction to be lifted as a per-mount option (possibly only allowing ESC, SI and SO), or a mount option that recodes into UTF-8.

Control characters in file names

Posted Nov 29, 2010 16:30 UTC (Mon) by nix (subscriber, #2304) [Link]

The xargs only works if you have at least one matching file. You want -0r. (Of course this is totally GNU-only.)

Control characters in file names

Posted Dec 2, 2010 19:17 UTC (Thu) by Ross (guest, #4065) [Link]

You repeat at least three times in the thread that it takes hundreds of lines to handle control characters in shell scripts. That's just not true. But worse, it's a terrible argument even if it were true.

You aren't proposing to remove all the characters that make it difficult to write correct shell scripts. In fact tab and newline and the worst "offenders" in your list of control characters. Most shells don't care about control characters at all. This can't be an argument for implementing the character set limitation because implementing it won't fix the problem -- the same script would still be broken by files with spaces in them (and any number of shell metacharacters).

And even if it did, I'm not sure the features of Bourne shell should dictate how the filesystem interface should work. The existing kernel and shell were designed together -- if you want to redo the filename encoding in the kernel, you should consider how the shell could be changed and also how other tools besides the shell are affected. Only looking at the shell is just too much focus.

Control characters in file names

Posted Nov 23, 2010 20:24 UTC (Tue) by jreiser (subscriber, #11027) [Link] (2 responses)

I don't remember ever having seen a legitimate use of this liberty.

My customers enjoy better performance at lower cost because of the difference (log2(254) - log2(223)), and I make money from that. [Hint: a database index encoded in filenames, accessed only by the database and the backup system.] If you wish to exclude [\x01-\x1f] from filenames that customarily are manipulated by your users and programs, then please write plugins/extensions/whatever to implement this constraint in the command-line shell programs of your choice.

Control characters in file names

Posted Nov 23, 2010 21:15 UTC (Tue) by Yorick (guest, #19241) [Link]

The very point of such a restriction would be that programs would not need to implement it themselves.

I'm somewhat surprised that you believe that 3 % longer file names would make a noticeable difference in performance for your application; have you measured this? Most cases of data encoded in file names that I have come across would happily use something like base64, with the added benefit of portability and easier manipulation and inspection of the directories with standard tools.

Control characters in file names

Posted Nov 25, 2010 16:47 UTC (Thu) by Spudd86 (guest, #51683) [Link]

Well since you don't want it you CAN just turn it off. Fact is that it's almost NEVER sane to have such a file name, and when it is you can disable the restriction.

No one who has proposed the restriction has ever done so as a non-optional, always on thing, on by default, yes, but never as something you couldn't turn off. If you can't ask that people running your app disable a feature of that type, then your app is already broken and you're just waiting for someone to hit the brokenness.

Control characters in file names

Posted Nov 23, 2010 21:26 UTC (Tue) by ballombe (subscriber, #9523) [Link] (8 responses)

You never seen a legitimate use ? I thought this was the oldest trick in the book.

do
touch `printf "\01"`
chmod 000 `printf "\01"`
and now, if you ever do 'rm -f *' by mistake, you will get a chance to abort before any files are deleted.

This is also useful to name image files: use a two dimensional ascii-art rendition of the image, this is much more intuitive than _dsc2919.jpg. After all a picture is worth a thousand word... and in ascii-art it _is_ a thousand word!)

Control characters in file names

Posted Nov 23, 2010 21:31 UTC (Tue) by jzbiciak (guest, #5246) [Link] (1 responses)

I'd hate to type filenames on your computer. Even tab expansion would be murder with your digital pictures.

Control characters in file names

Posted Nov 24, 2010 18:29 UTC (Wed) by ballombe (subscriber, #9523) [Link]

This is a minor quibble. Just add the picture number in the left-up corner and add a subdirectory with symlinks from the numbers to the image.

Control characters in file names

Posted Nov 24, 2010 6:20 UTC (Wed) by mfedyk (guest, #55303) [Link] (3 responses)

oooh, I like.

now if the camera could write vfat long file names without patents...

Control characters in file names

Posted Nov 24, 2010 8:05 UTC (Wed) by nix (subscriber, #2304) [Link] (2 responses)

I don't understand why people would want cameras to write long filenames anyway. They all name their files in robotic and boring fashion and the filenames are essentially meaningless nonces: all the actual lookup is always done via EXIF tags from some reader application.

Control characters in file names

Posted Nov 24, 2010 17:10 UTC (Wed) by sorpigal (guest, #36106) [Link] (1 responses)

It seems like the point would be to make cameras save pictures with ascii-art names depicting their contents, which would be arguably more useful than the robotic, boring names they use now.

ASCII Art file names

Posted Nov 25, 2010 7:49 UTC (Thu) by rgmoore (✭ supporter ✭, #75) [Link]

It would be great until you took two pictures that gave the same ASCII-art representation but were different in some other important way. For example, astrophotographers who want to do image stacking would find it very inconvenient. So would people who use some bracketing options, like color balance bracketing or even exposure bracketing with 1/3 stop increments. Sequence numbers may be boring and uninformative, but they stop you from accidentally overwriting an important picture.

Control characters in file names

Posted Nov 25, 2010 3:15 UTC (Thu) by mrshiny (subscriber, #4266) [Link]

You know, KDE and other modern shells can show you a tiny representation of the image on your disk without resorting to ascii-art filenames... Just sayin'.

Control characters in file names

Posted Dec 1, 2010 21:56 UTC (Wed) by cgwaldman (subscriber, #9061) [Link]

No, if you do 'rm -f *' it will delete every file. That's what the '-f' (force) flag does. This trick will only protect you from 'rm *', not the stronger 'rm -f *'. (Cute idea though...)

Control characters in file names

Posted Nov 25, 2010 3:44 UTC (Thu) by jthill (subscriber, #56558) [Link]

I once considered using .^A(name)^B(value^C to implement bundles/streams/forks in a portable way, so .^Asourceurl^Bhttp://lwn.net/Articles/416824/^C might be a good thing to include for a saved copy of this web page. It's hard to misinterpret, unlikely to conflict or be accidentally damaged, and dead easy to implement.

To keep from eating inodes you could just hardlink them all to a conventional spot, maybe .^ABUNDLE^BTAGINODE^C at the volume root. That would also make it possible to transport the trick in tar archives. Heh. Two of the subject design choices at once. I'm proud of myself.

Control characters in file names

Posted Nov 25, 2010 6:14 UTC (Thu) by cmccabe (guest, #60281) [Link] (3 responses)

Wow. The idea that displaying filenames on your terminal emulator could be a security hole is mindblowing-- but, apparently, true...

http://seclists.org/fulldisclosure/2003/Feb/att-341/Termu...
(from the Wheeler link)

Also, I suddenly don't feel so happy about using GNU screen all the time...

Control characters in file names

Posted Nov 25, 2010 16:52 UTC (Thu) by Spudd86 (guest, #51683) [Link] (1 responses)

Wait 'till you start running shell scripts on directories! (Handling file names with control characters in the name correctly can take HUNDREDS of lines of code in shell, people frequently write scripts that break when ask them to handle names with spaces, and that's EASY)

Control characters in file names

Posted Nov 25, 2010 23:20 UTC (Thu) by cmccabe (guest, #60281) [Link]

After reading that essay, I am convinced that we should ban control characters in filenames through one of the mechanisms described. UTF-8 doesn't use them, and all human languages should be representable with UTF-8. So allowing control characters is just a pointless duplication of functionality, like supporting pascal-style strings alongside C-style strings in the syscall API.

Control characters in file names

Posted Dec 2, 2010 19:46 UTC (Thu) by Ross (guest, #4065) [Link]

Yeah, great link. People don't have enough fear about their terminals. Some of the more horrific terminal codes that do things like open files in your home directory have been removed from xterm and rxvt (no idea about others) but it's by no means safe to just allow random characters to be written to your screen and it hasn't been even back to physical terminals.

Allowing write to write to your terminal is a security problem though I think talk filters out control characters. Running any program even as an unprivileged user with no filesystem access is a problem if the output is going to your terminal. Just cating a file from an unknown source is an issue. Running a program on an unsanitized input might cause it to print error messages or other strings without stripping out special characters.

Basically the terminal is full of security issues because it obeys control characters no matter how they get there and traditionally lots of stuff gets written to your screen from unsanitized sources.

Instead of changing the filesystem to fix a very small part of that (and let's face it, if you have something writing out malicious filenames, it's probably writing out malicious file contents), there should be a more comprehensive approach. For example, there could be a mechanism to add a tty filter process which could sanitize the output for your specific terminal. Ideally the terminal program would set it up before starting the shell (console and remote logins would need to be handled too, and remote logins are harder because the terminal type isn't known until login, if ever). The hard part is that you want some control characters to get through -- and probably different ones from different sources (setting the xterm title in your shell prompt code for example). There would need to be a way to get different interfaces for the shell, trusted programs, and untrusted programs. How to do this without redesigning the shell and all the utilities? :(

Control characters in file names

Posted Dec 2, 2010 17:20 UTC (Thu) by Ross (guest, #4065) [Link]

Lots of characters cause trouble in filenames. Shells hate whitespace for example. Terminals hate control characters. Other things get confused by commas, quotes, etc. Lots of shell utilities hate files starting with a hyphen, some with a plus. If you want to copy files to a Windows system or an MP3 player you need to avoid lots of things like question mark, star, less than, greater than, colon, dollar sign.

I guess my point is you can't design around everything which wouldn't like filenames to have specific characters. In fact it's kind of nice that the kernel doesn't know about the encoding system except that it won't produce single bytes that have the same ASCII value as / or NUL. That's kind of the bare minimum it needs to know to be able to handle entire paths and not just path components.

Sure I hate seeing filenames with crazy characters too. They usually appear to be accidently created. Or are the output of some terrible script which is trying to store too much information in filenames.

I don't think this qualifies as high-maintenance because most software pretty much seems to ignore it and treat filenames as sequences of bytes terminated by a NUL, and not do so very carefully with respect to whitespace or other unusual characters. I'd suggest that any kernel-level fix won't address enough of the issues to be a complete solution, and would require additional work in most applications and scripts to handle all filenames perfectly, just like now.

Removing setuid

Posted Nov 23, 2010 18:59 UTC (Tue) by talex (guest, #19139) [Link] (13 responses)

I wonder how hard it would be to eliminate setuid (including POSIX capabilities) entirely?

[ find -perm 4000 ... ]

Looks like most setuid binaries could be replaced by services (e.g. over D-BUS), running in an environment that is known and trusted. e.g. chsh, ping, mount (for cases where setuid is used), passwd, at.

su and sudo could be replaced by ssh (or telnet) localhost.

I'm not quite sure why chromium-browser-sandbox needs to be setuid, but presumably a slightly improved seccomp mode would fix that.

Is there anything that really needs to be setuid?

Removing setuid

Posted Nov 23, 2010 19:09 UTC (Tue) by Yorick (guest, #19241) [Link] (3 responses)

The most important user of setuid that I can think of is Nethack. It would have to run as a daemon (&).

Removing setuid

Posted Nov 23, 2010 22:09 UTC (Tue) by nix (subscriber, #2304) [Link] (2 responses)

& itself being one of the sickest in-jokes in nethack (which has almost too many, if that were possible).

Removing setuid

Posted Nov 24, 2010 13:18 UTC (Wed) by nye (guest, #51576) [Link] (1 responses)

I...never got that joke until just now.

Removing setuid

Posted Dec 1, 2010 3:30 UTC (Wed) by baldridgeec (guest, #55283) [Link]

Don't feel bad, I didn't either (even though MAIL is defined on my build!)

Removing setuid

Posted Nov 23, 2010 19:50 UTC (Tue) by zlynx (guest, #2285) [Link] (1 responses)

The sandbox is actually setuid to a lower permission level. Not what you usually see, but useful.

Removing setuid

Posted Nov 29, 2010 14:06 UTC (Mon) by talex (guest, #19139) [Link]

Needing to privileges to drop privileges seems quite a common problem with Linux. Unfortunately, patches to solve this problem may themselves cause more problems... by interfering with SetUID binaries:

http://lkml.org/lkml/2010/1/10/142

Removing setuid

Posted Nov 23, 2010 19:56 UTC (Tue) by vonbrand (guest, #4458) [Link] (1 responses)

Whatever you do, the result will still be some kind of membrane that separates (but connects) two domains with different privileges. Everything that goes through it will have to be checked. Sure, there are other ways to handle this; the real question is which is the hardest to foobar...

Removing setuid

Posted Nov 29, 2010 14:36 UTC (Mon) by talex (guest, #19139) [Link]

OK. In the case of services, the membrane is needed only for the socket over which the user sends their messages, which hopefully the programmer is already thinking about from a security PoV.

In the case of SetUID, the membrane includes quite a lot of things the programmer probably didn't think about, besides the program's arguments, including the inherited:

* environment variables
* file descriptors (e.g. close(1); exec(setuid))
* the current directory (which may be writeable/moveable by the user)
* ulimits
* umask
* POSIX capabilities?

(those are the ones I can think of; I'm sure there are more)

Removing setuid

Posted Nov 23, 2010 23:57 UTC (Tue) by cjwatson (subscriber, #7322) [Link] (2 responses)

It would be nice to have a replacement for the somewhat niche use of set-id to prevent a process being ptraced. ssh-agent (at least in Debian - I don't recall the upstream setup right now) is setgid to a single-purpose group, and drops that privilege on startup, purely to prevent an attacker ptracing the agent and extract cleartext keys. It helps to have this in the filesystem so that there's no vulnerable window at startup.

(Yes, if the compromise is long-term then the attacker can just install a keylogger and wait, but sometimes attackers only have a short window of opportunity and it doesn't hurt to make them work harder.)

Removing setuid

Posted Nov 25, 2010 18:13 UTC (Thu) by talex (guest, #19139) [Link] (1 responses)

That's an interesting example.

As you say, the current situation isn't great anyway. I wonder how Capsicum deals with tracing? I assume that you'd need to have a process descriptor to ptrace a process, so by default you'd only be able to trace your children.

If a process wanted to trace something else, it would have to ask a service (e.g your session manager) for a handle to the target. The session manager could refuse to hand over the handle to the ssh-agent process (or some stricter policy, like always confirming with the user).

Removing setuid

Posted Nov 26, 2010 14:35 UTC (Fri) by Yorick (guest, #19241) [Link]

For a capability-based system, I would imagine tracing the user's own processes to be a question for his powerbox. I don't remember if the Capsicum papers discuss the design of a powerbox to go with the rest of the system.

Removing setuid

Posted Nov 27, 2010 2:28 UTC (Sat) by skissane (subscriber, #38675) [Link] (1 responses)

I think we can distinguish two uses of setuid:
- start a non-privileged process, at login, or when init starts a service
- as a deputy to enable a lesser privileged user to do some task

To start the non-privileged process, I would suggest the ideal would be something like spawnas(), which only root can call, and specifies the non-root user to start the process as. (I have never liked fork/exec, because it is too easy to accidentally pass stuff to the child process from the parent when one didn't mean to. I would prefer spawn, with explicit specification of desired child process state, by use of some extensible data structure. Fork is dumb, because copy-on-write is the rare case, the common is fork-exec which is equivalent to spawn. And with multiple threads, who ever really needs fork anyway?)

Then, all the deputy uses can just become daemons accessible over some local domain socket or some RPC/LPC protocol (maybe implemented on top of that). Maybe with the ability to pass file descriptors/handles from the daemon back to its unprivileged caller... There should be some security checking, over what handles it can pass. (Permissions on objects could by default be non-delegable, unable to be passed to another process -- an extra permission, e.g. "read" vs "delegate read", could be needed to pass the handle to another process...)

Removing setuid

Posted Nov 30, 2010 10:27 UTC (Tue) by mpr22 (subscriber, #60784) [Link]

Personally, I'd rather write if (!(child = fork())) { do_stuff(); execve(foo, bar, baz); } than ram_stuff_into_structure(&foo); child = spawn(&foo); } because the former allows me to do arbitrary stuff between fork and exec, while the latter only allows me to do things the library / kernel designers have foreseen.

As for "And with multiple threads, who ever really needs fork anyway?", how about "anyone who wants to get a useful program-readable error indication when the activity that's just been kicked off corrupts its heap, rather than having the whole kit and kaboodle come crashing down with a SEGV"?

Ghosts of Unix past, part 4: High-maintenance designs

Posted Nov 23, 2010 19:35 UTC (Tue) by BenHutchings (subscriber, #37955) [Link]

truncate() is apparently very high-maintenance for the mm system.

Ensuring that every process has a parent is fairly high-maintenance and still far less useful than process handles (which would be file descriptors yielding the exit status).

more than high-maintenance, I think

Posted Nov 23, 2010 20:22 UTC (Tue) by sbishop (guest, #33061) [Link] (4 responses)

I think that there is another issue with setuid that isn't mentioned in the article. I have understood it to be the reason filesystem binds (as referenced in part 2) are a privileged operation. Otherwise, you could have setuid binaries dynamically linking against a library of your own choosing, for instance.

more than high-maintenance, I think

Posted Nov 23, 2010 20:28 UTC (Tue) by josh (subscriber, #17465) [Link]

Or have sudo read an /etc/sudoers of your own choosing. :)

more than high-maintenance, I think

Posted Nov 23, 2010 22:00 UTC (Tue) by vonbrand (guest, #4458) [Link] (2 responses)

That is completely orthogonal to SUID/SGID, as it means getting doctored input (filesystem) to a program running with privileges.

more than high-maintenance, I think

Posted Nov 24, 2010 0:56 UTC (Wed) by neilbrown (subscriber, #359) [Link] (1 responses)

If you are thinking of filesystem binding which is system-wide - imposed on all processes, then I agree this is orthogonal to SUID/SGID.
However if you are talking about per-session filesystem binding (with process-local name spaces that are inherited across fork/exec, but not imposed on other processes), then I think this is exactly related to SUID/SGID.
A problem with SUID/SGID is that so much is inherited from the parent. The process-local name space would be inherited too, so it could affect the behaviour of the program in unexpected way.

This is "simply" avoided by allowing process-local filesystem binding, but only on to objects to which the user already has write access. However I don't know if it would be as useful with that restriction ... maybe it would.

more than high-maintenance, I think

Posted Nov 24, 2010 8:14 UTC (Wed) by nix (subscriber, #2304) [Link]

An alternative 'simple' avoidance (more of a hack, perhaps) would be to switch to a specific binding (perhaps some daemon could have the job of handing out fd's to legitimate mounts of such directories; perhaps a specific binding of the whole hierarchy, or a specific binding of particular often-security-important directories) whenever the current binary detects that it is setuid. (Obviously, its children would inherit this as usual.)

If you consider /etc, /lib, /bin, /etc /sbin, /usr/lib, /usr/bin, /usr/sbin and /usr/libexec 'security-important', then bingo, all the 'read an /etc/passwd of your own design' or 'force a setuid binary to run a new binary of your own design' attacks become impossible.

Downsides: an arms race for 'security-important directories' much as there currently is for security-important environment variables. If you rebind the whole tree then people might find that setuid apps could no longer operate on stuff in the user's own hierarchy (as soon as non-superuser mount() becomes possible), which would be seriously counterintuitive. Depends on a new daemon, but perhaps that could be the same one that does service invocation, so the same thing could provide a means of secure setuid app invocation *and* make old setuid apps more secure in the presence of user-defined mounts as well.

Perhaps what you want the 'privileged mount daemon' to do is to hand out new mounts to all filesystems not marked as 'user' in *its* /etc/fstab: that way there would be no arms race for 'security-important directories' and it would be impossible for a user to fake it out by rebinding the /etc/fstab.

This could easily be done by ld.so (except that getting this change past Ulrich is probably impossible: it feels overdesigned and fragile even to me).

Async I/O

Posted Nov 23, 2010 21:06 UTC (Tue) by kleptog (subscriber, #1183) [Link] (9 responses)

Something that UNIX really needs a good handle on is asynchronous I/O. There have been all sorts of designs over the years but they all don't quite seem to be there somehow. I think signalfd() is a big step in the right direction, if you could get the interaction with fork()/exec() sorted out.

Basically you get a sort of I/O completion port, an fd where all the interesting stuff (I/O completions, signals, child exits, etc) get sent to you via messages. That would be wicked cool.

Async I/O

Posted Nov 24, 2010 5:22 UTC (Wed) by wahern (subscriber, #37304) [Link] (7 responses)

The fundamental problem with AIO is that the block drivers aren't interruptible at their core. They need to run on a thread and block waiting on certain hardware operations.

So whether you're using kernel- or user-thread AIO, a thread will be sitting there waiting to unblock. Kernel AIO can be faster only because of efficiency with copying data. So perhaps we should look for more designs like vmsplice() and worry less about disk AIO per se. But other than that, emulating AIO with user-threads isn't much different than in-kernel AIO. And you can build a user-land, pollable interface using eventfd().

BTW, kqueue(2) provides all the things you talk about, in particular on FreeBSD which has defined AIO kevents.

Async I/O

Posted Nov 24, 2010 22:02 UTC (Wed) by kleptog (subscriber, #1183) [Link] (6 responses)

The fundamental problem with AIO is that the block drivers aren't interruptible at their core. They need to run on a thread and block waiting on certain hardware operations.

Is that really true? I was under the impression that with modern hardware interfaces were becoming more network-like, with sending and receiving messages. You can have dozens of outstanding requests queued on a modern hard disk, I'm not sure why AIO is a big deal. Why is requesting data off a file on disk different from requesting a block from a network block device?

The messy part about POSIX AIO is more the interface, using signals again while they are completely inappropriate for the task (see the earlier article). Yes, you can emulate it with threads, but they have their own baggage. Glibc did aio that way for a while and injecting threads into an otherwise unthreaded program tends to have side-effects (on fork/exec, amongst others). The emulation sucked in other ways, like no more than one request outstanding per FD, WTF!

What I'd like to see is a message passing interface connecting you to the block device queue, where you submit your requests as messages and receive the data back as messages. Perhaps even a read giving you the data preceded by a header saying which FD it came from. That saves the kernel keeping pointers to user-space. send/recvmsg() seem especially suited.

Thanks for the kqueue tip, that looks like a solid solution.

Async I/O

Posted Nov 25, 2010 0:01 UTC (Thu) by neilbrown (subscriber, #359) [Link] (3 responses)

> Why is requesting data off a file on disk different from requesting a block from a network block device?

Requesting a block from a disk maybe isn't. Inside Linux you call 'submit_bio' and inside the bio is a callback pointer which will eventually get called (theoretically it could be called even before the submit_bio call completes). It is completely async and it would be quiet easy to plug that into an async user-space interface if you had a good one.

But requesting data off a file is different. The fs might have to load one or more indirect blocks before it even knows where to look for the block. And then (for a more advanced fs), it might want to respond to a read failure in some arbitrarily complex fashion. Obviously this is all possible. The fs could encode that state concerning 'what to do next' in some structure and attach it to the bio that is sent off.

In generally you might have arbitrarily many levels each of which attaches state and a call-back pointer to a request and ships it down to the lower level, and that state can be arbitrarily complex and will often have 'stage' information for walking through a state-machine. What you end up building looks a lot like a traditional call stack with local variable and return addresses. But we already have a well-tested and well-understood mechanism for holding local variable and return addresses. We call it a 'thread'.

So the question is: when is it better to have lots of threads each running traditional single-threaded code, and when it is better to encode state in a separate data structure and use a state machine to process it all 'asynchronously'.

Naturally we have both approaches in the Linux kernel so we can compare.

The examples of state-machines that I am aware of are the block layer (submit_bio mentioned above) and the sunrpc layer used by NFS. Obviously the tcp/ip stack would work like a state machine, but I have no directly familiarity with its complexity. It would have the benefit of being designed and specified as a state machine, so that might help it a little.

The block layer has a single level - the queue. When lower-levels want to do something extra, they make another request and put it back on the queue. This leads to a modest amount of complexity in different flags on requests saying different things about how much the queue handlers are allowed to re-order them. It is manageable, but it isn't simple.

The sunrpc layer has wheels-within-wheels. Much like the 'indirect block' lookup mentioned earlier, sunrpc might need to perform a portmap lookup to find out which port to send a request to, and might need to try again after a timeout. I think the sunrpc code is reasonably well structured in that is isn't too hard to get an over-view of how things flow around. But I find that going much deeper gets really hard. Following error statuses around is particularly a challenge.

If these two are good examples, I would suggest that these state-machine-with-explicit-state approaches should be avoided where possible. I think it cannot be avoided in the block layer. I am less convinced about sunrpc, though at the time it was written it may well have been the best option.

I would be very hesitant about suggesting that filesystems be re-written to be completely asynchronous. It would be much more robust to find a way to fork a blocked task of into a separate thread.

Then an AIO request would either return a status or a pid, and as you suggested, reading from some fd would return the pids of the tasks as they complete. 'kill' could be used as a cancellation interface. This has all largely been suggested already on lkml. I seem to recall there were some uncertainties about making sure the user-space side always had the same pid. So maybe some issues never got ironed out and nobody put the time in to implementing it (that I am aware of).

http://lwn.net/Articles/316806/ seems relevant here.

Async I/O

Posted Nov 25, 2010 17:39 UTC (Thu) by kleptog (subscriber, #1183) [Link]

Thanks a lot for that explanation, I see the difficulties now.

State machines are nice and efficient, but hard to understand and program. What are probably the most common large state machines, parsers, are often generated by other programs (flex & bison) from a higher level description, whose output no human can understand directly.

I don't think anyone has yet defined a language for filesystems that could be used to generate an appropriate state machine, perhaps the state of the art just isn't there yet.

Async I/O

Posted Nov 27, 2010 2:34 UTC (Sat) by skissane (subscriber, #38675) [Link] (1 responses)

Couldn't this be a good case for writing a kernel in a language that had coroutines or continuations? One can write the code in the easier to follow procedural style, and then make it asynchronous just by adding "yield" or "call/cc" at the appropriate points. I wonder how hard it would be to come up with a version of C with coroutines, and whether that could be used with the Linux kernel...

Is there some way you could take the synchronous implementation of a driver or filesystem or other kernel component, determine the blocking points (or just require them to be annotated), and then automatically generate a state-machine-based asynchronous implementation from the synchronous one? A single source code for both synch and asynch implementations would make one more confident they were both correct.

Async I/O

Posted Nov 27, 2010 3:06 UTC (Sat) by foom (subscriber, #14868) [Link]

But why bother? Kernel threads *are* such a state machine, just a very convenient form. What good would reinventing threads do?

Async I/O

Posted Nov 29, 2010 11:05 UTC (Mon) by marcH (subscriber, #57642) [Link]

> Yes, you can emulate it with threads, but they have their own baggage.

That's an understatement

http://stackoverflow.com/questions/220752/what-is-the-c-m...
http://blogs.sun.com/dave/entry/parallelism_manifesto

etc.

Async I/O

Posted Nov 29, 2010 19:42 UTC (Mon) by jra (subscriber, #55261) [Link]

> Glibc did aio that way for a while and injecting threads into an otherwise
> unthreaded program tends to have side-effects (on fork/exec, amongst
> others). The emulation sucked in other ways, like no more than one request
> outstanding per FD, WTF!

Interestingly enough, I also thought "WTF" when I came across this in glibc. So I wrote a test patch for glibc which removed this restriction and allowed multiple outstanding reqests per fd.

When I ran my aio test program using this change, I got a factor of 5 speedup with glibc aio. When I ran Samba under this change using SMB1/smbclient, or SMB2 with the re-written Windows redirector (both of which issue multiple outstanding async IO requests) it went *slower*, by a factor of about 0.7 of the single request per-fd speed.

Clearly there is something interesting going on here. Ping me if you want to try my glibc patch for yourself (which I haven't promoted as clearly I don't understand what is going on here :-).

Jeremy.

Async I/O

Posted Nov 25, 2010 17:09 UTC (Thu) by Spudd86 (guest, #51683) [Link]

you might also want to look at libaio which actually uses the kernel's async io facilities... but it seems there's some issues with it...

http://ozlabs.org/~rusty/index.cgi/tech/2008-01-08.html

I'd say poke around a bit, I think you probably can get what you want on Linux already... it just may not be easy to find the documentation...

Ghosts of Unix past, part 4: High-maintenance designs

Posted Nov 23, 2010 22:13 UTC (Tue) by nix (subscriber, #2304) [Link] (10 responses)

Yet *another* horrible problem with setuid is what it did to the simple Unix userid/groupid system. Rather than processes simply having a userid and a group id, as in the early days, or a user id, a group id, and a list of supplementary group ids, it has a real user id, an effective user id, a saved user ID, a morass of inconsistent functions to switch between them, some of which sometimes let you switch back and sometimes don't, with some of which have magic special rules to switch to/from root and some of which don't... the transition diagram is numbing in its complexity. I'm amazed more security holes don't arise from this nightmare.

d-bus services (or other started-as-root daemon-invoked processes) have *none* of this security-critical quasi-portable crap visible (obviously it is visible briefly inside d-bus as it forks and switches uid, but that need not be visible to the service author.)

Ghosts of Unix past, part 4: High-maintenance designs

Posted Nov 23, 2010 22:31 UTC (Tue) by foom (subscriber, #14868) [Link] (1 responses)

Well, even without a suid bit, you still want to be able to switch effective UIDs while preserving the ability to switch back, so that you can do work on behalf of a user using that user's credentials. E.g. accessing files on the filesystem.

Ghosts of Unix past, part 4: High-maintenance designs

Posted Nov 24, 2010 8:16 UTC (Wed) by nix (subscriber, #2304) [Link]

Yes, but if you were designing a new system from scratch, would you choose to do that the way Unix has? I know I wouldn't.

(And you mean 'while *optionally* preserving the ability to switch back': a lot of programs really don't want to keep that ability.)

Ghosts of Unix past, part 4: High-maintenance designs

Posted Nov 26, 2010 16:45 UTC (Fri) by nevyn (guest, #33129) [Link] (1 responses)

> d-bus services (or other started-as-root daemon-invoked processes) have *none* of
> this security-critical quasi-portable crap visible

They don't have _those_ issues, no. But there are a number of open issues wrt. how d-bus services break the link between the user and the service. The most obvious is that loginuid is lost. I've also yet to see any kind of analysis on DOSing D-Bus. These are all security related problems, they are just less well understood.

Ghosts of Unix past, part 4: High-maintenance designs

Posted Dec 2, 2010 22:27 UTC (Thu) by oak (guest, #2786) [Link]

> I've also yet to see any kind of analysis on DOSing D-Bus.

DOSing D-BUS is trivial, just register services and send messages.

D-BUS daemon needs an FD per client connection so you can DOS it by creating new connections to it until it doesn't anymore accept connections (it runs out of FDs before you do as it already has several clients).

D-BUS doesn't seem to have limits on its memory usage. For example if you send messages to (e.g. your own) service and it doesn't read them, D-BUS doesn't block either sender or receiver, it just buffers all the messages until the system runs to swap and D-BUS goes OOM.

The D-BUS memory management code seems also a bit horrible, both inefficient (doesn't free memory to system, just fragments its heap) and complicated. It's also a bit strange that a thing that mostly is supposed to push bits from one socket to another is CPU, not IO bound (at least when it has many clients, like you have on Maemo).

Ghosts of Unix past, part 4: High-maintenance designs

Posted Nov 30, 2010 16:53 UTC (Tue) by pbonzini (subscriber, #60935) [Link] (5 responses)

Since you mention fork... never thought what kind of awful code this particular high-maintenance design causes?

As a first example, think about the ugly hack that vfork is. It still survives because it does perform better than fork. Luckily posix_spawn hides this ugliness and chooses fork over vfork. On the other hand, it means that popen can be _much_ more expensive than system just because one will use fork and the other will use vfork.

And one thing I realized today: you can hardly retrieve the errno of a failed exec system call in a forked process. Everything you will do is going to be racy, except possibly using ptrace on the child.

(FWIW, the most clever way I thought about it is to use a FD_CLOEXEC pipe and write the return code of exec there; if the parent cannot read anything, exec succeeded... or the forked child died of a signal after exec returned but before it wrote to a pipe... and if you wanna use waitpid, you get it wrong in case the execed process was signaled before the parent started waitpid...).

Ghosts of Unix past, part 4: High-maintenance designs

Posted Dec 4, 2010 18:00 UTC (Sat) by nix (subscriber, #2304) [Link] (4 responses)

Yeah, that's quite nasty. But spawn*() has all the same problems and a bunch of extra ones, prominent among them the fact that it's exactly the same as fork()/exec() with the code between the fork() and exec() hardwired. So you end up with dozens of spawn*() calls and no benefit over fork()/exec() at all (except on tiny non-MMU systems, which can theoretically implement spawn*() but not fork()/exec() --- but usually implement both, because most code uses fork()/exec() and not spawn*().)

I suspect that a combination of waitpid() (to catch signals and read-from-pipe to catch errno might work: if you played games with self-signalling you could possibly encode errnos as rare signals and drop the pipe, at the cost of losing the ability to detect those rare signals.

A bit of extra effort (sending something down the pipe right before exec() as well as right after a failed one, and opening the pipe end O_CLOEXEC) enables you to distinguish between a signal hitting before exec() and a signal hitting after a successful exec().

But, yes, this is all pointlessly complex. If C had proper Lisp-style macros we could wrap this up in a library without the result becoming as inflexible as spawn(). (Something involving function pointers could get halfway there, perhaps. But it wouldn't be as neat.)

Ghosts of Unix past, part 4: High-maintenance designs

Posted Dec 5, 2010 20:40 UTC (Sun) by quotemstr (subscriber, #45331) [Link]

except on tiny non-MMU systems, which can theoretically implement spawn*() but not fork()/exec()

Cygwin also falls into this category. fork() works there, but it's painfully slow because it copies the entire address space. spawn() is far more efficient.

Ghosts of Unix past, part 4: High-maintenance designs

Posted Dec 6, 2010 10:58 UTC (Mon) by pbonzini (subscriber, #60935) [Link] (2 responses)

no benefit over fork()/exec() at all (except on tiny non-MMU systems, which can theoretically implement spawn*() but not fork()/exec()

Actually, if the parent has a large RSS it is quite common to see major performance improvements with vfork() over fork(). And given how hacky vfork() is, I'd really be happy to pay the price of spawn()'s inflexibility. fork() should be treated like a relic of when parallelism was achieved using processes rather than threads, IMO.

Ghosts of Unix past, part 4: High-maintenance designs

Posted Dec 6, 2010 11:31 UTC (Mon) by dlang (guest, #313) [Link] (1 responses)

does this vfork advantage still exist (i.e., is it measurable) when the host OS does Copy On Write for the fork instead of actually copying all ram?

yes, the page tables still get modified twice, but is this measurable on modern hardware?

Ghosts of Unix past, part 4: High-maintenance designs

Posted Dec 6, 2010 11:58 UTC (Mon) by pbonzini (subscriber, #60935) [Link]

Yes, I've seen forking take 60% of CPU (that was forking 4 child processes per second from a +2 GB process). Using fork to vfork, or equivalently switching to posix_spawn, brought it down to 3-4%.

Ghosts of Unix past, part 4: High-maintenance designs

Posted Nov 24, 2010 1:07 UTC (Wed) by joey (guest, #328) [Link] (2 responses)

My favorite consequence of suid + hard links is that the combination opens a new security hole, where a user can stash away hard links to suid binaries for later exploitation. Of course this has been worked around ad-hoc by at least some package managers clearing the suid bit of the old file when upgrading it.

I am confused by the characterisation of the linux filesystem as a DAG. I suppose that was true on unixes that could hard link directories.

The bit about suid shell scripts is a trifle misleading for linux too, as the kernel does not honor suid bits on scripts. Perl worked around this with suidperl, which proved to be problimatic; perl's taint mode has more general applications (think CGIs).

Ghosts of Unix past, part 4: High-maintenance designs

Posted Nov 24, 2010 1:35 UTC (Wed) by njs (guest, #40338) [Link] (1 responses)

> I am confused by the characterisation of the linux filesystem as a DAG

The directory tree is a, well, tree, but if you include files than hard-links make it a DAG.

If you could hard-link directories then it wouldn't be a DAG, it'd just be a DG.

Ghosts of Unix past, part 4: High-maintenance designs

Posted Nov 24, 2010 2:20 UTC (Wed) by foom (subscriber, #14868) [Link]

OSX allows you to hardlink directories, but still requires that the filesystem remain a DAG -- it prevents you from creating cycles.

#!

Posted Nov 24, 2010 8:58 UTC (Wed) by eru (subscriber, #2753) [Link] (1 responses)

This did not apply at the time that setuid was first invented, but once Unix gained the #!/bin/interpreter (or "shebang") method of running scripts it became possible for scripts to run setuid.

As I remember it, running scripts as if they were executables predates #!. The new notation merely allowed the script to specify the interpreter. In older Unix versions, the system simply tried to feed into /bin/sh any file with x permission that it did not recognize as a binary executable.

#!

Posted Nov 24, 2010 10:14 UTC (Wed) by neilbrown (subscriber, #359) [Link]

Before #!,

   execve("/path/of/script",argv,envp)

would fail. The shell would catch this failure, possible read the start of the script to see if it looked OK, and try

   argv[-1] = "/path/of/script";
   execve("/path/to/shell", argv-1, envp);

(though it wouldn't have used a -1 index it would have re-allocated, but you get the idea). The shell still does this so you can run a script without "#!" at the front.

So you could run scripts as executables, but only using a shell (e.g. system()), not using execve, and there was no opportunity for the kernel to impose setuid.

With the invention of '#!', this functionality was moved into the kernel. The kernel would look at the first few bytes of the program and decide how to execute it, either by reading it into an address space, or running an interpreter to read the script. As it was done in the kernel, setuid could be effective and was - in most versions of Unix.

As has been noted in an earlier comment, Linux doesn't impose setuid on scripts. This is because setuid is effected by a call to setup_new_exec() which the individual format handlers call, and binfmt_script doesn't call it.

Exercises for the interested reader

Posted Nov 25, 2010 16:07 UTC (Thu) by lacos (guest, #70616) [Link] (7 responses)

1. Identify a design element in the IP protocol suite which could be described as "high maintenance" or as having "unintended consequences".

Source routing and TCP urgent data come to my mind.

3. Research and enumerate uses of "hard links" which are not adequately served by using symbolic links instead. Suggest technologies that might effectively replace these other uses.

Some programs do this:

Create a file.
Work with the file.
If the work completes, close the file, leave the reference (the name) in the fs.
If the work is interrupted or fails, close the file and remove the reference (in some order).

It is sometimes useful to add an independent hard link to the file during step 2, so it can be salvaged when the program removes it in step 4. Example: downloading a big file with your browser, then continuing it with wget.

The non-plus-ultra of hard-linking is flink(). Among other things, it should allow for a very useful operation: make a file reappear instantenously in the filesystem. Consider the practice of some utilities:

Create a temporary file.
Remove its name immediately, but keep it open by file descriptor, file description, and inode.
If the work completes, close the file. With the last reference going away, the space occupied by the file is released.
If the program crashes, or eg. a SIGKILL is delivered to it, the previous step happens automatically. Nothing left around to clean up later.

flink() extends this workflow for files that you want to keep in the end! Just replace the third step in the previous list: if the work completes, flink() the file back into the filesystem, then close it. It's like a lightweight "commit transaction" statement, with automatic rollback on failure.

flink syscall

Posted Nov 26, 2010 23:44 UTC (Fri) by speedster1 (guest, #8143) [Link]

It sounded very promising, but unfortunately nobody was able to come up with an implementation avoiding security holes. Linus put it this way, later in the same thread:

"As others have pointed out, there is no way in HELL we can do this
securely without major other incursions.

In particular, both flink() and funlink() require that you do all the
same permission checks that a real link() or unlink() would do. And as
some of them are done on the _source_ of the file, that implies that
they have to be done at open() time."

http://lkml.indiana.edu/hypermail/linux/kernel/0304.0/160...

Exercises for the interested reader

Posted Dec 4, 2010 1:20 UTC (Sat) by mxkb (guest, #71646) [Link] (5 responses)

Thinking outside of IP, IPv6's design is of high maintenance. The design was based on 1990's technology and is much worse than IPv4. There's no Identification field, which makes it very hard to debug. It uses this linked TLV headers instead of stating the total length in the beginning, and that makes the implementation very hard.

Ipv6's design is just so bad, and I don't understand why people won't start to work on the replacement.

Exercises for the interested reader

Posted Dec 4, 2010 1:34 UTC (Sat) by foom (subscriber, #14868) [Link] (1 responses)

Probably because those two things you said cannot possibly rise to the level of "needs a brand new incompatible-with-everything replacement"...

Also: the identification field is used for something other than fragment reassembly? huh..

Exercises for the interested reader

Posted Dec 7, 2010 0:55 UTC (Tue) by mxkb (guest, #71646) [Link]

Yes, protocol wise, the identification field is used only for fragmentation. But in debugging, you can use it for correlating packets seen at two different places. Just think about it, in a datagram paradigm, isn't it useful if I can tell a retransmitted packet from the original packet?

Exercises for the interested reader

Posted Dec 4, 2010 7:22 UTC (Sat) by paulj (subscriber, #341) [Link] (2 responses)

I've heard people with experience of building hardware-forward routers complain about the same thing - that the IPv6 "Next-Header" daisy chain is a pain to implement (in hardware specifically, was their complaint). I don't understand yet why though - IPv4 options *also* use a TLV format, and the v6 payload length includes the options just as v4's length does. What am I missing?

Exercises for the interested reader

Posted Dec 7, 2010 0:50 UTC (Tue) by mxkb (guest, #71646) [Link] (1 responses)

Extension headers are considered part of the payload. So even from ipv6 header, you can get payload length, you still have to parse every extension header to find the TCP/UDP header. The difference is in IPV4, you can find the L4 header directly.

Exercises for the interested reader

Posted Dec 7, 2010 10:41 UTC (Tue) by paulj (subscriber, #341) [Link]

Yes, I can see how some might dislike that, but it only applies to end-hosts. So doesn't affect routers, unless I'm missing something? (Indeed, making it more expensive for middle-boxes to unpack and get at the ULPs might even be a good thing in some respects).

Ghosts of Unix past, part 4: High-maintenance designs

Posted Nov 27, 2010 2:54 UTC (Sat) by skissane (subscriber, #38675) [Link] (1 responses)

One thing I dislike about symbolic links, is if you move or rename the target, the link is broken. I have had the idea before of "smart links". Assign every file a unique ID, e.g. a UUID. Not inode number, since the inode number might be reused in the future for another file. Have an index on the filesystem of file ID -> inode. Provide an API to access this index, so we can open a file by unique ID rather than name.

A "smart link" then would store, not the name of its target, but the file ID. Even if the target is renamed, the link would still work. However, if the target is removed, the link no longer works, it is now broken.

Suppose, as well as assigning files unique IDs, we have a unique ID for each filesystem (again, a UUID would be good). Then I can smart link to a file on another filesystem -- the link contains fs ID and file ID. If the filesystem is mounted, and the file exists, the link will point to the file. Whereas, if the filesystem is not mounted, or it is but the file does not exist, the smart link is broken.

Within a single filesystem, one could implement referential integrity for smart links -- if the file is deleted, the smart links to it are deleted as well. This would require some form of reverse index, from file id of target, to the file ids of smart links to it. I guess we could support both types of smart links -- a flag on the link could indicate whether deletion of target should delete link or not. One could even implement referential integrity for cross-filesystem links, but I think that would be too complex to be worth it (e.g., how to deal with case where I know filesystem X contains a link, but I can't delete it because that FS is currently not mounted?)

If I was designing an OS from scratch, I would be inclined to give every file a single primary name, and then have smart links (rather than hard or symbolic links) to implement secondary names. But I also think, if we have filesystem APIs to access files by UUID, maybe we don't even names to be compulsory -- what is wrong with nameless files? Maybe filenames don't belong in the core FS, but in a higher layer (like some 'name DB' or 'catalogue'?) Maybe we should just have a three layer structure, Filesystem -> Directory -> File, with UUIDs to identify each. The "directory" is just to group related files for management purposes, e.g. quotas and access control. If one wants more layers of directories, that can be implemented in the "naming" upper layer, rather than in the "nameless" core filesystem...

Ghosts of Unix past, part 4: High-maintenance designs

Posted Nov 27, 2010 3:45 UTC (Sat) by foom (subscriber, #14868) [Link]

> If I was designing an OS from scratch, I would be inclined to give every file a single primary name, and then have smart links (rather than hard or symbolic links) to implement secondary names.

That's exactly what Apple did 18 years ago. The Apple filesystems originally didn't support hard links or soft links. In 1992 they introduced "Alias" files, which are a smart softlink. In addition to containing the relative path, absolute path, and volumeid/file inode for locating the original file, they even contain enough information to automatically attempt to re-mount a network filesystem in order to access to original file.

It contains enough duplicate information that it can find an original file either if it's been removed/recreated in place, or if it's been moved to another location (but not both at once).

Unfortunately, aliases are not automatically resolved in the OS, but require apps to resolve them manually. In OSX, the GUI frameworks generally do this for you, but the BSDish tools won't work with aliases at all, just seeing them as a regular file. And OSX now supports traditional unix symlinks and hardlinks too.

Hardlinks are hardly a source of complexity

Posted Dec 2, 2010 11:43 UTC (Thu) by ksandstr (guest, #60862) [Link] (7 responses)

Hardlink disambiguation requires only that programs such as tar and du keep a set of INO * FSID seen so far. Given that programmers generally have a set-like data structure around in the libraries they're writing for, it's hardly the case that whatever extra complexity the author sees hardlinks causing in tar, du etc. was actually more than a minor wart at most. POSIX, for a bare minimum, implies hsearch(3) already.

Really, while(step_to_next_file(&ctx)) { if(!opt_do_hardlinks_once || !set_find_or_add(&fileset, ctx.ino, ctx.fsid)) process_file(&ctx); } would seem to cover it. Complexity? What? This sort of thing doesn't even require the outer curlies.

Hardlinks are hardly a source of complexity .. or is that 'difficulty'

Posted Dec 3, 2010 5:11 UTC (Fri) by neilbrown (subscriber, #359) [Link] (6 responses)

I think you might be confusing 'complexity' with 'difficulty'. While they often go together, they don't have to.

Digging a big hole is difficult (if, like me, you aren't very fit) but it is hardly complex. Solving sudoku is certainly complex but I do not find it particularly difficult (fun though).

You could think of 'complexity' as meaning 'room for errors to creep in'. It is certainly easy to make mistakes in sudoku. Less so in hole digging.

The complexity is not in the code, but in the need for the code. It means that I cannot simply archive each file in isolation, but need to interpret it in a larger context. It means to extract a file from an archive, I either need 2 passes, or I need to remember where every linked file was and rewind to read it.
It means it is imperative that the filesystem provides a unique inode number for every file, that is not re-used during the time that tar runs. This is not always as easy as it sounds.

Suppose while tar is running it finds a file '/long/path/foo' with a link count of 2. Immediately thereafter I remove both links and create a new file with two links, one at '/other/path/foo' and it happens to get the same inode number. When tar gets to that other foo, what does it do? Is it the other link which happens to have been changed in the mean time - so probably best to record the link and not the file - or is it a brand new file - so best to archive it and forget about finding the second link to the first foo.

Even if you think the answer to the above is obvious, the fact that I had to ask the question is a complexity.

So no: it isn't difficult to fix the glaring obvious issues. But it still adds complexity which we might be better off without.

Hardlinks are hardly a source of complexity .. or is that 'difficulty'

Posted Dec 6, 2010 17:34 UTC (Mon) by stevem (subscriber, #1512) [Link] (5 responses)

While you're right that hard links carry a complexity penalty, I'm not convinced that any alternatives might be any better. As an example, when building Debian CDs we:

* make a hard-link tree of the files that we want to fit within each image
* add some sym-links within that tree for various reasons
* run mkisofs/genisoimage/whatever on the tree to make the output ISO image

As an alternative, we *could* simply copy all the files that we want into the temp trees, but that costs a vast amount more in terms of disk space and time spent.

Or (as we have done in the past) create a tree of symbolic links instead. But then we've got to resolve where those links point when we build the image to know whether they belong inside or outside and hence how we should resolve them - more complexity.

Hard links are *cute* and I like them. :-)

Hardlinks are hardly a source of complexity .. or is that 'difficulty'

Posted Dec 10, 2010 4:19 UTC (Fri) by neilbrown (subscriber, #359) [Link] (4 responses)

Hard links certainly are cute. But also painful.

Your comments leads perfectly to exercise 3. You have identified a use-case that isn't really well handled by symlinks. So: what other technologies could serve the purpose -- and would they be better or worse than hard links?

There are at least two that I can think of, but I'd rather not give my answers away - better to encourage others to think - someone will probably have a better idea than mine...

Hardlinks are hardly a source of complexity .. or is that 'difficulty'

Posted Dec 10, 2010 4:31 UTC (Fri) by sfeam (subscriber, #2841) [Link]

For that particular use, one might attach an attribute either to the individual files or to their respective directory entries that would control whether or not the file is visible in the current context. The process burning the ISO image would grab all visible files, as it does now, but many other files in the same directories would be effectively invisible to it.

Hardlinks are hardly a source of complexity .. or is that 'difficulty'

Posted Dec 11, 2010 16:06 UTC (Sat) by MrWim (subscriber, #47432) [Link]

It seems that what is really wanted in this case is a copy-on-write as discussed in LWN articles COW Links (29/3/2004) and The two sides of reflink()

Hardlinks are hardly a source of complexity .. or is that 'difficulty'

Posted Dec 14, 2010 19:06 UTC (Tue) by adriboca (guest, #71850) [Link] (1 responses)

I agree with nellbrown that hardlinks are not the right solution for that problem. In fact, the options "-path-list" & "-graft-points" of mkisofs, should allow you to select any files that must be included in the disc image, renaming them as desired. This method, of creating a small text file and passing it to mkisofs is certainly much faster than hard linking all the tree.

If for some reason, those options do not work exactly like you need them, then mkisofs or whatever tool you use must be improved, not the file system. I have a lot of experience but I have never seen any application for which hard links are the best solution, but I have seen a lot of cases when they are an inconvenience.

I must make a correction to the article, the phrase "the idea of "hard links", known simply as links before symbolic links were invented" is not true. The first type of links that were invented were what are called now symbolic links, and they were introduced in the Multics file system.

UNIX made four simplifications of the Multics FS and the last two of them were stupid (i.e. they made negligible economies in time & space, but they created problems that are not solved even today in the successors of UNIX):
1 Short 14-character names instead of long names
2 A single set of file mode bits instead of ACLs
3 Hard links instead of symbolic links
4 Merged write & append rights

Later, BSD did the right thing by reintroducing in their improved file system the long names & the symbolic links, which were copied afterwards by the other UNIX derivatives.

Hardlinks are hardly a source of complexity .. or is that 'difficulty'

Posted Dec 15, 2010 22:25 UTC (Wed) by neilbrown (subscriber, #359) [Link]

Hard links do clearly provide a simple solution for this problem, but as I have hinted, I don't think that value is worth the cost. However I don't really like the approach of depending on cleverness in mkisofs either as it is a solution that would need to be implemented in any tool that has this need.

reflinks (already mentioned) are certainly a possible solution. I'm not entirely sure I'm comfortable with reflinks, though I cannot really explain why, so it might irrational. I would generally prefer any deduplication happened transparently rather than with a new syscall, but whatever...

My faviourite technology for this need is overlayfs (or possibly union mounts, though I like overlayfs more). Clearly it would require non-privileged uses to create mountpoints but I think the pressure is building for that and it is going to become a reality some day soon. Other than that issue, it is a perfect solution!

Ghosts of Unix past, part 4: High-maintenance designs

Posted Dec 17, 2010 1:23 UTC (Fri) by rich0 (guest, #55509) [Link] (2 responses)

The article mentions in passing that it is safer to write changes to a new file and rename it than to overwrite a file in place. This in itself is actually an example of a bad design - similar to the recent fiasco over app designers being told to fsync their changes.

The ideal design would be for applications to tell the OS what they want to do, and for the OS to do it. In this case, the application WANTS to atomically modify the contents of a file, but since the OS provides no capability for atomic updates instead the application PRETENDS that it wants to create a new file, rename it, and unlink the old one.

The solution in this case is transaction support within the filesystem. Then if you have a copy-on-write filesystem or whatever you're not forcing the system to rewrite the entire contents of a file to change a few bytes in the middle safely.

Ghosts of Unix past, part 4: High-maintenance designs

Posted Dec 17, 2010 3:38 UTC (Fri) by neilbrown (subscriber, #359) [Link] (1 responses)

I agree that it is best to be able to tell the OS exactly what you want to do, though sometimes that can be tricky.

In the case of modifying a file, if it really is just a few bytes in the middle that you want to change, then getting a lock, writing those bytes, and unlocking is probably best.

However in many cases you want to change N bytes in the middle of the file to M other bytes. This either requires non-linear transformations on the file, or the remainder of the file to be re-written. The former would hardly ever by used and so would be buggy. The later is very little less effort than re-writing the entire file.

So I still think that 'write a new file and replace the old' is a good model to follow.

Ghosts of Unix past, part 4: High-maintenance designs

Posted May 20, 2021 0:09 UTC (Thu) by wnoise (guest, #19404) [Link]

Using directory trees instead of single files is a widely underused design. Replacing a given set of bytes with another is easy if that set of bytes is a separate file.