Welcome to the GNU Core Utilities FAQ. This document answers the most frequently asked questions about the core utilities of the GNU Operating System.
The master location of this document is available online at http://www.gnu.org/software/coreutils/faq/.
If you have a question that is not answered in this FAQ then please check the mailing list archives. If you find a useful question and answer please send a message to the bug list and I will add it to the FAQ so that this document can be improved. If you still don’t find a suitable answer, consider posting the question to the bug list.
An excellent collection of FAQs is available by anonymous FTP at rtfm.mit.edu and in particular the Unix FAQ is pertinent here. ftp://rtfm.mit.edu/pub/usenet/news.answers/unix-faq/faq/contents
This FAQ was written by Bob Proulx <bob@proulx.com> as an amalgamation of the many questions asked and answered on the bug lists.
Copyright © 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015 Free Software Foundation
This document is free documentation; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 3 of the License, or (at your option) any later version.
This document and any included programs in the document are distributed in the hope that they will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
If you have not received a copy of the GNU General Public License see http://www.gnu.org/licenses/.
Next: Where can I get more information about GNU coreutils?, Previous: Top, Up: Top
Previously a set of three packages combined implemented the core set of GNU utilities. These were the GNU fileutils, shellutils and textutils. (Additionally shellutils with the version number attached as a suffix was one letter too long for 14 character limited filesystems and was also known as sh-utils.) Each had its own web page. Each had their own mailing list. But the three were generally considered a set.
In 2003 these three packages of fileutils
, shellutils
,
and textutils
were combined into the current coreutils
package. This greatly simplified the maintenance and management of
this project. The older packages are deprecated. All users are
requested to update to the latest stable release of coreutils
.
Next: Where can I get the latest version of GNU coreutils?, Previous: Fileutils shellutils and textutils, Up: Top
The online info
documentation is always the most up to date
source of information. It should always be consulted first for the
latest information on your particular installation. Here are example
commands to invoke info
to browse the documentation.
info coreutils info coreutils ls info coreutils sort info coreutils head
Additionally the home page contains the canonical top level information and pointers to all things GNU coreutils.
The online web home page for the GNU coreutils.
Please browse the mailing list archives. It is possible, even likely, that your question has been asked before. You might find just the information you were looking for. Many questions are asked here and at least a few are answered.
The GNU Core Utilities mailing list archives of the bug list.
The GNU Core Utilities bug mailing list subscription interface.
The GNU Core Utilities mailing list archives of the discussion list.
The GNU Core Utilities discussion mailing list subscription interface.
Next: How do I add a question to the FAQ?, Previous: Where can I get more information about GNU coreutils?, Up: Top
The GNU coreutils main stable release distribution site.
The GNU coreutils alpha and beta test release distribution site.
The GNU coreutils developer’s site on Savannah documenting source code access.
Browse the source code using the online web interface to the git version control system source code repository.
Development releases of GNU coreutils source code can be downloaded from ftp://alpha.gnu.org/gnu/coreutils. These are test releases and although they are generally of good quality they have not been tested well enough nor matured enough to be considered stable releases. Also by definition stable means stable and the test releases change frequently. Use with care.
The source code is maintained in the git version control system. The home page for git is http://git.or.cz/. A wiki page on Savannah documenting git use is at http://savannah.gnu.org/maintenance/UsingGit. A local working copy of the latest source code may be obtained using this command.
git clone git://git.savannah.gnu.org/coreutils
When possible it helps to report any bugs as seen against the latest development version. However the latest development version has dependencies upon the latest version of several other projects. See the README-hacking file available in the project source and here on the web at http://git.savannah.gnu.org/gitweb/?p=coreutils.git;a=blob;f=README-hacking for details of how to build the latest development version from checkout sources.
To avoid needing the latest development tools the maintainers periodically make available snapshot releases. These may be built with only normal system dependencies. These are frequently short lived and available directly from the developer’s site. To find out information about these it is best to read the mailing list and look for the latest snapshot announcement. Here are search links that may be useful.
http://lists.gnu.org/archive/cgi-bin/namazu.cgi?query=%2Bsubject%3Asnapshot&submit=Search%21&idxname=bug-coreutils http://lists.gnu.org/archive/cgi-bin/namazu.cgi?query=%2Bsubject%3Asnapshot&submit=Search%21&idxname=coreutils
Great effort is spent to ensure that the software builds easily on a large number of different systems. If you have not done it before then compiling the coreutils from a release or snapshot is probably easier than you think. If possible please build and test the latest test release to see if your problem is resolved there.
A note for Cygwin users. The MS-Windows platform requires special handling. For Cygwin I recommend using the latest precompiled binaries from the Cygwin site. They do an excellent job of handling the peculiarities and deficiencies of that platform.
Patches for bugs and enhancements are always appreciated. Please submit patches as unified diffs against the latest version control sources if possible. If unable to access Savannah then please use the latest test release that is available.
Be aware that submissions that are more than a few lines are legally significant and will require a copyright assignment to the FSF. In order to keep GNU free it is important for the FSF to be able to defend the GPL status in court if necessary. If you are unable to make a copyright assignment, such as for reasons of being unable to obtain a disclaimer from your employer, then your code cannot be incorporated, in which case it would be better to simply discuss your idea on the mailing list. Ideas cannot be copyrighted nor patented and someone else may be able to contribute the code to the project. Almost a worst case is when a person contributes a perfect fix or feature enhancement but it cannot be used because of the inability to obtain legal copyright assignment. In which case the contribution of a patch can actually hurt more than it helps and it would have been better simply to discuss it. The project ChangeLog is filled with the names of people making good contributions who have suggested ideas and reported bugs.
Next: How do I report a bug?, Previous: Where can I get the latest version of GNU coreutils?, Up: Top
Send a plain text e-mail to the <coreutils@gnu.org> mailing list. Also including the Frequently Given Answer if you have one. Please make sure the message is formatted as a plain text message. Do not send HTML encoded messages.
Please allow some time for processing. Many people read the mail so please try to be concise. Even if you do not receive a personal response your message will have been read by the maintainers. But so many messages are posted that it can be overwhelming for us at times.
Next: I reported a bug but did not get a response., Previous: How do I add a question to the FAQ?, Up: Top
Please be sure that the bug you are reporting is actually related to GNU coreutils. Many times people report problems in other unrelated programs. Those may be legitimate bugs but the GNU coreutils maintainers are in no position to do anything about other people’s software. People have reported bugs in their sound cards and crashes of their disk drives and panics of their operating system kernel. We can’t help you with any of those things.
If possible please test for the presence of your problem using the latest version of the software. The GNU utilities are widely used and many times the bug will already have been found and fixed in a later version.
Check the mailing list archives to see if someone else has already reported your problem. Keep in mind that many people will be reading the list and it can be difficult to keep up with the volume repeats. It can also be useful to search the gnu.org site using your favorite Internet search engines by specifying a site:lists.gnu.org search restriction.
Searchable mailing list archives.
Searchable mailing list archives of the discussion list.
If you think you have a real bug then send mail to the bug discussion list. Use a subject that is descriptive of the issue. Think of how difficult it is to follow a thread of discussion when every subject line is simply bug.
Examples of good subjects:
mv && hardlinks problem dd and skip on Linux tape devices and pipes assertion failure in mv command fails to compile on platform xyz
Examples of bad subjects:
I have a question. help! bug? bug report URGENT
In your description indicate the version of the program that you are seeing with the problem. Also note the operating system type and version. The GNU utilities are part of the GNU Operating System but have been ported to run standalone on many different types of systems and some problems will be unique to them. We are good at guessing your environment but it is much simpler if we have the information without guessing.
Please make sure the message is formatted as a plain text message. Do not send HTML encoded messages. Do not send overly large messages without first establishing contact. Once someone found a way to trigger a problem with sort by giving it a 200MB sized file. They sent that datafile to the mailing list. Needless to say that was not appreciated. People on limited bandwidth connections reading the list were severely affected.
Include as small of a test case as you can manage to create that will allow others to recreate the problem. If the problem cannot be recreated then it is very difficult to diagnose or fix.
Confused by all of this? Don’t worry. We all start somewhere. Here is a pointer to an article written by Eric S. Raymond on how to ask good questions.
Patches to the source code are always appreciated. But reports of bugs are always welcome even if you do not feel comfortable working in the source code.
Next: I use the Cygwin toolset and I have a problem., Previous: How do I report a bug?, Up: Top
It may just be that the volunteers are busy. Please be patient. Every message to the bug lists is read by many people. Sometimes there is just nothing to say at the time. If sufficient time has elapsed, say one or two weeks, and you think your message may have been forgotten it is perfectly acceptable to send a second message asking about your first message.
But if you think your message might not have made it to the list or that you may have missed the response then check the mail archives as described above. If you do not see your message there then we did not see it either.
Or perhaps your formatting was such that it was unreadable. A number of posts are unreadable because the text was encoded and it could not be deciphered. If it does not show up as plain text in the archive then it did not get sent to the list in plain text format.
Next: Why can only root chown files?, Previous: I reported a bug but did not get a response., Up: Top
The hard work the Cygwin team has done to port GNU Project software to MS-Windows is greatly admired. However the GNU team generally uses GNU Operating Systems and do not have access to Cygwin systems or MS-Windows systems which means that most of us can’t help you. It would be most appreciated if you would make your bug report directly to the Cygwin folks. They are the experts and best suited to handle your problem.
Cygwin Home Page
Cygwin FAQ. Most excellent!
Cygwin Bug Report Guidelines. This is a good reference for general bug reporting.
Next: chown fails when the username contains a '.' in it., Previous: I use the Cygwin toolset and I have a problem., Up: Top
The GNU chown program will change the ownership if the operating system it is running upon allows it. If you can’t change file ownership then it is the operating system which is restricting you and not the chown program.
Actually, the GNU chown command does not know if this is the policy of the system or not. It calls the kernel system call chown() just like any other program (e.g. perl, ruby, etc.) If the OS allows it then it will change the ownership of the file. Different systems handle this differently. Traditional System V Unix systems allow anyone to give a file away to other owners. On those systems GNU chown does change the ownership of files.
But on most modern systems BSD semantics are followed and only the superuser can change the ownership of the file. The problem for documenting this is that GNU chown does not know which it will be running on. It could be one or it could be the other. Or it might even be running on a system without the concept of file ownership at all! This is really an OS policy decision and it is hard to track documentation to be different on different systems. But the documentation must be independent of operating system.
The reason an operating system needs to restrict changing ownership is mostly threefold.
Next: How do I remove files that start with a dash?, Previous: Why can only root chown files?, Up: Top
That was a problem with now quite old versions of fileutils that are unlikely to be in use anymore. Newer versions fix the problem. This entry is retained only to prevent ’makeinfo’ from renumbering the following entries.
Next: I have a file called --help. How do I remove it?, Previous: chown fails when the username contains a '.' in it., Up: Top
Since the file name begins with a ’-’ it looks like an option to the command. You need to force it to not look like an option. Put a ./ in front of it. Or give it the full file name path. Or tell the command you are through with options by using the double dash to end all option processing. This is common to most traditional Unix commands.
rm ./-stuff rm /full/path/-stuff rm -- -stuff
And the same for other utilities too.
mv ./-stuff differentstuff mv -- -stuff differentstuff
Next: I have a file '-f' and it affects rm., Previous: How do I remove files that start with a dash?, Up: Top
This is just a variant of the previous question. It also applies to files named ’-i’ and the like. But this question is asked often enough that it deserved an entry specifically for it.
rm ./--help rm -- --help rm ./-i rm -- -i
In fact touching a file called ‘-i’ in a directory is an old
trick to avoid accidentally saying ‘rm *’ and having it remove
all of the files. Since the ‘*’ expands to match all file names
then the first such name will be the ‘-i’. That will make the
command become ‘rm -i file1 file2’. As you can see that will
cause rm
to prompt you and if that is not what you wanted
then you can interrupt the command. I don’t personally like this and
don’t recommend it.
Next: Why doesn't rm -r *.pattern recurse like it should?, Previous: I have a file called --help. How do I remove it?, Up: Top
I have a directory containing some files and also ’-f’ as a filename and
rm
thinks I gave it the -f
option. Why?
$ ls -f bar foo $ rm -i * $ ls -f
This is not a bug. This is normal behavior. The shell expands globbing characters such as ’*’ before executing the command. The command that rm is seeing is the following.
rm -i -f bar baz foo fum
You can test the result by using the echo command to echo-print the
command line without actually invoking it: echo rm -i *
.
The -f option to rm overrides the -i option. Therefore the files are removed without asking. Since the shell expands globs like ’*’ before the program sees the command line it cannot distinguish between something the user typed and something that was expanded by the shell. The shell filters all command lines. This is a good thing and adds a lot of power to the system but it means you have to know that the shell expansion filter is there to write robust scripts and command lines.
To robustly write a command that does what you are wanting you need to do one of the following:
rm -i ./* rm -i -- *
See the previous question with regards to a filename -i.
Next: I used rm to remove a file. How can I get it back now?, Previous: I have a file '-f' and it affects rm., Up: Top
This question is asked a number of ways:
rm -R *.html chmod -R 744 *.pl chown -R user:user *.html
This is the same behavior as other typical programs such as
ls -R
, chmod -R
, chown -R
, etc. that
also include recursive directory options. None of those commands
search a directory hierarchy looking for matching files. Instead the
shell expands the wildcards in the current directory only.
Here are the pieces of information you need to understand what is happening. The -R (and -r) option says that if any of the files listed on the command line are a directory then recurse down through those directories. Only arguments to the program which are directories are recursively acted upon. So any program argument which is a directory will be removed completely which would mean recursing down that directory and removing anything below it. But if the command line argument (after shell expansion) is not a directory then it won’t go searching trying to find a match. It is very unlikely that “*.html” will match a directory since that is typically a file suffix. It is very unlikely that -R with *.html will do anything useful. More often the recursive options are used with literal directory names or with the “.” current working directory.
The shell interpreter is expanding the command line file name glob characters prior to passing them to your command. This is a simple form of regular expression matching designed to make file name matching easier. This provides a consistent interface to all programs since the expansion code is common to all programs by being in the interpreting shell instead of in the program itself. Commands in Unix do not see the “*.html” or any of the shell meta-characters. Commands see the expanded names which the shell found matched file names in the current directory.
The “*” is the “glob” character because it matches a glob of characters. But it only matches files in the current directory. It does not list files in other directories. (Some modern shells have expanded this capability as a non-standard extension.) The shell matches and expands glob characters and passes the result to the command.
You can confirm this by using the echo command. This is built into
most command shells, for example into bash. Try echo *.html
.
Try echo */*.html
. In your example the first would print out
“*.html” if nothing matched but would print out all file names that
did match. The command would see the result and has no idea that you
provided a wildcard to match against file names.
If you want to match files in subdirectories as well then you would need to say so explicitly with “*/*.html”. The first star would match all file names in the current directory. Then the second “*.html” would be matching files in the subdirectories under names already matched by the first “*” glob.
All of that was to explain why things are working as they should even if you are here reading this entry because you think they are not.
Given the question then here is what you really wanted to do. If you
want to search all directories below a location, let’s say your
present working directory or any arbitrary path on the file system,
then you can use other Unix commands such as find
to do so.
Here is an example using two commands, find
and chmod
,
in combination in a way that keeps with the Unix philosophy.
chmod a+r $(find . -name '*.html' -print)
The $(command) executes the find command, takes the output of the find
command and replaces it on the command line in place of the “$(...)”
in a process known as command substitution and then passes the result
to the rm command. Note that the ’*.html’ is quoted to keep the shell
from expanding it. The find
command will do the expansion
itself in this case and so the ’*’ needs to be hidden in a string to
keep the shell from expanding it first. The chmod
command
will receive a list of the files produced by the find
as
command line arguments.
Unfortunately this transferal of functionality from the command to the shell comes at a cost. In most operating system kernels there is a limited amount of argument space that is available for this argument expansion of file names. Combining commands using shell command substitution may fail if there are too many arguments. If you have a HUGE subdirectory with thousands of files the above command will fail to execute. (See the FAQ entry on “Argument list too long”.) The limit is different on different systems and getting larger as RAM gets cheaper but almost always there is still a limit. 20KB was typical for a time and now 2MB is common. But any limit is still a limit regardless. Additionally it is usually shared with environment variable space which also consumes it.
The xargs
command was designed specifically to work around
this argument space limit. Therefore in traditional systems a
better method is to use find
coupled with xargs
.
find . -name '*.html' -print | xargs chmod a+r
You could substitute a full path in place of the ’find .’ such as something like ’find /class/home’.
By piping the data from one command to another the argument size
limitation was avoided. This technique is very often seen in shell
scripts. However this technique does not work as intended when
filenames contain whitespace. A robust and safer method was
implemented as an extension by GNU find
using the
“-print0” option and zero terminated strings instead of newline
terminated strings:
find . -name '*.html' -print0 | xargs -0 chmod a+r
However that was never universally implemented. This made writing
portable scripts difficult. The good news is that recent versions of
find
implement a new form of the -exec option using “{} +”
which combine the xargs
functionality into the
find
itself. This has been adopted as a POSIX standard and
should now be available on all modern POSIX compliant systems.
find . -name '*.html' -exec chmod a+r {} +
Note the “+”. Traditionally a “;” was used with different behavior.
Next: Why don't the utilities have built in directory recursion?, Previous: Why doesn't rm -r *.pattern recurse like it should?, Up: Top
It is extremely unlikely that you will be able to recover a file from the filesystem after you have removed it. It depends upon your operating system and the filesystem used with it. The disk blocks storing the file would have been freed and very likely reused by other files. This is especially true on an active file system. However, having said that disclaimer, it is frequently possible to recover deleted files.
Matt Schalit <mschalit at pacbell dot net> suggests in a message to the list that the user may want to look into The Coroner’s Toolkit, unrm, and lazarus.
Ciaran O’Riordan <coriordan at compsoc dot com> wrote suggesting that users should look at the programs ’recover’ and ’gtkrecover’. The latter being a gui interface for the former. The tool has been used with great success when O’Riordan’s 700 MB music collection was deleted. It is GPL and is a Debian package. It is ext2 specific but is available for both GNU/Linux and GNU/Hurd. Also, the ’recover’ homepage has a link to a manual way to retrieve deleted files.
On Debian systems:
apt-get install recover
See also “Linux Ext2fs Undeletion mini-HOWTO” by Aaron Crane at
http://www.tldp.org/HOWTO/mini/Ext2fs-Undeletion.html. This
contains very useful information about this topic. This may already
be present on your system at
/usr/share/doc/HOWTO/en-txt/mini/Ext2fs-Undeletion.gz or might
be found other places using the locate Ext2fs-Undeletion
command.
less /usr/share/doc/HOWTO/en-txt/mini/Ext2fs-Undeletion.gz
See also “Linux ext3 FAQ” by Juri Haberland at http://batleth.sapienti-sat.org/projects/FAQs/ext3-faq.html. Thanks to Paul Dorweiler for submitting that reference for this FAQ.
Q: How can I recover (undelete) deleted files from my ext3 partition?
Actually, you can’t! This is what one of the developers, Andreas Dilger, said about it:
In order to ensure that ext3 can safely resume an unlink after a crash, it actually zeros out the block pointers in the inode, whereas ext2 just marks these blocks as unused in the block bitmaps and marks the inode as "deleted" and leaves the block pointers alone.
Your only hope is to "grep" for parts of your files that have been deleted and hope for the best.
User green noted the ext4magic tool which is specifically written to recover files on ext3 and ext4 filesystems using the file system journal. See the homepage at http://developer.berlios.de/projects/ext4magic. This is also packaged for some software distributions such as Debian probably others too.
Apparently there is a process to undelete files that exist on ReiserFS file systems. See the “ReiserFS File Undelete HOWTO” page at http://www.cgsecurity.org/wiki/ReiserFS_File_Undelete_HOWTO for a description.
Sometimes people want to alias rm
such that files are not
actually removed but are instead saved off to a trashcan. When this
is done files can be recovered from the trash if later this is found
to be a mistake. Steve Ward suggested the following alias.
alias del='mv --verbose --backup=simple --suffix=$(date +".(%F_%T)") --target-directory=$HOME/.Trash/'
Dennis German also suggested referencing this HOWTO at http://gentoo-wiki.com/HOWTO_Move_Files_to_a_Trash on how to configure your shell environment to use a trashcan. This is certainly not a good default for everyone but for those who desire it this may be a useful configuration.
You don’t want your files to be recovered and you want them to stay
deleted? The GNU coreutils package includes a utility called shred
which may be of some use to prevent it. Read the documentation on
that program for more details especially the exceptions section where
shred does not work. This operation does not work on journaled
filesystems.
info shred
Next: ls -d does not list directories!, Previous: I used rm to remove a file. How can I get it back now?, Up: Top
People frequently wonder why
cat
,
chgrp
,
chmod
,
chown
,
ls
,
mv
,
rmdir
,
rm
,
touch
,
do not support find
type of operations. This is part of the
subtle but beautiful design of the Unix system. Programs should be
simple and modular. Common behavior should be modularized into a common
location where it can be used by other programs. More complicated
programs are created by chaining together simpler programs.
Here is an example of the Unix philosophy in action. Let’s say you want to restrict directory listings to only show directories. The normal thing would be to use a combination of commands. This following example will do what you want.
ls -al | grep ^d
This command assumes that you are using bash
. Some other
shells such as the old sh have ‘^’ as being a special character,
a synonym for ‘|’, and you would need to quote it to avoid the
special meaning. This command will generate the full listing and the
grep will show only the lines starting with ’d’. Those are the
directories. Doing this type of command chaining is a very typical
way of doing things under Unix.
When people find that particular combinations of commands are ones that they use a lot then they will typically create a shell script, a shell function or a shell alias which does this with a shorter name. I like shell scripts because they are easier to pass around. If you found the above command one that you always used then perhaps you would create the following shell script.
File lsdir:
#!/bin/sh ls -al "$@" | grep ^d
You could call it whatever you desired. Then it would work just like ls but with your special behavior. For all intents and purposes it would be just like a normal Unix command and its output could be piped into other commands.
This is embodied in what is known as The Unix Philosophy. There are many good discussions and without favoring one or another too much here are a few of them.
Wikipedia entry.
Excerpts from Eric S. Raymond’s The Art of Unix Programming
Search for references on the web using Google’s search engine.
The best solution to use when needing file recursion is often to use
the find
program. It is a standard program that includes many
powerful file finding features. It will work in combination with
other commands to form a powerful combination for directory recursion.
Here is a simple example that changes the file permissions on a directory tree. The “{} +” is replaced by the largest set of filenames allowed by the operating system. This is a very efficient way to invoke commands on a large set of files.
find . -exec chmod a+r {} +
Next: ls -a * does not list dot files, Previous: Why don't the utilities have built in directory recursion?, Up: Top
I typed in ls -dl but it only showed me the current directory.
Well, yes, that is what it is supposed to do. The default directory to list if none is specified is the current directory. The -d option prevents recursively listing the directory. Therefore ls -ld lists only attributes of the current directory.
But I expected to see all of the directories.
That would be the output of ’ls -l’ without the -d option.
The -d option is meant to prevent ls from listing the contents of directories when you only want it to list the names of the directories.
To understand the usefulness of the -d option try this example. Compare the differences in the results of ls when you use the -d option versus when you do not.
ls -ld /etc/*.d ls -l /etc/*.d
If you are trying to find files in the directory hierarchy then you
should look into using the find
command. It is very powerful
and contains an interface which can be used with many other programs.
info find
Next: Argument list too long, Previous: ls -d does not list directories!, Up: Top
When I list directories using ls *
, or even ls -a *
I
never see hidden files, for example files with names that start with a
dot as in ‘.profile’.
The command shell expands the ’*’ before ever handing it to a command. This is regardless of it being ls as it could be any command on the command line. The ’*’ is called the glob character because it matches a glob of characters. This process is called file name globbing. This is documented in your shell manual. Bash documents this well. In the bash info pages look for the section titled “Filename Expansion”.
A ’*’ is a shell pattern and is replaced by a list of files that match that pattern. Filenames that start with a ’.’ do not match that pattern. Neither does it match a ’?’. The dot character must be explicitly matched when it occurs at the start of the filename.
You should test what input is being given to commands by the shell
with the echo
command. Try these patterns as starting
examples. Try this in your home directory where there are usually
rich examples of dot files.
echo * echo .* echo .* * echo .?* echo .??*
As you will see from those examples the ls
command is only
listing out the files that were presented to it by the shell. Which
answers why the dot files were not listed. In fact, what is
ls
doing that you can’t do yourself? Very little in this
particular case. You might as well use echo
for listing and
then you can use the fmt
command to word wrap to your
screen. You can also use tr
and grep
to finish
the job. Suddenly, using ls
for this seems more convenient.
Especially when coupled with the -l option!
echo * | fmt echo .* | fmt echo .* * | fmt echo .* * | tr " " "\012" | grep profile
Okay, now I understand why ls *
did not list out hidden dot
files. But then how does one list out dot files?
There are several ways that are typically used. Here are a variety of examples that should spark ideas for you. Try them out and compare and contrast their differences.
ls -a | grep profile ls -d .* ls -d .??* ls -d .[^.]* ls -d .[!.]*
Some are more convenient than others. But dot files are meant to be hidden files. Therefore it is reasonable that you will need to do a little more work to unhide them.
You should also read over the answers to other questions in this FAQ
as this is a very similar theme. Read the documentation on your
shell. For GNU systems info bash
will launch the info system
on the bash documentation. I also recommend reading one of the many
fine shell programming books available from the bookstore.
Next: I am trying to compile a C program ..., Previous: ls -a * does not list dot files, Up: Top
I tried to move about 5000 files with mv, but it said:
$ mv * /some/other/directory/ bash: /bin/mv: Argument list too long
A scalable equivalent to the above is:
find . -maxdepth 1 -name '[!.]*' -print0 | xargs -r0 mv --target='/some/other/directory'
The traditional Unix operating system kernel has a fixed limit of
memory available for program environment and argument list combined.
Into this combined program environment and argument space all of the
environment and all of the command line arguments must be fit. If
they don’t fit then the external program execution will fail. In the
above example the *
is expanded by the shell into a literal
list of filenames to be passed as command line arguments to the
mv
program. If the number of files in the current directory
to be expanded is larger than this fixed buffer space including the
environment space then they won’t fit. In that case it is impossible
to execute the program with the specified list of arguments and the
current environment size.
Note that the “Argument list too long” message came from the
bash
shell command line interpreter. Its job is to expand
command line wildcard characters that match filenames. It expands
them before any program can access them. This is common to all
programs on most Unix-like operating systems. It cannot exceed the OS
limit of ARG_MAX and if it tries to do so the error “Argument list
too long” is returned to the shell and the shell returns it to you.
This is not a bug in mv
or other utilities nor is it a bug
in bash
or any other shell. It is an architecture
limitation of Unix-like operating systems. The mv
program
was prevented by the OS from running and the shell is just the one in
the middle reporting the problem. The shell tried to load the program
but the OS ran out of space. However, this problem is one that is
easily worked around using the supplied utilities. Please review the
documentation on find
and xargs
for one possible
combination of programs that work well.
The value can vary per operating system. POSIX only requires 4k as the minimum amount. Newer operating systems releases usually increase this to a somewhat larger value. On my Linux system (2.2.12) that amount is 128k bytes. On my HP-UX system (11.0) that amount is 2M bytes. HP-UX provided 20k for many years before increasing it to 2M. Sven Mascheck wrote to the mailing list that early systems such as Unix V7, System III, 3BSD used 5120 bytes and that early 4BSD and System V used 10240 bytes. More information about ARG_MAX from Sven Mascheck is available on his page http://www.in-ulm.de/~mascheck/various/argmax/.
The getconf
command may be used to print the currently
implemented limit.
getconf ARG_MAX 131072
It might occur to you to increase the value of ARG_MAX such as by recompiling the kernel with a larger value but I advise against it. Any limit, even if large, is still a limit. As long as it exists then it should be worked around for robust script operation. On the command line most of us ignore it unless we exceed it at which time we fall back to more robust methods.
Here is an example using chmod where exceeding ARG_MAX argument length
is avoided by using find
.
find htdocs -name '*.html' -exec chmod a+r {} +
Read the previous question for another facet of this problem.
Recent News: The Linux kernel has removed the classic ARG_MAX
limitation. See
the changeset on git.kernel.org
for the change. This was released with the linux-2.6.23 kernel. It
will eventually propagate into the release software distributions
which include it. Note that glibc’s getconf ARG_MAX
hasn’t
caught up yet and still reports a fixed value.
Next: I am having a problem with kill nice pwd sleep or test., Previous: Argument list too long, Up: Top
People trying to develop compiled software often ask questions that start with, “I have found a bug in the sleep() function.” Or ask a question similar to, “I am trying to use the chown() function in my program.” They confuse the library call with the program utility because they have the same name. They are having a problem using the library routine in their compiled program but are asking the question of the command line program developers.
Frequently the same name is used for both a program and a system call.
Most of the programs got their names because that was what the system
call was named. One caused the other. Which means that it is very
likely that the sleep
program was named because it used the
sleep
library routine.
This confusion between library routines and command line programs
usually happens with
chgrp
,
chmod
,
chown
,
mkdir
,
mknod
,
nice
,
rmdir
,
sleep
,
sync
,
uname
,
or one of the other programs that have a name matching a C programming
routine. It can be confusing to realize that the program is a wrapper
to the underlying library routine or system call. If you are compiling
a program then you are NOT using the GNU command line utility
in your C/C++ program.
You may be confusing the C library routine used in your program with the shell command line program. They are not related. Except that the program calls the library function just like your program does. If you have a real bug in the library routine then you need to determine who supplied that library and report the bug to them. If it is the GNU Lib C (a.k.a. glibc) provided library then the glibc mailing list at <bug-glibc@gnu.org> might be a useful reference.
The online man pages can also be confusing in regard to this. If you are trying to use chown() in a program and get a man page you will likely get the man page for the command line program and not the library routine you were looking for. You must specify the section of the manual that you want information.
man chown man 2 chown man sleep man 3 sleep
Traditionally man pages are organized by section number. Section 1 are command line programs. Section 2 are operating system calls, aka kernel calls. Section 3 are library routines. Library routines may or may not use system calls. Etc. Other sections are used for other purposes that are not germane here.
Many GNU programs have deprecated the man
pages and have moved
entirely to info
documentation. But man pages still have a
loyal following and are quite useful for reference. But they make poor
user guides.
Next: echo test printf kill, Previous: I am trying to compile a C program ..., Up: Top
This is frequently a variant of the previous question. Go read it first and then come back here. This time the same name is used by a utility and is also built into a command line shell. In which case you might need to report your problem to your shell maintainers.
Many commands are built into your command shell so as to speed up shell
scripts. People frequently get confused over whether they are running
the external program or the internal shell builtin. This usually
happens with
kill
,
nice
,
pwd
,
printf
,
sleep
,
test
,
and some others that exist both as internal and external commands.
Double check that the problem you are having is with the external
program. If it is the internal version then contemplate reporting it to
<bug-bash@gnu.org>.
Many times a program is required by circumstances to exist both as a builtin and as an external. Usually this is because of the need to be exec’d directly from another program without a shell to implement it as a builtin. Therefore it must exist as an external, standalone utility. However, those utilities are also built into most shells in order to improve the execution speed.
Next: Sort does not sort in normal order!, Previous: I am having a problem with kill nice pwd sleep or test., Up: Top
Q: echo/test/printf/test/[/kill do not work as documented. Commonly, one of these doesn’t work (yet expected to work):
echo --help echo --version echo -e '\x45' printf --version printf '\u263a\n' kill --help
This is another variation of the previous two items above.
A: These commands are often implemented as shell builtin functions, supporting less features than GNU coreutils’ versions.
Use ’type’ (bash builtin) to determine if the command is a shell builtin:
$ type printf printf is a shell builtin
Versus:
$ type printf printf is /usr/bin/printf
To use the external version of these programs, prefix with ’env’ or specify a full path.
Shell builtin functions on do not support –version:
$ echo --version --version $ printf --version bash: printf: --: invalid option printf: usage: printf [-v var] format [arguments]
While GNU coreutils programs do support –version (and other features mentioned in the documentation):
$ env echo --version | head -n1 echo (GNU coreutils) 8.21 $ env printf --version | head -n1 printf (GNU coreutils) 8.21 $ /bin/echo --version | head -n1 echo (GNU coreutils) 8.21 $ env echo -e '==\x3D==' ===== $ env printf '\u263a\n' [prints unicode smily face, if supported by your terminal]
These features often extends POSIX, and are not portable. To restore strict POSIX compliance, use POSIXLY_CORRECT environment variable as explained by Eric Blake: http://lists.gnu.org/archive/html/bug-coreutils/2013-03/msg00016.html
This entry submitted by Assaf Gordon.
Next: The ls command is not listing files in a normal order!, Previous: echo test printf kill, Up: Top
Why is sort not sorting correctly?
Some years ago at the time of transition to locale aware libc routines this question arose very often. With the passage of time most people have learned about this behavior and workarounds are quite well known. But the question is still raised often. It is most often due to having LANG set to ‘en_US.UTF-8’ set in your user environment. At that point sort appears broken because case is folded and punctuation is ignored because ‘en_US.UTF-8’ specifies this behavior. Once specifically requested by LANG and other LC_* variables, sort and other locale knowledgeable programs must respect that setting and sort according to the operating system locale tables. That is why this is not a bug in sort, sort is doing the right thing, even if the behavior is generally undesirable. Most of the language specific locales have tables that specify the sort behavior to ignore punctuation and to fold case. This is counter intuitive to most long time computer users!
Of course locales are a blessing to non-english speaking computer users. Many languages use non-ASCII fonts and character sets. The POSIX standards and the GNU utilities support those by using the installed system library sorting routines. By using locales languages can specify a specific ordering. This can be done by a translator well after the program has been written and by translators not known by the program author. It is a dynamic change to the program. However, when those tables exhibit an undesired behavior then it can also break an otherwise correct program. If locale tables force a locale specific collation sequence when a standard collation sequence was desired then this is most noticeable with the sort command and so it bears the full force of the problem.
This locale sorting behavior is configured in your environment. This is due to the fact that you or your vendor have set environment variables that direct the program to use locale specific sorting tables which do not sort as you expect. You or your vendor have probably set environment variables such as LANG, LC_COLLATE, or LC_ALL to ‘en_US.UTF-8’. The locale collating table selected by the locale setting is not part of the GNU program but part of your vendor’s system release.
To select a standard sort ordering simply set LC_ALL to ‘C’ or ‘POSIX’.
# If you use bash or some other Bourne-based shell, export LC_ALL=C # If you use a C-shell, setenv LC_ALL C
and it will then work the way you expect because it will use the standard set of tables. The standard set of tables is the traditional one for US-ASCII. However this will probably not sort characters outside of that character set correctly. So far there is still no fully satisfactory solution to this problem. If you find one then please contact me so that this information can be listed.
The locale affects almost every command that operates on characters.
It has very deep reaching effects in addition to ordering. For
instance it causes regular expressions to operate more slowly. Many
commands execute more slowly in multibyte locales. Often the normal
grep, sed, awk commands are all very much slower when operating in a
UTF locale than when operating in the C locale. For this and related
reasons a very commonly seen workaround in scripts is to set
LC_ALL=C
so as to force standard locale behavior for both
sorting and for regular expressions.
See
the standards documentation for more information on the locale
variables with regards to sort
.
Also see
http://www.opengroup.org/onlinepubs/009695399/basedefs/xbd_chap08.html#tag_08_02
for more information on environment variables that control
localization.
Next: The date command is not working right., Previous: Sort does not sort in normal order!, Up: Top
This is just a variant of the previous question. Any program that is compliant with the standards and implements locale based collating sequences to support non-ASCII languages will be affected.
See
the standards documentation for more information on the locale
variables with regards to ls
.
Next: ln -s did not link my files together, Previous: The ls command is not listing files in a normal order!, Up: Top
$ date -d "2006-04-02 02:30:00" date: invalid date `2006-04-02 02:30:00'
There are several reasons why date might report a date as invalid. The usual reason is that the exact time indicated doesn’t exist because the local daylight saving time changed and skipped over that moment in time. The answer to the above was due to the timezone switched to DST just at that moment in time. This can be seen by using zdump to display the time zone data.
$ zdump -v US/Mountain | grep 2006 US/Mountain Sun Apr 2 08:59:59 2006 UTC = Sun Apr 2 01:59:59 2006 MST isdst=0 gmtoff=-25200 US/Mountain Sun Apr 2 09:00:00 2006 UTC = Sun Apr 2 03:00:00 2006 MDT isdst=1 gmtoff=-21600 US/Mountain Sun Oct 29 07:59:59 2006 UTC = Sun Oct 29 01:59:59 2006 MDT isdst=1 gmtoff=-21600 US/Mountain Sun Oct 29 08:00:00 2006 UTC = Sun Oct 29 01:00:00 2006 MST isdst=0 gmtoff=-25200
This shows that on my system daylight time begins on Apr 2 03:00:00 2006 MDT with the last second before being Apr 2 01:59:59 2006 MST and therefore "2006-04-02 02:30:00" cannot be a valid time in the US. It may be a valid time elsewhere in another timezone.
When debugging these problems including the timezone information is
very important to understanding the problem. When reporting bugs
please include the time zone in question. Also note that localized
timezone names such as “EST” are ambiguous. For portability the
numeric timezone offsets such as produced by date -R
are
best.
The parsing of dates with date --date=STRING
is a GNU
extension and not covered by any standards beyond those to which GNU
holds itself. In which case whatever GNU decides to do is what it
decides to do. However touch -d STRING
is
defined by POSIX
and is implemented with the same date string parsing code. Therefore
you can expect that similar rules apply to both. Currently the date
parsing code makes a best guess at understanding the string in a
variety of formats but isn’t always able to understand all possible
string variations. See the date manual for an extended description of
the syntax. But in short summary it is intended to be a human date
string.
$ date --date="Wed, 07 Jan 2009 14:03:59 -0700" +"%F %T %z" 2009-01-07 14:03:59 -0700 $ date --date="2009-01-07 14:03:59 -0700" +"%F %T %z" 2009-01-07 14:03:59 -0700 $ date --date="Wed Jan 7 14:03:59 MST 2009" +"%F %T %z" 2009-01-07 14:03:59 -0700 $ date -d today +"%F %T %z" 2009-01-07 14:06:23 -0700 $ date -d tomorrow +"%F %T %z" 2009-01-08 14:06:38 -0700
Caution: Named timezones may not be unambiguous. In particular the locale-dependent string from the traditional legacy default
date
output may produce a string that can’t be parsed unambiguously. It is better to use thedate -R
format when producing output from date that is meant to be parsed again by it.
$ date Mon Jul 2 13:10:11 MDT 2012 # <-- Warning: Text timezone may not be unique. $ date -R Mon, 02 Jul 2012 13:10:23 -0600 # <-- Info: Unambiguous time string. $ date -R -d "$(date -R) + 2 days" Wed, 04 Jul 2012 13:11:05 -0600
However the date -R
format resolves time to the second.
That is usually okay. But if resolution smaller than one second is
needed, such as for setting a clock, then a high resolution and
unambiguous time should be used.
In addition to other formats the date --date=STRING
may take
the time in Unix time format. This is the number of seconds since
midnight UTC of January 1, 1970.
See
Wikipidia’s Unix time page
for a good description of Unix time.
Precede the number with an @
character to indicate that the
time string is a Unix time in seconds since the epoch.
$ date +%s 1243720606 $ date -R --date=@1243720606 Sat, 30 May 2009 15:56:46 -0600 $ date +%s.%N 1243720606.134242460 $ date --date=@1243720606.134242460 +"%F %T.%N %z" 2009-05-30 15:56:46.134242460 -0600 $ date --date=@1243720606 +"%F %T.%N %z" 2009-05-30 15:56:46.000000000 -0600 $ date --date=@1243720606.134242460 +"@%s.%N" @1243720606.134242460
As noted in the info documentation fuzzy relative items in date strings may cause the result to end up at an undesired time. Such as at an invalid time due to a time change or in the same month instead of a different month. Therefore it is more robust to anchor the time at a particular safer starting point before applying a relative time modifier. Such as operating at noon to avoid time changes that happen at night. Such as operating at the middle of the month to avoid different lengths of months.
If you are calculating relative days from today then you can simply
use today
as the day reference point and then specify
12:00
to work at noon.
$ date -d "12:00 today -1 days" +"Yesterday was %A." Yesterday was Saturday. $ date -d "12:00 yesterday" +"Yesterday was %A." Yesterday was Saturday. $ date -d "12:00 tomorrow" +"Tomorrow will be %A." Tomorrow will be Monday.
If you are working with months or years then it is more robust to
operate at the middle of the month. Unfortunately that isn’t
today
and so needs a little bit of calculation. Use
date +%Y-%m-15
to generate the date at the middle of the
month.
$ date --date="$(date +%Y-%m-15) -1 month" +'Last month was %B.' Last month was June. $ date --date="$(date +%Y-%m-15) +1 month" +'Next month will be %B.' Next month will be May. $ date --date="$(date +%Y-%m-15) 1 year ago" +'Last year was %Y.' Last year was 2008. $ date --date="$(date +%Y-%m-15) +1 year" +'Next year will be %Y.' Next year will be 2010.
If doing several calculations then it is possible to cross midnight between the multiple calls and produce confusing results. To avoid this take the time into a variable and then always use that variable for the calculations.
$ now=$(date +%s) $ today=$(date --date="@$now" "+%F 12:00") $ echo "Yesterday=$(date -d "$today 1 day ago" +%A), Tomorrow=$(date -d "$today 1 day" +%A)." Yesterday: Saturday, Tomorrow: Monday. $ thismonth=$(date --date="@$now" "+%Y-%m-15 12:00") $ echo "LastMonth=$(date -d "$thismonth 1 month ago" +%B), NextMonth=$(date -d "$thismonth 1 month" +%B)." LastMonth=March, NextMonth=May.
Tip: GNU date itself doesn’t include any direct support for finding days between dates. But it can be used as part of a script to find this information. The technique is to convert each date to an integer value such as a Julian Day Number or seconds since the Unix epoch 1970-01-01 00:00:00 UTC and then do take the difference and then convert the seconds to days. Use Unix seconds is very convenient due to the %s format. Here is an example.
date1="2008-01-01" date2="2010-06-21" date1seconds=$(date -d "$date1 12:00" +%s) date2seconds=$(date -d "$date2 12:00" +%s) totalseconds=$(( $date2seconds - $date1seconds )) secondsperday=86400 days=$(( $totalseconds / $secondsperday )) echo "There were $days days between $date1 and $date2"And of course that can be considerably compacted by applying the constructs inline but this is not as illustrative and so was expanded out in the above example. Here is a compact version.
$ echo Days between dates: $(( ( $(date -d "2010-06-21 12:00" +%s) - $(date -d "2008-01-01 12:00" +%s) ) / 86400 ))
The NEWS shows the following snippet which is useful and seemingly not well known.
Tip: Date values can now have leading TZ=”...” assignments that override the environment only while that date is being processed. For example, the following shell command converts from Paris to New York time:
TZ="America/New_York" date --date='TZ="Europe/Paris" 2004-10-31 06:30'
At the change of the year there are always many bug reports concerning date behaving incorrectly. This is usually due to people using a mismatch of ISO and legacy date format specifiers.
$ date -d 2008-12-31 +%Y%V 200801 $ date -d 2008-12-31 "+%G-%m-%d" 2009-12-31
The %Y and %U or %W options work in combination. (Use %U for weeks starting with Sunday or %W for weeks starting with Monday.) The ISO %G and %V options work in combination. Mixing them up creates confusion. Instead use %Y and %U/%W together or use %G and %V together.
$ date -d 2008-12-31 +%G%V 200901 $ date -d 2009-01-01 +%G%V 200901 $ date -d 2008-12-31 +%Y%U 200852 $ date -d 2009-01-01 +%Y%U 200900
Use of ISO week numbers tends to create confusion. The ISO week numbering scheme is somewhat different from calendar week numbering. ISO week numbers start on Monday of the week with the year’s first Thursday in it. See Wikipidia’s ISO 8601 page or Wikipidia’s ISO week date page for a good summary.
ISO Week Dates can be created using the following format.
$ date -d "2009-01-07 12:00:00 +0000" "+%G-W%V-%u" 2009-W02-3
See
the standards documentation for more information with regards to
date
.
Caution: Recently the appearance of different glyphs for the dash character in typsetting systems has caused some confusing when cutting and pasting examples. This can cause errors parsing the different and unexpected unicode characters that look like a dash but are not. (e.g. The dashes in 2009-01-07 might be unicode hyphens.) If you think the date looks okay but still produces a syntax error try inspecting the character set and replacing the dashes with ’-’ characters.
Urs Thuermann wrote a nice mailing list posting describing related concepts in some detail:
Documentation on specifying the timezone with TZ may be found on a GNU system in the Texinfo documentation system in the “TZ Variable” section under the “Calendar Time” chapter. On a GNU system you can get there directly with this command line.
$ info libc 'TZ Variable'
This most recent version is also available online.
See also the online POSIX documentation.
Next: How do I change the ownership or permissions of a symlink?, Previous: The date command is not working right., Up: Top
Symbolic links are created with ln -s. That creates a name
redirection. When a symlink is accessed the filesystem will take the
contents of the symlink as a redirection to another file, where the
process may recursively be continued many times. If you were meaning
ln -s a /c/b
then that would create ‘/c/b’ which would be
a relative symlink to file ‘/c/a’. If ‘/c/a’ did not exist
then this would be dangling until the file was created.
The owner, group and mode of a symlink are not significant to file access through it.
Symbolic links may use either absolute or relative paths but there are trade offs. Generally I advocate making only relative links so that the location is network independent and will work desirably across NFS mounted filesystems.
The this FreeBSD man page symlink(7) on the web includes a nice description of symlinks.
See also these mailing list threads for more discussion of this issue.
http://lists.gnu.org/archive/html/bug-coreutils/2007-12/msg00211.html http://lists.gnu.org/archive/html/bug-coreutils/2007-11/msg00006.html http://lists.gnu.org/archive/html/bug-coreutils/2011-12/msg00086.html
Next: Value too large for defined data type, Previous: ln -s did not link my files together, Up: Top
The owner, group, and permissions of a symlink are not in any way significant. Only the value of the symlink is meaningful. Regardless of that some operating systems will allow you to change the owner, group or mode of a symlink and other operating systems will not. Do not worry about it as it does not matter in any case.
Next: Tar created a Large File but I can't remove it., Previous: How do I change the ownership or permissions of a symlink?, Up: Top
It means that your version of the utilities were not compiled with large file support enabled. The GNU utilities do support large files if they are compiled to do so. You may want to compile them again and make sure that large file support is enabled. This support is automatically configured by autoconf on most systems. But it is possible that on your particular system it could not determine how to do that and therefore autoconf concluded that your system did not support large files.
The message "Value too large for defined data type" is a system error message reported when an operation on a large file is attempted using a non-large file data type. Large files are defined as anything larger than a signed 32-bit integer, or stated differently, larger than 2GB.
Many system calls that deal with files return values in a "long int" data type. On 32-bit hardware a long int is 32-bits and therefore this imposes a 2GB limit on the size of files. When this was invented that was HUGE and it was hard to conceive of needing anything that large. Time has passed and files can be much larger today. On native 64-bit systems the file size limit is usually 2GB * 2GB. Which we will again think is huge.
On a 32-bit system with a 32-bit "long int" you find that you can’t make it any bigger and also maintain compatibility with previous programs. Changing that would break many things! But many systems make it possible to switch into a new program mode which rewrites all of the file operations into a 64-bit program model. Instead of "long" they use a new data type called "off_t" which is constructed to be 64-bits in size. Program source code must be written to use the off_t data type instead of the long data type. This is typically done by defining -D_FILE_OFFSET_BITS=64 or some such. It is system dependent. Once done and once switched into this new mode most programs will support large files just fine.
See the next question if you have inadvertently created a large file and now need some way to deal with it.
Next: The 'od -x' command prints bytes in the wrong order., Previous: Value too large for defined data type, Up: Top
I created a file with tar cvf backup.tar. Trying to "rm" this file is not possible. The error message is:
rm: cannot remove `backup.tar': Value too large for defined data
What could I do to remove that file ?
Sometimes one utility such as tar
will be compiled with large
file support while another utility like rm
will be compiled
without. It happens. Which means you might find yourself with a large
file created by one utility but unable to work with it with another.
At this point we need to be clever. Find a utility that can operate on a large file and use it to truncate the file. Here are several examples of how to work around this problem. Of course in a perfect world you would recompile the utilities to support large files and not worry about needing a workaround.
This example again requires perl to be configured for large files.
perl -e 'unlink("backup.tar");'
So let’s try to hit it more directly. Truncate the file first. That will make it small and then you can remove it. The shell will do this when redirecting the output of commands.
true > backup.tar rm backup.tar
However, if your shell was not compiled for large files then the redirection will fail. In that case we have to resort to more subtle methods. Since tar created the file then tar must be configured to support large files. Use that to your advantage to truncate the file.
tar cvf backup.tar /dev/null
Next: expr 2 * 3 does not work, Previous: Tar created a Large File but I can't remove it., Up: Top
That is the required behavior depending upon the endianess of the underlying machine. You are probably mixing up words and bytes.
The -x option outputs short integers, note integers which is a word and not a byte, in the machine’s short integer format. If you are operating on a little endian machine such as an x86 then the bytes appear in ’backwords’ order. Here is what the od documentation says.
`-x' Output as hexadecimal two-byte units. Equivalent to `-t x2'.
If you require a specific byte ordering, note bytes not words, then you need to supply a byte specification such as ’od -t x1’.
echo abcdefgh > /tmp/letters od -cx /tmp/letters 0000000 a b c d e f g h \n \0 6261 6463 6665 6867 000a 0000011 od -t cx1 /tmp/letters 0000000 a b c d e f g h \n 61 62 63 64 65 66 67 68 0a 0000011
Alternatively since version 8.23 (2014), od supports the --endian option, allowing selection of a specific byte ordering.
If you search the web for “little endian big endian” you should turn up many hits for various documentation on this subject. But for sure you should read “On Holy Wars and a Plea for Peace” written by Danny Cohen and published in IEEE Computer years ago as it is a classic treatise on the subject.
Next: df and du report different information, Previous: The 'od -x' command prints bytes in the wrong order., Up: Top
The expr program appears to be broken because expr 2 * 3 produces a syntax error.
expr 2 * 3 expr: syntax error
The case shown is not quoted correctly. As such it provides a good example of incorrectly quoting shell metacharacters. The “*” is being expanded by the shell and the expr is seeing filenames creating the syntax error.
The “*” character is a special character to the command shell. It is called a glob character because the shell expands it to match filenames. It matches a glob of characters. The shell then passes the result to the command. The expr program does not see the star character in “2 * 3” but instead sees a “2” followed by every filename matched from the current directory and then finally the “3”.
Your command is really something completely different than you thought it was going to be. Use echo to see what your command line really is:
echo expr 2 * 3
You will see it matching filenames and so create the syntax error.
There are many entries in this FAQ related to command line “*” glob expansion. If you are having trouble with this concept then read through the other entries for a fresh perspective.
You need to quote shell metacharacters to prevent them from being modified by the shell. Here are some possible ways of quoting this example.
expr 2 \* 3 expr 2 '*' 3 expr 2 "*" 3
The man page for expr carries this text:
Beware that many operators need to be escaped or quoted for shells.
Next: df Size and Used and Available do not add up, Previous: expr 2 * 3 does not work, Up: Top
Why don’t df and du report the same values?
This is a sometimes confusing topic. The df
will report a
certain value for the amount of disk space used in a filesystem. The
user then tries to add up the amount of disk spaced consumed in the
directory structure using du
. The numbers are rarely
exactly the same and are sometimes quite a large amount different!
Why is this?
Fundamentally it is because df
and du
report
different information. The df
command reports overall
filesystem information. The du
command reports information
in directory hierarchies. Those are two different things and they may
report different information. In particular data that is not recorded
within a directory hierarchy will not be shown by du
.
A very common case where this occurs is when a program writes a very
large log file. What often happens is that the user notices this
large growing file and removes (unlinks) the file from the directory.
If the program that was writing the file continues running and
continues to maintain a handle to that file then the file will
continue to exist in the filesystem. This will continue to be
reported by df
but will no longer be accessible through the
directory hierarchy and du
.
James Youngman contributed this nice explanation to the mailing list.
Another explanation written to the mailing list.
A related question is, “Why does simply trying to add up all of the disk space used by files not result in the amount of disk space consumed in the filesystem?” Because some disk space is lost in filesystem overhead such as in block fragments and filesystem inodes. This is especially noticeable in some filesystems with large block sizes when there are many very small files.
Next: Old tail plus N syntax now fails, Previous: df and du report different information, Up: Top
The df
report simply does not add up? Why not?
$ df / Filesystem 1K-blocks Used Available Use% Mounted on /dev/sda1 40559188 35847132 2651768 94% / $ df -h / Filesystem Size Used Avail Use% Mounted on /dev/sda1 39G 35G 2.6G 94% /
The most natural thing in the world is to add the values of Used plus Available and expect to have a result that equals Size. But as we see here 35847132 plus 2651768 is not equal to 40559188 and is missing aproximately 2G of disk. Where did it go?
This data is in the minfree
percentage of reserved filesystem
disk blocks. A typical filesystem value for minfree
is 5%
reserved to superuser processes. Root can make use of all of the disk
space but non-root processes will be restricted by the minfree
value. If a user or user process fills up a partition the root user
can still create files within the provided space.
Additionally modern filesystems attempt to control the amount of disk
fragmentation automatically. This requires that there is sufficient
free disk space available. When the filesystem is operated very close
to 100% full then undesirable fragmentation is increased. This may
significantly decrease disk performance. Keeping a minfree
reserved is one way to ensure a sufficient amount of disk space for
the filesystem to operate at high efficiency.
In this example 5% of 39G is reserved and not included in the Available value. 39G * 5% is about 1.9G minfree. 35G used plus 2.6G available plus 1.9G minfree is aproximately 39G and equal to the size of the filesystem.
The tunefs
command using the tunefs -m NUM option is
the traditional command to adjust the filesystem minfree value. More
information may be found in the manual pages and documentation for
that command.
Next: join requires sorted input files, Previous: df Size and Used and Available do not add up, Up: Top
Old tail syntax tail +N FILE
now fails.
$ tail +5 FILENAME tail: cannot open `+5' for reading: No such file or directory ==> FILENAME <==
The problem is that “+” is a valid filename character and is not an expected option character. Options are supposed to start with a “-” character. It has always been a bad idea to eat up an additional character as an additional option specifier. This is now no longer allowed by the standards. The tail command is now required to respect that “+” is not an option and treat it as a filename.
This is is documented in the NEWS file.
A few usages still have behavior that depends on which POSIX standard is being conformed to, and portable applications should beware these problematic usages. These include: Problematic Standard-conforming replacement, depending on usage whether you prefer the behavior of: POSIX 1003.2-1992 POSIX 1003.1-2001 sort +4 sort -k 5 sort ./+4 tail +4 tail -n +4 tail ./+4 tail - f tail f [see (*) below] tail -c 4 tail -c 10 ./4 tail -c4 touch 12312359 f touch -t 12312359 f touch ./12312359 f uniq +4 uniq -s 4 uniq ./+4 (*) "tail - f" does not conform to POSIX 1003.1-2001; to read standard input and then "f", use the command "tail -- - f". These changes are in response to decisions taken in the January 2005 Austin Group standardization meeting. For more details, please see "Utility Syntax Guidelines" in the Minutes of the January 2005 Meeting http://www.opengroup.org/austin/docs/austin_239.html.
Also the coreutils info manual has the following documentation.
The GNU utilities normally conform to the version of POSIX that is standard for your system. To cause them to conform to a different version of POSIX, define the
_POSIX2_VERSION
environment variable to a value of the form yyyymm specifying the year and month the standard was adopted. Two values are currently supported for_POSIX2_VERSION
: ‘199209’ stands for POSIX 1003.2-1992, and ‘200112’ stands for POSIX 1003.1-2001. For example, if you have a newer system but are running software that assumes an older version of POSIX and uses ‘sort +1’ or ‘tail +10’, you can work around any compatibility problems by setting ‘_POSIX2_VERSION=199209’ in your environment.
What does this mean? It means that you should avoid tail +4
,
since it might mean either tail ./+4
or tail -n +4
. It
also means that there are several ways of dealing with this problem.
tail -n +5 SOMEFILE
Use the “-n” option to specify the number of lines.
_POSIX2_VERSION=199209 tail +5
Use the _POSIX2_VERSION
environment variable to specify the
level of standard conformance that is desired.
_POSIX2_VERSION=200112 tail +5
Use the _POSIX2_VERSION
environment variable to specify the
level of standard conformance that is desired.
The environment variable may be set in the environment and the effect
will be global. Note that setting _POSIX2_VERSION
will also
affect the behavior of sort
, uniq
, touch
and
perhaps other programs.
_POSIX2_VERSION=199209 export _POSIX2_VERSION tail +5
In many ways for portable scripts it is better to avoid using
tail
at all and use other standard utilities. For example
use sed
instead of tail
. Use of sed
is
very portable.
Instead of tail +5
use either:
sed '1,4d'
Or use:
sed -n '5,$p'
Next: cp and mv the reply option is deprecated, Previous: Old tail plus N syntax now fails, Up: Top
A common misconception is that join
works on arbitrary input
files. But join
requires that the input files be sorted. The
join --help
documentation says:
Important: FILE1 and FILE2 must be sorted on the join fields. E.g., use ` sort -k 1b,1 ' if `join' has no options, or use ` join -t '' ' if `sort' has no options. Note, comparisons honor the rules specified by `LC_COLLATE'.
Since coreutils release 6.11 (2008-04-19), if the input is not sorted and some lines cannot be joined, a warning message will be given.
Next: uname is system specific, Previous: join requires sorted input files, Up: Top
UPDATE: The –reply option has been removed in release 7.1.
In coreutils release 5.90 (2005-09-29) the cp
and
mv --reply=X
option was deprecated. There was a discussion
of the problems and issues on the bug-coreutils mailing list. As a
result of the discussion it was then offically deprecated. The
discussion may be found here in the mailing list archives:
http://lists.gnu.org/archive/html/bug-coreutils/2005-06/msg00160.html http://lists.gnu.org/archive/html/bug-coreutils/2005-07/msg00009.html
The ‘--reply’ options only controlled the action when prompts would have been asked. It did not have any effect for when the program would not have prompted at all. Meaning that many users called –reply=no in a non-interactive use such as through a script by cron and since it wouldn’t prompt in that case it never got the “no” answer and would still overwrite the target file! Clearly that was not what was intended and very unfortunate behavior. This was so confusing to people that the ‘--reply’ option has been deprecated and the option is scheduled for removal in the future. It did not do what you want it to do anyway.
In the 7.1 release the ‘-n’, ‘--no-clobber’ option was introduced. It does what you wanted the other to do but didn’t.
Here is the NEWS entry.
cp and mv: the --reply={yes,no,query} option has been removed. Using it has elicited a warning for the last three years. cp and mv accept a new option, --no-clobber (-n): silently refrain from overwriting any existing destination file
A good alternative is to use the rsync
command using the
--ignore-existing option.
$ rsync -a --ignore-existing file1 file2
Previous: cp and mv the reply option is deprecated, Up: Top
Due to the peculiarities of the Linux kernel there are often questions
like this one concerning the uname -r -v
output.
The man page for uname says:
-r, --kernel-release print the kernel release -v, --kernel-version print the kernel versionBut the Linux kernel seems to return the opposite value for each.
$ uname -r 2.6.32-431.1.2.0.1.el6.x86_64 $ uname -v #1 SMP Fri Dec 13 13:06:13 UTC 2013Isn’t that a bug?
This is not a bug in the uname command. The uname(3) command line
utility is simply a thin wrapper around the uname(2) system call. It
displays the kernel information stored in the kernel’s struct
utsname
data structure. That information is determined by the
operating system kernel not the uname(3) command line display tool.
The short answer is that the kernel has chosen to store that specific data and it isn’t anything that the uname display program can change. If you want it changed then it must be changed in the kernel.
The long answer is that the uname command is a very old command and has been around a very long time. The utility of the command itself dates way back into the history of Unix and the origins of communication between different hosts. (If anyone can point me to a good reference on the original history of this please contact me.) The system name and host names are limited to eight characters which is smaller than the full field width available to those fields today. Longer strings will be truncated. The command has always been very system specific. The output of uname is not portable across different systems.
The only portable way to use uname(1) is to call it first without arguments to see which system name it returns and then after knowing the system type then call it again with whatever options make sense on that system.
A recent log of the discussion in the bug tracker includes more information and links to duplicate bug reports on the same topic.
At the time of this writing someone at Wikipedia has documented a list of uname outputs from various systems. I think that is unmanageable and at some point in the future I expect a deletionist will delete it. But in the meantime a table of various system outputs is there.
Documentation on the uname(2) system call and the struct
utsname
data structure may be found on a GNU system in the Texinfo
documentation system in the “Platform Type” section under the
“System Management” chapter. On a GNU system you can get there
directly with this command line.
$ info libc 'Platform Type'
This most recent version is also available online.