Use strace/truss to debug system problems

David Simmons
August 24, 1997

THE GREATEST SYSTEM ADMINISTRATION UTILITY

The UNIX strace program (or truss under Solaris, and par under IRIX) is essential for debugging many system problems. strace is a system call tracer, which displays each system call a process performs and the arguments passed. strace can spawn a new program to trace, or it can be invoked on an existing process.

There are many instances when a program freezes or otherwise fails to work, and offers few clues as to the problem. Sometimes a daemon process fails to perform its duty. Occasionally you may need to spy on the workings of a network program to see what order it opens sockets, or reads configuration files. strace is great for these occasions.

USING STRACE

strace is invoked with the program to be traced as the argument, or with the "-p" option with a PID number to trace a process that is already running. The "-s" option to set the string size is highly recommended if you need to examine the input and output of a program.

One recent "real world" application of strace that I encountered was fixing a database problem. An mSQL database I maintain was failing occasionally. By examining the mSQL server process with strace, I was able to watch the server iterate through database entries and freeze consistently at a specific point in the database. I was able to determine that the problem was a corrupted item in the database, and once I removed that item, all the problems went away.

REFERENCES

strace-3.1.0.1.tar.gz (172K)
This is the free strace package currently maintained by Rick Sladkey <jrs@world.std.com>. It works on the following systems: sparc-sun-sunos4.1, i486-linux, sparc-sun-solaris2, i486-ncr-sysv4, i486-sun-solaris2, m68k-linux, mips-sgi-irix5, and alpha-linux.

David Simmons
send mail