Chapter 5 Troubleshooting

5.1. Why is FreeBSD finding the wrong amount of memory on i386™ hardware?
5.2. Why do my programs occasionally die with “Signal 11” errors?
5.3. My system crashes with either “Fatal trap 12: page fault in kernel mode”, or “panic:”, and spits out a bunch of information. What should I do?
5.4. Why do I get the error “maxproc limit exceeded by uid %i, please see tuning(7) and login.conf(5)”?
5.5. Why does sendmail give me an error reading “mail loops back to myself”?
5.6. Why do full screen applications on remote machines misbehave?
5.7. Why does it take so long to connect to my computer via ssh or telnet?
5.8. Why does “file: table is full” show up repeatedly in dmesg(8)?
5.9. Why does the clock on my computer keep incorrect time?
5.10. What does the error “swap_pager: indefinite wait buffer:” mean?
5.11. What is a “lock order reversal”?
5.12. What does “Called ... with the following non-sleepable locks held” mean?
5.13. Why does buildworld/installworld die with the message “touch: not found”?

5.1. Why is FreeBSD finding the wrong amount of memory on i386™ hardware?

The most likely reason is the difference between physical memory addresses and virtual addresses.

The convention for most PC hardware is to use the memory area between 3.5 GB and 4 GB for a special purpose (usually for PCI). This address space is used to access PCI hardware. As a result real, physical memory can not be accessed by that address space.

What happens to the memory that should appear in that location is dependent on your hardware. Unfortunately, some hardware does nothing and the ability to use that last 500 MB of RAM is entirely lost.

Luckily, most hardware remaps the memory to a higher location so that it can still be used. However, this can cause some confusion if you watch the boot messages.

On a 32-bit version of FreeBSD, the memory appears lost, since it will be remapped above 4 GB, which a 32-bit kernel is unable to access. In this case, the solution is to build a PAE enabled kernel. See the entry on memory limits and about different memory limits on different platforms for more information.

On a 64-bit version of FreeBSD, or when running a PAE-enabled kernel, FreeBSD will correctly detect and remap the memory so it is usable. During boot, however, it may seem as if FreeBSD is detecting more memory than the system really has, due to the described remapping. This is normal and the available memory will be corrected as the boot process completes.

5.2. Why do my programs occasionally die with “Signal 11” errors?

Signal 11 errors are caused when your process has attempted to access memory which the operating system has not granted it access to. If something like this is happening at seemingly random intervals then you need to start investigating things very carefully.

These problems can usually be attributed to either:

  1. If the problem is occurring only in a specific application that you are developing yourself it is probably a bug in your code.

  2. If it is a problem with part of the base FreeBSD system, it may also be buggy code, but more often than not these problems are found and fixed long before us general FAQ readers get to use these bits of code (that is what -current is for).

In particular, a dead giveaway that this is not a FreeBSD bug is if you see the problem when you are compiling a program, but the activity that the compiler is carrying out changes each time.

For example, suppose you are running make buildworld, and the compile fails while trying to compile ls.c into ls.o. If you then run make buildworld again, and the compile fails in the same place then this is a broken build — try updating your sources and try again. If the compile fails elsewhere then this is almost certainly hardware.

What you should do:

In the first case you can use a debugger e.g., gdb(1) to find the point in the program which is attempting to access a bogus address and then fix it.

In the second case you need to verify that it is not your hardware at fault.

Common causes of this include:

  1. Your hard disks might be overheating: Check the fans in your case are still working, as your disk (and perhaps other hardware might be overheating).

  2. The processor running is overheating: This might be because the processor has been overclocked, or the fan on the processor might have died. In either case you need to ensure that you have hardware running at what it is specified to run at, at least while trying to solve this problem (in other words, clock it back to the default settings.)

    If you are overclocking then note that it is far cheaper to have a slow system than a fried system that needs replacing! Also the wider community is not often sympathetic to problems on overclocked systems, whether you believe it is safe or not.

  3. Dodgy memory: If you have multiple memory SIMMS/DIMMS installed then pull them all out and try running the machine with each SIMM or DIMM individually and narrow the problem down to either the problematic DIMM/SIMM or perhaps even a combination.

  4. Over-optimistic Motherboard settings: In your BIOS settings, and some motherboard jumpers you have options to set various timings, mostly the defaults will be sufficient, but sometimes, setting the wait states on RAM too low, or setting the “RAM Speed: Turbo” option, or similar in the BIOS will cause strange behavior. A possible idea is to set to BIOS defaults, but it might be worth noting down your settings first!

  5. Unclean or insufficient power to the motherboard. If you have any unused I/O boards, hard disks, or CD-ROMs in your system, try temporarily removing them or disconnecting the power cable from them, to see if your power supply can manage a smaller load. Or try another power supply, preferably one with a little more power (for instance, if your current power supply is rated at 250 Watts try one rated at 300 Watts).

You should also read the SIG11 FAQ (listed below) which has excellent explanations of all these problems, albeit from a Linux® viewpoint. It also discusses how memory testing software or hardware can still pass faulty memory.

Finally, if none of this has helped it is possible that you have just found a bug in FreeBSD, and you should follow the instructions to send a problem report.

There is an extensive FAQ on this at the SIG11 problem FAQ.

5.3. My system crashes with either “Fatal trap 12: page fault in kernel mode”, or “panic:”, and spits out a bunch of information. What should I do?

The FreeBSD developers are very interested in these errors, but need some more information than just the error you see. Copy your full crash message. Then consult the FAQ section on kernel panics, build a debugging kernel, and get a backtrace. This might sound difficult, but you do not need any programming skills; you just have to follow the instructions.

5.4. Why do I get the error “maxproc limit exceeded by uid %i, please see tuning(7) and login.conf(5)”?

The FreeBSD kernel will only allow a certain number of processes to exist at one time. The number is based on the kern.maxusers sysctl(8) variable. kern.maxusers also affects various other in-kernel limits, such as network buffers. If your machine is heavily loaded, you probably want to increase kern.maxusers. This will increase these other system limits in addition to the maximum number of processes.

To adjust your kern.maxusers value, see the File/Process Limits section of the Handbook. (While that section refers to open files, the same limits apply to processes.)

If your machine is lightly loaded, and you are simply running a very large number of processes, you can adjust this with the kern.maxproc tunable. If this tunable needs adjustment it needs to be defined in /boot/loader.conf. The tunable will not get adjusted until the system is rebooted. For more information about tuning tunables, see loader.conf(5). If these processes are being run by a single user, you will also need to adjust kern.maxprocperuid to be one less than your new kern.maxproc value. (It must be at least one less because one system program, init(8), must always be running.)

5.5. Why does sendmail give me an error reading “mail loops back to myself”?

You can find a detailed answer for this question in the Handbook.

5.6. Why do full screen applications on remote machines misbehave?

The remote machine may be setting your terminal type to something other than the cons25 terminal type required by the FreeBSD console.

There are a number of possible work-arounds for this problem:

  • After logging on to the remote machine, set your TERM shell variable to ansi or sco if the remote machine knows about these terminal types.

  • Use a VT100 emulator like screen at the FreeBSD console. screen offers you the ability to run multiple concurrent sessions from one terminal, and is a neat program in its own right. Each screen window behaves like a VT100 terminal, so the TERM variable at the remote end should be set to vt100.

  • Install the cons25 terminal database entry on the remote machine. The way to do this depends on the operating system on the remote machine. The system administration manuals for the remote system should be able to help you here.

  • Fire up an X server at the FreeBSD end and login to the remote machine using an X based terminal emulator such as xterm or rxvt. The TERM variable at the remote host should be set to xterm or vt100.

5.7. Why does it take so long to connect to my computer via ssh or telnet?

The symptom: there is a long delay between the time the TCP connection is established and the time when the client software asks for a password (or, in telnet(1)'s case, when a login prompt appears).

The problem: more likely than not, the delay is caused by the server software trying to resolve the client's IP address into a hostname. Many servers, including the Telnet and SSH servers that come with FreeBSD, do this to store the hostname in a log file for future reference by the administrator.

The remedy: if the problem occurs whenever you connect from your computer (the client) to any server, the problem is with the client; likewise, if the problem only occurs when someone connects to your computer (the server) the problem is with the server.

If the problem is with the client, the only remedy is to fix the DNS so the server can resolve it. If this is on a local network, consider it a server problem and keep reading; conversely, if this is on the global Internet, you will most likely need to contact your ISP and ask them to fix it for you.

If the problem is with the server, and this is on a local network, you need to configure the server to be able to resolve address-to-hostname queries for your local address range. See the hosts(5) and named(8) manual pages for more information. If this is on the global Internet, the problem may be that your server's resolver is not functioning correctly. To check, try to look up another host — say, www.yahoo.com. If it does not work, that is your problem.

Following a fresh install of FreeBSD, it is also possible that domain and name server information is missing from /etc/resolv.conf. This will often cause a delay in SSH, as the option UseDNS is set to yes by default in /etc/ssh/sshd_config. If this is causing the problem, you will either need to fill in the missing information in /etc/resolv.conf or set UseDNS to no in sshd_config as a temporary workaround.

5.8. Why does “file: table is full” show up repeatedly in dmesg(8)?

This error message indicates you have exhausted the number of available file descriptors on your system. Please see the kern.maxfiles section of the Tuning Kernel Limits section of the Handbook for a discussion and solution.

5.9. Why does the clock on my computer keep incorrect time?

Your computer has two or more clocks, and FreeBSD has chosen to use the wrong one.

Run dmesg(8), and check for lines that contain Timecounter. The one with the highest quality value that FreeBSD chose.

# dmesg | grep Timecounter
Timecounter "i8254" frequency 1193182 Hz quality 0
Timecounter "ACPI-fast" frequency 3579545 Hz quality 1000
Timecounter "TSC" frequency 2998570050 Hz quality 800
Timecounters tick every 1.000 msec

You can confirm this by checking the kern.timecounter.hardware sysctl(3).

# sysctl kern.timecounter.hardware
kern.timecounter.hardware: ACPI-fast

It may be a broken ACPI timer. The simplest solution is to disable the ACPI timer in /boot/loader.conf:

debug.acpi.disabled="timer"

Or the BIOS may modify the TSC clock—perhaps to change the speed of the processor when running from batteries, or going into a power saving mode, but FreeBSD is unaware of these adjustments, and appears to gain or lose time.

In this example, the i8254 clock is also available, and can be selected by writing its name to the kern.timecounter.hardware sysctl(3).

# sysctl kern.timecounter.hardware=i8254
kern.timecounter.hardware: TSC -> i8254

Your computer should now start keeping more accurate time.

To have this change automatically run at boot time, add the following line to /etc/sysctl.conf:

kern.timecounter.hardware=i8254

5.10. What does the error “swap_pager: indefinite wait buffer:” mean?

This means that a process is trying to page memory to disk, and the page attempt has hung trying to access the disk for more than 20 seconds. It might be caused by bad blocks on the disk drive, disk wiring, cables, or any other disk I/O-related hardware. If the drive itself is actually bad, you will also see disk errors in /var/log/messages and in the output of dmesg. Otherwise, check your cables and connections.

5.11. What is a “lock order reversal”?

The FreeBSD kernel uses a number of resource locks to arbitrate contention for certain resources. When multiple kernel threads try to obtain multiple resource locks, there's always the potential for a deadlock, where two threads have each obtained one of the locks and blocks forever waiting for the other thread to release one of the other locks. This sort of locking problem can be avoided if all threads obtain the locks in the same order.

A run-time lock diagnostic system called witness(4), enabled in FreeBSD-CURRENT and disabled by default for stable branches and releases, detects the potential for deadlocks due to locking errors, including errors caused by obtaining multiple resource locks with a different order from different parts of the kernel. The witness(4) framework tries to detect this problem as it happens, and reports it by printing a message to the system console about a “lock order reversal” (often referred to also as LOR).

It is possible to get false positives, as witness(4) is conservative. A true positive report does not mean that a system is dead-locked; instead it should be understood as a warning of the form “if you were unlucky, a deadlock would have happened here”.

Note: Problematic LORs tend to get fixed quickly, so check http://lists.FreeBSD.org/mailman/listinfo/freebsd-current before posting to the mailing lists.

5.12. What does “Called ... with the following non-sleepable locks held” mean?

This means that a function that may sleep was called while a mutex (or other unsleepable) lock was held.

The reason this is an error is because mutexes are not intended to be held for long periods of time; they are supposed to only be held to maintain short periods of synchronization. This programming contract allows device drivers to use mutexes to synchronize with the rest of the kernel during interrupts. Interrupts (under FreeBSD) may not sleep. Hence it is imperative that no subsystem in the kernel block for an extended period while holding a mutex.

To catch such errors, assertions may be added to the kernel that interact with the witness(4) subsystem to emit a warning or fatal error (depending on the system configuration) when a potentially blocking call is made while holding a mutex.

In summary, such warnings are non-fatal, however with unfortunate timing they could cause undesirable effects ranging from a minor blip in the system's responsiveness to a complete system lockup.

For additional information about locking in FreeBSD see locking(9).

5.13. Why does buildworld/installworld die with the message “touch: not found”?

This error does not mean that the touch(1) utility is missing. The error is instead probably due to the dates of the files being set sometime in the future. If your CMOS-clock is set to local time you need to run the command adjkerntz -i to adjust the kernel clock when booting into single user mode.