Throughout the kernel there are access restrictions relating to jailed processes. Usually, these restrictions only check whether the process is jailed, and if so, returns an error. For example:
if (jailed(td->td_ucred)) return (EPERM);
System V IPC is based on messages. Processes can send each other these messages which tell them how to act. The functions which deal with messages are: msgctl(3), msgget(3), msgsnd(3) and msgrcv(3). Earlier, I mentioned that there were certain sysctls you could turn on or off in order to affect the behavior of jail. One of these sysctls was security.jail.sysvipc_allowed. By default, this sysctl is set to 0. If it were set to 1, it would defeat the whole purpose of having a jail; privileged users from the jail would be able to affect processes outside the jailed environment. The difference between a message and a signal is that the message only consists of the signal number.
/usr/src/sys/kern/sysv_msg.c:
msgget(key, msgflg): msgget returns (and possibly creates) a message descriptor that designates a message queue for use in other functions.
msgctl(msgid, cmd, buf): Using this function, a process can query the status of a message descriptor.
msgsnd(msgid, msgp, msgsz, msgflg): msgsnd sends a message to a process.
msgrcv(msgid, msgp, msgsz, msgtyp, msgflg): a process receives messages using this function
In each of the system calls corresponding to these functions, there is this conditional:
/usr/src/sys/kern/sysv_msg.c: if (!jail_sysvipc_allowed && jailed(td->td_ucred)) return (ENOSYS);
Semaphore system calls allow processes to synchronize execution by doing a set of operations atomically on a set of semaphores. Basically semaphores provide another way for processes lock resources. However, process waiting on a semaphore, that is being used, will sleep until the resources are relinquished. The following semaphore system calls are blocked inside a jail: semget(2), semctl(2) and semop(2).
/usr/src/sys/kern/sysv_sem.c:
semctl(semid, semnum, cmd, ...): semctl does the specified cmd on the semaphore queue indicated by semid.
semget(key, nsems, flag): semget creates an array of semaphores, corresponding to key.
key and flag take on the same meaning as they do in msgget.
semop(semid, array, nops): semop performs a group of operations indicated by array, to the set of semaphores identified by semid.
System V IPC allows for processes to share memory. Processes can communicate directly with each other by sharing parts of their virtual address space and then reading and writing data stored in the shared memory. These system calls are blocked within a jailed environment: shmdt(2), shmat(2), shmctl(2) and shmget(2).
/usr/src/sys/kern/sysv_shm.c:
shmctl(shmid, cmd, buf): shmctl does various control operations on the shared memory region identified by shmid.
shmget(key, size, flag): shmget accesses or creates a shared memory region of size bytes.
shmat(shmid, addr, flag): shmat attaches a shared memory region identified by shmid to the address space of a process.
shmdt(addr): shmdt detaches the shared memory region previously attached at addr.
Jail treats the socket(2) system call and related lower-level socket functions in a special manner. In order to determine whether a certain socket is allowed to be created, it first checks to see if the sysctl security.jail.socket_unixiproute_only is set. If set, sockets are only allowed to be created if the family specified is either PF_LOCAL, PF_INET or PF_ROUTE. Otherwise, it returns an error.
/usr/src/sys/kern/uipc_socket.c: int socreate(int dom, struct socket **aso, int type, int proto, struct ucred *cred, struct thread *td) { struct protosw *prp; ... if (jailed(cred) && jail_socket_unixiproute_only && prp->pr_domain->dom_family != PF_LOCAL && prp->pr_domain->dom_family != PF_INET && prp->pr_domain->dom_family != PF_ROUTE) { return (EPROTONOSUPPORT); } ... }
The Berkeley Packet Filter provides a raw interface to data link layers in a protocol independent fashion. BPF is now controlled by the devfs(8) whether it can be used in a jailed environment.
There are certain protocols which are very common, such as TCP, UDP, IP and ICMP. IP and ICMP are on the same level: the network layer 2. There are certain precautions which are taken in order to prevent a jailed process from binding a protocol to a certain address only if the nam parameter is set. nam is a pointer to a sockaddr structure, which describes the address on which to bind the service. A more exact definition is that sockaddr "may be used as a template for referring to the identifying tag and length of each address". In the function in_pcbbind_setup(), sin is a pointer to a sockaddr_in structure, which contains the port, address, length and domain family of the socket which is to be bound. Basically, this disallows any processes from jail to be able to specify the address that does not belong to the jail in which the calling process exists.
/usr/src/sys/netinet/in_pcb.c: int in_pcbbind_setup(struct inpcb *inp, struct sockaddr *nam, in_addr_t *laddrp, u_short *lportp, struct ucred *cred) { ... struct sockaddr_in *sin; ... if (nam) { sin = (struct sockaddr_in *)nam; ... if (sin->sin_addr.s_addr != INADDR_ANY) if (prison_ip(cred, 0, &sin->sin_addr.s_addr)) return(EINVAL); ... if (lport) { ... if (prison && prison_ip(cred, 0, &sin->sin_addr.s_addr)) return (EADDRNOTAVAIL); ... } } if (lport == 0) { ... if (laddr.s_addr != INADDR_ANY) if (prison_ip(cred, 0, &laddr.s_addr)) return (EINVAL); ... } ... if (prison_ip(cred, 0, &laddr.s_addr)) return (EINVAL); ... }
You might be wondering what function prison_ip() does. prison_ip() is given three arguments, a pointer to the credential(represented by cred), any flags, and an IP address. It returns 1 if the IP address does NOT belong to the jail or 0 otherwise. As you can see from the code, if it is indeed an IP address not belonging to the jail, the protocol is not allowed to bind to that address.
/usr/src/sys/kern/kern_jail.c: int prison_ip(struct ucred *cred, int flag, u_int32_t *ip) { u_int32_t tmp; if (!jailed(cred)) return (0); if (flag) tmp = *ip; else tmp = ntohl(*ip); if (tmp == INADDR_ANY) { if (flag) *ip = cred->cr_prison->pr_ip; else *ip = htonl(cred->cr_prison->pr_ip); return (0); } if (tmp == INADDR_LOOPBACK) { if (flag) *ip = cred->cr_prison->pr_ip; else *ip = htonl(cred->cr_prison->pr_ip); return (0); } if (cred->cr_prison->pr_ip != tmp) return (1); return (0); }
Even root users within the jail are not allowed to unset or modify any file flags, such as immutable, append-only, and undeleteable flags, if the securelevel is greater than 0.
/usr/src/sys/ufs/ufs/ufs_vnops.c: static int ufs_setattr(ap) ... { ... if (!priv_check_cred(cred, PRIV_VFS_SYSFLAGS, 0)) { if (ip->i_flags & (SF_NOUNLINK | SF_IMMUTABLE | SF_APPEND)) { error = securelevel_gt(cred, 0); if (error) return (error); } ... } } /usr/src/sys/kern/kern_priv.c int priv_check_cred(struct ucred *cred, int priv, int flags) { ... error = prison_priv_check(cred, priv); if (error) return (error); ... } /usr/src/sys/kern/kern_jail.c int prison_priv_check(struct ucred *cred, int priv) { ... switch (priv) { ... case PRIV_VFS_SYSFLAGS: if (jail_chflags_allowed) return (0); else return (EPERM); ... } ... }