2 The DIGITAL UNIX Scheduler

On a single-processor system, only one process's code is executing at a time. Which process has control of the CPU is decided by the scheduler. The scheduler chooses which process should execute based on priority, therefore the highest priority process will be the one that is executing.

The scheduler has 64 priority levels; every process on the system is at one of these priority levels. The priority level at which a process is allowed to execute, its scheduling interactions with other processes at that level, and if or how it moves between priority levels are determined by its scheduling policy.

DIGITAL UNIX provides two interfaces to the scheduler: the traditional UNIX timesharing interface (nice) and the POSIX 1003.1b realtime execution scheduling interface.

This chapter includes the following sections:

Scheduler Fundamentals, Section 2.1

Scheduling Policies, Section 2.2

Process Priorities, Section 2.3

Scheduling Functions, Section 2.4

Priority and Policy Example, Section 2.5

2.1 Scheduler Fundamentals

The terms and mechanisms needed to understand the DIGITAL UNIX scheduler are explained in the following sections.

2.1.1 Schedulable Entities

The scheduler operates on threads. A thread is a single, sequential flow of control within a process. Within a single thread, there is a single point of execution. Most traditional processes consist of a single thread.

Using DECthreads, DIGITAL's multithreading run-time library, a programmer can create several threads within a process. Threads execute independently, and within a multithreaded process, each thread has its own point of execution.

The scheduler considers all threads on the system and runs the one with the highest priority.

2.1.2 Thread States

Every thread has a state. The thread currently executing in the CPU is in the run state. Threads that are ready to run are in the runnable state. Threads that are waiting for a condition to be satisfied are in the wait state. Examples of conditions a thread may be waiting for are a signal from another process, a timer expiration, or an I/O completion.

The scheduler selects the highest priority thread in the running or runnable state to execute on the CPU. Thus the running thread will always be the one with the highest priority.

2.1.3 Scheduler Database

All runnable threads have entries in the scheduler database. The scheduler database is an array of 64 lists, one list for each priority level.

The scheduler orders the processes on each priority level list by placing the process that should run next at the head of the list, and the process that should wait the longest to run at the tail of the list.

2.1.4 Quantum

Each thread has a value associated with it, known as a quantum, that defines the maximum amount of contiguous CPU time it may use before being forced to yield the CPU to another thread of the same priority.

A thread's quantum is set according to its scheduling policy. The goal of the timesharing policy is to choose a short enough time so that multiple users all think the system is responsive while allowing a long enough time to do useful work. Some realtime policies have an infinite quantum since the work to be done is considered so important that it should not be interrupted by a process of equal priority.

2.1.5 Scheduler Transitions

A new thread is selected to run when one of the following events occurs:

The running process enters a wait state

A higher priority process becomes runnable

A process changes scheduling policy

The quantum of the running process expires

When an event occurs, the scheduler updates the scheduler database. If a thread in the database now has priority higher than that of the currently running thread, the current thread is preempted, placed into the scheduler database, and the highest priority thread is made the running thread. A scheduler that works in this manner is known as a preemptive priority scheduler.

When a thread is placed into a priority list in the scheduler database, it is placed at the tail of the list unless it has just been preempted. If it has just been preempted, the threads scheduling policy determines whether it is inserted at the head (realtime scheduling policy) or the tail (timeshare scheduling policy).

Figure 2-1 illustrates the general principles of process scheduling.

Figure 2-1: Order of Execution

Processes A, B, and C are in the process list for the highest priority used in this illustration. Process A is at the beginning of the process list for priority 30. That means that process A executes first, then processes B and C, respectively. When no more processes remain in the process list for priority 30, the scheduler looks to the next lowest priority, finds process D at the beginning of the process list, and executes process D.

When a process changes priority, it goes to the end of the process list for its new priority. Figure 2-1 shows process F changing priority from 15 to 30. At priority 15 process F is at the end of the process list. When process F changes to priority 30, the process goes to the end of the process list for priority 30. At priority 30 process F is queued to execute after process C, but before process D.

Figure 2-2 illustrates how processes can change from the running state to the runnable state within the queue for a single priority. In this illustration, processes running under the SCHED_RR scheduling policy move in and out of the running state.

Figure 2-2: Process Events

As processes are selected to run or move from the end to the beginning of the process list, the scheduler continually updates the kernel database and the process list for each priority.

2.2 Scheduling Policies

Whether or not a timesharing process runs is often determined not by the needs of the application, but by the scheduler's algorithm. The scheduler determines the order in which processes execute and sometimes forces resource-intensive processes to yield to other processes.

Other users' activities on the system at that time affect scheduling. Whether or not a realtime process yields to another process can be based on a quantum or the scheduling policy.

2.2.1 The Nature of the Work

Scheduling policies are designed to give you flexibility and control in determining how work is performed so that you can balance the nature of the work with the behavior of the process. Essentially, there are three broad categories of work:

Timesharing Processing
Used for interactive and noninteractive applications with no critical time limits but a need for reasonable response time and high throughput.

System Processing
Used for work on behalf of the system such as paging, networking, and accessing files. The responsiveness of system processing impacts the responsiveness of the whole system.

Realtime Processing
Used for critical work that must be completed within a certain time period, such as data collection or device control. The nature of realtime processing often means that missing a deadline makes the data invalid or causes damage.

To control scheduling policies, you must use P1003.1b realtime scheduling functions and select an appropriate scheduling policy for your process. DIGITAL UNIX P1003.1b scheduling policies are set only through a call to the sched_setscheduler function. The sched_setscheduler function recognizes the scheduling policies by keywords beginning with SCHED_ as follows:

Keyword	Description
SCHED_OTHER	Timesharing scheduling
SCHED_FIFO	First-in first-out scheduling
SCHED_RR	Round-robin scheduling

All three scheduling policies have overlapping priority ranges to allow for maximum flexibility in scheduling. When selecting a priority and scheduling policy for a realtime process, consider the nature of the work performed by the process. Regardless of the scheduling policy, the scheduler selects the process at the beginning of the highest-priority, nonempty process list to become a running process.

2.2.2 Timesharing Scheduling

The P1003.1b timesharing scheduling policy, SCHED_OTHER, allows realtime applications to return to a nonrealtime scheduling policy. In timesharing scheduling, a process starts with an initial priority that either the user or the scheduler can change. Timesharing processes run until the scheduler recalculates process priority, based on the system load, the length of time the process has been running, or the value of nice. Section 2.3.1 describes timesharing priority changes in more detail.

Under the timesharing scheduling policy, the scheduler enforces a quantum. Processes are allowed to run until they are preempted, yield to another process, or finish their quantum. If no equal or higher-priority processes are waiting to run, the executing process is allowed to continue. However, while a process is running, the scheduler changes the process's priority. Over time, it is likely that a higher-priority process will exist because the scheduler adjusts priority. If a process is preempted or yields to another process, it goes to the end of the process list for the new priority.

2.2.3 Fixed-Priority Scheduling

With a fixed-priority scheduling policy, the scheduler does not adjust process priorities. If the application designer sets a process at priority 30, it will always be queued to the priority 30 process list, unless the application or the user explicitly changes the priority.

As with all scheduling policies, fixed-priority scheduling is based on the priorities of all runnable processes. If a process waiting on the process list has a higher priority than the running process, the running process is preempted for the higher-priority process. However, the two fixed-priority scheduling policies (SCHED_FIFO and SCHED_RR) allow greater control over the length of time a process waits to run.

Fixed-priority scheduling relies on the application designer or user to manage the efficiency of process priorities relative to system workloads. For example, you may have a process that must be allowed to finish executing, regardless of other activities. In this case, you may elect to increase the priority of your process and use the first-in first-out scheduling policy, which guarantees that a process will never be placed at the end of the process list if it is preempted. In addition, the process's priority will never be adjusted and it will never be moved to another process list. With fixed-priority scheduling policies, you must explicitly set priorities by calling either the sched_setparam or sched_setscheduler function. Thus, realtime processes using fixed-priority scheduling policies are free to yield execution resources to each other in an application-dependent manner.

If you are using a fixed-priority scheduling policy and you call the nice or renice function to adjust priorities, the function returns without changing the priorities.

2.2.3.1 First-In First-Out Scheduling

The first-in first-out scheduling policy, SCHED_FIFO, gives maximum control to the application. This scheduling policy does not enforce a quantum. Rather, each process runs to completion or until it voluntarily yields or is preempted by a higher-priority process.

Processes scheduled under the first-in first-out scheduling policy are chosen from a process priority list that is ordered according to the amount of time its processes have been on the list without being executed. Under this scheduling policy, the process at the beginning of the highest-priority, nonempty process list is executed first. The next process moves to the beginning of the list and is executed next. Thus execution continues until that priority list is empty. Then the process at the beginning of the next highest-priority, nonempty process list is selected and execution continues. A process runs until execution finishes or the process is preempted by a higher-priority process.

The process at the beginning of a process list has waited at that priority the longest amount of time, while the process at the end of the list has waited the shortest amount of time. Whenever a process becomes runnable, it is placed on the end of a process list and waits until the processes in front of it have executed. When a process is placed in an empty high-priority process list, the process will preempt a lower-priority running process.

If an application changes the priority of a process, the process is removed from its list and placed at the end of the new priority process list.

The following rules determine how runnable processes are queued for execution using the first-in first-out scheduling policy:

When a process is preempted, it goes to the beginning of the process list for its priority.

When a blocked process becomes runnable, it goes to the end of the process list for its priority.

When a running process changes the priority or scheduling policy of another process, the changed process goes to the end of the new priority process list.

When a process voluntarily yields to another process, it goes to the end of the process list for its priority.

The first-in first-out scheduling policy is well suited for the realtime environment because it is deterministic. That is, processes with the highest priority always run, and among processes with equal priorities, the process that has been runnable for the longest period of time is executed first. You can achieve complex scheduling by altering process priorities.

Also, under the first-in first-out scheduling policy, the user can raise the priority of a running process to avoid its being preempted by another process. Therefore, a high-priority, realtime process running under the first-in first-out scheduling policy can use system resources as long as necessary to finish realtime tasks.

2.2.3.2 Round-Robin Scheduling

The round-robin scheduling policy, SCHED_RR, is a logical extension of the first-in first-out scheduling policy. A process running under the round-robin scheduling policy is subject to the same rules as a process running under the fixed-priority scheduling policy, but a quantum is imposed on the running process. When a process finishes its quantum, it goes to the end of the process list for its priority.

Processes under the round-robin scheduling policy may be preempted by a higher-priority process before the quantum has expired. A preempted process goes to the beginning of its priority process list and completes the previously unexpired portion of its quantum when the process resumes execution. This ensures that a preempted process regains control as soon as possible.

Figure 2-3 shows process scheduling using a quantum. One portion of the figure shows the running process; the other portion of the figure shows what happens to running processes over time. Process G is removed from the beginning of the process list, placed in the run queue, and begins execution. Process B, a higher priority process, enters the runnable state while process G is running. The scheduler preempts process G to execute process B. Since process G had more time left in its quantum, the scheduler returns process G to the beginning of the process list, keeps track of the amount of time left in process G's quantum, and executes process B. When process B finishes, process G is again moved into the run queue and finishes its quantum. Process H, next in the process list, executes last.

Figure 2-3: Preemption -- Finishing a Quantum

Round-robin scheduling is designed to provide a facility for implementing time-slice algorithms. You can use the concept of a quantum in combination with process priorities to facilitate time-slicing. You can use the sched_rr_get_interval function to retrieve the quantum used in round-robin scheduling. If a process, running under the round-robin scheduling policy, runs without blocking or yielding for more than this amount of time, it may be preempted by another runnable process at the same priority.

2.3 Process Priorities

All applications are given an initial priority, either implicitly by the operating system or explicitly by the user. If you fail to specify a priority for a process, the kernel assigns the process an initial priority.

You can specify and manage a process's priority using either nice or P1003.1b functions. The nice functions are useful for managing priorities for nonrealtime, timesharing applications. However, realtime priorities are higher than the nice priorities and make use of the P1003.1b scheduling policies. Realtime priorities can be managed only by using the associated P1003.1b functions.

In general, process scheduling is based on the concept that tasks can be prioritized, either by the user or by the scheduler. Each process table entry contains a priority field used in process scheduling. Conceptually, each priority level consists of a process list. The process list is ordered with the process that should run first at the beginning of the list and the process that should run last at the end of the list. Since a single processor can execute only one process at a time, the scheduler selects the first process at the beginning of the highest priority, nonempty process list for execution.

Priority levels are organized in ranges. The nonprivileged user application runs in the same range as most applications using the timesharing scheduling policy. Most users need not concern themselves with priority ranges above this range. Privileged applications (system or realtime) use higher priorities than nonprivileged user applications. In some instances, realtime and system processes can share priorities, but most realtime applications will run in a priority range that is higher than the system range.

2.3.1 Priorities for the nice Interface

The nice interface priorities are divided into two ranges: the higher range is reserved for the operating system, and the lower range for nonprivileged user processes. With the nice interface, priorities range from 20 through -20, where 20 is the lowest priority. Nonprivileged user processes typically run in the 20 through 0 range. Many system processes run in the range 0 through -20. Table 2-1 shows the nice interface priority ranges.

Table 2-1: Priority Ranges for the nice Interface

Range Priority Level

Nonprivileged user 20 through 0

System 0 through -20

Range	Priority Level
Nonprivileged user	20 through 0
System	0 through -20

A numerically low value implies a high priority level. For example, a process with a priority of 5 has a lower priority than a process with a priority of 0. Similarly, a system process with a priority of -5 has a lower priority than a process with a priority of -15. System processes can run at nonprivileged user priorities, but a user process can only increase its priority into the system range if the owner of the user process has superuser privileges.

Processes start at the default base priority for a nonprivileged user process (0). Since the only scheduling policy supported by the nice interface is timesharing, the priority of a process changes during execution. That is, the nice parameter represents the highest priority possible for a process. As the process runs, the scheduler adds offsets to the initial priority, adjusting the process's priority downward from or upward toward the initial priority. However, the priority will not exceed (be numerically lower than) the nice value.

The nice interface supports relative priority changes by the user through a call to the nice, renice, or setpriority functions. Interactive users can specify a base priority at the start of application execution using the nice command. The renice command allows users to interactively change the priority of a running process. An application can read a process's priority by calling the getpriority function. Then the application can change a process's priority by calling the setpriority function. These functions are useful for nonrealtime applications but do not affect processes running under one of the P1003.1b fixed-priority scheduling policies described in Section 2.2.

Refer to the reference pages for more information on the getpriority, setpriority, nice, and renice functions.

2.3.2 Priorities for the Realtime Interface

Realtime interface priorities are divided into three ranges: the highest range is reserved for realtime, the middle range is used by the operating system, and the low range is used for nonprivileged user processes. DIGITAL UNIX realtime priorities loosely map to the nice priority range, but provide a wider range of priorities. Processes using the P1003.1b scheduling policies must also use the DIGITAL UNIX realtime interface priority scheme. Table 2-2 shows the DIGITAL UNIX realtime priority ranges.

Table 2-2: Priority Ranges for the DIGITAL UNIX Realtime Interface

Range Priority Level

Nonprivileged user SCHED_PRIO_USER_MIN through SCHED_PRIO_USER_MAX

System SCHED_PRIO_SYSTEM_MIN through SCHED_PRIO_SYSTEM_MAX

Realtime SCHED_PRIO_RT_MIN through SCHED_PRIO_RT_MAX

Range	Priority Level
Nonprivileged user	SCHED_PRIO_USER_MIN through SCHED_PRIO_USER_MAX
System	SCHED_PRIO_SYSTEM_MIN through SCHED_PRIO_SYSTEM_MAX
Realtime	SCHED_PRIO_RT_MIN through SCHED_PRIO_RT_MAX

Realtime interface priority levels are the inverse of the nice priority levels; a numerically high value implies a high priority level. A realtime process with a priority of 32 has a higher priority than system processes, but a lower priority than another realtime process with a priority of 45. Realtime and system processes can run at nonprivileged user priorities, but a nonprivileged user process cannot increase its priority into the system or realtime range without superuser privileges.

The default initial priority for processes using realtime priorities is 19. The default scheduling policy is timesharing.

Figure 2-4 illustrates the relationship between these two priority interfaces.

Figure 2-4: Priority Ranges for the nice and Realtime Interfaces

Note that hardware interrupts are unaffected by process priorities, even the highest realtime priority.

DIGITAL UNIX does not support priority inheritance between processes. This is important to remember in prioritizing processes in such a way to avoid priority inversion. Priority inversion takes place when a higher priority process is blocked by the effects of a lower priority process.

For example, a client program running at a priority of 60 (realtime priority) blocks while waiting for the receipt of data. This allows a loop program to run at the lower priority of 40 (also realtime priority), but the network thread that dequeues the network packets is running at a system priority of 30. The loop program blocks the network thread, which in turn blocks the higher priority client process which is still waiting for the receipt of data.

In this case, the inversion may be resolved by running the network thread at a higher priority than the loop program. When running realtime processes at the exclusive realtime priority level, it is important to ensure that the processes give up the CPU in order for normal system processes to run.

2.3.3 Displaying Realtime Priorities

The ps command displays current process status and can be used to give realtime users snapshots of process priorities. Realtime users can use POSIX realtime functions to change process priority. Therefore, the ps command is a useful tool for determining if realtime processes are running at the expected priority.

The ps command captures the states of processes, but the time required to capture and display the data from the ps command may result in some minor discrepancies.

Priorities used in the realtime scheduling interface are displayed when you use the specifier psxpri in conjunction with the -o or -O switch on the ps command. Fields in the output format include the process ID (PID), POSIX scheduling priority (PPR), the state of the process (S), control terminal of the process (TTY), CPU time used by the process (TIME), and the process command (COMM).

The following example shows information regarding processes, with or without terminals, and displays timesharing and POSIX priorities. Note that the display indicates that the ps command is also running.

% ps -aeO psxpri
  PID PPR S    TTY             TIME COMMAND
    0  31 R <  ??          16:52:49 kernel idle
    1  19 I    ??          28:28.03 init
    7  19 I    ??           0:02.72 kloadsrv
   11  19 I    ??           0:00.94 dxterm
      
.
.
.
14737  60 S<   p2           0:00.01 ./tests/work
13848  15 R    ttyv3        0:01.12 ps

In the example above, two processes are using realtime priorities. The first process (PID 0) is running at maximum system priority. The second realtime process (PID 14737) has been sleeping for less than twenty seconds at priority 60. The processes with PIDs 1, 7, and 11 are idle at the maximum user priority.

For more information, see the reference page for the ps command.

2.3.4 Configuring Realtime Priorities

You should assign realtime priorities according to the critical nature of the work the processes perform. Some applications may not need to have all processes running in the realtime priority range. Applications that run in a realtime range for long periods may prevent the system from performing necessary services, which could cause network and device timeouts or data overruns. Some processes perform adequately if they run under a fixed-priority scheduling policy at priority 19. Only critical processes running under a fixed-priority scheduling policy should run with priorities in the realtime range, 32 through 63.

Although P1003.1b functions let you change the scheduling policy while your application is running, it is better to select a scheduling policy during application initialization than to change the scheduling policy while the application executes. However, you may find it necessary to adjust priorities within a scheduling policy as the application executes.

It is recommended that all realtime applications provide a way to configure priorities at runtime. You can configure priorities using the following methods:

Providing a default priority within the realtime priority range by calling the sched_get_priority_max and sched_get_priority_min functions

Using a .rc initialization file, which overrides the default priority, or using environment variables, which override the default priority

Adjusting priority during initialization by calling the sched_setparam function

Each process should have a default base priority appropriate for the kind of work it performs and each process should provide a configuration mechanism for changing that base priority. To simplify system management, make the hardcoded default equal to the highest priority used by the application. At initialization, the application should set its process priorities by subtracting from the base priority. Use the constants given in the sched.h header file as a guide for establishing your default priorities.

The sched.h header file provides the following constants that may be useful in determining the optimum default priority:

        SCHED_PRIO_USER_MIN
        SCHED_PRIO_USER_MAX
        SCHED_PRIO_SYSTEM_MIN
        SCHED_PRIO_SYSTEM_MAX
        SCHED_PRIO_RT_MIN
        SCHED_PRIO_RT_MAX

These values are the current values for default priorities. When coding your application, use the constants rather than numerical values. The resulting application will be easier to maintain should default values change.

Debug your application in the nonprivileged user priority range before running the application in the realtime range. If a realtime process is running at a level higher than kernel processes and the realtime process goes into an infinite loop, you must reboot the system to stop process execution.

Although priority levels for DIGITAL UNIX system priorities can be adjusted using the nice or renice functions, these functions have a ceiling that is below the realtime priority range. To adjust realtime priorities, use the sched_getparam and sched_setparam P1003.1b functions, discussed in Section 2.4.3. You should adjust process priorities for your own application only. Adjusting system process priorities could have unexpected consequences.

2.4 Scheduling Functions

Realtime processes must be able to select the most appropriate priority level and scheduling policy dynamically. A realtime application often modifies the scheduling policy and priority of a process, performs some function, and returns the process to its previous priority. Realtime processes must also be able to yield system resources to each other in response to specified conditions. The following P1003.1b functions satisfy these realtime requirements:

Function	Description
`sched_getscheduler`	Returns the scheduling policy of a specified process
`sched_getparam`	Returns the scheduling priority of a specified process
`sched_get_priority_max`	Returns the maximum priority allowed for a scheduling policy
`sched_get_priority_min`	Returns the minimum priority allowed for a scheduling policy
`sched_rr_get_interval`	Returns the current quantum for the round-robin scheduling policy
`sched_setscheduler`	Sets the scheduling policy and priority of a specified process
`sched_setparam`	Sets the scheduling priority of a specified process
`sched_yield`	Yields execution to another process

Refer to the reference pages for a complete description of these functions.

All the preceding functions, with the exception of the sched_yield function, require a process ID parameter (pid). In all P1003.1b priority and scheduling functions, a pid value of zero indicates that the function call refers to the calling process. Use zero in these calls to eliminate using the getpid or getppid functions.

The priority and scheduling policy of a process are inherited across a fork or exec system call.

Changing the priority or scheduling policy of a process causes the process to be queued to the end of the process list for its new priority. You must have superuser privileges to change the realtime priorities or scheduling policies of a process.

2.4.1 Determining Limits

Three functions allow you to determine scheduling policy parameter limits. The sched_get_priority_max and sched_get_priority_min functions return the appropriate maximum or minimum priority permitted by the scheduling policy. These functions can be used with any of the P1003.1b scheduling policies: first-in first-out, round-robin, or timesharing. You must specify one of the following keywords when using these functions:

SCHED_FIFO

SCHED_RR

SCHED_OTHER

The sched_rr_get_interval function returns the current quantum for process execution under the round-robin scheduling policy.

2.4.2 Retrieving the Priority and Scheduling Policy

Two functions return the priority and scheduling policy for realtime processes, sched_getparam and sched_getscheduler, respectively. You do not need special privileges to use these functions, but you need superuser privileges to set priority or scheduling policy.

If the pid is zero for either function, the value returned is the priority or scheduling policy for the calling process. The values returned by a call to the sched_getscheduler function indicate whether the scheduling policy is SCHED_FIFO, SCHED_RR, or SCHED_OTHER.

2.4.3 Setting the Priority and Scheduling Policy

Use the sched_getparam function to determine the initial priority of a process; use the sched_setparam function to establish a new priority. Adjusting priority levels in response to predicted system loads and other external factors allows the system administrator or application user greater control over system resources. When used in conjunction with the first-in first-out scheduling policy, the sched_setparam function allows a critical process to run as soon as it is runnable, for as long as it needs to run. This occurs because the process preempts other lower-priority processes. This can be important in situations where scheduling a process must be as precise as possible.

The sched_setparam function takes two parameters: pid and param. The pid parameter specifies the process to change. If the pid parameter is zero, priority is set for the calling process. The param parameter specifies the new priority level. The specified priority level must be within the range for the minimum and maximum values for the scheduling policy selected for the process.

The sched_setscheduler function sets both the scheduling policy and priority of a process. Three parameters are required for the sched_setscheduler function: pid, policy, and param. If the pid parameter is zero, the scheduling policy and priority will be set for the calling process. The policy parameter identifies whether the scheduling policy is to be set to SCHED_FIFO, SCHED_RR, or SCHED_OTHER. The param parameter indicates the priority level to be set and must be within the range for the indicated scheduling policy.

Notification of a completed priority change may be delayed if the calling process has been preempted. The calling process is notified when it is again scheduled to run.

If you are designing portable applications (strictly conforming POSIX applications), be careful not to assume that the priority field is the only field in the sched_param structure. All the fields in a sched_param structure should be initialized before the structure is passed as the param argument to the sched_setparam or sched_setscheduler. Example 2-1 shows how a process can initialize the fields using only constructs provided by the P1003.1b standard.

Example 2-1: Initializing Priority and Scheduling Policy Fields

/* Change to the SCHED_FIFO policy and the highest priority, then  */
/* lowest priority, then back to the original policy and priority. */

#include <unistd.h>
#include <sched.h>

#define CHECK(sts,msg)  \
  if (sts == -1) {      \
    perror(msg);        \
    exit(-1);           \
  }

main ()
{
  struct sched_param param;
  int my_pid = 0;
  int old_policy, old_priority;
  int sts;
  int low_priority, high_priority;

       /* Get parameters to use later.  Do this now  */
       /* Avoid overhead during time-critical phases.*/

  high_priority = sched_get_priority_max(SCHED_FIFO);
  CHECK(high_priority,"sched_get_priority_max");
  low_priority = sched_get_priority_min(SCHED_FIFO);
  CHECK(low_priority,"sched_get_priority_min");

       /* Save the old policy for when it is restored. */

  old_policy = sched_getscheduler(my_pid);
  CHECK(old_policy,"sched_getscheduler");

       /* Get all fields of the param structure.  This is where */
       /* fields other than priority get filled in.             */

  sts = sched_getparam(my_pid, &param);
  CHECK(sts,"sched_getparam");

       /* Keep track of the old priority. */

  old_priority = param.sched_priority;

       /* Change to SCHED_FIFO, highest priority.  The param   */
       /* fields other than priority get used here.            */

  param.sched_priority = high_priority;
  sts = sched_setscheduler(my_pid, SCHED_FIFO, &param);
  CHECK(sts,"sched_setscheduler");

       /* Change to SCHED_FIFO, lowest priority.  The param */
       /* fields other than priority get used here, too.    */

  param.sched_priority = low_priority;
  sts = sched_setparam(my_pid, &param);
  CHECK(sts,"sched_setparam");

       /* Restore original policy, parameters.  Again, other  */
       /* param fields are used here.                         */

  param.sched_priority = old_priority;
  sts = sched_setscheduler(my_pid, old_policy, &param);
  CHECK(sts,"sched_setscheduler 2");

  exit(0);
}

A process is allowed to change the priority of another process only if the target process runs on the same node as the calling process and at least one of the following conditions is true:

The calling process is a privileged process with a real or effective UID of zero.

The real user UID or the effective user UID of the calling process is equal to the real user UID or the saved-set user UID of the target process.

The real group GID or the effective group GID of the calling process is equal to the real group GID or the saved-set group GID of the target process, and the calling process has group privilege.

Before changing the priority of another process, determine which UID is running the application. Use the getuid system call to determine the real UID associated with a process.

2.4.4 Yielding to Another Process

Sometimes, in the interest of cooperation, it is important that a running process give up the kernel to another process at the same priority level. Using the sched_yield function causes the scheduler to look for another process at the same priority level to run, and forces the caller to return to the runnable state. The process that calls the sched_yield function resumes execution after all runnable processes of equal priority have been scheduled to run. If there are no other runnable processes at that priority, the caller continues to run. The sched_yield function causes the process to yield for one cycle through the process list. That is, after a call to sched_yield, the target process goes to the end of its priority process list. If another process of equal priority is created after the call to sched_yield, the new process is queued up after the yielding process.

The sched_yield function is most useful with the first-in first-out scheduling policy. Since the round-robin scheduling policy imposes a quantum on the amount of time a process runs, there is less need to use sched_yield. The round-robin quantum regulates the use of system resources through time-slicing. The sched_yield function is also useful when a process does not have permission to set its priority but still needs to yield execution.

2.5 Priority and Policy Example

Example 2-2 shows how the amount of time in a round-robin quantum can be determined, the current scheduling parameters saved, and a realtime priority set. Using the round-robin scheduling policy, the example loops through a test until a call to the sched_yield function causes the process to yield.

Example 2-2: Using Priority and Scheduling Functions

#include <unistd.h>
#include <time.h>
#include <sched.h>
#define LOOP_MAX 10000000
#define CHECK_STAT(stat, msg)  \
     if (stat == -1)           \
     { perror(msg);            \
       exit(-1);               \
     }

main()
{
     struct sched_param my_param;
     int      my_pid = 0;
     int      old_priority, old_policy;
     int      stat;

     struct timespec rr_interval;
     int      try_cnt, loop_cnt;
     volatile int tmp_nbr;

      /* Determine the round-robin quantum */

stat = sched_rr_get_interval (my_pid, &rr_interval);
CHECK_STAT(stat, "sched_rr_get_interval");
printf("Round-robin quantum is %lu seconds, %ld nanoseconds\n",
     rr_interval.tv_sec, rr_interval.tv_nsec);

      /* Save the current scheduling parameters */

old_policy = sched_getscheduler(my_pid);
stat = sched_getparam(my_pid, &my_param);
CHECK_STAT(stat, "sched_getparam - save old priority");
old_priority = my_param.sched_priority;

      /* Set a realtime priority and round-robin */
      /* scheduling policy */

my_param.sched_priority = SCHED_PRIO_RT_MIN;
stat = sched_setscheduler(my_pid, SCHED_RR, &my_param);
CHECK_STAT(stat, "sched_setscheduler - set rr priority");

      /* Try the test */

for (try_cnt = 0; try_cnt < 10; try_cnt++)

      /* Perform some CPU-intensive operations */

     {for(loop_cnt = 0; loop_cnt < LOOP_MAX; loop_cnt++)
           {
            tmp_nbr+=loop_cnt;
            tmp_nbr-=loop_cnt;
            }

       printf("Completed test %d\n",try_cnt);
       sched_yield();
       }

      /* Lower priority and restore policy */

my_param.sched_priority = old_priority;
stat = sched_setscheduler(my_pid, old_policy, &my_param);
CHECK_STAT(stat, "sched_setscheduler - to old priority");
}