Processes in Linux
A Linux process is a
program running in the Linux system. Depending on Linux distributions, it's
also known as service. In Linux community however, a Linux process is called
daemon.
When you start a
program or running an application in Linux, you actually execute that program.
A Linux process (a daemon), running in foreground or in the background, uses
memory and CPU resources. That's why we need to manage Linux process. Keeping
unused Linux process running in the system is a waste and also expose your
system to security threat.
We are going to learn
and practice Linux process management using Linux ps command, pstree command and
top command in this tutorial.
In Linux, every running
process or daemon is given an identity number called PID (Process ID). The process
id is unique. We can terminate unused program in the system by stopping its
process id.
In order to manage
Linux processes, we need to identify some process information such as who's responsible
for the process, which terminal the process is running from and what command
used to run the process.
We can view a
particular Linux process by providing its characteristic. A Linux command which
can be used to check the ID of a running process in Linux system is the ps
command.
1.
ps (Processes)
Processes
are the programs that are running on your machine. They are managed by the
kernel and each process has an ID associated with it called the process ID
(PID). This PID is assigned in the order that processes are created.
Go
ahead and run the ps command to see a list of running processes:
$ ps
PID TTY STAT TIME CMD
41230 pts/4 Ss 00:00:00 bash
51224 pts/4 R+ 00:00:00 ps
This shows you a quick snapshot of the current processes:
- PID: Process ID
- TTY: Controlling terminal associated with the process (we'll go in detail about this later)
- STAT: Process status code
- TIME: Total CPU usage time
- CMD: Name of executable/command
If you look at the man page for ps you'll see that there are lots of command options you can pass, they will vary depending on what options you want to use - BSD, GNU or Unix. In my opinion the BSD style is more popular to use, so we're gonna go with that. If you are curious the difference between the styles is the amount of dashes you use and the flags.
$ ps aux
The a displays all processes running, including the ones being ran by other users. The u shows more details about the processes. And finally the x lists all processes that don't have a TTY associated with it, these programs will show a ? in the TTY field, they are most common in daemon processes that launch as part of the system startup.
You'll notice you're
seeing a lot more fields now, no need to memorize them all, in a later course
on advanced processes, we'll go over some of these again:
- USER: The effective user (the one whose access we are using)
- PID: Process ID
- %CPU: CPU time used divided by the time the process has been running
- %MEM: Ratio of the process's resident set size to the physical memory on the machine
- VSZ: Virtual memory usage of the entire process
- RSS: Resident set size, the non-swapped physical memory that a task has used
- TTY: Controlling terminal associated with the process
- STAT: Process status code
- START: Start time of the process
- TIME: Total CPU usage time
- COMMAND: Name of executable/command
The ps command can get a little messy to look at, for now the fields we will look at the most are PID, STAT and COMMAND.
Another very useful
command is the top command, top gives you real time information about the
processes running on your system instead of a snapshot. By default you'll get a
refresh every 10 seconds. Top is an extremely useful tool to see what processes
are taking up a lot of your resources.
$ top
2. Controlling Terminal
We
discussed how there is a TTY field in the ps output. The TTY is the terminal
that executed the command.
There are two types of terminals, regular terminal devices and pseudo terminal devices. A regular terminal device is a native terminal device that you can type into and send output to your system, this sounds like the terminal application you've been launching to get to your shell, but it's not.
We're goanna segue so you can see this action, go ahead and type Ctrl-Alt-F1 to get into TTY1 (the first virtual console), you'll notice how you don't have anything except the terminal, no graphics, etc. This is considered a regular terminal device; you can exit this with Ctrl-Alt-F7.
A pseudo terminal is what you've been used to working in, they emulate terminals with the shell terminal window and are denoted by PTS . If you look at ps again, you'll see your shell process under pts/*.
Ok, now circling back to the controlling terminal, processes are usually bound to a controlling terminal. For example, if you were running a program on your shell window such as find and you closed the window, your process would also go with it.
There are processes such as daemon processes, which are special processes that are essentially keeping the system running. They often start at system boot and usually get terminated when the system is shutdown. They run in the background and since we don't want these special processes to get terminated they are not bound to a controlling terminal. In the ps output, the TTY is listed as a? Meaning it does not have a controlling terminal.
3. Process Details
Before we get into more
practical applications of processes, we have to first understand what they are
and how they work. This part can get confusing since we are diving into the
nitty gritty, so feel free to come back to this lesson if you don't want to learn
about it now.
A process like we said
before is a running program on the system, more precisely it's the system
allocating memory, CPU, I/O to make the program run. A process is an instance
of a running program, go ahead and open 3 terminal windows, in two windows, run
the cat command without passing any options (the cat process will stay open as
a process because it expects stdin). Now in the third window run: ps aux | grep
cat. You'll see that there are two processes for cat, even though they are calling
the same program.
The kernel is in charge
of processes, when we run a program the kernel loads up the code of the program
in memory, determines and allocates resources and then keeps tabs on each
process, it knows:
- The status of the process
- The resources the process is using and receives
- The process owner
- Signal handling (more on that later)
- And basically everything else
All processes are
trying to get a taste of that sweet resource pie, it's the kernel's job to make
sure that processes get the right amount of resources depending on process
demands. When a process ends, the resources it used are now freed up for other
processes.
4.
Process Creation
Again
this lesson and the next are purely information to let you see what's under the
hood, feel free to circle back to this once you've worked with processes a bit
more.
When a new process is created, an existing process basically clones itself using something called the fork system call (system calls will be discussed very far into the future). The fork system call creates a mostly identical child process, this child process takes on a new process ID (PID) and the original process becomes its parent process and has something called a parent process ID PPID. Afterwards, the child process can either continue to use the same program its parent was using before or more often use the execve system call to launch up a new program. This system call destroys the memory management that the kernel put into place for that process and sets up new ones for the new program.
We can see this in action:
$
ps l
The l option gives us a "long format" or even more detailed view of our running processes. You'll see a column labelled PPID, this is the parent ID. Now look at your terminal, you'll see a process running that is your shell, so on my system I have a process running bash. Now remember when you ran the ps l command, you were running it from the process that was running bash. Now you'll see that the PID of the bash shell is the PPID of the ps l command.
So
if every process has to have a parent and they are just forks of each other,
there must be a mother of all processes right? You are correct, when the system
boots up, the kernels creates a process called init, it has a PID of 1. The
init process can't be terminated unless the system shuts down. It runs with
root privileges and runs many processes that keep the system running. We will
take a closer look at init in the system bootup course, for now just know it is
the process that spawns all other processes.
5.
Process Termination
Now
that we know what goes on when a process gets created, what is happening when
we don't need it anymore? Be forewarned, sometimes Linux can get a little
dark...
A process can exit using the _exit system call, this will free up the resources that process was using for reallocation. So when a process is ready to terminate, it lets the kernel know why it's terminating with something called a termination status. Most commonly a status of 0 means that the process succeeded. However, that's not enough to completely terminate a process. The parent process has to acknowledge the termination of the child process by using the wait system call and what this does is it checks the termination status of the child process. I know it's gruesome to think about, but the wait call is a necessity, after all what parent wouldn't want to know how their child died?
There is another way to terminate a process and that involves using signals, which we will discuss soon.
Orphan Processes
When a parent process dies before a child process, the kernel knows that it's not going to get a wait call, so instead it makes these processes "orphans" and puts them under the care of init (remember mother of all processes). Init will eventually perform the wait system call for these orphans so they can die.
Zombie Processes
What
happens when a child terminates and the parent process hasn't called wait yet?
We still want to be able to see how a child process terminated, so even though
the child process finished, the kernel turns the child process into a zombie
process. The resources the child process used are still freed up for other
processes, however there is still an entry in the process table for this
zombie. Zombie processes also cannot be killed, since they are technically
"dead", so you can't use signals to kill them. Eventually if the
parent process calls the wait system call, the zombie will disappear, this is
known as "reaping". If the parent doesn't perform a wait call, init
will adopt the zombie and automatically perform wait and remove the zombie. It
can be a bad thing to have too many zombie processes, since they take up space
on the process table, if it fills up it will prevent other processes from
running.
6.
Signals
A signal is a notification to a process that something has happened.
Why we have signals
They are software interrupts and they have lots of uses:
- A user can type one of the special terminal characters (Ctrl-C) or (Ctrl-Z) to kill, interrupt or suspend processes
- Hardware issues can occur and the kernel wants to notify the process
- Software issues can occur and the kernel wants to notify the process
- They are basically ways processes can communicate
Signal process
When a signal is generated by some event, it's then delivered to a process, it's considered in a pending state until it's delivered. When the process is ran, the signal will be delivered. However, processes have signal masks and they can set signal delivery to be blocked if specified. When a signal is delivered, a process can do a multitude of things:
- Ignore the signal
- "Catch" the signal and perform a specific handler routine
- Process can be terminated, as opposed to the normal exit system call
- Block the signal, depending on the signal mask
Common signals
Each signal is defined by integers with symbolic names that are in the form of SIGxxx. Some of the most common signals are:
- SIGHUP or HUP or 1: Hangup
- SIGINT or INT or 2: Interrupt
- SIGKILL or KILL or 9: Kill
- SIGSEGV or SEGV or 11: Segmentation fault
- SIGTERM or TERM or 15: Software termination
- SIGSTOP or STOP: Stop
Numbers can vary with signals so they are usually referred by their names.
Some
signals are unblock able, one example is the SIGKILL signal. The KILL signal
destroys the process.
7.
kill (Terminate)
You can send signals that terminate processes, such a command is aptly named the kill command.
$ kill 12445
The 12445 is the PID of the process you want to kill. By default it sends a TERM signal. The SIGTERM signal is sent to a process to request its termination by allowing it to cleanly release its resources and saving its state.
You can also specify a signal with the kill command:
$
kill -9 12445
This
will run the SIGKILL signal and kill the process.
Differences between SIGHUP, SIGINT, SIGTERM, SIGKILL, SIGSTOP?
These signals all sound reasonably similar, but they do have their differences.
- SIGHUP - Hangup, sent to a process when the controlling terminal is closed. For example, if you closed a terminal window that had a process running in it, you would get a SIGHUP signal. So basically you've been hung up on
- SIGINT - Is an interrupt signal, so you can use Ctrl-C and the system will try to gracefully kill the process
- SIGTERM - Kill the process, but allow it to do some cleanup first
- SIGKILL - Kill the process, kill it with fire, doesn't do any cleanup
- SIGSTOP - Stop/suspend a process
8.
niceness
When you run multiple things on your computer, like perhaps Chrome, Microsoft Word or Photoshop at the same time, it may seem like these processes are running at the same time, but that isn't quite true.
Processes use the CPU for a small amount of time called a time slice. Then they pause for milliseconds and another process gets a little time slice. By default, process scheduling happens in this round-robin fashion. Every process gets enough time slices until it's finished processing. The kernel handles all of these switching of processes and it does a pretty good job at it most of the time.
Processes
aren't able to decide when and how long they get CPU time, if all processes
behaved normally they would each (roughly) get an equal amount of CPU time.
However, there is a way to influence the kernel's process scheduling algorithm
with a nice value. Niceness is a pretty weird name, but what it means is that
processes have a number to determine their priority for the CPU. A high number
means the process is nice and has a lower priority for the CPU and a low or
negative number means the process is not very nice and it wants to get as much
of the CPU as possible.
$
top
You can see a column for NI right now, that is the niceness level of a process.
To change the niceness level you can use the nice and renice commands:
$
nice -n 5 apt upgrade
The nice command is used to set priority for a new process. The renice command is used to set priority on an existing process.
$
renice 10 -p 3245
9.
Process States
Let's take a look at the ps aux command again:
$
ps aux
In the STAT column, you'll see lots of values. A linux process can be in a number of different states. The most common state codes you'll see are described below:
- R: running or runnable, it is just waiting for the CPU to process it
- S: Interruptible sleep, waiting for an event to complete, such as input from the terminal
- D: Uninterruptible sleep, processes that cannot be killed or interrupted with a signal, usually to make them go away you have to reboot or fix the issue
- Z: Zombie, we discussed in a previous lesson that zombies are terminated processes that are waiting to have their statuses collected
- T: Stopped, a process that has been suspended/stopped
10. /proc filesystem
Remember everything in Linux is a file, even processes.
Process information is stored in a special filesystem known as the /proc
filesystem.
$ ls /proc
You should see multiple values in here, there are
sub-directories for every PID. If you looked at a PID in the ps output, you
would be able to find it in the /proc directory.
Go ahead and enter one of the processes and look at that
file:
$ cat /proc/12345/status
You should see process state information and well as more detailed information. The /proc directory is how the kernel is views the system, so there is a lot more information here than what you would see in ps.
11.
Job Control
Let's say you're working on a single terminal window and you're running a command that is taking forever. You can't interact with the shell until it is complete, however we want to keep working on our machines, so we need that shell open. Fortunately we can control how our processes run with jobs:
Sending a job to the background
Appending an ampersand (&) to the command will run it in the background so you can still use your shell. Let's see an example:
- $ sleep 1000 &
- $ sleep 1001 &
- $ sleep 1002 &
View all background jobs
Now
you can view the jobs you just sent to the background.
$
jobs
[1] Running sleep 1000 &
[2]- Running sleep 1001 &
[3]+ Running sleep 1002 &
This
will show you the job id in the first column, then the status and the command
that was run. The + next to the job ID means that it is the most recent background
job that started. The job with the - is the second most recent command.
Sending a job to the background on existing job
If you already ran a job and want to send it to the background, you don't have to terminate it and start over again. First suspend the job with Ctrl-Z, then run the bg command to send it to the background.
pete@icebox ~ $ sleep 1003
^Z
[4]+ Stopped sleep 1003
pete@icebox ~ $ bg
[4]+ sleep 1003 &
pete@icebox ~ $ jobs
- Running sleep 1000 &
- Running sleep 1001 &
- Running sleep 1002 &
- Running sleep 1003 &
Moving a job from the background to the foreground
To move a job out of the background just specify the job ID you want. If you run fg without any options, it will bring back the most recent background job (the job with the + sign next to it)
$
fg %1
Kill background jobs
Similar to moving jobs out of the background, you can use the same form to kill the processes by using their Job ID.
kill %1
Post a Comment
If you have any doubts, please let me know