Linux: Text Fu
Standard Output(stdout), standard input (stdin) and standard error (stderr) are three different data streams in linux. A stream may carry data, which in this example is text.
Streams are treated similarly to files in Linux. Each file has a file descriptor, which is a unique number that is used to identify it. These numbers are as follows:
- 0: stdin
- 1: stdout
- 2: stderr
File descriptor is used when an action on the file is required.
Standard output (Stdout) is the default file descriptor to which a process can write output in the linux. Let’s run the command.
$ echo Hello world!
The text appears to be shown in the terminal. The procedure here is that the echo command receives the keyboard input (stdin) and shows the result (stdout). This output can alternatively be directed to a file.
$ echo Hello world! > test.txt
The > character is used to redirect standard output to a file, in this example to test.txt rather than the terminal. As a result, the terminal displays nothing. Hoyouver, the text will be visible when you read the file.
$ cat test.txt
If the file does not exist, it will be created, and if the file already exists, it will be overwritten. Using the >> operator, you can now redirect data without overwriting the file. Let’s see what happens.
$ echo Hello world 1 > test2.txt $ echo Hello world 2 >> test2.txt $ cat test2.txt
Standard In (stdin)
It is associated with taking input. Let’s see how it works.
$ echo Hello there! > linux.txt $ cat < linux.txt > hello.txt $ cat hello.txt
you first redirected ‘hello there!’ text into the linux.txt file. The < operator is used for taking input. So, linux.txt is taken as input and it is directed to the hello.txt file. Now, if you read the hello.txt, you can see the text is transferred.
Standard Error (stderr)
It is used for displaying the error messages. For exploration, you can try to read a file named none.txt which does not exist.
$ cat none.txt
When you run the command, the terminal will throw an error as ‘No such file or directory’.
Now, if you want this stream to redirect into a file, you have to run the following command.
$ cat none.txt 2> error.txt
It’s worth noting that you have to provide the file descriptor because it’s stderr, not stdout.
The Pipe is a Linux tool that allows you to combine two or more commands so that the output of one becomes the input for the next. Let’s see how it works.
$ echo Hello there | grep Hello
The | operator refers to the piping. In this example, echo generates the output (stdout) text which is then used as input (stdin) for grep command.
Tee is used when there is a need to write the output in two different streams.
$ ls | tee ls.text
Here, the ls command will show all the directory and files in the terminal and also the same output is written in the ls.txt file.
In linux, the environment is defined by the environment variables which is a character with an assigned value. Some variables are set by the system and some by the user. Now, let’s run the following command:
$ printenv HOME
$ printenv USER
$ echo $HOME
$ echo $USER
These are some variables in the environment. The PATH variable is another crucial variable. When you run a command, the system looks for these. If you try to execute a program from anywhere other than its installation path, you’ll get a ‘command not found’ error. You must include the program’s installation directory in the PATH variable in this situation. In this way, the system will know where to look for when this command is executed. For finding out the PATH variable of your system, you have to run the following command.
$ echo $PATH
The following command will display all of the information about the environment.
It extracts portions of the text from a file. The portion will be determined by the user. To understand this, first let us create a text file.
$ nano hello.txt
Now let’s say you want to extract the 10th and 8th character from this file. For this you have to run the following command.
$ cut -c 10 hello.txt $ cut -c 8 hello.txt
Here, -c refers to the character extraction. Important thing to note here is that space also counts as a character. You can also extract the text by a field using the -f. By default, it takes TAB as delimiters for the field. In the hello.txt file, the number of TABs used is 2. So, there are 3 fields.
$ cut -f 3 hello.txt
It should give ‘fur’ as output. Let’s see.
yeah ! It’s working. Now, the good news is you can set the delimiter that you want. Lets run the following command:
$ echo -f 3 -d “,” hello.txt
The -d is used for setting the delimiters. Here it is set as comma (,). Hence, for the hello.txt file, the result should be ‘little ball of fur’.
In linux, paste command is used to join files horizontally from each file specified. It displays the standard output where TAB is used as the delimiter. To explore the paste command first, we need to create two files.
So, we have created country.txt and capital.txt files. Now let’s explore the paste command.
$ paste country.txt capital.txt
When no option is used, paste combines the files in parallel, which means it writes matching lines from the files using the TAB delimiter. Let’s explore the options available for the paste command.
$ paste -s country.txt capital.txt
When you use -s option, paste merge the files sequentially. It reads all of the lines from one file before merging it with another. As like in the cut command, here also you can use a custom delimiter.
$ paste -d “-” country.txt capital.txt
If more than one character is passed as a delimiter, it uses the mentioned characters in a circular fashion. Let’s see how it works.
$ paste -d “;-” country.txt capital.txt
Head and Tail
It is useful when you have to read a very long which we often encounter. If you try to read it using cat command, the terminal will be overloaded with the text. In this manner, head allows you to read a number of lines as like from to top of the file. So, for testing purpose, you can choose any of your log file and run the following command.
$ head -n 5 /var/log/auth.log
The -n option is used for selecting the number of lines. If number of lines is not specified, it shows the first ten lines.
Similar to the head command, tail is used for read a number of specified lines of long files from the bottom. It also shows 10 lines by default but you can choose as you like.
$ tail -n 4 /var/log/auth.log
There is a interesting option available for this command which is -f. It allows you to see if anything is added to the file. Let’s run the command.
$ tail -f /var/log/syslog
It is used for joining multiple files together by a common field. To explore this, first make two files such that they have a common field.
1 Pakistan 2 India 3 Bangladesh 4 Russia
1 Islamabad 2 New Delhi 3 Dhaka 4 Moscow
You can see, these two files have similar fields of 1,2,3,4. So, you can join them using join command. Now, run the following command to do that.
$ join country.txt capital.txt
Now, let’s see another case where files have similar fields but their position is different.
Pakistan 1 India 2 Bangladesh 3 Russia 4
1 Islamabad 2 New Delhi 3 Dhaka 4 Moscow
How can you join these two files? Here, we want to use field 2 for file 1 and field 1 for file 2. Good news is join command allows user to do that. Now, run the following command.
$ join -1 2 -2 1 country.txt capital.txt
In the command -1 refers to the country.txt or 1st file and -2 refers to the capital.txt or 2nd file.
The split commad does the opposite of the join command that means it is used for splitting large files intro smaller files. First make a text file with 10 lines. Now, let’s run the following code and explore yourself.
$ cat large.txt $ split -l 4 large.txt
Here, -l options let you choose to decide the number of lines for a file. By default, after 1000 lines, split command will create new files. The created files have name ‘xaa’, ‘xac’, ‘xab’. You can aslo choose the name of ‘x’ here. Run the following command in your terminal.
$ split -l 4 large.txt small_
A name of ‘small_’ has been after large.txt here. Hence, all the splitted files will be created have name like small_aa, small_ab etc.
It is used for sorting files in a specific order. The sort command assumes all the characters of the file were written in ASCII.
$ sort -n sample.txt $ sort -r sample.txt
When the -n option is specified, the contents of the file get sorted numerically and the -r option allows you to sort reverse of numerically. There is another useful option for sorting which is -u which let you remove the dulplicate contents. Let’s try this one.
$ sort -u duplicate_content.txt
The tr command can translates or deletes from the stdin (standard input) and display the stdout (standard out). It is useful for removing duplicate characters, find and replacing, lowercase to uppercase conversion.
$ tr a-z A-Z
In this case, terminal is accepting input from the user and converting the input into uppercase characters. Now, let’s see how can you replace character using tr.
$ echo ‘linux-ubuntu’ | tr ‘nu’ ‘ab’
In this case, every ‘n’ and ‘u’ character will be replaced by ‘a’ and ‘b’ respectively.
Wc and nl
It is used for showing total number words in a file. It can also show the number of lines and and number of bytes respectively.
$ wc large.txt
To see just the number of lines you have to run the command below.
$ wc -l large.txt
The grep command allows you to find characters in a file. It is very useful and one of the most popular command in linux.
$ cat duplicate_content.txt $ grep USA duplicate_content.txt