Learning the Command Line

As Jeroen Janssens in his great blog Data Science at the command line writes:

The command line has many great advantages that can really make you a more efficient and productive data scientist. Roughly grouping the advantages, the command line is: agile, augmenting, scalable, extensible, and ubiquitous.

I couldn’t agree more although it took a while until I figured it out. So I decided to take you with me on my journey learning the basics about working with the command line. All the knowledge I share with you here is based on “The Linux Command Line” book written by William E. Shotts, which I also highly recommend.

Introduction

When we speak of the command line, we are really referring to the shell. The shell is a program that takes keyboard commands and passes them to the operating system to carry out. On macOS, the default shell is Bash, an acronym for Bourne Again Shell. The program that enables you to interact with the shell, is called terminal.

Once you launched the terminal, you should see something like this:

MacBook-Air:~ bbettendorf$

This is called a shell prompt, and it appears whenever the shell is ready to accept input.

1. Some Simple Commands

date

MacBook-Air:~ bbettendorf$ date
Mo 25 Mär 2019 11:15:48 CET

cal

MacBook-Air:~ bbettendorf$ cal
     März 2019        
So Mo Di Mi Do Fr Sa  
                1  2  
 3  4  5  6  7  8  9  
10 11 12 13 14 15 16  
17 18 19 20 21 22 23  
24 25 26 27 28 29 30  
31          

Command History

If we press the up-arrow key, we see that the previous command(s) reappear after the prompt. This is called command history. Press the down-arrow key, and the previous command disappears.

2. Navigation

The Current Working Directory

To display the current working directory, we use the pwd command (print working directory):

MacBook-Air:~ bbettendorf$ pwd
/Users/bbettendorf

Listing the Contents of a Directory

To list the files and directories in the current working directory, we use the ls command:

MacBook-Air:~ bbettendorf$ ls
Applications						Music
Desktop							Pictures
Documents						Sites
Downloads						
Dropbox							
Movies					

Changing the Current Working Directory

To change our working directory, we use the cd command (change directory) followed by the pathname of the desired working directory. Pathnames can be specified as either absolute pathnames or relative pathnames.

An absolute pathname begins with the root directory and follows the filesystem tree branch by branch until the path to the desired directory or file is completed:

MacBook-Air:~ bbettendorf$ cd /Users/bbettendorf/Documents/Flatiron_Course_Materials/
MacBook-Air:Flatiron_Course_Materials bbettendorf$ ls

... Listing of many, many files ...

On the contrary, a relative pathname starts from the working directory. To do this, is uses a couple pf special symbols to represent relative positions in the filesystem tree: . (dot) and .. (dot dot)

The . symbol refers to the working directory – and can be omitted in almost all cases, because it is implied – and the .. symbol refers to the working directory’s parent history.

MacBook-Air:Flatiron_Course_Materials bbettendorf$ cd ..
MacBook-Air:Documents bbettendorf$ 

Shortcuts

  • cd …….changes the working directory to your home directory.
  • cd – …..changes the working directory to the previous directory.

Hidden files

Filenames that begin with a period character are hidden. The command ls will not list them until you say ls -a.

3. Exploration

Options and Arguments

Commands are often followed by one or more options that modify their behavior and, further, by one or more arguments, the items upon which the command acts.

command -options arguments

List Directory Contents

ls is probably the most used command, with which we can see directory contents. In this example here, the ls command is given the shortl option to produce long format output, the short t option to sort the result by the files’s modification time, and the long reverse option to reverse the order of the sort.

All options have a short version as well as a long one.

$ ls -lt --reverse

Viewing File Contents

The less command is a program to view text files. Once the less program starts, we can view the contents of the file. If the file is no longer than one page, we can scroll up and down one line with the up-arrow and down-arrow. To exit less, press the Q key. If the file is longer, press the B key to scroll back one page, and the spacebar to scroll forward one page.

MacBook-Air:dsc-5-capstone-project-online-ds-ft-100118 bbettendorf$ less README.md

4. Manipulation

Wildcards

Because the shell uses filenames so much, it provides special characters to help you rapidly specify groups of filenames. These special characters are called wildcards. Here are some examples:

*                                      All files

b*                                    Any file beginning with b

b*.txt                             Any file beginning with b followed by any characters and ending with .txt

Data???                         Any file beginning with Data followed by exactly three characters

[abc]*                             Any file beginning with either a, b, or c

train.[0-9][0-9][0-9]    Any file beginning with train. followed by exactly three numerals

Create Directories

The first example below creates a single directory named test_dir_1, while the second would create three directories:

$ mkdir test_dir_1

$ mkdir test_dir_1 test_dir_2 test_dir_3

Copy Files and Directories

The cp command copies files or directories.

In the first example we copy file1 to file2. If the latter already exists, it is overwritten with the contents of file1 – if not, it is created.

The second example is the same, except that if file2 exists, the user is prompted before it is overwritten. (The option -i stands for interactive.)

$ cp file1 file2

$ cp -i file1 file2

Next, we copy file1 and file2 into directory dir, which must already exist:

$ cp file1 file2 dir

And lastly, using a wildcard, all the files in dir1 are copied into dir2, which must already exist:

$ cp dir1/* dir2

Move and Rename Files

The mv command performs both file moving and file renaming, depending on how it is used. In either case, the original filename no longer exists after the operation! mv is used in much the same way as cp.

In the example below, we move file1 to file2. If file2 exists, the user is prompted before it is overwritten with the contents of file1. (The short -i option stands for interactive.) If not, it is created. In either case, file1 ceases to exist.

$ mv -i file1 file2

This code snippet moves directory dir1 and its contents into directory dir2. If dir2 does not exist, then a) create it, b) move the contents of dir1 into dir2, and c) delete dir1.

$ mv dir1 dir2

Remove Files and Directories

The rm command is used to remove or delete files and directories. But be careful: Once you delete something with rm, it’s gone!

The first example deletes file1 silently, the second prompts the user for confirmation before deleting:

$ rm file1

$ rm -i file1

To delete a directory and all of its contents, including its subdirectories, we have to use rm with the recursive option (-r):

$ rm -r old_directory

This example deletes just the HTML files in a directory:

$ rm *.html

5. Using Commands

Display a Command’s Type

The type command displays the kind of command the shell will execute, given a particular command name. A command can be

  • an executable program
  • a shell builtin
  • a shell function
  • an alias.
MacBook-Air:~ bbettendorf$ type help
help is a shell builtin

Get Help for Shell Builtins

The help command followed by the name of the shell builtin is a built-in help facility. When square brackets appear in the discription, they indicate optional items:

$ help cd
cd: cd [-L|-P] [dir]
    Change the current directory to DIR.  The variable $HOME is the
    default DIR.  The variable CDPATH defines the search path for
    the directory containing DIR.  Alternative directory names in CDPATH
    are separated by a colon (:).  A null directory name is the same as
    the current directory, i.e. `.'.  If DIR begins with a slash (/),
    then CDPATH is not used.  If the directory is not found, and the
    shell option `cdable_vars' is set, then try the word as a variable
    name.  If that variable has a value, then cd to the value of that
    variable.  The -P option says to use the physical directory structure
    instead of following symbolic links; the -L option forces symbolic links
    to be followed.

Most executable programs intended for command-line use provide a formal piece of documentation called a manual or man page. A special paging program called man is used to view them, like this:

$ man ls

Display a Very Brief Description of a Command

The whatis program displays the name and a one-line description of a command:

$ whatis cp
cp(1)                    - copy files

Display Appropriate Commands

You may search the list of manual pages for possible matches based on a search term with apropos. Though crude, this approach is sometimes helpful.

$ apropos movie
avconvert(1)             - movie conversion tool
qc2movie(1)              - Quartz Composer export tool
qtmodernizer(1)          - Tool to convert legacy movies to modern format with minimal changes

Creating Your Own Commands

First a small trick: It’s possible to put more than one commands on a line by seperating each command with a semicolon character:

 $ cd /Users/bbettendorf/Documents/; ls; cd

Imagine we would use such a sequence more than once. Then we can turn this sequence into a new command using alias.

$ alias showme='cd /Users/bbettendorf/Documents/; ls; cd'

Notice the structure of this command:

$ alias name='string'

After we define our alias, it can be used anywhere. To remove the alias, we use the unalias command:

$ unalias showme

There is one tiny problem with defining aliases on the command line: they vanish when your shell session ends. There are other ways to establish long lasting aliases, but this is outside the scope of this tutorial.

Print Line, Word, and Byte Counts

The wc (word count) command is used to display the number of lines, words, and bytes contained in a file. For example:

$ wc README.md 
      24     451    2650 README.md

It prints out three numbers: lines, words, and bytes contained in README.md

Pipelines

With a shell feature called pipelines, the output of one command can be piped into the input of another. To do so, we use the pipe operator | (vertical bar).

In the example below we sort the subdirectories in Flatiron_Course_Materials, remove any duplicates (uniq) and print the last 5 lines with the tail command. By default, head – which prints the first lines – and tail print 10 lines of text, but this can be adjusted with the -n option:

MacBook-Air:~ bbettendorf$ ls /Users/bbettendorf/Documents/Flatiron_Course_Materials/ | sort | uniq | tail -n 5
section44
section45
section46
section47
section48

Generating a sequence of numbers

The command seq generates a sequence of numbers. Let’s generate a sequence of five numbers:

$ seq 5
1
2
3
4
5

Print lines matching a pattern

grep is a powerful program used to find text patterns within files:

$ seq 30 | grep 3
3
13
23
30

When grep encounters a “pattern” in a file, it prints out the lines containing it. The patterns grep can match can be very complex, but for now let’s say we want to find all the files in one directory that have the word csv in the name:

$ ls /Users/bbettendorf/Documents/Flatiron_Course_Materials/competition/ | sort | uniq | grep csv
df_sample_submission.csv
test.csv
train.csv

And if we wanted to know how many numbers (or words)  between 1 and 100 contain a three, we can do:

 $ seq 100 | grep 3 | wc -w
      19

6. Expansion

Expansions and quotes are perhaps the two most important subjects to learn about the shell. Why? Let’s start with expansions.

Each time you type a command line and press the ENTER key, the shell performs several processes upon the text before it carries out your command. We’ve seen a couple of cases of how a simple character, for example * , can have a lot of meaning to the shell. The process that makes this happen is called expansion. With expansion. you enter something, and it is expanded into something else before the shell acts upon it.

Pathname Expansion

The mechanism by which wildcards work is called pathname expansion. Given a home directory that looks like this:

MacBook-Air:~ bbettendorf$ ls
Applications						
Desktop							
Documents						
Downloads
Dropbox							
Library							
Movies							
Music
Pictures

We could carry out the following expansions with the echo command that simply displays a line of text:

MacBook-Air:~ bbettendorf$ echo D*
Desktop Documents Downloads Dropbox

and:

MacBook-Air:~ bbettendorf$ echo *s
Applications Documents Downloads Movies Pictures Sites

or even:

MacBook-Air:~ bbettendorf$ echo [[:upper:]]*
Applications Desktop Documents Downloads Dropbox Library Movies Music Pictures Sites

Arithmetic Expansion

The shell allows arithmetic operations to be performed by expansion and uses the following form:

$((expression))

where expression is an arithmetic operation consisting of values and arithmetic operators.

Arithmetic expansion supports only integers, but can perform quite a number of different operations: addition, subtraction, multiplication, divison (but remember: results are only integers), modulo, exponentiation.

$ echo $((2+2))
4

Single parantheses may be used to group multiple subexpressions:

$ echo $(( (5**2) *3 ))
75

Brace Expansion

Perhaps the strangest expansion is called brace expansion. With it, you can create multiple text strings from a pattern containing braces. Here are two examples:

$ echo Number_{1,2,3}
Number_1 Number_2 Number_3

$ echo {Z..A}
Z Y X W V U T S R Q P O N M L K J I H G F E D C B A

So what is this good for? The most common application is to make lists or files or directories to be created in an easy way:

$ cd Pics
$ mkdir {2016..2018}-0{1..9} {2016..2018}-{10..12}
$ ls
2016-01	2016-04	2016-07	2016-10	2017-01	2017-04	2017-07	2017-10	2018-01	2018-04	2018-07	2018-10
2016-02	2016-05	2016-08	2016-11	2017-02	2017-05	2017-08	2017-11	2018-02	2018-05	2018-08	2018-11
2016-03	2016-06	2016-09	2016-12	2017-03	2017-06	2017-09	2017-12	2018-03	2018-06	2018-09	2018-12

7. Quoting

Now that we’ve seen how many ways the shell can perform expansions, it’s time to learn how we can control it. Take this example:

$ echo this    is    a   whitespace    test
this is a whitespace test

Or this:

$ echo The total amount is $100.00 for you
The total amount is 00.00 for you

In the first example, word splitting by the shell removed extra whitespace from the echo command’s list of arguments. And in the second example, parameter expansion (which we haven’t covered so far) substituted an empty string for the value of $1 because it was an undefined variable.

The shell provides a mechanism called quoting to selectively suppress unwanted expansions.

Double Quotes

If you place text inside double quotes, all the special characters used by the shell lose their meaning and are treated as ordinary characters. The exceptions are $ (dollar sign), \ (backslash), and `(back tick).

$ echo "this    is    a   whitespace    test"
this    is    a   whitespace    test

Single Quotes

If we need to suppress all expansions, we use single quotes:

$ echo 'The total amount is $100.00 for you'
The total amount is $100.00 for you

Escaping Characters

Sometimes we want to quote only a single character. To do this, we can precede a character with a backslash, which in this context is called an escape character. Often this is done inside double quotes to selectively prevent an expansion:

$ echo "The balance for you is \$5.00."
The balance for you is $5.00.

To allow a backslash character to appear, escape it by typing \\

That’s it for today! But stay tuned: I plan to continue writing about the command line in the next few weeks!

. . . . . . . . . . . . . .

Thank you for reading! I hope you enjoyed reading this article, and I am always happy to get critical and friendly feedback, or suggestions for improvement!