awk / cut: Skip First Two Fields and Print the Rest of Line

Originally published at: https://www.cyberciti.biz/faq/unix-linux-bsd-appleosx-skip-fields-command/

I would like to skip first two or three fields at the the beginning of a line and print the rest of line. Consider the following input:
    This is a test
    Giving back more than we take

I want my input file with the following output:
    a test
    more than we take

How do I printing lines from the nth field using awk under UNIX or Linux operating systems?

Consider input having two consecutive spaces. The old solution accurately preserves “the rest of the line”:

echo 'This is a  test' | awk '{print substr($0, index($0,$3))}'
# a   test

The new solution doesn’t:

echo 'This is a  test' | awk '{ $1=""; $2=""; print}'
#   a test

Here’s a solution using sed:

echo 'This is a  test' | sed -E 's/^([^ ]* *){2}//'
# a  test

Here’s an advanced use of GNU sed and cut to parse ls and only display certain fields like size and filename, where the filename may contain spaces, tabs, and newlines.

(Yes, I know parsing ls is a bad practice; I’m just using it as a example of something with fields where the last field can contain white space.)

cd "$(mktemp -d)"
echo -n A > filename
echo -n AB >  'filename with   spaces'
echo -n ABC >  $'filename with\nnewline'
echo -n ABCD >  $'filename with\ttab'
ls -lb | sed -E '1d;s/ +/\x00/g9; s/ +/\t/g; s/\x00/ /g' | cut -f 5,9 | column -s $'\t' -t
#1  filename
#3  filename\ with\nnewline
#2  filename\ with\ \ \ spaces
#4  filename\ with\ttab

The g9 thing is a GNU extension I found via: https://unix.stackexchange.com/a/155810/89782

The “old solution” has a flaw.

It works for the given example input:

echo 'This is a    test' | awk '{print substr($0, index($0,$3))}'

Result:

a    test

But, if we change ‘This’ to ‘That’, the result is quite unexpected:

echo 'That is a    test' | awk '{print substr($0, index($0,$3))}'

Result:

at is a    test

:astonished:

sed provides the best solution to this problem

We’ll simulate awk’s default behavior of separating fields by sequences of spaces, TABs, and newlines.

Fields are normally separated by whitespace sequences (spaces, TABs, and newlines)
Default Field Splitting (The GNU Awk User’s Guide)

echo 'This   is   a    test' | sed -E 's/^([^ \t]*[ \t]*){2}//'

Result:

a    test

:heavy_check_mark:

Tip: To start at the 4th field, replace the 2 with a 3. You get the idea.

1 Like

Wow @robin217 that is some dedication about updating old solution. thank you :slight_smile:


Linux sysadmin blog - Linux/Unix Howtos and Tutorials - Linux bash shell scripting wiki