The Unix Text Processing Trinity

grep, sed, awk

Three tools born in Bell Labs that still power every server, every pipeline, and every sysadmin's toolkit — over 50 years later.
Learn their history and master them the right way.

bash — 80×24
# Find all error lines in a log
$ grep -n "ERROR" /var/log/app.log
142:ERROR: Connection timeout to db-primary
287:ERROR: Out of memory in worker-3
# Replace all occurrences in-place
$ sed -i 's/localhost/0.0.0.0/g' config.yaml
# Sum values from the 3rd column
$ awk '{sum += $3} END {print sum}' sales.csv
1847293.50
$

Three tools, three philosophies

Each tool follows the Unix philosophy of doing one thing well. Together, they form the most powerful text processing toolkit ever created — no installation required on any Unix system.

g/
Since 1973

grep

Global Regular Expression Print

The searcher. grep scans input line-by-line and prints lines that match a pattern. It's the fastest way to find a needle in a haystack of text — from log files to codebases, grep is usually the first command you reach for.

s/
Since 1974

sed

Stream Editor

The transformer. sed reads input as a stream, applies editing commands, and outputs the result. Find-and-replace across thousands of files, delete lines matching a pattern, insert text — sed automates what you'd do manually in an editor.

{}
Since 1977

awk

Aho, Weinberger & Kernighan

The programmer. awk is a complete programming language designed for structured text. It splits each line into fields, supports variables, arrays, arithmetic, and functions. For columnar data, nothing comes close.

Born in Bell Labs

All three tools emerged from AT&T Bell Labs during the golden age of Unix development in the 1970s — a decade that shaped modern computing.

1973

grep is born from ed

Ken Thompson wrote grep overnight as a standalone tool. The name comes from the ed editor command g/re/p — "globally search for a regular expression and print matching lines." Doug McIlroy had asked Thompson to add regex search to ed for large files, and Thompson's solution was to extract the functionality into its own program. This was one of the first examples of the Unix philosophy: small, composable tools connected by pipes.

👤 Ken Thompson, Bell Labs
1973–1979

grep evolves: egrep & fgrep

Alfred Aho wrote egrep (extended grep), adding support for the +, ?, and | operators — full regular expression syntax. He also created fgrep (fixed grep), which uses the Aho-Corasick algorithm for extremely fast multi-pattern matching without regex. These variants were later unified as flags: grep -E and grep -F.

👤 Alfred Aho, Bell Labs
1974

sed brings editing to streams

Lee E. McMahon developed sed at Bell Labs as a non-interactive version of the ed editor. The key innovation was processing text as a stream — reading from standard input, applying transformations, and writing to standard output. This made it perfect for pipelines and automation. McMahon's sed could handle files too large to fit in memory, a critical capability for 1970s hardware.

👤 Lee E. McMahon, Bell Labs
1977

awk — a language, not just a command

Alfred Aho, Peter Weinberger, and Brian Kernighan created awk as a pattern-matching programming language. Named after their initials (A-W-K), it was designed to process structured data by automatically splitting lines into fields. awk introduced concepts like BEGIN/END blocks, associative arrays, and field-based processing that influenced later languages including Perl and Python.

👤 Aho, Weinberger & Kernighan, Bell Labs
1985

The AWK Programming Language

Aho, Kernighan, and Weinberger published "The AWK Programming Language" — the definitive reference. This coincided with "new awk" (nawk), a major revision adding user-defined functions, multiple input streams, and computed regex. It cemented awk's position as a serious programming tool, not just a command-line utility.

📖 Addison-Wesley, 1988
1999

GNU grep — the modern standard

GNU grep, maintained by Mike Haertel and the GNU project, became the de facto implementation on Linux systems. It unified grep, egrep, and fgrep into a single binary with flags, added --color highlighting, recursive search (-r), Perl-compatible regex (-P), and significant performance optimizations using Boyer-Moore and other algorithms.

👤 Mike Haertel & GNU Project
2024

Still evolving: second edition of TAWKPL

Brian Kernighan co-authored the second edition of "The AWK Programming Language" — nearly four decades after the original. Kernighan also continues to maintain the original "one true awk" from Bell Labs. Meanwhile, gawk (GNU awk), led by Arnold Robbins, keeps adding features like network I/O, loadable extensions, and namespace support — proof that awk remains a living, evolving tool on all fronts.

👤 Brian Kernighan & Arnold Robbins, 2024

From basic to battle-tested

Real-world examples organized by tool and difficulty level.

Search for a string in files Basic
# Search recursively in all .py files grep -rn "import requests" --include="*.py" ./src/ # Case-insensitive search grep -i "error" /var/log/syslog # Show lines that do NOT match grep -v "^#" config.conf
-r recursive, -n line numbers, -i case-insensitive, -v invert match.
Context around matches Basic
# Show 3 lines before and after match grep -C 3 "segfault" /var/log/kern.log # Show 5 lines after match (great for stack traces) grep -A 5 "Exception" app.log # Show 2 lines before match grep -B 2 "FATAL" app.log
-C context (before+after), -A after, -B before. Perfect for log analysis.
Regex power moves Intermediate
# Match email addresses grep -oE '[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}' contacts.txt # Match IP addresses grep -oE '\b([0-9]{1,3}\.){3}[0-9]{1,3}\b' access.log # Find functions in source code grep -nE '^\s*(def|function|func|fn)\s+\w+' *.py *.js *.go
-o prints only the matching part. -E enables extended regex.
Advanced: multi-pattern & process substitution Advanced
# Match multiple patterns at once grep -E "error|warning|critical" /var/log/syslog | sort | uniq -c | sort -rn # Search with Perl regex (lookahead) grep -P '(?<=password=)[^\s]+' config.env # Find files containing BOTH patterns grep -rlZ "import os" *.py | xargs -0 grep -l "subprocess"
Combining grep with pipes and xargs for complex multi-stage filtering.
Find and replace Basic
# Simple replacement (first occurrence per line) sed 's/foo/bar/' input.txt # Global replacement (all occurrences) sed 's/foo/bar/g' input.txt # In-place edit (modifies the file directly) sed -i 's/http:/https:/g' urls.txt # In-place with backup sed -i.bak 's/old/new/g' config.yml
The s command is sed's bread and butter. g flag = all occurrences. -i = in-place edit.
Line operations Basic
# Delete lines matching a pattern sed '/^#/d' config.conf # Delete empty lines sed '/^$/d' file.txt # Print only lines 10-20 sed -n '10,20p' bigfile.log # Insert text before line 5 sed '5i\# This is a new comment' script.sh
d deletes, p prints, i inserts. -n suppresses default output.
Capture groups & backreferences Intermediate
# Swap first and last names sed -E 's/^(\w+) (\w+)$/\2, \1/' names.txt # Add quotes around each word sed -E 's/(\w+)/"\1"/g' words.txt # Extract domain from URLs sed -E 's|https?://([^/]+).*|\1|' urls.txt
\1, \2 reference captured groups. -E enables extended regex (no escaping parens).
Multi-line & advanced sed Advanced
# Delete block between two patterns sed '/BEGIN_BLOCK/,/END_BLOCK/d' config.txt # Replace only within matching lines sed '/error/s/level=info/level=error/' app.log # Multiple operations in one pass sed -e 's/foo/bar/g' -e 's/baz/qux/g' -e '/^$/d' input.txt # Append line after match sed '/\[database\]/a host=db.production.local' config.ini
Range patterns (/start/,/end/) and chained operations make sed a surgical text editor.
Field extraction Basic
# Print specific columns (space-delimited) awk '{print $1, $3}' data.txt # Custom delimiter (CSV) awk -F',' '{print $2}' users.csv # Print last field of each line awk '{print $NF}' access.log # Custom output separator awk -F':' 'BEGIN{OFS="\t"} {print $1, $3, $7}' /etc/passwd
$1, $2... are fields. $NF = last field. $0 = entire line. -F sets delimiter.
Filtering & pattern matching Basic
# Print lines where column 3 > 100 awk '$3 > 100' sales.txt # Pattern match on a field awk '$2 ~ /error/' log.txt # Combine conditions awk '$3 > 50 && $4 == "USD"' transactions.csv # Print line numbers awk '/TODO/ {print NR": "$0}' source.py
awk processes each line and lets you filter by any condition — numeric, string, or regex.
Aggregation & statistics Intermediate
# Sum a column awk '{sum += $3} END {print "Total:", sum}' sales.csv # Average awk '{sum += $2; n++} END {print "Avg:", sum/n}' scores.txt # Min and Max awk 'NR==1{min=max=$3} $3>max{max=$3} $3<min{min=$3} END{print min,max}' data.txt # Count occurrences per group awk '{count[$1]++} END {for (k in count) print k, count[k]}' access.log
awk's variables, arithmetic, and associative arrays make it a command-line spreadsheet.
Advanced: report generation Advanced
# Generate a formatted report from CSV awk -F',' ' BEGIN { printf "%-20s %10s %8s\n", "Product", "Revenue", "Units" printf "%-20s %10s %8s\n", "-------", "-------", "-----" } NR > 1 { rev[$1] += $3; units[$1] += $2 } END { for (p in rev) printf "%-20s %10.2f %8d\n", p, rev[p], units[p] } ' sales.csv
awk's printf and associative arrays can generate full reports directly on the command line.

Try it live

Enter your input text and command to see results instantly. This simulates grep, sed, and awk behavior right in your browser — no server required.

Interactive Playground

Output
Click "Run" or press Ctrl+Enter to execute...

Quick reference

The most useful flags and patterns at a glance. Bookmark this page.

grep flags
-iCase-insensitive matching
-vInvert match (non-matching lines)
-r / -RRecursive search in directories
-nShow line numbers
-lList filenames only
-cCount matching lines
-oPrint only matching portion
-EExtended regex (egrep)
-PPerl-compatible regex
-wMatch whole words only
-A n / -B n / -C nContext: after / before / both
--include="*.ext"Filter files by extension
--color=autoHighlight matches in color
sed commands
s/old/new/Substitute first occurrence
s/old/new/gSubstitute all occurrences
s/old/new/giCase-insensitive substitution
/pattern/dDelete matching lines
/pattern/pPrint matching lines
-n 'Np'Print line N
-iEdit file in place
-i.bakIn-place with backup
-e 'cmd1' -e 'cmd2'Multiple operations
/start/,/end/Range pattern (from/to)
\1, \2Backreferences
y/abc/xyz/Transliterate characters
awk essentials
$0Entire line
$1, $2, $NFFields (1st, 2nd, last)
NRCurrent line number
NFNumber of fields
-F','Set field separator
BEGIN { ... }Run before processing
END { ... }Run after processing
/regex/ { ... }Pattern-action block
$2 ~ /pat/Field matches regex
printf "fmt", argsFormatted output
array[key]++Associative arrays
length(), split(), substr()Built-in functions

Power user secrets

Techniques that separate beginners from professionals.

TRICK 01

grep + xargs for bulk operations

Find files containing a pattern and do something with each one — safely handling spaces in filenames.

# Find all files with TODO and open in editor grep -rlZ "TODO" src/ | xargs -0 code # Count TODOs per file grep -rcl "TODO" src/ | sort -t: -k2 -rn
TRICK 02

sed for renaming files in bulk

Combine sed with shell loops to rename hundreds of files using regex patterns.

# Rename .jpeg to .jpg for f in *.jpeg; do mv "$f" "$(echo "$f" | sed 's/\.jpeg$/.jpg/')" done # Lowercase all filenames for f in *; do mv "$f" "$(echo "$f" | sed 's/.*/\L&/')" done
TRICK 03

awk as a calculator

Use awk for quick math right on the command line — no bc or python needed.

# Quick calculation echo | awk '{print 2^32}' # Convert bytes to human-readable ls -l | awk '{ if ($5 > 1073741824) printf "%s\t%.1fG\n", $NF, $5/1073741824 else if ($5 > 1048576) printf "%s\t%.1fM\n", $NF, $5/1048576 else printf "%s\t%.1fK\n", $NF, $5/1024 }'
TRICK 04

The pipe trinity

Combine all three tools in a single pipeline for maximum power.

# Top 10 IPs hitting 404s grep " 404 " access.log \ | awk '{print $1}' \ | sort | uniq -c | sort -rn \ | head -10 # Find large files, format nicely find / -size +100M 2>/dev/null \ | xargs ls -lh \ | awk '{print $5, $NF}' \ | sed 's|/home/user|~|g' \ | sort -hr
TRICK 05

sed's hold space (multi-line magic)

sed has a hidden "hold space" buffer for complex multi-line transformations.

# Join every 2 lines into one sed 'N;s/\n/ /' file.txt # Reverse line order (like tac) sed -n '1!G;h;$p' file.txt # Remove duplicate consecutive lines (like uniq) sed '$!N; /^\(.*\)\n\1$/!P; D' file.txt
TRICK 06

awk's getline for external commands

Execute shell commands from within awk and use their output.

# Add timestamp to each line awk '{ "date +%H:%M:%S" | getline ts close("date +%H:%M:%S") print ts, $0 }' input.txt # Resolve IPs to hostnames awk '{ cmd = "dig +short -x " $1 cmd | getline hostname close(cmd) print $1, hostname }' ips.txt
TRICK 07

grep --color in pipelines

Force color output even when piping to keep matches highlighted.

# Keep color through pipes grep --color=always "error" log.txt | less -R # Highlight without filtering grep --color=always -E "ERROR|$" app.log # Multiple colors for different patterns grep --color=always "ERROR" log.txt \ | GREP_COLORS='mt=01;33' grep --color=always -E "WARN|$"
TRICK 08

awk for JSON-like output

Generate structured output formats directly from awk.

# CSV to JSON array awk -F',' 'NR>1 { printf "%s{\"name\":\"%s\",\"age\":%s}", (NR>2?",":""), $1, $2 } BEGIN{print "["} END{print "]"}' data.csv # Generate HTML table awk 'BEGIN{print "<table>"} { print "<tr>" for(i=1;i<=NF;i++) print "<td>"$i"</td>" print "</tr>" } END{print "</table>"}' data.txt

When to use which?

A side-by-side comparison to help you pick the right tool for the job.

Feature grep sed awk
Primary purpose Search & filter lines Transform text streams Process structured data
Best for Finding patterns in files Find-and-replace, deletions Columnar data, reports
Regex support BRE, ERE, PCRE BRE, ERE ERE
Variables No Hold/pattern space only Full variables & arrays
Arithmetic No No Yes (full math)
Field splitting No Manual (regex) Automatic (-F)
In-place editing No Yes (-i) Via gawk -i inplace
Programming constructs None Branches, labels if/else, for, while, functions
Speed for simple search Fastest Fast Good
Learning curve Easy Medium Medium–Hard
Typical one-liner grep -rn "bug" sed 's/old/new/g' awk '{print $2}'

Frequently asked questions

Each tool has a distinct focus. grep is a search tool — it scans text and prints lines matching a pattern. sed is a stream editor — it reads text, applies transformations (substitutions, deletions, insertions), and outputs the result. awk is a programming language for structured text — it automatically splits lines into fields and supports variables, arrays, and arithmetic. Think of it as: grep finds, sed changes, awk computes.
Start with grep — it's the simplest and you'll use it constantly. Next, learn basic sed substitutions (s/old/new/g) and line deletions. Finally, tackle awk for field-based processing. You can be productive with grep in 10 minutes, sed in an hour, and awk in an afternoon. Mastery of each takes longer, but basic usage covers 90% of daily needs.
Absolutely. These tools are available on virtually every Unix/Linux/macOS system without installation. They process text faster than most alternatives for common tasks, they compose beautifully with pipes, and they're the foundation of shell scripting. While Python, Perl, and modern alternatives exist, grep/sed/awk remain the fastest path from "I have a text problem" to "it's solved" — especially on servers where you can't install additional software.
ripgrep (rg) is a modern alternative to grep written in Rust. It's faster for recursive searches, respects .gitignore by default, and uses PCRE2 regex. However, grep is universally available (no installation needed), supports the POSIX standard for portability, and is the tool referenced in virtually all documentation and tutorials. Learn grep first — then use ripgrep if you need speed for large codebase searches.
For one-liners and quick data extraction, awk is often faster to write and execute than Python. You don't need to import modules, open files, or write boilerplate. However, for complex logic, error handling, API calls, or anything beyond text processing, Python is the better choice. The sweet spot for awk is tasks you can express in 1–5 lines. If your awk script exceeds 20 lines, it's probably time to switch to Python.
There are several options. WSL (Windows Subsystem for Linux) gives you full native versions. Git Bash includes grep, sed, and awk (via MinGW). Cygwin provides a full POSIX environment. You can also install GnuWin32 for standalone Windows ports. WSL is the recommended approach as it provides the most compatible and performant experience.
BRE (Basic Regular Expressions) is the default for grep and sed. Characters like (, ), {, }, +, and ? are literal — you must escape them to use as metacharacters: \(, \+, etc. ERE (Extended Regular Expressions), enabled with grep -E or sed -E, treats these as metacharacters by default. ERE is what most people expect from regex. When in doubt, use -E.
Perl was born directly from these three tools. In 1987, Larry Wall created Perl specifically to replace the awkward combination of grep, sed, and awk in his workflows. Perl inherited regex syntax from grep, the s/// substitution operator from sed, and concepts like $_ (the default variable), split, field processing, and BEGIN/END blocks from awk. In a sense, Perl is what you get when you merge all three into a single general-purpose language. This heritage also influenced later languages — Python's re module and Ruby's built-in regex support trace their lineage through Perl back to grep. Even grep -P (Perl-compatible regex) acknowledges this relationship by bringing Perl's enhanced regex syntax back into grep itself.
gawk (GNU awk) is the most widely used implementation of awk on Linux systems — when you type awk on most distributions, you're actually running gawk. It extends the original awk with features like network I/O, loadable extensions, namespace support, and persistent memory. A major milestone came in February 2026 with gawk 5.4, which switched its default regular expression engine to MinRX — a new, fully POSIX-compliant, non-backtracking matcher with polynomial time guarantees, written by Mike Haertel (the original author of GNU grep). The previous GNU regex engine was not fully POSIX-compliant, particularly around longest leftmost submatch rules. On top of that, gawk 5.4 is also faster at reading disk files — roughly 9% faster on large files thanks to removing unnecessary timeout checks. The old regex engine remains available via the GAWK_GNU_MATCHERS environment variable but is scheduled for eventual removal. Other awk implementations include mawk (default on Debian/Ubuntu, optimized for speed), nawk (the "new awk" from Bell Labs), and the original one true awk maintained by Brian Kernighan himself.