What is the difference between grep, sed, awk?

grep searches for patterns in text, sed performs stream editing (find-and-replace, deletions, insertions), and awk is a full programming language for structured text processing with fields and columns.

When should I use grep vs sed vs awk?

Use grep when you need to find lines matching a pattern. Use sed when you need to transform text (substitutions, deletions, insertions). Use awk when you need to process structured/columnar data or need programming logic like variables and arithmetic.

Who created grep, sed, and awk?

grep was created by Ken Thompson in 1973. sed was developed by Lee E. McMahon in 1974. awk was created by Alfred Aho, Peter Weinberger, and Brian Kernighan in 1977 — the name comes from their initials.

What's the connection between Perl and grep, sed, awk?

Perl was created by Larry Wall in 1987 specifically to replace grep, sed, and awk. It inherited regex from grep, the s/// operator from sed, and concepts like $_, split, and BEGIN/END blocks from awk. Perl's regex enhancements later came back to grep via the -P flag (Perl-compatible regex).

grep, sed, awk - The Unix Text Processing Trinity

Q: What is gawk and how does it differ from awk?

gawk (GNU awk) is the most common awk implementation on Linux. It extends awk with network I/O, extensions, and namespaces. gawk 5.4 (February 2026) introduced MinRX as the default regex engine — a fully POSIX-compliant, non-backtracking matcher with polynomial time guarantees, written by Mike Haertel, the original author of GNU grep. gawk 5.4 also reads disk files roughly 9% faster on large files. Other implementations include mawk (Debian/Ubuntu default), nawk, and the original one true awk maintained by Brian Kernighan.

02 — History

Born in Bell Labs

All three tools emerged from AT&T Bell Labs during the golden age of Unix development in the 1970s — a decade that shaped modern computing.

1973

grep is born from ed

Ken Thompson wrote grep overnight as a standalone tool. The name comes from the ed editor command g/re/p — "globally search for a regular expression and print matching lines." Doug McIlroy had asked Thompson to add regex search to ed for large files, and Thompson's solution was to extract the functionality into its own program. This was one of the first examples of the Unix philosophy: small, composable tools connected by pipes.

👤 Ken Thompson, Bell Labs

1973–1979

grep evolves: egrep & fgrep

Alfred Aho wrote egrep (extended grep), adding support for the +, ?, and | operators — full regular expression syntax. He also created fgrep (fixed grep), which uses the Aho-Corasick algorithm for extremely fast multi-pattern matching without regex. These variants were later unified as flags: grep -E and grep -F.

👤 Alfred Aho, Bell Labs

1974

sed brings editing to streams

Lee E. McMahon developed sed at Bell Labs as a non-interactive version of the ed editor. The key innovation was processing text as a stream — reading from standard input, applying transformations, and writing to standard output. This made it perfect for pipelines and automation. McMahon's sed could handle files too large to fit in memory, a critical capability for 1970s hardware.

👤 Lee E. McMahon, Bell Labs

1977

awk — a language, not just a command

Alfred Aho, Peter Weinberger, and Brian Kernighan created awk as a pattern-matching programming language. Named after their initials (A-W-K), it was designed to process structured data by automatically splitting lines into fields. awk introduced concepts like BEGIN/END blocks, associative arrays, and field-based processing that influenced later languages including Perl and Python.

👤 Aho, Weinberger & Kernighan, Bell Labs

1985

The AWK Programming Language

Aho, Kernighan, and Weinberger published "The AWK Programming Language" — the definitive reference. This coincided with "new awk" (nawk), a major revision adding user-defined functions, multiple input streams, and computed regex. It cemented awk's position as a serious programming tool, not just a command-line utility.

📖 Addison-Wesley, 1988

1999

GNU grep — the modern standard

GNU grep, maintained by Mike Haertel and the GNU project, became the de facto implementation on Linux systems. It unified grep, egrep, and fgrep into a single binary with flags, added --color highlighting, recursive search (-r), Perl-compatible regex (-P), and significant performance optimizations using Boyer-Moore and other algorithms.

👤 Mike Haertel & GNU Project

2024

Still evolving: second edition of TAWKPL

Brian Kernighan co-authored the second edition of "The AWK Programming Language" — nearly four decades after the original. Kernighan also continues to maintain the original "one true awk" from Bell Labs. Meanwhile, gawk (GNU awk), led by Arnold Robbins, keeps adding features like network I/O, loadable extensions, and namespace support — proof that awk remains a living, evolving tool on all fronts.

👤 Brian Kernighan & Arnold Robbins, 2024

03 — Examples

From basic to battle-tested

Real-world examples organized by tool and difficulty level.

Search for a string in files Basic

# Search recursively in all .py files grep -rn "import requests" --include="*.py" ./src/ # Case-insensitive search grep -i "error" /var/log/syslog # Show lines that do NOT match grep -v "^#" config.conf

-r recursive, -n line numbers, -i case-insensitive, -v invert match.

Context around matches Basic

# Show 3 lines before and after match grep -C 3 "segfault" /var/log/kern.log # Show 5 lines after match (great for stack traces) grep -A 5 "Exception" app.log # Show 2 lines before match grep -B 2 "FATAL" app.log

-C context (before+after), -A after, -B before. Perfect for log analysis.

Regex power moves Intermediate

# Match email addresses grep -oE '[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}' contacts.txt # Match IP addresses grep -oE '\b([0-9]{1,3}\.){3}[0-9]{1,3}\b' access.log # Find functions in source code grep -nE '^\s*(def|function|func|fn)\s+\w+' *.py *.js *.go

-o prints only the matching part. -E enables extended regex.

Advanced: multi-pattern & process substitution Advanced

Combining grep with pipes and xargs for complex multi-stage filtering.

Find and replace Basic

# Simple replacement (first occurrence per line) sed 's/foo/bar/' input.txt # Global replacement (all occurrences) sed 's/foo/bar/g' input.txt # In-place edit (modifies the file directly) sed -i 's/http:/https:/g' urls.txt # In-place with backup sed -i.bak 's/old/new/g' config.yml

The s command is sed's bread and butter. g flag = all occurrences. -i = in-place edit.

Line operations Basic

# Delete lines matching a pattern sed '/^#/d' config.conf # Delete empty lines sed '/^$/d' file.txt # Print only lines 10-20 sed -n '10,20p' bigfile.log # Insert text before line 5 sed '5i\# This is a new comment' script.sh

d deletes, p prints, i inserts. -n suppresses default output.

Capture groups & backreferences Intermediate

# Swap first and last names sed -E 's/^(\w+) (\w+)$/\2, \1/' names.txt # Add quotes around each word sed -E 's/(\w+)/"\1"/g' words.txt # Extract domain from URLs sed -E 's|https?://([^/]+).*|\1|' urls.txt

\1, \2 reference captured groups. -E enables extended regex (no escaping parens).

Multi-line & advanced sed Advanced

# Delete block between two patterns sed '/BEGIN_BLOCK/,/END_BLOCK/d' config.txt # Replace only within matching lines sed '/error/s/level=info/level=error/' app.log # Multiple operations in one pass sed -e 's/foo/bar/g' -e 's/baz/qux/g' -e '/^$/d' input.txt # Append line after match sed '/\[database\]/a host=db.production.local' config.ini

Range patterns (/start/,/end/) and chained operations make sed a surgical text editor.

Field extraction Basic

# Print specific columns (space-delimited) awk '{print $1, $3}' data.txt # Custom delimiter (CSV) awk -F',' '{print $2}' users.csv # Print last field of each line awk '{print $NF}' access.log # Custom output separator awk -F':' 'BEGIN{OFS="\t"} {print $1, $3, $7}' /etc/passwd

$1, $2... are fields. $NF = last field. $0 = entire line. -F sets delimiter.

Filtering & pattern matching Basic

# Print lines where column 3 > 100 awk '$3 > 100' sales.txt # Pattern match on a field awk '$2 ~ /error/' log.txt # Combine conditions awk '$3 > 50 && $4 == "USD"' transactions.csv # Print line numbers awk '/TODO/ {print NR": "$0}' source.py

awk processes each line and lets you filter by any condition — numeric, string, or regex.

Aggregation & statistics Intermediate

# Sum a column awk '{sum += $3} END {print "Total:", sum}' sales.csv # Average awk '{sum += $2; n++} END {print "Avg:", sum/n}' scores.txt # Min and Max awk 'NR==1{min=max=$3} $3>max{max=$3} $3<min{min=$3} END{print min,max}' data.txt # Count occurrences per group awk '{count[$1]++} END {for (k in count) print k, count[k]}' access.log

awk's variables, arithmetic, and associative arrays make it a command-line spreadsheet.

Advanced: report generation Advanced

# Generate a formatted report from CSV awk -F',' ' BEGIN { printf "%-20s %10s %8s\n", "Product", "Revenue", "Units" printf "%-20s %10s %8s\n", "-------", "-------", "-----" } NR > 1 { rev[$1] += $3; units[$1] += $2 } END { for (p in rev) printf "%-20s %10.2f %8d\n", p, rev[p], units[p] } ' sales.csv

awk's printf and associative arrays can generate full reports directly on the command line.

06 — Tricks & Tips

Power user secrets

Techniques that separate beginners from professionals.

TRICK 01

grep + xargs for bulk operations

Find files containing a pattern and do something with each one — safely handling spaces in filenames.

# Find all files with TODO and open in editor grep -rlZ "TODO" src/ | xargs -0 code # Count TODOs per file grep -rcl "TODO" src/ | sort -t: -k2 -rn

TRICK 02

sed for renaming files in bulk

Combine sed with shell loops to rename hundreds of files using regex patterns.

# Rename .jpeg to .jpg for f in *.jpeg; do mv "$f" "$(echo "$f" | sed 's/\.jpeg$/.jpg/')" done # Lowercase all filenames for f in *; do mv "$f" "$(echo "$f" | sed 's/.*/\L&/')" done

TRICK 03

awk as a calculator

Use awk for quick math right on the command line — no bc or python needed.

# Quick calculation echo | awk '{print 2^32}' # Convert bytes to human-readable ls -l | awk '{ if ($5 > 1073741824) printf "%s\t%.1fG\n", $NF, $5/1073741824 else if ($5 > 1048576) printf "%s\t%.1fM\n", $NF, $5/1048576 else printf "%s\t%.1fK\n", $NF, $5/1024 }'

TRICK 04

The pipe trinity

Combine all three tools in a single pipeline for maximum power.

# Top 10 IPs hitting 404s grep " 404 " access.log \ | awk '{print $1}' \ | sort | uniq -c | sort -rn \ | head -10 # Find large files, format nicely find / -size +100M 2>/dev/null \ | xargs ls -lh \ | awk '{print $5, $NF}' \ | sed 's|/home/user|~|g' \ | sort -hr

TRICK 05

sed's hold space (multi-line magic)

sed has a hidden "hold space" buffer for complex multi-line transformations.

# Join every 2 lines into one sed 'N;s/\n/ /' file.txt # Reverse line order (like tac) sed -n '1!G;h;$p' file.txt # Remove duplicate consecutive lines (like uniq) sed '$!N; /^$.*$\n\1$/!P; D' file.txt

TRICK 06

awk's getline for external commands

Execute shell commands from within awk and use their output.

# Add timestamp to each line awk '{ "date +%H:%M:%S" | getline ts close("date +%H:%M:%S") print ts, $0 }' input.txt # Resolve IPs to hostnames awk '{ cmd = "dig +short -x " $1 cmd | getline hostname close(cmd) print $1, hostname }' ips.txt

TRICK 07

grep --color in pipelines

Force color output even when piping to keep matches highlighted.

# Keep color through pipes grep --color=always "error" log.txt | less -R # Highlight without filtering grep --color=always -E "ERROR|$" app.log # Multiple colors for different patterns grep --color=always "ERROR" log.txt \ | GREP_COLORS='mt=01;33' grep --color=always -E "WARN|$"

TRICK 08

awk for JSON-like output

Generate structured output formats directly from awk.

# CSV to JSON array awk -F',' 'NR>1 { printf "%s{\"name\":\"%s\",\"age\":%s}", (NR>2?",":""), $1, $2 } BEGIN{print "["} END{print "]"}' data.csv # Generate HTML table awk 'BEGIN{print "<table>"} { print "<tr>" for(i=1;i<=NF;i++) print "<td>"$i"</td>" print "</tr>" } END{print "</table>"}' data.txt

07 — Comparison

When to use which?

A side-by-side comparison to help you pick the right tool for the job.

Feature	grep	sed	awk
Primary purpose	Search & filter lines	Transform text streams	Process structured data
Best for	Finding patterns in files	Find-and-replace, deletions	Columnar data, reports
Regex support	BRE, ERE, PCRE	BRE, ERE	ERE
Variables	No	Hold/pattern space only	Full variables & arrays
Arithmetic	No	No	Yes (full math)
Field splitting	No	Manual (regex)	Automatic (`-F`)
In-place editing	No	Yes (`-i`)	Via `gawk -i inplace`
Programming constructs	None	Branches, labels	if/else, for, while, functions
Speed for simple search	Fastest	Fast	Good
Learning curve	Easy	Medium	Medium–Hard
Typical one-liner	`grep -rn "bug"`	`sed 's/old/new/g'`	`awk '{print $2}'`

08 — FAQ

Frequently asked questions

Each tool has a distinct focus. grep is a search tool — it scans text and prints lines matching a pattern. sed is a stream editor — it reads text, applies transformations (substitutions, deletions, insertions), and outputs the result. awk is a programming language for structured text — it automatically splits lines into fields and supports variables, arrays, and arithmetic. Think of it as: grep finds, sed changes, awk computes.

Start with grep — it's the simplest and you'll use it constantly. Next, learn basic sed substitutions (s/old/new/g) and line deletions. Finally, tackle awk for field-based processing. You can be productive with grep in 10 minutes, sed in an hour, and awk in an afternoon. Mastery of each takes longer, but basic usage covers 90% of daily needs.

Absolutely. These tools are available on virtually every Unix/Linux/macOS system without installation. They process text faster than most alternatives for common tasks, they compose beautifully with pipes, and they're the foundation of shell scripting. While Python, Perl, and modern alternatives exist, grep/sed/awk remain the fastest path from "I have a text problem" to "it's solved" — especially on servers where you can't install additional software.

ripgrep (rg) is a modern alternative to grep written in Rust. It's faster for recursive searches, respects .gitignore by default, and uses PCRE2 regex. However, grep is universally available (no installation needed), supports the POSIX standard for portability, and is the tool referenced in virtually all documentation and tutorials. Learn grep first — then use ripgrep if you need speed for large codebase searches.

For one-liners and quick data extraction, awk is often faster to write and execute than Python. You don't need to import modules, open files, or write boilerplate. However, for complex logic, error handling, API calls, or anything beyond text processing, Python is the better choice. The sweet spot for awk is tasks you can express in 1–5 lines. If your awk script exceeds 20 lines, it's probably time to switch to Python.

There are several options. WSL (Windows Subsystem for Linux) gives you full native versions. Git Bash includes grep, sed, and awk (via MinGW). Cygwin provides a full POSIX environment. You can also install GnuWin32 for standalone Windows ports. WSL is the recommended approach as it provides the most compatible and performant experience.

BRE (Basic Regular Expressions) is the default for grep and sed. Characters like (, ), {, }, +, and ? are literal — you must escape them to use as metacharacters: \(, \+, etc. ERE (Extended Regular Expressions), enabled with grep -E or sed -E, treats these as metacharacters by default. ERE is what most people expect from regex. When in doubt, use -E.

Perl was born directly from these three tools. In 1987, Larry Wall created Perl specifically to replace the awkward combination of grep, sed, and awk in his workflows. Perl inherited regex syntax from grep, the s/// substitution operator from sed, and concepts like $_ (the default variable), split, field processing, and BEGIN/END blocks from awk. In a sense, Perl is what you get when you merge all three into a single general-purpose language. This heritage also influenced later languages — Python's re module and Ruby's built-in regex support trace their lineage through Perl back to grep. Even grep -P (Perl-compatible regex) acknowledges this relationship by bringing Perl's enhanced regex syntax back into grep itself.

gawk (GNU awk) is the most widely used implementation of awk on Linux systems — when you type awk on most distributions, you're actually running gawk. It extends the original awk with features like network I/O, loadable extensions, namespace support, and persistent memory. A major milestone came in February 2026 with gawk 5.4, which switched its default regular expression engine to MinRX — a new, fully POSIX-compliant, non-backtracking matcher with polynomial time guarantees, written by Mike Haertel (the original author of GNU grep). The previous GNU regex engine was not fully POSIX-compliant, particularly around longest leftmost submatch rules. On top of that, gawk 5.4 is also faster at reading disk files — roughly 9% faster on large files thanks to removing unnecessary timeout checks. The old regex engine remains available via the GAWK_GNU_MATCHERS environment variable but is scheduled for eventual removal. Other awk implementations include mawk (default on Debian/Ubuntu, optimized for speed), nawk (the "new awk" from Bell Labs), and the original one true awk maintained by Brian Kernighan himself.

grep, sed, awk

Three tools, three philosophies

grep

sed

awk