Mastering Text Processing with Grep, Sed, Awk, Cut, and Sort

Mastering Text Processing with Grep, Sed, Awk, Cut, and Sort


Text processing is an essential skill for anyone working with data, scripts, or system administration. Linux provides a suite of powerful command-line tools that make it easy to search, modify, and manipulate text. Whether you’re working with log files, configuration files, or data sets, mastering tools like grepsedawkcut, and sort can save you time and improve efficiency.

So, in this tutorial article, I will share about how we can use text processing in real life with cut, sort, extract, filter with mini examples.




1. Grep: Searching Text Patterns

grep is used to search for specific patterns or text files. It is very helpful for finding logs or any particular files information.



Basic Syntax

grep [options] PATTERN [file...]
Enter fullscreen mode

Exit fullscreen mode



Common Use Cases



1. Simple String Search

grep "error" logfile.txt
Enter fullscreen mode

Exit fullscreen mode

This command will execute and grab error in log file and printing each line by line.



2. Case-Insensitive Search

grep -i "warning" logfile.txt
Enter fullscreen mode

Exit fullscreen mode

If you want to make more recursive then -I adds both case-sensitive features like, Warning & warning.



3. Search with Regular Expressions

grep -E "ERROR|WARN" logfile.txt
Enter fullscreen mode

Exit fullscreen mode

As you read in title -E flag used for regular expression. Suppose we have to find both error and warning, That time this flag can be useful.



4. Search Recursively

grep -r "TODO" /path/to/project
Enter fullscreen mode

Exit fullscreen mode

This command search TODO keyword in all the files as per we mentioned recursively. Real life example, We can search single word in entire source code.




2. Sed: Stream Editor for Modifying Text

sed stands for (stream editor) is an non-interactive command for editing and modifying text in file or maybe stream. It helps developer to make small edit in real time codebase.



Basic Syntax

sed [options] 'COMMAND' [file...]
Enter fullscreen mode

Exit fullscreen mode



Common Use Cases



1. Replace Text

sed 's/foo/bar/g' file.txt
Enter fullscreen mode

Exit fullscreen mode

The s/foo/bar/g command replaces all instances of “foo” with “bar” in file.txt.



2. Delete Specific Lines

sed 's/foo/bar/g' file.txt
Enter fullscreen mode

Exit fullscreen mode

This command deletes the third line from file.txt.



3. Insert Text After a Pattern

sed '/pattern/aThis is new text' file.txt
Enter fullscreen mode

Exit fullscreen mode

This command adds sentence “This is new text” after the word “pattern” in every line.



4. In-place File Editing

sed -i 's/old/new/g' file.txt
Enter fullscreen mode

Exit fullscreen mode

This -I option and command will edit or modify file in the place. so, we can make changes directly to the original files.




3. Awk: A Pattern-Scanning and Processing Language

awk is powerful programming language for patterns scanning and processing. It can transform or filter data of the various sources based on condition. Best for extracting data.



Basic Syntax

awk 'PROGRAM' [file...]
Enter fullscreen mode

Exit fullscreen mode



Common Use Cases



1. Print Specific Columns

awk '{print $1, $3}' file.txt
Enter fullscreen mode

Exit fullscreen mode

This command will print first and third column from every line from file.txt.



2. Filter by Condition

awk '$3 > 100' data.txt
Enter fullscreen mode

Exit fullscreen mode

This is command makes condition for prints lines where third column values are greater than 100.



3. Field Separator

awk -F, '{print $2}' data.csv
Enter fullscreen mode

Exit fullscreen mode

The -F option sets the field separator to the comma (,) for processing in csv files.



4. Mathematical Operations

awk '{sum += $2} END {print sum}' data.txt
Enter fullscreen mode

Exit fullscreen mode

This script will sum the values in all second columns of data.txt




4. Cut: Extract Specific Sections of Text

For extracting specific sections of lines or text as columns and fields we use cut command.



Basic Syntax

cut [options] [file...]
Enter fullscreen mode

Exit fullscreen mode



Common Use Cases



1. Extract Specific Columns

cut -f1,3 file.txt
Enter fullscreen mode

Exit fullscreen mode

Used for extracting specific columns.



2. Specify a Delimiter

cut -d',' -f2 file.csv
Enter fullscreen mode

Exit fullscreen mode

The -d option defines as delimiter.It is allowing you to work with delimited text files like CSVs. This command extracts the second column from a CSV file.




5. Sort: Sort Lines in a File

sort stands for sorting lines in alphabetical order or numerical order. it supports advance sorting algorithm for sorting columns, ignoring case and many more.



Basic Syntax

sort [options] [file...]
Enter fullscreen mode

Exit fullscreen mode



Common Use Cases



1. Sort Alphabetically

sort file.txt
Enter fullscreen mode

Exit fullscreen mode

This sort simple syntax command will sort lines alphabetically.



2. Sort Numerically

sort -n data.txt
Enter fullscreen mode

Exit fullscreen mode

The -n option sorts the file based on numeric values rather than alphabetically.



3. Reverse Sorting

sort -r file.txt
Enter fullscreen mode

Exit fullscreen mode

Reverses the sort order of lines.



4. Sort by a Specific Column

sort -k 2 file.txt
Enter fullscreen mode

Exit fullscreen mode

The -k 2 option sorts the lines based on the second column.




Combining Tools for Powerful Text Processing

The combination of this all tools make powerful workflows. Example, you can search, filter and sort data using one single command.



Example: Search, Extract, and Sort

grep "error" logfile.txt | cut -d' ' -f1,4 | sort -u
Enter fullscreen mode

Exit fullscreen mode

This pipeline searches for lines containing “error” in logfile.txt, extracts the first and fourth fields, and then sorts the results uniquely.



Example: Modify and Filter with Sed and Awk

sed 's/warning/WARNING/g' logfile.txt | awk '$3 == "ERROR" {print $1, $4}'
Enter fullscreen mode

Exit fullscreen mode

This command replaces “warning” with “WARNING” in logfile.txt and prints the first and fourth fields of lines where the third field is “ERROR”.




Conclusion

Learning tools like grep, sed, awk, cut and sort helps for making runtime and efficient data processing in linux. This tools are powerful for search, filter and sorting data efficiently. This is it from my side. I hope you like the short and simple way of learning with me. In upcoming article we will discuss about file permissions and ownership with specific commands.



Source link
lol

By stp2y

Leave a Reply

Your email address will not be published. Required fields are marked *

No widgets found. Go to Widget page and add the widget in Offcanvas Sidebar Widget Area.