Questions tagged [text-processing]
Manipulation or examining of text by programs, scripts, etc.
8,530 questions
3
votes
2
answers
97
views
jq - combine JSON objects with their arrays without duplicating or removing existing
I'm trying to combine the arrays of two JSON files I need to achieve this by combining the arrays without removing existing values of arrays and without removing any object. Sorting output arrays ...
3
votes
4
answers
339
views
How to count the number of multi-valued attributes in an LDAP entry... or lines in a paragraph, where diff paragraphs have different numbers of lines?
I'm trying to figure out a way of counting how many attribute values (for a multi-valued attribute in LDAP) various different users have. For example, the data looks something like this...
dn: uid=...
13
votes
1
answer
797
views
Why is "cmd | xargs" to join lines of cmd with spaces wrong
We often (1, 2, 3, 4...) see code such as:
cmd | xargs
or even:
cmd2 $(cmd | xargs)
In an attempt to join the lines of cmd with spaces (and in the second case so as to construct a command line for ...
0
votes
4
answers
132
views
GREP Date Format from String
I'm trying to write up a script to interact with the log files of Postgres. I have a query to get the filename of the log files.
sudo -u postgres psql -tc "SELECT setting FROM pg_settings WHERE ...
1
vote
3
answers
169
views
find awk grep - search and replace & passing modified contents to awk to overwrite the existing file
I have a folder with many subfolders full of various Quarto(reg) files & in those files there are links that are located in varying positions in the file lines.
UPDATE ON 3 November 2025 in ...
7
votes
9
answers
745
views
How to get every lines between nth and (n+1)th match of grep in text file
I've got a text file containing e.g.
Success
Something
Anything
Success
Somebody
Anybody
Someone
Success
(line 8 is deliberately an empty line) and I would like to export every line between the nth ...
4
votes
4
answers
495
views
How can I find common prefixes in file names to group them?
I would like to be able to find all files in multiple directories whose file names start with the same string, but preferably not if that string is only one word or contains fewer than perhaps 5 ...
3
votes
2
answers
202
views
Embedded special characters skewing sed output
The Issue
I've been parsing a file with sed trying to tweeze out the desired data. This has worked fine for most lines in the file but there appears to be some embedded special characters that are ...
4
votes
4
answers
514
views
Remove new lines and everything after comment symbol with awk or sed
How to remove comments and newline symbols without using two pipes.
I have bookmarks.txt file with comments.
https://cookies.com # recipes cookbook
https://magicwands.com # shopping
I can copy link ...
5
votes
5
answers
411
views
Compare files and combine rows with matching values based on last column
I'm working with several files which come in bundle of four, across groups the bundels have the same number of columns; see below for an example showing the first four rows with header:
File1 has ...
2
votes
1
answer
148
views
Tmux pane with long-running session using wrong character set?
Today I connected to a long-running process in tmux over ssh for work, to find that the pane the process was running in seems to have started using the wrong character encoding for its output, leading ...
3
votes
1
answer
392
views
How to do non-greedy multiline capture with recent versions of pcre2grep?
I noticed a difference in behavior between an older pcre2grep version (10.22) and a more recent one (10.42), and I am wondering how I can get the old behavior back.
Take the following file:
aaa
bbb
...
2
votes
1
answer
113
views
Redirect `rtf` output to file
System Info
alinuxchap@libertus-desktop:/usr/share/X11/xkb $ uname -a
Linux libertus-desktop 6.12.25+rpt-rpi-v8 #1 SMP PREEMPT Debian 1:6.12.25-1+rpt1 (2025-04-30) aarch64 GNU/Linux
alinuxchap@...
6
votes
7
answers
1k
views
How to find numbers in a textfile that are not divisible by 4096, round them up and write new file?
In Linux there is a file numbers.txt.
It contains a few numbers, all separated with space.
There are numbers like this: 5476089856 71788143 9999744134 114731731 3179237376
In this example only the ...
3
votes
5
answers
719
views
Randomly pick single line from multiple lines while assigning value to environment variable
In a certain script that we run routinely we configure hostnames in environment variables. Since hostnames can change overtime, we try to dynamically pick the current set of hosts using linux's ...
1
vote
7
answers
369
views
Extracting paragraphs with awk
What is the correct way to extract paragraphs in this log file using awk:
$ cat log.txt
par1, line1
par1, line2
par1, line3
par1, line4
par1, line5
par1, last line
par2, line1
...
2
votes
5
answers
156
views
formatting git log messages for later processing
I am trying to format and connect git log messages for later processing.
I am using git log --pretty=format:'%H %s' to get commit hash and the complete message at the moment.
I need commit messages to ...
2
votes
3
answers
238
views
How to extract specific fields from systemctl output for a custom report
I would like to build a report coming from the output of certain commands.
For instance, I have the output of such command:
systemctl --type=service --state=running |
grep -e cron -e apache2 -e ...
1
vote
3
answers
152
views
edit all the values in a specific column based on row numbers range
I have a PDB file (coordinates of atoms in a protein) on a Linux machine:
ATOM 1 N GLY A 1 0.535 51.766 5.682 1.00 0.00
ATOM 2 CA GLY A 1 -0.712 50....
0
votes
5
answers
148
views
Match multiple vars across two lines and delete entire entry
MATCH1.MATCH2 {
always same MATCH3
}
All three MATCH(es) must match.
input:
foo.bar {
always same bus
}
1.2 {
always same 3
}
a.b {
always same c
}
i.ii {
always same iii
}
b.2 {
...
5
votes
6
answers
1k
views
Remove the first field (and leading spaces) with a single AWK
Consider this input and output:
foo bar baz
bar baz
How do you achieve with a single AWK? Please explain your approach too.
These are a couple tries:
$ awk '{ $1 = ""; print(substr($0, 2)) ...
1
vote
5
answers
512
views
How to remove every first duplicate line in a column from mac terminal?
A huge txt file with 360k lines. Lines needed to be deleted are duplicated in both column 1 (id) and column 2 (nick), but differ in column 3 (category). There're only 2 lines for all duplicates in ...
0
votes
2
answers
148
views
List and count ciphers used by cryptsetup in /dev/mapper devices
In my Linux Computer there are many files called file1, file2, file3 ... in /dev/mapper/.
Now I want to have an overview from the files what cipher is used how often.
I tried this
for i in /dev/...
9
votes
6
answers
751
views
How to display duplicate lines with different first field
Regarding this information below:
807:Lipstick:Cosmetics:50:250
808:MixerGrinder:Electronics:10:35000
809:MixerGrinder:Electronics:10:35000
I am expecting to display this information below:
808:...
1
vote
3
answers
123
views
Extracting "devname" from log message with re_extract
Can anyone help? I've exhausted my knowledge and troubleshooting skills trying to get this working.
Here is the example data from "msg":
date=2025-03-26 time=12:45:57 devname="this-is-...
1
vote
2
answers
145
views
Filter for arbitrary AND patterns [duplicate]
Consider a command which takes arguments like this: cmd foo bar baz [arbitrary args...]. How do you build a filter of AND patterns based on those arguments?
Something like this pipeline of greps:
grep ...
0
votes
1
answer
203
views
Use sed to replace only part of a string
I'm trying to replace bobearl with jim in the following string
"billy" "bobearl" and "johnny"
I can do something like this:
sed 's/bob/jim/' /tmp/text.txt
"billy&...
2
votes
5
answers
729
views
How to display and count vowels in file
I have a file with a name list as shown below:
Ishmael
Mark
Anton
Rajesh
Pete
I am trying to print something like this:
Iae 3
a 1
Ao 2
ae 2
ee 2
I developed this code:
cat names.txt | grep -Eo '...
0
votes
0
answers
118
views
Advanced CLI tool/code to determine text encoding (besides enca)
Looking for advanced CLI tool/code to determine text Codepage/Language (besides enca).
Goal: Automate as much as possible conversion of hundreds/thousands of 8-bit text files (including non-ASCII ...
0
votes
2
answers
125
views
On Ubuntu 20 server, I must replace all occurances of the color #640000 with #06172A
On Ubuntu 20 server, I have to replace all occurances of the color #640000 with #06172A. I have tried the following commands to replace
Go to folder where the relevant files reside:
$ cd /path/to/the/...
9
votes
5
answers
2k
views
Run command on each line of CSV file, using fields in different places of the command
I have a CSV file and want to run a command for each line, using the fields of the file as separate arguments.
For example given the following file:
foo,42,red
bar,13,blue
baz,27,green
I want to run ...
4
votes
3
answers
268
views
Add columns from variable number of files to base file
I'm dealing with a series of bed files, which look like this:
chr1 100 110 0.5
chr1 150 175 0.2
chr1 200 300 1.5
With the columns being chromosome, start, end, score. I have multiple different files ...
-4
votes
5
answers
216
views
Command to display all the employees whose first name have more than 6 characters
From the script below I need to know the following:
EmpNo#Email#Name#JobLevel#Experience
641357#Amrit_Mohanty#Amrit Mohanty#3#2
678522#Puneet_Mishra#Puneet Mishra#3#1
670242#Vikas_Bharti#Vikas Bharti#...
1
vote
3
answers
196
views
Loop ip list through geoiplookup and delete lines that do not match criteria
Thanks in advance for any ideas you present.
My current project has me trying to loop a file containing a list of 1000's of IP addresses through geoiplookup and piping it to sed to delete all lines ...
3
votes
5
answers
749
views
removing braces statements containing nested braces inside
A typical latex problem:
\SomeStyle{\otherstyle{this is the \textit{nested part} some more text...}}
Now I want to remove all \SomeStyle{...} but not the content. Content contains nested braces. The ...
6
votes
2
answers
741
views
How can I extract quoted strings within a variable?
I acknowledge there are superficially similar questions asked here before, but all of those I've seen are simpler than what I'm trying to achieve. Bash-only solutions are preferred.
I have a variable ...
2
votes
2
answers
1k
views
Why is the file changing before being written to?
On Kubuntu Linux, The Google Chrome browser adds a checksum to the file, preventing simply editing the file by hand. So I'm writing a script to add the checksum.
$ cat .config/google-chrome/Default/...
-2
votes
3
answers
235
views
Bash script to uncomment lines with leading spaces on a file with specific pattern
I try to uncomment specific lines from a file with patterns in oracle linux 8.6 using bash. There are leading white spaces on certain lines where the comments are not removed. I tried to uncomment the ...
-2
votes
3
answers
226
views
How to replace two lines containing [tab] chars into one line with just [newline] char, using a bash script?
In a directory I have a bunch of text files. Some of the files contain double lines with a [tab] char only. I want to find and change these two "tabbed lines" into one line with a new line ...
1
vote
2
answers
399
views
Extracting table of contents from PDFs
I have a reasonably large personal library with books in various formats. I have tried to organize their metadata, including a text field containing the tables of contents. At the moment I am using ...
6
votes
2
answers
411
views
Update object inside array inside another JSON object
I have a huge JSON object with an array of objects inside it. I have to add key:value pair to a specific object in the array. For example, let the input object is:
{
"a": {
"b&...
-5
votes
2
answers
116
views
How to count the no of occurrences of a particular string in a latest log file to read last 5 min data in linux [closed]
I want to capture an error code (e.g. 502) from a log file.
The log file is rollover when it's reached to 100 MB like access.log_126427, access.log_197455, etc. There is no specific pattern of the ...
0
votes
1
answer
280
views
Find all files in directory and apply commands to each of them
I want to apply commands below to all files in a directory instead of one file.
cat file.txt | sed -E "s/\@([0-9]+)\W+~(.*?)/\1 \2/g" | tr -d '~'
cat file.txt | sed -E "s/\@([0-9]+).*\~...
1
vote
3
answers
138
views
How do I merge bottom line with previous line? [duplicate]
I have a pretty basic file;
15
Chapter name
some text and some more text
some text and some more text
I was trying to get something like this
Book: 15 Chapter name
some text and some more text
some ...
0
votes
2
answers
128
views
BSD sed/awk moving portion of line to line above (switching attribute in HTML file)
My situation is simple : I have an HTML file with several lines containing only the indented <section> block tag, each line followed by an (also indented) <h3 id="YYYY">...</...
0
votes
2
answers
238
views
How to insert text before the first line of an UTF-8 with BOM file
This question is closely related to: How to insert text before the first line of a file?. I deliberately made the title similar to that question to highlight this.
Except the target file is UTF-8 with ...
1
vote
1
answer
103
views
Delete lines containing partial string match
I have 2 files
file1
00:00:00:00:00:01
file2
00:00:00:00:00:02 foo bar
00:00:00:00:00:01 something else
What I want to do is compare the two files and remove 00:00:00:00:00:01 from file 2 so I end ...
2
votes
3
answers
129
views
Printing a specific section everytime search results are matched
I have a pretty basic text file on a Linux machine that has stuff like Chapters, Dialogues and References.
This is what it looks like
Chapter: 1 One: Birds and Trees
Birds are beautiful and trees ...
1
vote
8
answers
258
views
linux shell script to remove 1 char in a particular field in file having lines of around 3000
My input file:
1oo+457864227yexaloo+6784536pkp8907654
2oo+499004227yexaloo+69008908pkp8907654
3oo+648968976yexaloo+53589094pkp8907654
4oo+490764578yexaloo+6784536pkp8907654
I want to find out the ...
2
votes
7
answers
867
views
Shell Script to Normalize the data
We have requirement to normalize the data ... Item field is comma delimited and irregular and it may have any items from 0 to max (lets say 100)
Input:
key1|desc field|item1,item2,item3,item4|extra ...