Nixnib0002.0
Replacing Reserved Characters with %s/old//new/g in Vim#
nixnib 0002.0#
As a learning exercise, I decided to needlessly complicate a simple process and see what it showed me.
I was writing this section from nixnib001 in markdown, using vim:
#### nib (n.)
"1580s, 'beak or bill of a bird,' Scottish variant of *neb*, 'beak or bill of a bird.' Perhaps influenced by *nibble* (v.). Meaning 'point' (of a pen or quill) is recorded by 1610s (*neb* in this sense is from 1590s)." (https://www.etymonline.com/word/nib)[etymonline]
As I was writing this (just converting straight-up text to markdown, closer to transcribing than writing), I forgot how to enter italics in markdown, and wrote forward-slashes (/
) around every italicized word, as in /nibble/
and /neb/
, as placeholders to be corrected later. When I remembered that asterisks (*
) were the proper special character, I wanted to try to use the mysterious, important %s/old/new/g
vim command to edit the entire file at once. I’d heard about it studying for the RHCSA, and that is how it is presented in the study guide for EX200 by Sander Van Vugt.
I realized that forward-slashes are special characters, and this was going to be a regex nightmare. I got :%s/\/*\//\*\\\*/g
far when my brain broke. I understood that I had to escape the forward slashes, which is what /\/*\/
is trying to do.
I did not know how to express that I needed a variable range of characters within the escaped slashes (e.g., a nibble
as well as a neb
) to properly place the asterisks around the italicized words. Naturally, I pulled up ChatGPT (heretofore: Chat).
It gave me the following vim regex command, and then some explanation:
:%s/\/\([^\/]\+\)\//\*\1\*/g
Explanation:
\/ → Escapes the first / so it is treated as a literal slash.
\([^\/]\+\) → Captures (\(\)) one or more (\+) characters that are not a slash ([^\/]).
\/ → Matches the closing slash.
\*\1\* → Replaces with *, then the captured text (\1), then another *.
g → Ensures all instances in a line are replaced.
I ran that command and it worked.
So, TL;DR you’re in this exact situation (lol), run this to replace all the /
around italicized words in vim with *
:
:%s/\/\([^\/]\+\)\//\*\1\*/g
I hope that saved you and planet earth a Chat processing token.
Trying to understand what it is my computer just did, I pulled up The Linux Bible by Christopher Negus. The exalted text reveals that the :s
and :g
commands in vim are actually, basically, pre-historical. Ultimately, they actually come from an editor named ex
. They appear sed
-like (to me) because both ex and sed are children of ed, the OG Unix editor.
After the delimiter :
that opens a vim command,s/
is the actual command being run here: a substitute or find-and-replace. The /g
flag at the close of the line argues that the command should target for substitution every instance on a line of “old” for “new,” rather than stop at the first instance on a line. The default behavior for :s
is for only the first result on the first line found to be changed.
Every instance of “old” on the first line where “old” occurs is included by /g
, but the command still only targets that first line it finds. To include every line in the file as a potential target, we add the %
before s
. That signifies that every line in the file must be acted on, regardless of whether the line prior did or did not include an instance of “old.”
Interestingly, The Linux Bible offers the command as :g/Local/s//Remote/g
rather than :%s/old/new/g
. It eliminates the %
by running s
as a global command upon a given pattern, with (:g/<!pattern!>/
). Rather than running the s
program against every line, then, the :g
command will only run s
against the lines that are found to match the given pattern.
Far fewer lines are ’touched’ or modified this way, which is resourceful. This method can also protect from unintended replacements, according to Chat, but its examples didn’t prove that very well to me. I like both, and I like more knowing the difference. Please send me any examples or explanations you have that can further explain that difference.
Back to the wild regex. My goal is to understand. Next chunk up, with :%s\/\.../g
covered, is \([^\/]\+\)\/
.
I guess regex phrases are best understood like meter in poetry, where minute sub-units dictate cardinal meaning-making boundaries, and these boundaries must be mapped to extract the phrases. What I mean is, in formal poetry, there’s a significant difference in interpretation when \ _ \ _
(stress-unstress-stress-unstress) is read as, simply, two trochees, versus as a headless iamb followed by another iamb with an extra unstressed syllable. As in poetry, so in regex. The difference between \/\/\
and \/\
is ginormous.
s/
to begin the substitution command
& \/
to escape the opening forward-slash as a literal character
& \(
open the sub-pattern capture group
&[^\/]
should be understood as [^...]
because that is another sub-group. ^
signals NOT, the anti-pattern of what is contained in the brackets. In this case, we have an escaped forward-slash. So we are gathering every character that is not a slash.
Important: Within a [^...]
construction, most special characters are understood as literals and do not need to be escaped. The only ones that need to be escaped are those that function specially within the [^...]
, like hyphens. This won’t cause an error, but that is why, in this case, the \
and /
don’t require escaping, if you were wondering.
& \+
an open-ended counter, to match one or more of the anti-pattern previous.
& \)
escape the sub-pattern capture group closing
& \/
to escape the closing forward-slash as a literal character. This seals the capture group as the content of my errant placeholder forward-slashes.
The parenthetical regex structure creates a (https://www.regular-expressions.info/brackets.html) [capturing group]. They’re kind of like, how, with a bash script, every argument that follows the command is given a variable name of $1, $2, … etc. following the cmd itself at $0. A capturing group in regex can be used to grab the part of a string that matches the sub-pattern declared within parenthesis, and then refer to that pattern (or submatch) with a numerical figure (as in, \1
).
So in the above, \(
is the beginning of our capture group, and \)
is the end. We should be able to re-use content captured by the submatch.
After the \/
that follows completes the search portion of our expression, a standalone /
opens the replacement portion of the substitution command.
Next, we have: \*\1\*/
. This part is far simpler.
\*
is an escaped asterisk character, which signals italics in markdown, opening the replacement string
& \1
refers to our captured sub-match above. On a given line that matches the search for an enclosing \/...\/
, the partial match of all characters within the \([^\/]\+\)\/
are deployed in the replacement string where that \1
resides. In our case, after the opening asterisk.
& \*
is simply an escaped closing asterisk
& /
is the close of our replacement string, a structure of the larger :%s/.../.../
pattern.
Finally, we have the trailing flag g
, which has our search & substitute operate on all instances within a line, rather than only the first. That’s old hat by this point, if you’re still reading.
So, with all of that broken down, our final expression is:
:%s/\/\([^\/]\+\)//\*/1\*/g
That makes a hell of a lot more sense to me than it did before. We’ve created nothing; we’ve built nothing, but I’m very glad to have done this. Maybe by nixnib1000.0, I’ll be able to read regex like nothing. That’s a cute goal.