Hung-Yi’s Journal

Front-end Developer, Emacs Adventurer, Home Cook

24 Feb 2021

The Programmer's Way to Write in Title Case Using Emacs Lisp

It turns out that writing in proper title case can be important for a front-end developer, since there are often things on screen that need to appear as a title to convey importance: dialog headers, menu items, buttons labels, to name a few.

The problem is I had always dismissed it as something for school, academics or "real" writers, so I never bothered to learn the rules properly. This meant many distracting trips to https://titlecase.com/. Realising this, I used Emacs to make a quick way to convert something to title case without relying on my brain. This post is my journey on how I wrote that thing.

What Is Title Casing?

If you're like me and you didn't pay attention in high school English (oops) then here's a quick refresher. Generally, a correctly title cased phrase will:

  • Uppercase the first letter of most words

    e.g. "There Is No Spoon"

  • Not capitalize 'small' and 'unimportant' words

    e.g. "Long Live the King"

  • Always capitalize the first and the last words, even if they're small

    e.g. "The Land and Sea We Live On"

  • Capitalize sub-phrases as if they were their own title

    e.g. "The Lonely Reindeer: A Christmas Story"

There are many intricacies and edge cases that can get complex, so I won't go over them in detail. On a website or in an app, it just needs to look close enough, not necessarily fully conform to AP or Chicago style.1

A Quick Win

As an avid Emacs user, my first thought was to find a package on ELPA or MELPA that might do this for me. The closest, most robust solution I found was this page on EmacsWiki, which was an elisp wrapper around a Perl script written by John Gruber and Aristotle Pagaltzis, originating from a blog post way back in 2008. Unlike a lot of hacked-together titlecase solutions, this one actually came with a set of difficult looking unit tests!

To be honest, this worked really well for me. I brought the Perl script and titlecase.el into my config and used it happily for several months. I would totally recommend this route if it works for you.

But having to carry around a Perl script and making Emacs do a process call out just kept nagging at me. What I really wanted was a native Emacs Lisp solution to keep my config pure and less reliant on external dependencies.

Rolling My Own Solution and Adding More Test Cases

Instead of an unholy mess of regular expressions, I decided to use a state machine and reduce each character in the input string to convert it to title case. I wrote it in a completely pure way, as an exercise, even though Emacs Lisp isn't normally conducive to that sort of thing.

I also added a bunch more test cases for the "start of a new sentence" behavior:

  • Semicolons (;)

  • Question marks (?)

  • Periods (.)

  • New lines (\n)

And better handling for:

  • Windows file system paths (e.g. C:\Windows\System32)

  • Relative unix file system paths (e.g. ~/.config/emacs and ./documents/tax)

Here's the title casing code, in fully portable Emacs Lisp with no external dependencies:

(require 'cl-lib)
(require 'subr-x)

(defun titlecase-string (str)
  "Convert string STR to title case and return the resulting string."
  (let* ((case-fold-search nil)
         (str-length (length str))
         ;; A list of markers that indicate start of a new phrase within the title, e.g. "The Lonely Reindeer: A Christmas Story"
         (new-phrase-chars '(?: ?. ?? ?\; ?\n ?\r)) ; must be followed by one of  word-boundary-chars
         (immediate-new-phrase-chars '(?\n ?\r))    ; immediately triggers new phrase behavior without waiting for word boundary
         ;; A list of characters that indicate "word boundaries"; used to split the title into processable segments
         (word-boundary-chars (append '(?  ?– ?— ?- ?‑ ?/) immediate-new-phrase-chars))
         ;; A list of small words that should not be capitalized (in the right conditions)
         (small-words (split-string "a an and as at but by en for if in of on or the to v v. vs vs. via" " "))
         ;; Fix if str is ALL CAPS
         (str (if (string-match-p "[a-z]" str) str (downcase str)))
         ;; Reduce over a state machine to do title casing
         (final-state (cl-reduce
                       (lambda (state char)
                         (let* ((result               (aref state 0))
                                (last-segment         (aref state 1))
                                (first-word-p         (aref state 2))
                                (was-in-path-p        (aref state 3))
                                (last-char            (car last-segment))
                                (in-path-p            (or (and (eq char ?/)
                                                               (or (not last-segment) (member last-char '(?. ?~))))
                                                          (and was-in-path-p
                                                               (not (or (eq char ? )
                                                                        (member char immediate-new-phrase-chars))))))
                                (end-p                (eq (+ (length result) (length last-segment) 1)
                                                          str-length))                                          ; are we at the end of the input string?
                                (pop-p                (or end-p (and (not in-path-p)
                                                                     (member char word-boundary-chars))))       ; do we need to pop a segment onto the output result?
                                (segment              (cons char last-segment))                                 ; add the current char to the current segment
                                (segment-string       (apply #'string (reverse segment)))                       ; the readable version of the segment
                                (small-word-p         (member (downcase (substring segment-string 0 -1))
                                                              small-words))                                     ; was the last segment a small word?
                                (capitalize-p         (or end-p first-word-p (not small-word-p)))               ; do we need to capitalized this segment or lowercase it?
                                (ignore-segment-p     (or (string-match-p "[a-zA-Z].*[A-Z]" segment-string)     ; ignore explicitly capitalized segments
                                                          (string-match-p "^https?:" segment-string)            ; ignore URLs
                                                          (string-match-p "\\w\\.\\w" segment-string)           ; ignore hostnames and namespaces.like.this
                                                          (string-match-p "^[A-Za-z]:\\\\" segment-string)      ; ignore windows filesystem paths
                                                          was-in-path-p                                         ; ignore unix filesystem paths
                                                          (member ?@ segment)))                                 ; ignore email addresses and user handles with @ symbol
                                (next-result          (if pop-p
                                                          (concat
                                                           result
                                                           (if ignore-segment-p
                                                               segment-string                                   ; pop segment onto the result without processing
                                                             (titlecase--segment segment-string capitalize-p))) ; titlecase the segment before popping onto result
                                                        result))
                                (next-segment         (unless pop-p segment))
                                (will-be-first-word-p (if pop-p
                                                          (or (not last-segment)
                                                              (member last-char new-phrase-chars)
                                                              (member char immediate-new-phrase-chars))
                                                        first-word-p)))
                           (vector next-result next-segment will-be-first-word-p in-path-p)))
                       str
                       :initial-value
                       (vector nil      ; result stack
                               nil      ; current working segment
                               t        ; is it the first word of a phrase?
                               nil))))  ; are we inside of a filesystem path?
    (aref final-state 0)))

(defun titlecase--segment (segment capitalize-p)
  "Convert a title's inner SEGMENT to capitalized or lower case depending on CAPITALIZE-P, then return the result."
  (let* ((case-fold-search nil)
         (ignore-chars '(?' ?\" ?\( ?\[ ?‘ ?“ ?’ ?” ?_))
         (final-state (cl-reduce
                       (lambda (state char)
                         (let ((result (aref state 0))
                               (downcase-p (aref state 1)))
                           (cond
                            (downcase-p                 (vector (cons (downcase char) result) t))  ; already upcased start of segment, so lowercase the rest
                            ((member char ignore-chars) (vector (cons char result) downcase-p))    ; check if start char of segment needs to be ignored
                            (t                          (vector (cons (upcase char) result) t))))) ; haven't upcased yet, and we can, so do it
                       segment
                       :initial-value (vector nil (not capitalize-p)))))
    (thread-last (aref final-state 0)
      (reverse)
      (apply #'string))))

(defun titlecase-region (begin end)
  "Convert text in region from BEGIN to END to title case."
  (interactive "*r")
  (let ((pt (point)))
    (insert (titlecase-string (delete-and-extract-region begin end)))
    (goto-char pt)))

(defun titlecase-dwim ()
  "Convert the region or current line to title case.
If Transient Mark Mode is on and there is an active region, convert
the region to title case.  Otherwise, work on the current line."
  (interactive)
  (if (and transient-mark-mode mark-active)
      (titlecase-region (region-beginning) (region-end))
    (titlecase-region (point-at-bol) (point-at-eol))))

Bonus Evil Operator

If you use evil, then this might be useful to you. I bound it to g ` for no other reason than it was available in Doom, and I remember that it's close to the ~ key in vim that inverts a character's case.

(after! evil
  (map! :nv "g`" (evil-define-operator my/evil-titlecase-operator (beg end)
                   (interactive "<r>")
                   (save-excursion
                     (set-mark beg)
                     (goto-char end)
                     (titlecase-dwim)))))

With this operator, you can g ` ` to make the entire line title case, or use any motion, like g ` i t to run title casing inside of a HTML tag. Isn't that cool?

Test Cases

If you want to experiment with writing your own implementation, here are the test cases that I used:

Input Output
the quick brown fox jumps over the lazy dog The Quick Brown Fox Jumps Over the Lazy Dog
'the great gatsby' 'The Great Gatsby'
small word at the end is nothing to be afraid of Small Word at the End Is Nothing to Be Afraid Of
for step-by-step directions email someone@gmail.com For Step-by-Step Directions Email someone@gmail.com
2lmc spool: 'gruber on OmniFocus and vapo(u)rware' 2lmc Spool: 'Gruber on OmniFocus and Vapo(u)rware'
Have you read “The Lottery”? Have You Read “The Lottery”?
Have you read “the lottery”? Have You Read “The Lottery”?
Have you read "the lottery"? Have You Read "The Lottery"?
your hair[cut] looks (nice) Your Hair[cut] Looks (Nice)
People probably won't put http://foo.com/bar/ in titles People Probably Won't Put http://foo.com/bar/ in Titles
Scott Moritz and TheStreet.com’s million iPhone la‑la land Scott Moritz and TheStreet.com’s Million iPhone La‑La Land
Scott Moritz and thestreet.com’s million iPhone la‑la land Scott Moritz and thestreet.com’s Million iPhone La‑La Land
BlackBerry vs. iPhone BlackBerry vs. iPhone
Notes and observations regarding Apple’s announcements from ‘The Beat Goes On’ special event Notes and Observations Regarding Apple’s Announcements From ‘The Beat Goes On’ Special Event
Read markdown_rules.txt to find out how _underscores around words_ will be interpretted Read markdown_rules.txt to Find Out How _Underscores Around Words_ Will Be Interpretted
Q&A with Steve Jobs: 'That's what happens in technology' Q&A With Steve Jobs: 'That's What Happens in Technology'
What is AT&T's problem? What Is AT&T's Problem?
Apple deal with AT&T falls through Apple Deal With AT&T Falls Through
this v that This v That
this vs that This vs That
this v. that This v. That
this vs. that This vs. That
The SEC's Apple probe: what you need to know The SEC's Apple Probe: What You Need to Know
'by the way, small word at the start but within quotes.' 'By the Way, Small Word at the Start but Within Quotes.'
Starting sub-phrase with a small word: a trick, perhaps? Starting Sub-Phrase With a Small Word: A Trick, Perhaps?
Sub-phrase with a small word in quotes: 'a trick, perhaps?' Sub-Phrase With a Small Word in Quotes: 'A Trick, Perhaps?'
Sub-phrase with a small word in quotes: "a trick, perhaps?" Sub-Phrase With a Small Word in Quotes: "A Trick, Perhaps?"
"Nothing to Be Afraid of?" "Nothing to Be Afraid Of?"
a thing A Thing
Dr. Strangelove (or: how I Learned to Stop Worrying and Love the Bomb) Dr. Strangelove (Or: How I Learned to Stop Worrying and Love the Bomb)
this is trimming This Is Trimming
IF IT’S ALL CAPS, FIX IT If It’s All Caps, Fix It
___if emphasized, keep that way___ ___If Emphasized, Keep That Way___
What could/should be done about slashes? What Could/Should Be Done About Slashes?
Never touch paths like /var/run before/after /boot Never Touch Paths Like /var/run Before/After /boot
What about relative paths like ./profile and ~/downloads/music? What About Relative Paths Like ./profile and ~/downloads/music?
And windows paths like c:\temp\scratch too And Windows Paths Like c:\temp\scratch Too
There are 100's of buyer's guides There Are 100's of Buyer's Guides
a trick perhaps? or not really. A Trick Perhaps? Or Not Really.
drop. the. ball. Drop. The. Ball.
some cats are fun; the others aren't Some Cats Are Fun; The Others Aren't
roses are red \n violets are blue Roses Are Red \n Violets Are Blue
roses are red \n and violets are blue Roses Are Red \n And Violets Are Blue
the home directory is /home/username \n but the root's home is /root The Home Directory Is /home/username \n But the Root's Home Is /root

1

Don't kill me. I'm a programmer, not a journalist or writer.