An Emacs language mode creation tutorial
An Emacs language mode creation tutorial
Pre-Intro about the resurrection of this tutorial
This page is the resurrection of the original "Emacs language mode creation tutorial" that once upon a time lived at
I resurrected it via archive.org and updated most of the outdated links, especially the ones into the emacs lisp manual.
So everything after this Pre-Intro is originally written by Scott Andrew Borton, modulo bugfixes. Have fun.
PS: In the meantime I realized there is also another resurrected version in emacswiki
. You should probably give that a try as it is maintained by several people.
And some hint from emacswiki.org:
Note that the tutorial builds a new mode from the ground up. If the modes are relatively simple, then starting out with a GenericMode
or a DerivedMode
makes more sense. Even if you decide to write most of the code yourself, starting out with SampleMode
might be a good idea. What the tutorial does and none of the other suggestions is explaining things. This is very important; so if you are curious to learn, then perhaps the tutorial is actually better for you. It even delves into such topics as RegexpOpt
. I recommend that.
I've written several Emacs modes for various obscure or in-house tool languages. When starting my first mode, I found that there weren't a lot of lucid explanations of how to write a mode intended for language editing. Even Writing GNU Emacs Extensions
doesn't cover topics like syntax highlighting and indentation. And
, distributed with recent versions of Emacs, doesn't handle indentation.
Here, I walk through my wpdl-mode.el
, a mode for editing Workflow Process Definition Language
files. I simply go through each line of the mode, and explain what it does. It may also be helpful to refer to this WPDL example
, to see how the concepts are being applied. Hopefully
is clear and simple enough for you to learn the basics of writing your own mode.
The following topics are covered:
The following information was written with GNU Emacs versions 20 or 21 in mind. NB:
Since I am not a professional Emacs hacker, some of this might be a case of the blind leading the blind. More experienced Emacs guys may take offense at the style or terminology, and will hopefully correct my ignorance.
Thanks to readers on
who have already contributed suggestions!
Copyright © 2002, 2003 Scott Andrew Borton
(defvar wpdl-mode-hook nil)
First, we define some variables that all modes should define.
allows the user to run their own code when your mode is run
(let ((wpdl-mode-map (make-keymap)))
(define-key wpdl-mode-map "\C-j" 'newline-and-indent)
"Keymap for WPDL major mode")
Now we create a keymap. This map, here called
, allows both you and users to define their own keymaps
. The keymap is immediately set to a default keymap
. Then, using
, we insert an example keybinding into the keymap, which maps the
function to Control-j (which is actually the default binding for this function, but is included anyway as an example). Of course, you may define as many keybindings as you wish.
If your keymap will have very few entries, then you may want to consider
(add-to-list 'auto-mode-alist '("\\.wpd\\'" . wpdl-mode))
Here, we append a definition to auto-mode-alist
. This tells emacs that when a buffer with a name ending with
is opened, then
should be started in that buffer. Some modes leave this step to the user.
'("\\<\\(A\\(CTIVITY\\|PPLICATION\\)\\|DATA\\|END_\\(A\\(CTIVITY\\|PPLICATION\\)\\|DATA\\|MODEL\\|PARTICIPANT\\|T\\(OOL_LIST\\|RANSITION\\)\\|WORKFLOW\\)\\|MODEL\\|PARTICIPANT\\|T\\(OOL_LIST\\|RANSITION\\)\\|WORKFLOW\\)\\>" . font-lock-builtin-face)
'("\\('\\w*'\\)" . font-lock-variable-name-face))
"Minimal highlighting expressions for WPDL mode")
Now we have defined our minimal set of keywords for emacs to highlight. A font-lock-keywords variable
is a list of keywords to highlight. There are many ways to specify this list. I have used the form
(matcher . facename)
. With this form, I have specified a pattern to match, and then a face name
to use for the actual highlighting.
There are two elements to my list: the first element matches WPDL language keywords, and the second element matches WPDL identifier names (variable names). I have selected the appropriate font-lock face names for each type of keyword (
For my keyword list, I've selected those WPDL keywords which would benefit most from being highlighted: keywords that delimit blocks of information. One may notice that the regexp used to specify these keywords is optimized. I did not have to do this by hand. Emacs provides the
function to save you from the tedious work of creating complicated regexps.
takes a list of strings and an additional optional argument. This optional argument controls whether or not we want to wrap the entire regexp in parens. In our case, we do. For example, the following expression:
(regexp-opt '("PARTICIPANT" "END_PARTICIPANT" "MODEL" "END_MODEL" "WORKFLOW" "END_WORKFLOW" "ACTIVITY" "END_ACTIVITY" "TRANSITION" "END_TRANSITION" "APPLICATION" "END_APPLICATION" "DATA" "END_DATA" "TOOL_LIST" "END_TOOL_LIST") t)
Results in this regexp:
Because font-lock is so resource-intensive, providing optimized regexps to font-lock should provide a boost in performance.
Finally, the regexp is wrapped in
. These indicate that the regexp should only match keywords if they are surrounded by either a space or a beginning/end-of-file. This ensures that, for example, the keyword
is not highlighted in the word
'("\\<\\(AUTHOR\\|C\\(ONDITION\\|REATED\\)\\|DE\\(FAULT_VALUE\\|SCRIPTION\\)\\|EXTENDED_ATTRIBUTE\\|FROM\\|I\\(MPLEMENTATION\\|N_PARAMETERS\\)\\|JOIN\\|NAME\\|O\\(THERWISE\\|UT_PARAMETERS\\)\\|PERFORMER\\|ROUTE\\|S\\(PLIT\\|TATUS\\)\\|T\\(O\\(OLNAME\\)?\\|YPE\\)\\|VENDOR\\|WPDL_VERSION\\)\\>" . font-lock-keyword-face)
'("\\<\\(TRUE\\|FALSE\\)\\>" . font-lock-constant-face)))
"Additional Keywords to highlight in WPDL mode")
Now I've defined the second level of highlighting. Note that the second level is appended to the first level, resulting in a single keyword variable that matches everything in both levels. Here I've specified even more keywords, along with two common WPDL constant values, TRUE and FALSE.
'("\\<\\(A\\(ND\\|PPLICATIONS\\)\\|BOOLEAN\\|HUMAN\\|INTEGER\\|NO\\|OR\\(GANISATIONAL_UNIT\\)?\\|R\\(EFERENCE\\|OLE\\)\\|S\\(TRING\\|YNCHR\\)\\|UNDER_REVISION\\|WORKFLOW\\|XOR\\)\\>" . font-lock-constant-face)))
"Balls-out highlighting in WPDL mode")
I've now defined more WPDL constants. This completes the list of WPDL keywords.
(defvar wpdl-font-lock-keywords wpdl-font-lock-keywords-3
"Default highlighting expressions for WPDL mode")
Here I've defined the default level of highlighting to be the maximum. This is just my preference— the user can change this variable (if the user knows how! This might be something to put in the documentation for your own mode).
WPDL features a Pascal-like syntax. This provides a natural basis for indentation. Blocks of information can be indented away from their parent. Fortunately, this doesn't seem to be too difficult to accomplish with Emacs— the indentation code for
is only 28 lines long. Here is some example WPDL code, including indentation:
NAME "In the Mail Room"
NAME "Mail Room"
I have identified five rules for indenting WPDL code. The rules are as follows:
- If we are at the beginning of the buffer, indent to column 0.
- If we are currently at an
END_ line, then de-indent relative to the previous line.
- If we first see an
END_ line before our current line, then we should indent our current line to the same indentation as the
- If we first see a "start line" like
PARTICIPANT, then we need to increase our indentation relative to that start line.
- If none of the above apply, then do not indent at all.
The following WPDL code example, with comments, may clarify these ideas:
// My activity // Rule 1 applies
ACTIVITY 'MailRoom' // Rule 5 applies
NAME "Mail Room" // Rule 4 (based on "ACTIVITY")
TOOL_LIST // Rule 4 (based on "ACTIVITY")
'scan_document' // Rule 4 (based on "TOOL_LIST")
'identify_document' // Rule 4 (based on "TOOL_LIST")
'send_document' // Rule 4 (based on "TOOL_LIST")
END_TOOL_LIST // Rule 2
PERFORMER 'Joe' // Rule 3
END_ACTIVITY // Rule 2
(defun wpdl-indent-line ()
"Indent current line as WPDL code"
We start by defining a single function for determining how a given line should be indented. It may be helpful to make the function interactive
, to aid you in testing your indentation function. Making the function interactive allows you to call the function directly using
. Also, we set the point
to the beginning of the line.
(if (bobp) ; Check for rule 1
The first indentation-related thing we do is to check to see if this is the first line in the buffer, using the function bobp
. If it is, we set the indentation level to 0, using
indents the current line to the given column. Please note that if this condition is true, then the rest of the indentation code is not considered.
(let ((not-indented t) cur-indent)
Now we declare two variables. We will store the value of our intended indentation level for this line in
. Then, when all of the indentation options have been considered (rules 2-5), we will finally make the indentation.
The use of
will become clear later.
(if (looking-at "^[ \t]*END_") ; Check for rule 2
(setq cur-indent (- (current-indentation) default-tab-width)))
If we are not
at the beginning of the buffer, then we start to consider other indentation options. What we do here is to check to see if we are at the end of a block. In WPDL, blocks are ended by keywords that start with
. So, we check to see if we are on such a line by using the looking-at
function, using a regexp that will detect if we are at a line that starts with
. Remember, we are at the beginning of the line, so we need to include any spaces or tabs in the regexp.
If we see that we are at the end of a block, we then set the indentation level. We do this by going to the previous
line (using the
function), and then use the
function to see how that line is indented. Then we set
with the value of the previous line's indentation, minus
(if (< cur-indent 0)
(setq cur-indent 0)))
We also include a safety check, so that we don't try to indent past the left margin.
(if (looking-at "^[ \t]*END_") ; Check for rule 3
(setq cur-indent (current-indentation))
(setq not-indented nil))
; Check for rule 4
(if (looking-at "^[ \t]*\\(PARTICIPANT\\|MODEL\\|APPLICATION\\|WORKFLOW\\|ACTIVITY\\|DATA\\|TOOL_LIST\\|TRANSITION\\)")
(setq cur-indent (+ (current-indentation) default-tab-width))
(setq not-indented nil))
(if (bobp) ; Check for rule 5
(setq not-indented nil)))))))
If we are not looking at an
line, then we iterate backward through the code to find an "indentation hint". An indentation hint is some token in our file which can tell us how to indent the line we are on now
. The rules I have provided earlier
tell us what the indentation hints are. At this point, we only need to find the hints for rules 3-5, as the first two rules have already been covered by previous code.
There aren't any new Emacs lisp functions introduced here. The only thing worth noting is the use of
as a sentinel value for our while loop.
(indent-line-to 0))))) ; If we didn't see an indentation hint, then allow no indentation
Finally, we execute the actual indentation, if
we have actually identified an indentation case. We have (most likely) already stored the value of the indentation in
is empty, then we always indent to column 0.
And that concludes the indentation code for WPDL. Exercise
: This indentation code is rather simple. In which cases would the code fail to create proper indentation of WPDL code?
Now we will set up a syntax table
for WPDL. A syntax table tells Emacs how it should treat various tokens in your text for various functions, including movement within the buffer and syntax highlighting. For example, how does Emacs know to move forward by one word (as used in the
function)? The syntax table gives Emacs this kind of information. The syntax table is also used by the syntax highlighting package. It is for this reason that we want to modify the syntax table for
(let ((wpdl-mode-syntax-table (make-syntax-table)))
We will use the
function to create an empty syntax table. This function creates a syntax table that is a good start for most modes, as it either inherits
or copies entries from the standard syntax table.
(modify-syntax-entry ?_ "w" wpdl-mode-syntax-table)
The first modification we make to the syntax table is to declare the underscore character '_
' as being a valid part of a word. So now, a string like
will be treated as one word rather than two (the default Emacs behavior). We do this because we want to make it easier to treat WPDL variable names (which use underscores by convention), and, more importantly, keywords, as single words. NB
: Treating underscores as non-whitespace is non-standard Emacs behavior. Here, I feel justified in including this modification, since so many WPDL keywords include underscores.
function takes a character as its first argument, a syntax class
as its second argument, and the syntax table to be modified as the third argument. In Emacs Lisp, characters are represented by using the
symbol followed by the actual character, so we use
to represent the underscore. The syntax class indicates how a particular character is treated. Example syntax classes
include "punctuation character," "open parenthesis indicator", and "word constituent."
is the symbol for "word constituent."
(modify-syntax-entry ?/ ". 124b" wpdl-mode-syntax-table)
(modify-syntax-entry ?* ". 23" wpdl-mode-syntax-table)
(modify-syntax-entry ?\n "> b" wpdl-mode-syntax-table)
WPDL comments are just like C++ comments. So, our goal is to program C++ comments into the WPDL syntax table. To do this, we need to use some extra syntax class parameters called syntax flags
. Some syntax classes have these extra parameters to further refine the place of the character within the classes. The syntax class used here is
, which means "punctuation character."
If we consult the Emacs lisp programming guide and see what the syntax flags mean, we will see that we have made the following adjustments to the syntax table:
1) That the character
is the start
of a two-character comment sequence (
), that it may also be the second character of a two-character comment-start sequence (
), that it is the end
of a two-character comment-start sequence (
), and that comment sequences that have this character as the second character in the sequence is a "b-style" comment (
). It's a rule that comments that begin with a "b-style" sequence must end with either the same or some other "b-style" sequence.
2) That the character
is the second character of a two-character comment-start sequence (
) and that it is the start
of a two-character comment-end
3) That the character
(which is the newline character) ends
a "b-style" comment.
Now we have programmed our comment style into the syntax table. The syntax highlighting mechanism (
) will now read the syntax table and highlight WPDL comments accordingly.
"Syntax table for wpdl-mode")
These lines finish the definition of the syntax table, including the document string.
Finally, we will create the function that will be called by Emacs when the mode is started.
(defun wpdl-mode ()
"Major mode for editing Workflow Process Description Language files"
Here we define our entry function, give it a documentation string, make it interactive, and call our syntax table creation function. We also set our keymap for the mode.
(set (make-local-variable 'font-lock-defaults) '(wpdl-font-lock-keywords))
Now we are specifying the font-lock (syntax highlighting) default keywords. Note that if the user has specified her own level of keyword highlighting by redefinine
, then that will be used instead of the default.
(set (make-local-variable 'indent-line-function) 'wpdl-indent-line)
Here we register our line indentation function with Emacs. Now Emacs will call our function every time line indentation is required (like when the user calls
(setq major-mode 'wpdl-mode)
(setq mode-name "WPDL")
The last steps in the entry function are to set the
variable with the value of our mode, to set the
variable (which determines what name will appear in the status line and buffers menu, for example), and to finally call
so that the user's own mode hooks will be called.
The most important line
Finally, we use
to expose our mode to the Emacs environment. Users will most likely want to use the
function in the
file to load the mode into the environment.
The Emacs lisp manual has a section about style
, including this part about major mode style
. In addition to following these guidelines, you may want to use the checkdoc
tool to help you get your style issues under control, especially concerning the format of the initial comment block.
Stefan Monnier, an Emacs maintainer, has suggested alternative approaches for indentation and the entry function. His sample mode
gives an outline of these alternative approaches. For example, our entry function could be greatly simplified by deriving
. The code to do this is given here:
; This can replace (defun wpdl-mode ()...
(define-derived-mode wpdl-mode fundamental-mode "WPDL"
"Major mode for editing Workflow Process Description Language files."
(set (make-local-variable 'font-lock-defaults) '(wpdl-font-lock-keywords))
(set (make-local-variable 'indent-line-function) 'wpdl-indent-line))
Thanks to S. Monnier, C. Marquardt, A. Mackenzie, Alex, A. Eliasen
, Y. Rabkin, Matt, Cyril, and others for making suggestions and corrections or asking pertinent questions.
Make or see comments on this page
Last modified: Wed May 12 18:09:30 FLE Daylight Time 2004 Scott Andrew Borton @two-wugs