What’s the easiest way you know of to tokenize an arithmetic expression in javascript? Let’s say you’re building a calculator application, and want this to happen:

1
2
3
4
console.log(
tokenize('100-(5.4 + 2/3)*5')
)
// [ '100', '-', '(', '5.4', '+', '2/3', ')', '*', '5' ]

Before you reach into your npm module bag-o-tricks, realize that this can be done in one line of javascript using a secret feature of the string split method. Behold:

1
2
3
4
5
'100-(5.4+2/3)*5'
.split(/(-|\+|\/|\*|\(|\))/)
.map(s => s.trim())
.filter(s => s !== '')
// [ '100', '-', '(', '5.4', '+', '2/3', ')', '*', '5' ]

Excuse me? What’s that hot mess inside the split function? Let’s break it down step by step using a few examples of increasing complexity:


Example 1: s.split(/-/)

Pretty obvious: this splits the string s anywhere it sees the minus sign symbol -.

1
2
'3-2-1'.split(/-/)
// ["3", "2", "1"]

Example 2: s.split(/(-)/)

The only difference from the previous example is the enclosing parens in the regex, which creates a capturing group. Here’s the key point of the entire article: If the regular expression contains capturing parentheses around the separator, then each time the separator is matched, the results of the capturing group are spliced into the output array.

1
2
'3-2-1'.split(/(-)/)
// ["3", "-", "2", "-", "1"]

Example 3: s.split(/(-|\+)/)

This builds off the previous example by adding support for the addition symbol \+. The backslash \ is required to escape the regex. The vertical pipe | acts as an OR statement (match - OR +).

1
2
'3-2-1+2+3'.split(/(-|\+)/)
// ["3", "-", "2", "-", "1", "+", "2", "+", "3"]

The Final Boss (tying everything together)

Hopefully, you now have all tools needed to understand .split(/(-|\+|\/|\*|\(|\))/). Hope that made sense! Let me know in the comments if you liked this article, or ping me on twitter!