Sample app: Convert spreadsheet tables to XLOOKUP formulas

Our latest sample app converts spreadsheet column data to formulas that use the XLOOKUP function. This post explains in detail how it was built.

August 11, 2023 update: We now recommend that you use the data editor to import and edit spreadsheet data. It is significantly easier to use than the method described here.

Original text: We publish sample apps as a means of demonstrating Calcapp features and techniques that you may want to use in your own apps that you distribute to your own users. This sample app is different — the intended user is you, the app author, and it helps you convert spreadsheet data to formulas that use the new XLOOKUP function.

Read this blog post to get a high-level overview of how to use this app to import table data and to learn how XLOOKUP works. This post explains in detail how to use the app converting spreadsheet data and how it was written.

To summarize, Calcapp now supports the XLOOKUP function, which is equivalent to the classic spreadsheet functions VLOOKUP and HLOOKUP. XLOOKUP accepts the sought value and two arrays, one containing values to find (the lookup array) and one containing the values to return (the result array).

This formula returns 300, because the sought value, 30, is the third element in the first array, which causes XLOOKUP to return the third element in the second array, 300:

XLOOKUP(30, { 10, 20, 30 }, { 100, 200, 300 })XLOOKUP(30; { 10; 20; 30 }; { 100; 200; 300 })

As Calcapp does not yet have native support for data tables, the data must be converted to XLOOKUP formulas. That’s the job of this app.

Usage

Here’s an embedded version of the app, which you can also run as a stand-alone app:

For step-by-step instructions on how to use this app, refer to this blog post.

Copy the lookup values from a spreadsheet table and paste them into the Lookup values field. Then, copy the result values and paste them into the Result values field. The generated XLOOKUP formula then becomes available.

Be sure to select the decimal separator you’re using using the Decimal separator field, to ensure that the generated formula works well with Calcapp Creator. If you use a decimal comma as a decimal separator, the app will use semicolons to separate parameters. If you use a decimal point as a decimal separator, the app will use commas to separate parameters.

Also, the app automatically detects if the values are numbers or text. If they are numbers (potentially starting with a currency symbol, such as “$”), the elements of the generated array are not quoted, ensuring that you get numbers — and not text — back when you call the function.

Under the hood

This app makes heavy use of Calcapp’s new formula engine, which we introduced with our November release. Thanks to the new features, this app is considerably simpler than the sample app that generates CHOOSE formulas we wrote in 2018.

You’re encouraged to view the app and its formulas as you read along. It’s available as a template when you create a new app under the Sample: XLOOKUP generator name.

Note: This post explains in detail how the XLOOKUP generator app is implemented. Feel free to skip it altogether if you’re just interested in using the app and not in how it was written.

Handling separators

This app works with both decimal points and decimal commas. It is important for the app to have access to this information, because the generated XLOOKUP formula uses commas as a parameter separator if a decimal point is selected and semicolons otherwise. Also, the app recognizes when users enter numbers as lookup or result values and leaves out quotation marks if that is the case.

There is a text drop-down field named Decimal separator. In order to insulate the rest of the app from having to know about the actual wording of the options available to the user, there are two hidden text fields that expose the chosen decimal separator and parameter separator as text strings.

The first is named DecimalSeparatorText and this is its value formula:

IF(DecimalSeparator = "Decimal point", ".", ",")IF(DecimalSeparator = "Decimal point"; "."; ",")

The second is named ParameterSeparatorText and this is its value formula:

IF(DecimalSeparator = "Decimal point", ",", ";")IF(DecimalSeparator = "Decimal point"; ","; ";")

Detecting numbers with a regular expression

It is important that the app can recognize numbers. To do so, we would normally use a function like PARSENUMBER, which returns an error if the given text string is not a number and the parameter as a number otherwise. This doesn’t work with this app, though, as it needs to be able to work with both decimal points and decimal commas.

PARSENUMBER does not work for this app, as it takes the language of the app into account and its decimal separator (a decimal point for US English and a decimal comma for German, for instance). PARSENUMBER can’t be coaxed into working with decimal commas if the language of the app uses a decimal point, and vice versa.

As a result, we need to detect if a text string is a number using a regular expression. These text strings can be notoriously tricky to get right, and we’ll be the first to admit that writing this sample app required a trip to the excellent regex101.com to make our regular expression work.

(Have you never heard of regular expressions before? They are a powerful tool to work with text. We wrote a lengthy introduction back in 2018 when Calcapp gained support for them. Be sure to read that post first if you want to understand this section. If not, feel free to skip ahead.)

To determine if a text string matches a regular expression, we need to use the REGEXMATCH function. \d matches a single digit, and \d+ matches one or multiple digits.

As a result, both these formulas return TRUE, indicating that both “123” and “123456789” match the regular expression \d+:

REGEXMATCH("123", "\d+")REGEXMATCH("123"; "\d+")
REGEXMATCH("123456789", "\d+")REGEXMATCH("123456789"; "\d+")

The regular expression \d+ can be used to match multiple integers, but what about decimal numbers? If a decimal point is used, \d+\.\d+ matches a decimal number. (The backslash before the period is there because periods are otherwise special characters in regular expressions that match anything. With a backslash, the period in the formula above matches a literal period.)

To make the regular expression work for arbitrary numbers — integers or decimal numbers — we need to make the decimal part optional. Consider this regular expression:

\d+(\.\d+)?

The part in parentheses matches the decimal part of the number. By putting a question mark after the closing parenthesis, the entire decimal part is made optional.

There are five more changes we need to make to make the regular expression complete, though. First, it must accept negative numbers, so we add a leading minus sign, followed by a question mark to make the minus sign optional:

-?\d+(\.\d+)?

Second, it must accept numbers that are preceded by a currency symbol (such as “$”). We don’t want these lines to count as text lines. \p{Sc} matches a currency symbol in a regular expression. It needs to go after the minus sign:

-?\p{Sc}?\d+(\.\d+)?

Third, we need to accept decimal numbers where the zero isn’t spelled out (like in .1). To do so, we must change the first + character, which accepts one or multiple digits, to a * character, which accepts zero or multiple digits:

-?\p{Sc}?\d*(\.\d+)?

Fourth, we only want to accept text strings that only consist of numbers, meaning that “3a” or “a3” should not qualify. To do so, we need to add so-called anchors. ^ represents the beginning of the text string, and is added first, and $ represents the end of the text string, and is added last:

^-?\p{Sc}?\d*(\.\d+)?$

Above, the anchors prevent arbitrary text from appearing between the beginning of the text string and the number, and between the number and the end of the text string.

Finally, we need to accept a decimal comma and not a decimal point if that is the user’s preference. Remember that the symbol the user prefers is stored as the value of the DecimalSeparatorText hidden field. That means that we need to replace \. in the expression with \, if that is the case. (A leading backslash isn’t strictly speaking needed here, as a comma has no special meaning in this context. It does no harm, though, so it stays.) We use the & operator to join together the various pieces of the expression we need.

Putting it all together, this is the formula for the value of the hidden field NumberRegex, which we’ll refer back to from other formulas later:

"^-?\p{Sc}?\d*(\" & DecimalSeparatorText & "\d+)?$""^-?\p{Sc}?\d*(\" & DecimalSeparatorText & "\d+)?$"

Determining if a text field only consists of numeric lines of text

Users are expected to paste their lookup values into the LookupValues text field and their result values into the ResultValues text field. Below each of them, we add two hidden logical fields which determine if the text fields only hold numbers, named LookupValuesAreNumbers and ResultValuesAreNumbers.

It is possible to use the REGEXMATCH function directly with text strings consisting of multiple lines. However, we won’t do that here, instead we’ll use the new TEXTSPLIT function to create an array of text strings (with one line per array element) from the original text string. If you’re interested in the other approach, refer to our write-up for the app producing CHOOSE formulas.

TEXTSPLIT breaks a text string apart and returns the pieces as an array of text strings. This formula returns the array { "1", "2", "3" }{ "1"; "2"; "3" }:

TEXTSPLIT("1-2-3", "-")TEXTSPLIT("1-2-3"; "-")

In the formula above, “-“ is used as the delimiter between the returned array elements. With a text field accepting multiple lines, the delimiter we are looking for is a line break character, which the new NEWLINE function returns.

As such, this formula breaks the text string LookupValues apart, returning an array of the lines of text it consists of:

TEXTSPLIT(LookupValues, NEWLINE())TEXTSPLIT(LookupValues; NEWLINE())

To determine if the individual text lines can be considered numbers, we can invoke the REGEXMATCH function directly on the array:

REGEXMATCH(TEXTSPLIT(LookupValues, NEWLINE()), NumberRegex)REGEXMATCH(TEXTSPLIT(LookupValues; NEWLINE()); NumberRegex)

The second parameter is set to NumberRegex, which is the hidden text field we constructed earlier holding the regular expression which matches numbers.

REGEXMATCH normally accepts two parameters (and a few optional parameters we won’t discuss here). The first one is a text string to match against a regular expression, given as the second parameter. It returns TRUE if the regular expression matches the given text string and FALSE otherwise.

Our new formula engine ensures that REGEXMATCH can also work with arrays. If the first parameter is an array — which it is, in this case — the result is also an array. Specificially, the result is an array of logical values, with the same size as the original text array. If { TRUE, FALSE, TRUE }{ TRUE; FALSE; TRUE } is returned, that means that the array given to REGEXMATCH contains three elements, and that the first and third elements match the regular expression given as the second parameter, while the second element does not.

So how do we go from a logical array, where each element signifies whether a single line can be interpreted as a number, to a single logical value, where TRUE means that all elements can be interpreted as numbers and FALSE indicates that there is at least one element that does not qualify?

Enter AND. This formula returns FALSE, because there is at least one FALSE element:

{ TRUE, TRUE, FALSE, TRUE, TRUE }{ TRUE; TRUE; FALSE; TRUE; TRUE }

AND only returns TRUE if all elements given to it are TRUE. This formula returns TRUE:

{ TRUE, TRUE, TRUE, TRUE, TRUE }{ TRUE; TRUE; TRUE; TRUE; TRUE }

Putting it all together, this is the formula associated with the value of the LookupValuesAreNumbers hidden field:

AND(REGEXMATCH(TEXTSPLIT(LookupValues, NEWLINE()), NumberRegex))AND(REGEXMATCH(TEXTSPLIT(LookupValues; NEWLINE()); NumberRegex))

The formula for the ResultValuesAreNumbers hidden field is similar.

Generating the XLOOKUP formula

The GeneratedXLOOKUPFormula field contains the generated formula. The formula of the value of this field uses the & operator heavily to join text strings together. It makes use of the hidden field ParameterSeparatorText to ensure that it inserts the proper parameter separator (a comma if a decimal point is used and a semicolon otherwise).

There are two final hidden fields we need in order to make the formula for the value of the GeneratedXLOOKUPFormula field more readable. They are named LookupQuoteIfNeeded and ResultQuoteIfNeeded.

The idea is that these hidden fields should contain a single quotation mark (“), but only if the corresponding text field does not exclusively contain lines that can be interpreted as numbers. These hidden text fields can then be referenced from the final formula as an easy means of inserting quotation marks, but only if they are needed.

Here’s the formula for the value property of the LookupQuoteIfNeeded field:

IF(!LookupValuesAreNumbers, """")IF(!LookupValuesAreNumbers; """")

The second parameter to IF may look a bit odd, but that’s actually how a single quotation mark is written in Calcapp (and in spreadsheets). Of the four quotation marks, the first and the last demarcate the text string. To write a single quotation mark, two quotation marks must be written, because a single one would end the text string prematurely.

Before we look at the final formula, let’s look at the expected results from the app. Let’s say that we have a very simple table, consisting of only two rows and two columns. Here it is:

Name Age
Louise 53
Ahmed 28

The user is expected to paste the contents of the first column into the LookupValues field, and the contents of the second column into the ResultValues field. We then want the app to generate this formula if the preferred decimal separator is a decimal point:

XLOOKUP(SoughtValue, { "Louise", "Ahmed" }, { 53, 28 })XLOOKUP(SoughtValue, { "Louise", "Ahmed" }, { 53, 28 })

We want the app to generate this formula if the preferred decimal separator is a decimal comma:

XLOOKUP(SoughtValue; { "Louise"; "Ahmed" }, { 53; 28 })XLOOKUP(SoughtValue; { "Louise"; "Ahmed" }, { 53; 28 })

Before we move on to the final formula that generates the formulas above, let’s consider what it takes to generate just the arrays to XLOOKUP. We have already concluded that we can convert the value of the LookupValues text field (complete with line breaks) to an array of its constituent lines with this formula:

TEXTSPLIT(LookupValues, NEWLINE())TEXTSPLIT(LookupValues; NEWLINE())

How do we put it back together to form a text string, once more, this time with different delimiters? Remember that the delimiter used above is a line break. This time around, we want the delimiter to be a parameter separator character (a comma or a semicolon), followed by a space for readability.

To achieve this, we use the TEXTJOIN function. This formula does exactly what is described in the preceding paragraph:

TEXTJOIN(ParameterSeparatorText & " ", TRUE, TEXTSPLIT(LookupValues, NEWLINE()))TEXTJOIN(ParameterSeparatorText & " "; TRUE; TEXTSPLIT(LookupValues; NEWLINE()))

(The second parameter, TRUE, instructs TEXTJOIN to ignore blank values.)

Now, what if the lookup values consist of text strings (meaning that the value of the LookupValuesAreNumbers hidden field is FALSE)? Then we need to use not only , or ; as delimiters, but " , " or " ; " (so that the elements of the array end with and start with a quotation mark).

This formula does just that:

TEXTJOIN(LookupQuoteIfNeeded & ParameterSeparatorText & " " & LookupQuoteIfNeeded, TRUE, TEXTSPLIT(LookupValues, NEWLINE()))TEXTJOIN(LookupQuoteIfNeeded & ParameterSeparatorText & " " & LookupQuoteIfNeeded; TRUE; TEXTSPLIT(LookupValues; NEWLINE()))

To truly generate an array, though, we also need the begin and end braces, { and }. Also, we potentially need to add quotation marks to the beginning of the first element and to the end of the last element.

Here’s the formula for a hidden text field named GeneratedLookupArray:

"{ " & LookupQuoteIfNeeded & TEXTJOIN(LookupQuoteIfNeeded & ParameterSeparatorText & " " & LookupQuoteIfNeeded , TRUE, TEXTSPLIT(LookupValues, NEWLINE())) & LookupQuoteIfNeeded & " }""{ " & LookupQuoteIfNeeded & TEXTJOIN(LookupQuoteIfNeeded & ParameterSeparatorText & " " & LookupQuoteIfNeeded ; TRUE; TEXTSPLIT(LookupValues; NEWLINE())) & LookupQuoteIfNeeded & " }"

There is a similar hidden text field named GeneratedResultArray.

Here’s the complete formula, combining the values of GeneratedLookupArray and GeneratedResultArray:

"XLOOKUP(" & SoughtValue & ParameterSeparatorText & " " & GeneratedLookupArray & ", " & GeneratedResultArray & ")""XLOOKUP(" & SoughtValue & ParameterSeparatorText & " " & GeneratedLookupArray & ", " & GeneratedResultArray & ")"

One last thing

At first blush, the complete formula above appears to work very well. It works for plain numbers and it works for text strings.

However, what about numbers that are preceded by currency symbols? The NumberRegex field treats them as numbers, which leads this app to generate an XLOOKUP formula such as this one:

XLOOKUP(SoughtValue, { "Louise", "Ahmed" }, { $53, $28 })XLOOKUP(SoughtValue; { "Louise"; "Ahmed" }; { $53; $28 })

That’s not a valid Calcapp formula — numbers can’t be preceded by currency symbols.

To fix this problem, we need to take steps to remove the currency symbols. To do so, we’ll modify the TEXTSPLIT parts of the formulas for the values of GeneratedLookupArray and GeneratedResultArray. Here is part of the formula for the value of GeneratedResultArray:

TEXTSPLIT(ResultValues, NEWLINE())TEXTSPLIT(ResultValues; NEWLINE())

(We’ll show this exclusively for result values, the formula for the value of GeneratedLookupArray also needs to be modified.)

First off, if the result values are not numbers, the formula above is exactly what we need — no currency symbols should be removed. We’ll achieve this with IF:

IF(ResultValuesAreNumbers, …, TEXTSPLIT(ResultValues, NEWLINE()))IF(ResultValuesAreNumbers; …, TEXTSPLIT(ResultValues; NEWLINE()))

We’ll need to replace in the formula above with a formula fragment that removes currency symbols.

To achieve this, we’ll rely on another regular expression:

([\P{Sc}]*)\p{Sc}?(.*)

\P{Sc} means “not a currency symbol” and \p{Sc} means “a currency symbol.” What this regular expression does is that it divides a text string into two halves (captured by two capturing groups), one consisting of the text that appears before a currency symbol and one consisting of the text that appears after. The currency symbol itself appears between the two halves and is not part of either capturing group.

(Capturing groups appear between parentheses in regular expressions and capture text, in this case the two halves mentioned above.)

The REGEXEXTRACT function can be used to extract the text captured by a capturing group. To use it, we need to provide the number of the capturing group as the third parameter (1 or 2, in this case, as there are two capturing groups).

Using REGEXEXTRACT would work — but would not be ideal, as we’d then need to repeat the REGEXEXTRACT invocation (once with 1 as the third parameter and once with 2 as the second parameter) and merge together the results using the & operator.

The new REGEXEXTRACTALL function is different from REGEXEXTRACT in that it returns the contents of all capturing groups as an array. That’s perfect, as we can then merge together the two halves (without the currency symbol) not with &, but with TEXTJOIN, with a delimiter set to """".

Here’s the formula with the part filled in:

IF(ResultValuesAreNumbers, TEXTJOIN("", FALSE, REGEXEXTRACTALL(TEXTSPLIT(ResultValues, NEWLINE()), "([\P{Sc}])\p{Sc}?(.)")), TEXTSPLIT(ResultValues, NEWLINE()))IF(ResultValuesAreNumbers; TEXTJOIN(""; FALSE; REGEXEXTRACTALL(TEXTSPLIT(ResultValues; NEWLINE()); "([\P{Sc}])\p{Sc}?(.)")); TEXTSPLIT(ResultValues; NEWLINE()))

Here’s the complete formula for GeneratedResultArray:

"{ " & ResultQuoteIfNeeded & TEXTJOIN(ResultQuoteIfNeeded & ParameterSeparatorText & " " & ResultQuoteIfNeeded, TRUE, IF(ResultValuesAreNumbers, TEXTJOIN("", FALSE, REGEXEXTRACTALL(TEXTSPLIT(ResultValues, NEWLINE()), "([\P{Sc}])\p{Sc}?(.)")), TEXTSPLIT(ResultValues, NEWLINE()))) & ResultQuoteIfNeeded & " }""{ " & ResultQuoteIfNeeded & TEXTJOIN(ResultQuoteIfNeeded & ParameterSeparatorText & " " & ResultQuoteIfNeeded; TRUE; IF(ResultValuesAreNumbers; TEXTJOIN(""; FALSE; REGEXEXTRACTALL(TEXTSPLIT(ResultValues; NEWLINE()); "([\P{Sc}])\p{Sc}?(.)")); TEXTSPLIT(ResultValues; NEWLINE()))) & ResultQuoteIfNeeded & " }"
« Backward compatibility and our new formula engine Tip: Avoid gotchas when adding text boxes and buttons in the middle of a range »