Conceptually, a computer program is an algorithm: a set of rules, logic, and/or procedures in order
to accomplish a task. In order for the program to run, it requires a language that translates these
rules, logic, and/or procedures in a way that the program understands. Generally speaking, programs accept inputs
and translate them into outputs.
Considered from end-to-end, data analytics requires a broad range of programs in order to extract and deliver insights from data:
R utilizes procedural languages C and FORTRAN for computation as well as the R language. It is a widely-used and flexible statistical computing software with some generalized programming ability. It was developed in the 90s. As an open-source platform, users can submit packages that include functions, tools, and statistical procedures. There are currently thousands of packages. Most uses of R require installation and loading of several packages.
Unfortunately, I know roughly nothing about the Python language, but it is gaining significantly more popularity. It is similar to R in that it is open source with individual/joint contributors. In contrast to R, it is a more generalized programming platform. As with most programming languages, there are a vast number of resources on the web to get you started if you're interested.
SAS has been the workhorse of statistics for decades. Most dates in programs have numeric representation of dates in order to facilitate adding and subtracting. January 1, 1960 represents the number 0 in SAS, which is around the time the software began to be developed. Excel users will likely recognize that Microsoft uses December 31, 1899 as the zero point such that January 1, 1900 = 1. SAS is purely a statistical computing software that consists of two primary steps: DATA and PROC. DATA steps can create and manipulate data whereas PROC steps perform an action on the data. The SAS language is primarily procedure-based, but there is some flexibility through user-defined functions and macros.
For analytics in the private sector, SQL is almost universally a requirement for data retrieval, manipulation, cleaning, and segmentation. Corporations tend to rely heavily on Relational Databases, for which the SQL language was designed. SQL is generally unlike the three programs listed above. By itself, SQL (Structured Query Language) follows the declarative paradigm: the query tells the program what it requires from a database and how to structure the results. However, the structure of the results are not customizable beyond the SQL language. Importantly, however, there are several procedural extensions to SQL, with PL/SQL and T-SQL being the most popular. These extensions bring procedural elements to SQL, such as logic and loops. SQL is almost exclusively used with a procedural extension, which allows the user to customize the output through data definitions, etc.
Visual Basic (VB), the underlying language of VBA, is becoming increasingly obsolete outside of a handful of cases. In fact, VB is no longer supported by Microsoft. However, its implementation in Visual Basic for Applications (VBA) makes it a nearly ubiquitous feature of every corporation. Because of the heavy reliance on Microsoft applications, particularly Excel, VBA will continue to be relevant for computer programmers and analysts. VBA uses a host application (such as Excel) and allows the user to control the host application and its objects. VBA utilizes concepts from several programming language paradigms, including procedural, object-oriented, and event-driven. Unlike procedural and object-oriented programming paradigms, execution of an event-driven program can be influenced by an end user of the program. For example, the user selecting "yes" to a prompt delivered by the program can alter the course of the program. As anyone who has used a macro knows, the end user frequently does not need to know anything about VBA in order to use the developed program (referred to as a macro).
Plenty of other statistical computing and programming softwares are available to analysts, each with their merits and limitations. Some are better equipped for large-scale implementations of analytical solutions; others are better equipped for day-to-day, smaller-scale analytical tasks. Many are open source because the coding/analytics community has thankfully championed democratization. Some of the softwares I've encountered are:
Despite different types of programming, they involve similar concepts that are shared across languages. For example, object-oriented and non-object-oriented programming use variables and procedures such as functions. The syntax varies from language to language, but the concepts are shared. The concepts are:
Variables allow the program to be dynamic. For example, a program can assign a numeric variable an approximately infinite number of possible values. In many programs, objects can also be treated as variables. For example, in VBA, you can assign a specific worksheet to a variable (i.e., a worksheet object).
For users of Excel, functions are a familiar concept. They take arguments (sometimes called parameters), which are then passed through the function's algorithm(s). For example, in Excel, the SUM function computes the sum of numbers. =SUM(1, 2, 3) provides the sum of the numbers 1, 2, and 3, which is 6. In programming, it is common to create functions in order to automate a task that would otherwise require multiple lines of code.
Logic is necessary for almost every program. It tells the program what to do if a certain condition is met (or is not met). This usually takes the form of an If...Then... statement and is one of the fundamental components of the procedural paradigm.
Loops are utilized across programming languages in order to perform many iterations of the same task. For example, if I wanted to color every cell in an Excel worksheet based on its value, I'd need to loop through each cell within that Worksheet. Even with thousands of cells, a loop condenses the procedure into only a handful of lines of code.
It is very good practice to format code consistently and with commonly used standards. Never underestimate the power of whitespace, indentation, and comments for making your code more readable. Remember, code readability is not only for posterity's sake, but for yours as well.
Even the best programmer will encounter errors during development. Knowing how to work through the error is known as "debugging". The program usually has built-in error handling to notify the programmer of the type of error and the location. Errors can sometimes lead to an hour of head-scratching before the solution is found. Therefore, understanding basic errors and what to look for when they occur are important measures to maintain sanity while coding.
Variables are a requirement of any program. They usually are of a specific type or "class," ranging from integers
to variables of a specific type of object within the language.
Variables in programming offer flexibility. Imagine you are trying to program a calculator-type program that adds two numbers.
The user inputs these two numbers, and then the program displays the answer. Without variables, you cannot program all
possible inputs and outputs:
1+1 = 2
1+1.1 = 2.1
1+1.11 = 2.11
....
However, with variables, the task becomes much simpler:
x+y = answer
In the first example, the program would have to search for the user inputs in a field of already calculated outputs.
In the second example, the program calculates the output (answer) based on the user inputs (x and y).
Numeric variables range from integers (no decimal precision) to float numeric variables, which replicate approximations of the real
number line. Integers are counting numbers (1, 2, 3, ...) and can be negative. "Float" is shorthand for the fact that a
decimal place can occur anywhere in a string of numbers (i.e., it can "float"). In other words, float numeric variables have decimal
places. Floats can technically be integers (1 is an integer and a float) but integers cannot be a float (1.1 is not an integer,
but it is a float). All programming languages will at least have these two numeric data types. For example,
R uses decimal values by default. Even if variable x is assigned an integer, it is stored as a number with decimal precision. However, x can
be coerced to an integer type.
Beyond the basic integer and float, some programming languages offer variations of them
that have different memory requirements and levels of precision. In T-SQL for example, the variable type bigint
(short-hand
for "big integer") stores integers from -2^63 to 2^63-1 and requires 8 bytes of memory. The variable type int
, on
the other hand, requires 4 bytes and ranges from -2^31 to 2^31-1. tinyint
ranges from 0 to 255, which is equivalent to 1 byte. For those familiar
with binary, 8 bits equals 1 byte, and 8-bit encryption allows 256 possible values. For float variables, two types are common: single- and double-precision
floating-point. Single requires 4 bytes of memory while double requires 8 bytes. Some programs allow precision up to 16 bytes (128 bits).
You do not need to know bits and bytes in order to program effectively. However, in general, you need to know which types require more
memory and why you might need them (i.e., for more decimal precision). It is also important to understand the types and the ultimate
precision of your results. Consider the examples below from R and T-SQL. We're simply trying to divide 1/100, which represents 0.01, or 1%.
In R, the results are as we expect:
> 1/100
[1] 0.01
In T-SQL, they are not. The query below returns the result 0.
select 1/100
Why? R's default numeric variable type is a float. Therefore, the result can also be a float. T-SQL, on the other hand,
will "guess" your variable type. In the example provided, It encounters a 1 and a 100, both of which it reads as an
integer. It then returns the result as an integer, which has no decimal precision. So, you get 0. In order to get the
expected result, you need to coerce at least one of the numbers into a float. The code below returns the expected
result by coercing 1 to a float through the built-in cast
function.
select cast(1 as float)/100
Strings are usually used for non-numeric values, but virtually every character and symbol (and number) can be represented in a string. Strings are sometimes referred to as character(s). The following are all examples of strings. I use the double quotation mark (") in order to denote the start and stop of a string. The use of quotation marks to designate strings is common to almost every programming language.
"string"
"this is a string"
"stRing 123"
"string_123"
"123string"
"123"
Note in the examples above that there are generally no rules on strings: Anything can be a string.
For data purposes, strings are common to use for categorical data. For example, if a car dealership's inventory were stored
as data, one could imagine fields for vehicle characteristics such as color, make, and model. In these instances, these fields
would be programmed as string variables, as shown in the data set below:
Model Color
[1,] "Honda" "Blue"
[2,] "Honda" "Red"
[3,] "Toyota" "Black"
In this example, both of these fields are strings, as denoted by the double quotation marks. However, not all
programming languages will display quotation marks in output for string variables. This example was created in R,
which always displays quotation marks in output to designate strings. T-SQL and SAS output, on the other hand,
will not display the quotation marks.
> class("15")
[1] "character"
> class(15)
[1] "numeric"
> "15"+15
Error in "15" + 15 : non-numeric argument to binary operator
The first lines of code tell us that "15" is a character value and that 15 is a numeric value. The difference between
a string and a numeric in this instance is simply the use of the quotation marks. This is true of all programming
languages I've worked with. When these two variables are summed in R, an
error is returned letting us know that we're attempting to perform a mathematical operation on a non-numeric value,
which the program cannot evaluate.
select '1' + 1
This code returns the value 2 without errors or warnings. Also, note that a string in T-SQL is represented through a
single quotation mark (').
> "hello"=="Hello"
[1] FALSE
Therefore, in R, if you search the data
for "hello" and it is coded as "Hello", the search will not return any results. In other cases, like T-SQL and Excel VBA,
you can search the data for "hello" and it will return any results that match it regardless of case. This disregard for
case sensitivity is useful because you can find records that meet a criteria regardless of case.
If working with raw data in R, tidying the cases on string variables may be required where it is not necessarily required in a SQL
query. Of course, almost all raw data needs tidying to some degree, but it is important to know whether the programming
languages accounts for case with string variables. It is also important to know whether in the original data case did,
in fact, matter. Otherwise, your analysis may incorrectly assume equivalence between "b" and "B", for example, which
may not have been the intention of the person or system who created the data.
Boolean variables are used for programming logic. They have two possible values: True and False. Most programs involve checking conditions and performing tasks (or not performing tasks) based on whether one or more conditions are met. For example, imagine a Bank charges two different interest rates depending on a borrower's credit score. For a high credit customer with a credit score of at least 700, the bank will offer a more competitive rate of 3.5%. For those who do not meet that criterion, the rate is 4.0%. Imagine a system is designed to automatically assign the rate when the application is processed. This logic would involve:
Evaluating conditions and utilizing boolean variables are a critical component to programming more complex tasks and to answering more complex analytical questions.
Given the flexibility required to accomplish tasks with a programming language, each language will likely have a
wide variety of additional variable types.
One somewhat common variable type is a "catch all" variable type. For example, in the Excel VBA language,
variant
variable types allow the variables to work with different types of values. During the course of
a program execution, a variant variable type may at one point be a numeric, at another a string, and at yet another
a Worksheet object. In general, it is advisable that when the variable type is known in advance to use the
explicit variable type rather than a catch-all variable type such as variant.
type in the Excel VBA language
The most common other variable type is an object variable. Precisely what kinds of objects are available depends on the
programming language. For example, in R, I could have a variable that represents a vector or one that represents the output
of a linear regression (referred to as a lm
object). In Excel VBA, I could have a variable that represents
a cell, a range, a worksheet, or a workbook. As mentioned above, these object variables require the most specialized
programming-specific knowledge as they are usually only relevant to one language.
Functions are self-contained programs that translate inputs into outputs. Even with no programming experience,
you've probably already encountered functions. If I want to know the square root of 1,500 on a calculator,
I'll use the √x
button to return the answer. Pressing this button tells the calculator to perform
the mathematical operation x^(1/2)
(which itself is then a type of function). This conversion process
is a function.
The most common example of a function and how one might use it in a program language is through the Excel built-in
functions. Within a cell, I type the following: =SUM(2, 3, 4)
. This uses the SUM()
function
in order to sum a series of arguments, which in this example are the numbers 2, 3, 4. Evaluating the function will
return the result 9 in the cell. Excel has many, many more built-in functions, which you can see
here.
As the universe of possible Excel functions makes clear, built-in functions typically fall into different classes. For example,
there are string functions that parse or combine substrings (a string within a string). There are others that perform
statistical operations, such as returning a random sample from a normal distribution. And there are others yet that work with
dates, which is actually a very common programming requirement.
Outside of what a function is used for, there are two different types of functions: built-in and user-defined. The examples above
are all built-in functions. These functions are "built in" to the programming language and generally do not require
the programmer to do anything in order to use them. Therefore, when I pull up an instance of R, without loading any package,
I can type the following and return a result.
> max(1, 2, 3)
[1] 3
In this example, we can see that the max
function returns the maximum of the inputs, which is 3. Maximum functions are
built-in functions for most programming languages.
User-defined functions are functions built by the programmer. Many analytical and programming tasks require manipulation of
data under a precise set of rules of formulas. Consider the following example. Say that a rental application approval requires three
criteria to be met: (1) credit score > 600, (2) no history of bankruptcies, and (3) no felony charges. A simple function
can be created to automate the decisions. You could design a function to which you pass an applicant's name, and then the function evaluates
these three criteria for the applicant. If they are all true, the program returns "Approved", but if one is not true, the program returns "Denied."
Functions are frequently required for more complex analytical/programming tasks that require automation.
In many cases, logic is a necessary component of any program. In fact, in more ways than one, logic is the essence
of programming. Whether we realize it or not, our interactions with technology always involve a behind-the-scenes logic.
Through sequences of actions, we interact with programs in order to fulfill wants and needs. We can ask our Google Assistant
or Alexa what the weather will be like tomorrow, we can change the channel on the television, or we can order a book on
Amazon. All of these actions involve programming logic.
The foundation of logic is the If...Then...Else
. Logic involves checking a condition, such as "Is it raining outside?"
Depending on whether evaluation of the conidition is True or False, then we execute a decision. For example, if it is raining
outside, you should take your umbrella with you. If it is not, you should not take your umbrella. Logic is integrated into
programming in the same way that it is hardwired into our decision-making processes. In many ways, the human brain is a
metaphor for programming: Programs are artificial intelligence.
Extending the If...Then...Else
example above, logic in programming can be summarized with the following
psuedo-code:
If {condition} Then {decision} Else {decision}
Importantly, a {decision}
node may involve the decision to "do nothing," i.e., to take no action.
If {light="red"} Then {press the brake} Else If {light="green"} Then {proceed} Else {be careful}
The code above can be parsed as follows: If the light is red, then you will press the brake. The Else If
then provides directions
if the light is not red, which implies it is either yellow or green. Therefore, if the light is not red and if the light is
green, then you will proceed. The final Else
means that the light is not red or green, which implies it is yellow. In this case,
a driver has to "be careful": the driver has to make a quick judgment of the relative risk of stopping or proceeding.
This is difficult to program in a systematic way. However, given the advent of autonomous (driver-less) vehicles, this difficult
decision-making process is being actively translated into programming statements. And, as you can imagine, this is why
programming and analytics always involve ethical decisions.
If...Then...Else
logic.
Frequently more than one condition needs to be evaluated in order to make a decision. Before deciding whether you need to
take an umbrella with you, you may check whether it is currently raining OR whether rain is forecast in the future.
Only one of many conditions may need to be true in this instance in order to make the decision "take my umbrella".
This is an example of a compound condition.
There are two common keywords used with compound conditions: AND
and OR
. Some languages use
symbols for these keywords. For example, R uses &
in lieu of "AND". Nontheless, the uses of
AND
and OR
are equivalent across programming languages.
For those familiar with Venn Diagrams or probability, AND
represents an intersection
of two conditions. This means that both conditions joined by the AND
must be satisfied. You can also
join an infinite number of conditions with AND
.
{condition 1} AND {condition 2} AND {condition 3} ...
OR
, on the other hand, represents a union
of two conditions. This means that at least one condition joined by the OR
must be satisfied.
As with AND
, you can join an infinite number of conditions with OR
.
{condition 1} OR {condition 2} OR {condition 3} ...
Finally, you can join AND
and OR
in as many ways as required, but in these cases parentheses
(or some form of bracketing) is recommended.
{condition 1} AND ( {condition 2} OR {condition 3} )
In this example, condition 1 must be satisfied and either condition 2 or 3 must be satisfied.
AND
and OR
and the universe of possible results:
Condition X | Condition Y | Compound Condition | Result |
---|---|---|---|
T | T | X AND Y | T |
T | F | X AND Y | F |
F | T | X AND Y | F |
F | F | X AND Y | F |
T | T | X OR Y | T |
T | F | X OR Y | T |
F | T | X OR Y | T |
F | F | X OR Y | F |
Operators are symbols and keywords utilized to perform various operations in programming languages. There are
a wide array of operators for any given program and they are equally as likely to vary across programming languages
as they are to be the same. The tables below list the most common operators as well as their symbols/keywords
across the R, T-SQL, VBA, and SAS programming languages. Much like mathematical expressions, which adhere to the
order of operations, there is operational precedence for all types of operators. For example, in SAS, AND
operators are processed before OR
operators.
Operation | R | T-SQL | VBA | SAS |
---|---|---|---|---|
Addition | + | + | + | + |
Subtraction | - | - | - | - |
Mulitplication | * | * | * | * |
Division | / | / | / | / |
Exponentiation | ^ | ^ | ^ | ** |
Natural Log | log | log | log† | log |
†It is log in VBA but ln in Excel.
Operation | R | T-SQL | VBA | SAS |
---|---|---|---|---|
Less Than | < | < | < | < or LT |
Less Than or Equal To | <= | <= | <= | <= or LE |
Greater Than | > | > | > | > or GT |
Greater Than or Equal To/td> | >= | >= | >= | >= or GE |
Equal To | == | = | = | = or EQ |
Not Equal To | != | <> | <> | NE or ~= or ^= |
Operation | R | T-SQL | VBA | SAS |
---|---|---|---|---|
AND | & | AND | AND | & or AND |
OR | | | OR | OR | ! or | or OR |
NOT | ! | NOT | NOT | ^ or ~ or NOT |
When working with strings in programming, it is common to require concatenation to join them together. Concatenation simply refers to combining strings into a larger string, single string. The examples below show two or three strings and the result of concatenation. Double quotation marks are used by convention to represent strings.
String A | String B | String C | Concatenated String |
---|---|---|---|
"Hello" | "123" | "Hello123" | |
"Hello" | " " | "Matt" | "Hello Matt" |
"2018" | "" | "A" | "2018A" | "2018" | "02" | "01" | "20180201" |
From the examples above, there are a couple key points:
Concatenation is accomplished in very different ways across programming languages. The table below lists the programming language, a description of how to concatenate, and an example of concatenation in the language to join the strings "Hello", " ", and "Matt". As with most things in programming, there are in some cases alternative methods to concatenate strings. The methods below, however, will always accomplish the task.
Language | How to Concatenate | Concatenation Example |
---|---|---|
R | Concatenation is
performed through the concatenation function,
paste() . The sep=""
argument tells the function to not include a delimiter. |
paste("Hello", " ", "Matt", sep="") |
This example uses the default sep=" "
argument. The result is the same.
Note that in R, omission of an argument with a
default returns the default. |
paste("Hello", "Matt") |
|
T-SQL | Concatenation is achieved through the "+" symbol. Note that this operator is the same as the addition operator. | 'Hello' + ' ' + 'Matt' |
VBA | Concatenation is achieved through the "&" symbol. | "Hello" & " " & "Matt" |
SAS | Concatenation is achieved through the "||" symbols (double bars). Finally, SAS also has a number of concatenation functions. | 'Hello' || ' ' || 'Matt' |
Loops are not necessarily a feature of every program, but they are certainly a common feature of programming. As an analyst,
certain tasks will invevitably require a loop. A loop is simply a way to perform an iterative task repeatedly
(i.e., in a loop). In my experience, loops have the most concrete examples in Excel VBA. Imagine you have a
50 row x 50 column block of cells in a spreadsheet (i.e., 2,500 cells). For every cell, you want to check if the
value exceeds a pre-defined threshold and perform conditional formatting on the cell for when it does. You also
want to count how many cells in the 2,500 range satisfy this criteron.
As with nearly every programming task, there is more than one way to accomplish the task, some of which actually
do not require a loop. However, once you become familiar with the logic of loops and how they are programmed in a
particular language, you will default to a loop in order to solve tasks rather than what are usually more
complicated alternatives.
In the example of a 50 x 50 Excel range, one can easily establish the following logic to solve the task:
{Check if cell > threshold}
{Add 1 to a running total if cell > threshold}
{Move to next cell}
In most programming languages, the {Move to next cell}
portion wraps
the code that is meant to be repeated, either through a specific syntax or symbols. In programming,
wrapping usually refers to placing a block of code within a pre-defined structure
(referred to as a wrapper). For example, a wrapper function would wrap another function within it
(in this instance, it makes the original function more customizable.
A loop usually takes some variation on the pseudo-code below.
For [every cell in the 50 x 50 range]
{Check if cell > threshold}
{Add 1 to a running total if cell > threshold}
{continue until all cells are looped}
In the example above, the 2,500-line code that would explicitly perform the task can be condensed into 4 lines with the use of a loop. (In Excel VBA, it would also be written in 4 lines). All programming languages have loops available, but they usually involve different symbols and syntax. Regardless, loops are a simple and efficient way to accomplish iterative tasks.
Formatting code does not impact how the code is processed, but it does have a considerable impact on the
"governance" of the code, including its readability by other programmers/users as well as
validation that the code is performing as intended. Most importantly, formatting your code helps
you work within your own code. I consider myself a notoriously unorganized coder, and I always marvel
at organized code because it is highly readable.
While there are not necessarily any "rules" for formatting code, there tend to be some standards:
mtx
may identify a matrix variable in R, and dblTemp
may refer to a temporary
numeric variable that has double-point precision.
# R
# A comment in R is the number/pound (#) symbol.
x = 1 #comments can be inserted in-line
-----
' VBA
' A comment in Excel VBA is the single apostrophe (') symbol.
x = 1 'comments can be inserted in-line
-----
-- T-SQL
-- A comment in Excel T-SQL is two dashes (--).
select 1+1 --comments can be inserted in-line
-----
* SAS;
* There are two types of comments in SAS
The first is an asterisk (*) followed by a semi-colon (;);
/*The second begins with a foward slash (/) and asterisk (*)
and ends with an asterisk and foward slash /*
data temp /* only the second type can be used in-line */;
data temp; * in-line after the statement is recommended;
run; /* this way, errors are less likely to occur */
Indentation of code allows "blocks" of text that span multiple lines to be visually set apart from the remainder of the code. These indented blocks of code are usually used for code that executes loops or other statements that require several statements. An example of a loop in the R language is shown below with indendation.
for (i in 1:10) {
x[i] = i*2
y[i] = i*3+2
}
An example of an If block in Excel VBA is shown below.
If rng > 0 Then
x = rng*2
y = rng*3+2
Else
x = rng
y=rng*2+2
End If
Coders always encounter programming errors. Without understanding that errors are a part of programming,
a first-time coder can easily find them frustrating if not downright insurmountable. "Debugging" errors
is one of the key components of becoming a coder.
First, it is important to understand why errors occur in the first place. There are three reasons for errors:
Combinations of (1) and (3) or (2) and (3) will almost always result in extended debugging processes. The best
method when dealing with something completely unfamiliar is to begin with error-free code (an example from the
internet or class). Then, modify piece-by-piece while handling each error as it occurs. If trying to run a code chunk
with many errors, you will encounter more than one type of error through the debugging process. For a beginning
coder, this may be the point at which you throw your hands in the air and say there is no use.
For the veteran coder, a change in the error after a code change is new, and usually useful, information. Even then,
the veteran coder may curse under his or her breath.
The point of all this rambling is:
I have seen beginning coders never produce an error-free script. This is partly due to unfamiliarity, but it is also due to their attitude toward errors. Write the entire program and then run the code to see what happens is never a good route. Break the task/program into bite-sized chunks that you tackle individually and sequentially. Code downstream from an error will almost always produce more errors:
> x <- "1"
> z <- x*2
Error in x * 2 : non-numeric argument to binary operator
> print(z)
Error in print(z) : object 'z' not found
In the R example above, x is stored as the value 1, but it is a string. Therefore, when multiplied by 2, you get an error based
on incompatible types. The third error only results from the fact that the second statement failed (i.e., nothing was
assigned to the variable z
. Otherwise, there is no error:
> x <- 1
> z <- x*2
> print(z)
[1] 2
Without accounting for the first error, you cannot account for subsequent errors. Therefore, you must always
debug errors in the order in which they occur.
The advice above is the most important part of understanding the debugging process, together with: (1) have patience
and (2) Google often. Even though I am (mostly) patient and a proficient Googler, I have had to spend hours at times on
the most elusive bugs. This is the nature of programming.
Finally, there are types of errors that can help you in the debugging process. I think of them in 3 groups, shown
below. There are others out there, but these are the main culprits.
Here's a syntax error in R because the as.character
function has an improperly capitalized "c":
> as.Character(1)
Error in as.Character(1) : could not find function "as.Character"
Here's a syntax error in Excel VBA:
Note that the error message received in Excel VBA is somewhat more straightforward than the error message in R. Also, Excel VBA turns the text to the color red when there is a syntax error.
x*1.5
where x
is your gross business. Without any other errors, the code would execute and spit out an attrition assumption.
However, the attrition would be 1.5 times your gross business, which would imply more lost business than you
have. In certain applications, this error may be easy to spot, but in other applications or with more subtle errors,
it may go undetected without thorough code testing. Therefore, the best way to prevent logic errors is to test your
code and test it often. I've also found a second set of eyes (such as my manager) helpful as well. Don't accept
a counter-intuitive answer without making sure there aren't any obvious logic errors.