Working with Scalars
Scalar data are the most basic in Perl. Each scalar datum is logically a single entity. Scalars can be strings of characters or numbers. In Perl, you write literal scalar strings like this:
For example, the strings "foobar" and 'baz' are scalar data. The numbers 3, 3.5 and -1 are also scalar data.
Strings are always enclosed in some sort of quoting, the most common of which are single quotes, ", and and double quotes, "". We’ll talk later about how these differ, but for now, you can keep in mind that any string of characters inside either type of quotes are scalar string data.
Numbers are always written without quotes. Any numeric sequence without quotes are scalar number data.
In this chapter, we will take a look at the variety of scalar data available in Perl, the way to store them in variables, how to operate on them, and how to output them.
Strings
Any sequence of @acronym{ASCII} characters put together as one unit, is a string. So, the word the is a string. This sentence is a string. Even this entire paragraph is a string. In fact, you could consider the text of this entire book as one string.
Strings can be of any length and can contain any characters, numbers, punctuation, special characters (like `!’, `#’, and `%’), and even characters in natural languages besides English In addition, a string can contain special @acronym{ASCII} formatting characters like newline, tab, and the “bell” character. We will discuss special characters more later on. For now, we will begin our consideration of strings by considering how to insert literal strings into a Perl program.
To begin our discussion of strings in Perl, we will consider how to work with “string literals” in Perl. The word literal here refers to the fact that these are used when you want to type a string directly to Perl. This can be contrasted with storing a string in a variable.
Any string literal can be used as an expression. We will find this useful when we want to store string literals in variables. However, for now, we will simply consider the different types of string literals that one can make in Perl. Later, we will learn how to assign these string literals to variables (see section Scalar Variables).
Single-quoted Strings
String literals can be represented in primarily three ways in Perl. The first way is in single quotes. Single quotes can be used to make sure that nearly all special characters that might be interpreted differently are taken at “face value”. If that concept is confusing to you, just think about single quoted strings as being, for the most part, “what you see is what you get”. Consider the following single-quoted string:
'io'; # The string 'io'
This represents a string consisting of the character `i’, followed by `\’, followed by `o’. However, it is probably easier just to think of the string as @string{i\o}. Some other languages require you think of strings not as single chunks of data, but as some aggregation of a set of characters. Perl does not work this way. A string is a simple, single unit that can be as long as you would like.(2)
Note in our example above that 'i\o' is an expression. Like all expressions, it evaluates to something. In this case, it evaluates to the string value, i\o. Note that we made the expression 'i\o' into a statement, by putting a semi-colon at the end ('i\o'
. This particular statement does not actually perform any action in Perl, but it is still a valid Perl statement nonetheless.
Special Characters in Single-quoted Strings
There are two characters in single quoted strings that do not always represent themselves. This is due to necessity, since single-quoted strings start and end with the `” character. We need a way to express inside a single-quoted string that we want the string to contain a `” character.
The solution to this problem is to preceded any `” characters we actually want to appear in the string itself with the backslash (`\’ character). Thus we have strings like this:
'xxx'xxx'; # xxx, a single-quote character, and then xxx
We have in this example a string with 7 characters exactly. Namely, this is the string: xxx'xxx. It can be difficult at first to become accustomed to the idea that two characters in the input to Perl actually produce only one character in the string itself. (3) However, just keep in mind the rules and you will probably get used to them quickly.
Since we have used the `\’ character to do something special with the `” character, we must now worry about the special cases for the backslash character itself. When we see a `\’ character in a single-quoted string, we must carefully consider what will happen.
Under most circumstances, when a `\’ is in a single-quoted string, it is simply a backslash, representing itself, as most other characters do. However, the following exceptions apply:
- The sequence `\” yields the character `” in the actual string. (This is the exception we already discussed above).
- The sequence `\\’ yields the character `\’ in the actual string. In other words, two backslashes right next to each other actually yield only one backslash.
- A backslash, by itself, cannot be placed at the end of a the single-quoted string. This cannot happen because Perl will think that you are using the `\’ to escape the closing `”.
The following examples exemplify the various exceptions, and use them properly:
'I don't think so.'; # Note the ' inside is escaped with 'Need a (backslash) or ?'; # The gives us , as does 'You can do this: '; # A single backslash at the end 'Three 's: "\\"'; # There are three chars between ""
In the last example, note that the resulting string is Three \'s: "\\\". If you can follow that example, you have definitely mastered how single-quoted strings work!
Newlines in Single-quoted Strings
Note that there is no rule against having a single-quoted string span several lines. When you do this, the string has newline characters embedded in it.
A newline character is a special ASCII character that indicates that a new line should be started. In a text editor, or when printing output to the screen, this usually indicates that the cursor should move from the end of the current line to the first position on the line following it.
Since Perl permits the placement of these newline characters directly into single quoted strings, we are permitted to do the following:
'Time to start anew.'; # Represents the single string composed of: # 'Time to' followed by a newline, followed by # 'start anew.'
This string has a total of twenty characters. The first seven are Time to. The next character following that is a newline. Then, the eleven characters, start anew. follow. Note again that this is one string, with a newline as its eighth character.
Further, note that we are not permitted to put a comment in the middle of the string, even though we are usually allowed to place a `#’ anywhere on the line and have the rest of the line be a comment. We cannot do this here, since we have yet to terminate our single-quoted string with a `”, and thus, any `#’ character and comment following it would actually become part of the single-quoted string! Remember that single-quotes strings are delimited by `” at the beginning, and `” at the end, and everything in between is considered part of the string, included newlines, `#’ characters and anything else.
Examples of Invalid Single-quoted Strings
In finishing our discussion of singled-quoted strings, consider these examples of strings that are not legal because they violate the exceptions we talked about above:
'You cannot do this: '; # INVALID: the ending cannot be alone 'It is 5 o'clock!' # INVALID: the ' in o'clock should be escaped 'Three 's: \'; # INVALID: the final escapes the ', thus # the literal is not terminated 'This is my string; # INVALID: missing close quote
Sometimes, when you have invalid string literals such as in the example above, the error message that Perl gives is not particularly intuitive. However, when you see error messages such as:
(Might be a runaway multi-line '' string starting on line X) Bareword found where operator expected Bareword "foo" not allowed while "strict subs" in use
It is often an indication that you have runaway or invalid strings. Keep an eye out for these problems. Chances are, you will forget and violate one of the rules for single-quoted strings eventually, and then need to determine why you are unable to run your Perl program.
A Digression–The print Function
Before we move on to our consideration of double-quoted strings, it is necessary to first consider a small digression. We know how to represent strings in Perl, but, as you may have noticed, the examples we have given thus far do not do anything interesting. If you try placing the statements that we listed as examples in section Single-quoted Strings, into a full Perl program, like this:
#!/usr/bin/perl use strict; use warnings; 'Three 's: "\\"'; # There are three chars between "" 'xxx'xxx'; # xxx, a single-quote character, and then xxx 'Time to start anew.';
you probably noticed that nothing of interest happens. Perl gladly runs this program, but it produces no output.
Thus, to begin to work with strings in Perl beyond simple hypothetical considerations, we need a way to have Perl display our strings for us. The canonical way of accomplishing this in Perl is to use the @builtin{print} function.
The @builtin{print} function in Perl can be used in a variety of ways. The simplest form is to use the statement print STRING;, where STRING is any valid Perl string.
So, to reconsider our examples, instead of simply listing the strings, we could instead print each one out:
#!/usr/bin/perl use strict; use warnings; print 'Three 's: "\\"'; # Print first string print 'xxx'xxx'; # Print the second print 'Time to start anew. '; # Print last string, with a newline at the end
This program will produce output. When run, the output goes to what is called the standard output. This is usually the terminal, console or window in which you run the Perl program. In the case of the program above, the output to the standard output is as follows:
Three 's: "\"xxx'xxxTime to start anew.
Note that a newline is required to break up the lines. Thus, you need to put a newline at the end of every valid string if you want your string to be the last thing on that line in the output.
Note that it is particularly important to put a newline on the end of the last string of your output. If you do not, often times, the command prompt for the command interpreter that you are using may run together with your last line of output, and this can be very disorienting. So, always remember to place a newline at the end of each line, particularly on your last line of output.
Finally, you may have noticed that formatting your code with newlines in the middle of single-quoted strings hurts readability. Since you are inside a single-quoted string, you cannot change the format of the continued lines within the print statement, nor put comments at the ends of those lines because that would insert data into your single-quoted strings. To handle newlines more elegantly, you should use double-quoted strings, which are the topic of the next section.
Double-quoted Strings
Double-quoted strings are another way of representing scalar string literals in Perl. Like single-quoted strings, you place a group of @acronym{ASCII} characters between two delimiters (in this case, our delimiter is `”‘). However, something called interpolation happens when you use a double-quoted string.
Interpolation in Double-quoted Strings
Interpolation is a special process whereby certain special strings written in @acronym{ASCII} are replaced by something different. In section Single-quoted Strings, we noted that certain sequences in single-quoted strings (namely, \\ and \') were treated differently. This is very similar to what happens with interpolation. For example, in interpolated double-quoted strings, various sequences preceded by a `\’ character act different.
Here is a chart of the most common of these:
| String | Interpolated As |
| `\\’ | an actual, single backslash character |
| `\$’ | a single $ character |
| `\@’ | a single @ character |
| `\t’ | tab |
| `\n’ | newline |
| `\r’ | hard return |
| `\f’ | form feed |
| `\b’ | backspace |
| `\a’ | alarm (bell) |
| `\e’ | escape |
| `\033′ | character represented by octal value, 033 |
| `\x1b’ | character represented by hexadecimal value, 1b |
Examples of Interpolation
Let us consider an example that uses a few of these characters:
#!/usr/bin/perl use strict; use warnings; print "A backslash: n"; print "Tab follows:tover heren"; print "Ring! an"; print "Please pay bkuhn@ebb.org $20.n";
This program, when run, produces the following output on the screen:
A backslash: Tab follows: over here Ring! Please pay bkuhn@ebb.org $20.
In addition, when running, you should hear the computer beep. That is the output of the `\a’ character, which you cannot see on the screen. However, you should be able to hear it.
Notice that the `\n’ character ends a line. `\n’ should always be used to end a line. Those students familiar with the C language will be used to using this sequence to mean newline. When writing Perl, the word newline and the `\n’ character are roughly synonymous.
Examples of Interpolation (ASCII Octal Values)
With the exception of `\n’, you should note that the interpolated sequences are simply shortcuts for actually @acronym{ASCII} characters that can be expressed in other ways. Specifically, you are permitted to use the actual @acronym{ASCII} codes (in octal or hexadecimal) to represent characters. To exemplify this, consider the following program:
#!/usr/bin/perl use strict; use warnings; print "A backslash: 134n"; print "Tab follows:11over heren"; print "Ring! 7n"; print "Please pay bkuhn100ebb.org