| Server IP : 170.10.162.208 / Your IP : 216.73.216.181 Web Server : LiteSpeed System : Linux altar19.supremepanel19.com 4.18.0-553.69.1.lve.el8.x86_64 #1 SMP Wed Aug 13 19:53:59 UTC 2025 x86_64 User : deltahospital ( 1806) PHP Version : 7.4.33 Disable Function : NONE MySQL : OFF | cURL : ON | WGET : ON | Perl : ON | Python : ON | Sudo : OFF | Pkexec : OFF Directory : /tmp/ |
Upload File : |
=encoding utf8
=head1 NAME
perlunicook - cookbookish examples of handling Unicode in Perl
=head1 DESCRIPTION
This manpage contains short recipes demonstrating how to handle common Unicode
operations in Perl, plus one complete program at the end. Any undeclared
variables in individual recipes are assumed to have a previous appropriate
value in them.
=head1 EXAMPLES
=head2 ℞ 0: Standard preamble
Unless otherwise notes, all examples below require this standard preamble
to work correctly, with the C<#!> adjusted to work on your system:
#!/usr/bin/env perl
use utf8; # so literals and identifiers can be in UTF-8
use v5.12; # or later to get "unicode_strings" feature
use strict; # quote strings, declare variables
use warnings; # on by default
use warnings qw(FATAL utf8); # fatalize encoding glitches
use open qw(:std :encoding(UTF-8)); # undeclared streams in UTF-8
use charnames qw(:full :short); # unneeded in v5.16
This I<does> make even Unix programmers C<binmode> your binary streams,
or open them with C<:raw>, but that's the only way to get at them
portably anyway.
B<WARNING>: C<use autodie> (pre 2.26) and C<use open> do not get along with each
other.
=head2 ℞ 1: Generic Unicode-savvy filter
Always decompose on the way in, then recompose on the way out.
use Unicode::Normalize;
while (<>) {
$_ = NFD($_); # decompose + reorder canonically
...
} continue {
print NFC($_); # recompose (where possible) + reorder canonically
}
=head2 ℞ 2: Fine-tuning Unicode warnings
As of v5.14, Perl distinguishes three subclasses of UTF‑8 warnings.
use v5.14; # subwarnings unavailable any earlier
no warnings "nonchar"; # the 66 forbidden non-characters
no warnings "surrogate"; # UTF-16/CESU-8 nonsense
no warnings "non_unicode"; # for codepoints over 0x10_FFFF
=head2 ℞ 3: Declare source in utf8 for identifiers and literals
Without the all-critical C<use utf8> declaration, putting UTF‑8 in your
literals and identifiers won’t work right. If you used the standard
preamble just given above, this already happened. If you did, you can
do things like this:
use utf8;
my $measure = "Ångström";
my @μsoft = qw( cp852 cp1251 cp1252 );
my @ὑπέρμεγας = qw( ὑπέρ μεγας );
my @鯉 = qw( koi8-f koi8-u koi8-r );
my $motto = "👪 💗 🐪"; # FAMILY, GROWING HEART, DROMEDARY CAMEL
If you forget C<use utf8>, high bytes will be misunderstood as
separate characters, and nothing will work right.
=head2 ℞ 4: Characters and their numbers
The C<ord> and C<chr> functions work transparently on all codepoints,
not just on ASCII alone — nor in fact, not even just on Unicode alone.
# ASCII characters
ord("A")
chr(65)
# characters from the Basic Multilingual Plane
ord("Σ")
chr(0x3A3)
# beyond the BMP
ord("𝑛") # MATHEMATICAL ITALIC SMALL N
chr(0x1D45B)
# beyond Unicode! (up to MAXINT)
ord("\x{20_0000}")
chr(0x20_0000)
=head2 ℞ 5: Unicode literals by character number
In an interpolated literal, whether a double-quoted string or a
regex, you may specify a character by its number using the
C<\x{I<HHHHHH>}> escape.
String: "\x{3a3}"
Regex: /\x{3a3}/
String: "\x{1d45b}"
Regex: /\x{1d45b}/
# even non-BMP ranges in regex work fine
/[\x{1D434}-\x{1D467}]/
=head2 ℞ 6: Get character name by number
use charnames ();
my $name = charnames::viacode(0x03A3);
=head2 ℞ 7: Get character number by name
use charnames ();
my $number = charnames::vianame("GREEK CAPITAL LETTER SIGMA");
=head2 ℞ 8: Unicode named characters
Use the C<< \N{I<charname>} >> notation to get the character
by that name for use in interpolated literals (double-quoted
strings and regexes). In v5.16, there is an implicit
use charnames qw(:full :short);
But prior to v5.16, you must be explicit about which set of charnames you
want. The C<:full> names are the official Unicode character name, alias, or
sequence, which all share a namespace.
use charnames qw(:full :short latin greek);
"\N{MATHEMATICAL ITALIC SMALL N}" # :full
"\N{GREEK CAPITAL LETTER SIGMA}" # :full
Anything else is a Perl-specific convenience abbreviation. Specify one or
more scripts by names if you want short names that are script-specific.
"\N{Greek:Sigma}" # :short
"\N{ae}" # latin
"\N{epsilon}" # greek
The v5.16 release also supports a C<:loose> import for loose matching of
character names, which works just like loose matching of property names:
that is, it disregards case, whitespace, and underscores:
"\N{euro sign}" # :loose (from v5.16)
=head2 ℞ 9: Unicode named sequences
These look just like character names but return multiple codepoints.
Notice the C<%vx> vector-print functionality in C<printf>.
use charnames qw(:full);
my $seq = "\N{LATIN CAPITAL LETTER A WITH MACRON AND GRAVE}";
printf "U+%v04X\n", $seq;
U+0100.0300
=head2 ℞ 10: Custom named characters
Use C<:alias> to give your own lexically scoped nicknames to existing
characters, or even to give unnamed private-use characters useful names.
use charnames ":full", ":alias" => {
ecute => "LATIN SMALL LETTER E WITH ACUTE",
"APPLE LOGO" => 0xF8FF, # private use character
};
"\N{ecute}"
"\N{APPLE LOGO}"
=head2 ℞ 11: Names of CJK codepoints
Sinograms like “東京” come back with character names of
C<CJK UNIFIED IDEOGRAPH-6771> and C<CJK UNIFIED IDEOGRAPH-4EAC>,
because their “names” vary. The CPAN C<Unicode::Unihan> module
has a large database for decoding these (and a whole lot more), provided you
know how to understand its output.
# cpan -i Unicode::Unihan
use Unicode::Unihan;
my $str = "東京";
my $unhan = Unicode::Unihan->new;
for my $lang (qw(Mandarin Cantonese Korean JapaneseOn JapaneseKun)) {
printf "CJK $str in %-12s is ", $lang;
say $unhan->$lang($str);
}
prints:
CJK 東京 in Mandarin is DONG1JING1
CJK 東京 in Cantonese is dung1ging1
CJK 東京 in Korean is TONGKYENG
CJK 東京 in JapaneseOn is TOUKYOU KEI KIN
CJK 東京 in JapaneseKun is HIGASHI AZUMAMIYAKO
If you have a specific romanization scheme in mind,
use the specific module:
# cpan -i Lingua::JA::Romanize::Japanese
use Lingua::JA::Romanize::Japanese;
my $k2r = Lingua::JA::Romanize::Japanese->new;
my $str = "東京";
say "Japanese for $str is ", $k2r->chars($str);
prints
Japanese for 東京 is toukyou
=head2 ℞ 12: Explicit encode/decode
On rare occasion, such as a database read, you may be
given encoded text you need to decode.
use Encode qw(encode decode);
my $chars = decode("shiftjis", $bytes, 1);
# OR
my $bytes = encode("MIME-Header-ISO_2022_JP", $chars, 1);
For streams all in the same encoding, don't use encode/decode; instead
set the file encoding when you open the file or immediately after with
C<binmode> as described later below.
=head2 ℞ 13: Decode program arguments as utf8
$ perl -CA ...
or
$ export PERL_UNICODE=A
or
use Encode qw(decode);
@ARGV = map { decode('UTF-8', $_, 1) } @ARGV;
=head2 ℞ 14: Decode program arguments as locale encoding
# cpan -i Encode::Locale
use Encode qw(locale);
use Encode::Locale;
# use "locale" as an arg to encode/decode
@ARGV = map { decode(locale => $_, 1) } @ARGV;
=head2 ℞ 15: Declare STD{IN,OUT,ERR} to be utf8
Use a command-line option, an environment variable, or else
call C<binmode> explicitly:
$ perl -CS ...
or
$ export PERL_UNICODE=S
or
use open qw(:std :encoding(UTF-8));
or
binmode(STDIN, ":encoding(UTF-8)");
binmode(STDOUT, ":utf8");
binmode(STDERR, ":utf8");
=head2 ℞ 16: Declare STD{IN,OUT,ERR} to be in locale encoding
# cpan -i Encode::Locale
use Encode;
use Encode::Locale;
# or as a stream for binmode or open
binmode STDIN, ":encoding(console_in)" if -t STDIN;
binmode STDOUT, ":encoding(console_out)" if -t STDOUT;
binmode STDERR, ":encoding(console_out)" if -t STDERR;
=head2 ℞ 17: Make file I/O default to utf8
Files opened without an encoding argument will be in UTF-8:
$ perl -CD ...
or
$ export PERL_UNICODE=D
or
use open qw(:encoding(UTF-8));
=head2 ℞ 18: Make all I/O and args default to utf8
$ perl -CSDA ...
or
$ export PERL_UNICODE=SDA
or
use open qw(:std :encoding(UTF-8));
use Encode qw(decode);
@ARGV = map { decode('UTF-8', $_, 1) } @ARGV;
=head2 ℞ 19: Open file with specific encoding
Specify stream encoding. This is the normal way
to deal with encoded text, not by calling low-level
functions.
# input file
open(my $in_file, "< :encoding(UTF-16)", "wintext");
OR
open(my $in_file, "<", "wintext");
binmode($in_file, ":encoding(UTF-16)");
THEN
my $line = <$in_file>;
# output file
open($out_file, "> :encoding(cp1252)", "wintext");
OR
open(my $out_file, ">", "wintext");
binmode($out_file, ":encoding(cp1252)");
THEN
print $out_file "some text\n";
More layers than just the encoding can be specified here. For example,
the incantation C<":raw :encoding(UTF-16LE) :crlf"> includes implicit
CRLF handling.
=head2 ℞ 20: Unicode casing
Unicode casing is very different from ASCII casing.
uc("henry ⅷ") # "HENRY Ⅷ"
uc("tschüß") # "TSCHÜSS" notice ß => SS
# both are true:
"tschüß" =~ /TSCHÜSS/i # notice ß => SS
"Σίσυφος" =~ /ΣΊΣΥΦΟΣ/i # notice Σ,σ,ς sameness
=head2 ℞ 21: Unicode case-insensitive comparisons
Also available in the CPAN L<Unicode::CaseFold> module,
the new C<fc> “foldcase” function from v5.16 grants
access to the same Unicode casefolding as the C</i>
pattern modifier has always used:
use feature "fc"; # fc() function is from v5.16
# sort case-insensitively
my @sorted = sort { fc($a) cmp fc($b) } @list;
# both are true:
fc("tschüß") eq fc("TSCHÜSS")
fc("Σίσυφος") eq fc("ΣΊΣΥΦΟΣ")
=head2 ℞ 22: Match Unicode linebreak sequence in regex
A Unicode linebreak matches the two-character CRLF
grapheme or any of seven vertical whitespace characters.
Good for dealing with textfiles coming from different
operating systems.
\R
s/\R/\n/g; # normalize all linebreaks to \n
=head2 ℞ 23: Get character category
Find the general category of a numeric codepoint.
use Unicode::UCD qw(charinfo);
my $cat = charinfo(0x3A3)->{category}; # "Lu"
=head2 ℞ 24: Disabling Unicode-awareness in builtin charclasses
Disable C<\w>, C<\b>, C<\s>, C<\d>, and the POSIX
classes from working correctly on Unicode either in this
scope, or in just one regex.
use v5.14;
use re "/a";
# OR
my($num) = $str =~ /(\d+)/a;
Or use specific un-Unicode properties, like C<\p{ahex}>
and C<\p{POSIX_Digit>}. Properties still work normally
no matter what charset modifiers (C</d /u /l /a /aa>)
should be effect.
=head2 ℞ 25: Match Unicode properties in regex with \p, \P
These all match a single codepoint with the given
property. Use C<\P> in place of C<\p> to match
one codepoint lacking that property.
\pL, \pN, \pS, \pP, \pM, \pZ, \pC
\p{Sk}, \p{Ps}, \p{Lt}
\p{alpha}, \p{upper}, \p{lower}
\p{Latin}, \p{Greek}
\p{script_extensions=Latin}, \p{scx=Greek}
\p{East_Asian_Width=Wide}, \p{EA=W}
\p{Line_Break=Hyphen}, \p{LB=HY}
\p{Numeric_Value=4}, \p{NV=4}
=head2 ℞ 26: Custom character properties
Define at compile-time your own custom character
properties for use in regexes.
# using private-use characters
sub In_Tengwar { "E000\tE07F\n" }
if (/\p{In_Tengwar}/) { ... }
# blending existing properties
sub Is_GraecoRoman_Title {<<'END_OF_SET'}
+utf8::IsLatin
+utf8::IsGreek
&utf8::IsTitle
END_OF_SET
if (/\p{Is_GraecoRoman_Title}/ { ... }
=head2 ℞ 27: Unicode normalization
Typically render into NFD on input and NFC on output. Using NFKC or NFKD
functions improves recall on searches, assuming you've already done to the
same text to be searched. Note that this is about much more than just pre-
combined compatibility glyphs; it also reorders marks according to their
canonical combining classes and weeds out singletons.
use Unicode::Normalize;
my $nfd = NFD($orig);
my $nfc = NFC($orig);
my $nfkd = NFKD($orig);
my $nfkc = NFKC($orig);
=head2 ℞ 28: Convert non-ASCII Unicode numerics
Unless you’ve used C</a> or C</aa>, C<\d> matches more than
ASCII digits only, but Perl’s implicit string-to-number
conversion does not current recognize these. Here’s how to
convert such strings manually.
use v5.14; # needed for num() function
use Unicode::UCD qw(num);
my $str = "got Ⅻ and ४५६७ and ⅞ and here";
my @nums = ();
while ($str =~ /(\d+|\N)/g) { # not just ASCII!
push @nums, num($1);
}
say "@nums"; # 12 4567 0.875
use charnames qw(:full);
my $nv = num("\N{RUMI DIGIT ONE}\N{RUMI DIGIT TWO}");
=head2 ℞ 29: Match Unicode grapheme cluster in regex
Programmer-visible “characters” are codepoints matched by C</./s>,
but user-visible “characters” are graphemes matched by C</\X/>.
# Find vowel *plus* any combining diacritics,underlining,etc.
my $nfd = NFD($orig);
$nfd =~ / (?=[aeiou]) \X /xi
=head2 ℞ 30: Extract by grapheme instead of by codepoint (regex)
# match and grab five first graphemes
my($first_five) = $str =~ /^ ( \X{5} ) /x;
=head2 ℞ 31: Extract by grapheme instead of by codepoint (substr)
# cpan -i Unicode::GCString
use Unicode::GCString;
my $gcs = Unicode::GCString->new($str);
my $first_five = $gcs->substr(0, 5);
=head2 ℞ 32: Reverse string by grapheme
Reversing by codepoint messes up diacritics, mistakenly converting
C<crème brûlée> into C<éel̂urb em̀erc> instead of into C<eélûrb emèrc>;
so reverse by grapheme instead. Both these approaches work
right no matter what normalization the string is in:
$str = join("", reverse $str =~ /\X/g);
# OR: cpan -i Unicode::GCString
use Unicode::GCString;
$str = reverse Unicode::GCString->new($str);
=head2 ℞ 33: String length in graphemes
The string C<brûlée> has six graphemes but up to eight codepoints.
This counts by grapheme, not by codepoint:
my $str = "brûlée";
my $count = 0;
while ($str =~ /\X/g) { $count++ }
# OR: cpan -i Unicode::GCString
use Unicode::GCString;
my $gcs = Unicode::GCString->new($str);
my $count = $gcs->length;
=head2 ℞ 34: Unicode column-width for printing
Perl’s C<printf>, C<sprintf>, and C<format> think all
codepoints take up 1 print column, but many take 0 or 2.
Here to show that normalization makes no difference,
we print out both forms:
use Unicode::GCString;
use Unicode::Normalize;
my @words = qw/crème brûlée/;
@words = map { NFC($_), NFD($_) } @words;
for my $str (@words) {
my $gcs = Unicode::GCString->new($str);
my $cols = $gcs->columns;
my $pad = " " x (10 - $cols);
say str, $pad, " |";
}
generates this to show that it pads correctly no matter
the normalization:
crème |
crème |
brûlée |
brûlée |
=head2 ℞ 35: Unicode collation
Text sorted by numeric codepoint follows no reasonable alphabetic order;
use the UCA for sorting text.
use Unicode::Collate;
my $col = Unicode::Collate->new();
my @list = $col->sort(@old_list);
See the I<ucsort> program from the L<Unicode::Tussle> CPAN module
for a convenient command-line interface to this module.
=head2 ℞ 36: Case- I<and> accent-insensitive Unicode sort
Specify a collation strength of level 1 to ignore case and
diacritics, only looking at the basic character.
use Unicode::Collate;
my $col = Unicode::Collate->new(level => 1);
my @list = $col->sort(@old_list);
=head2 ℞ 37: Unicode locale collation
Some locales have special sorting rules.
# either use v5.12, OR: cpan -i Unicode::Collate::Locale
use Unicode::Collate::Locale;
my $col = Unicode::Collate::Locale->new(locale => "de__phonebook");
my @list = $col->sort(@old_list);
The I<ucsort> program mentioned above accepts a C<--locale> parameter.
=head2 ℞ 38: Making C<cmp> work on text instead of codepoints
Instead of this:
@srecs = sort {
$b->{AGE} <=> $a->{AGE}
||
$a->{NAME} cmp $b->{NAME}
} @recs;
Use this:
my $coll = Unicode::Collate->new();
for my $rec (@recs) {
$rec->{NAME_key} = $coll->getSortKey( $rec->{NAME} );
}
@srecs = sort {
$b->{AGE} <=> $a->{AGE}
||
$a->{NAME_key} cmp $b->{NAME_key}
} @recs;
=head2 ℞ 39: Case- I<and> accent-insensitive comparisons
Use a collator object to compare Unicode text by character
instead of by codepoint.
use Unicode::Collate;
my $es = Unicode::Collate->new(
level => 1,
normalization => undef
);
# now both are true:
$es->eq("García", "GARCIA" );
$es->eq("Márquez", "MARQUEZ");
=head2 ℞ 40: Case- I<and> accent-insensitive locale comparisons
Same, but in a specific locale.
my $de = Unicode::Collate::Locale->new(
locale => "de__phonebook",
);
# now this is true:
$de->eq("tschüß", "TSCHUESS"); # notice ü => UE, ß => SS
=head2 ℞ 41: Unicode linebreaking
Break up text into lines according to Unicode rules.
# cpan -i Unicode::LineBreak
use Unicode::LineBreak;
use charnames qw(:full);
my $para = "This is a super\N{HYPHEN}long string. " x 20;
my $fmt = Unicode::LineBreak->new;
print $fmt->break($para), "\n";
=head2 ℞ 42: Unicode text in DBM hashes, the tedious way
Using a regular Perl string as a key or value for a DBM
hash will trigger a wide character exception if any codepoints
won’t fit into a byte. Here’s how to manually manage the translation:
use DB_File;
use Encode qw(encode decode);
tie %dbhash, "DB_File", "pathname";
# STORE
# assume $uni_key and $uni_value are abstract Unicode strings
my $enc_key = encode("UTF-8", $uni_key, 1);
my $enc_value = encode("UTF-8", $uni_value, 1);
$dbhash{$enc_key} = $enc_value;
# FETCH
# assume $uni_key holds a normal Perl string (abstract Unicode)
my $enc_key = encode("UTF-8", $uni_key, 1);
my $enc_value = $dbhash{$enc_key};
my $uni_value = decode("UTF-8", $enc_value, 1);
=head2 ℞ 43: Unicode text in DBM hashes, the easy way
Here’s how to implicitly manage the translation; all encoding
and decoding is done automatically, just as with streams that
have a particular encoding attached to them:
use DB_File;
use DBM_Filter;
my $dbobj = tie %dbhash, "DB_File", "pathname";
$dbobj->Filter_Value("utf8"); # this is the magic bit
# STORE
# assume $uni_key and $uni_value are abstract Unicode strings
$dbhash{$uni_key} = $uni_value;
# FETCH
# $uni_key holds a normal Perl string (abstract Unicode)
my $uni_value = $dbhash{$uni_key};
=head2 ℞ 44: PROGRAM: Demo of Unicode collation and printing
Here’s a full program showing how to make use of locale-sensitive
sorting, Unicode casing, and managing print widths when some of the
characters take up zero or two columns, not just one column each time.
When run, the following program produces this nicely aligned output:
Crème Brûlée....... €2.00
Éclair............. €1.60
Fideuà............. €4.20
Hamburger.......... €6.00
Jamón Serrano...... €4.45
Linguiça........... €7.00
Pâté............... €4.15
Pears.............. €2.00
Pêches............. €2.25
Smørbrød........... €5.75
Spätzle............ €5.50
Xoriço............. €3.00
Γύρος.............. €6.50
막걸리............. €4.00
おもち............. €2.65
お好み焼き......... €8.00
シュークリーム..... €1.85
寿司............... €9.99
包子............... €7.50
Here's that program; tested on v5.14.
#!/usr/bin/env perl
# umenu - demo sorting and printing of Unicode food
#
# (obligatory and increasingly long preamble)
#
use utf8;
use v5.14; # for locale sorting
use strict;
use warnings;
use warnings qw(FATAL utf8); # fatalize encoding faults
use open qw(:std :encoding(UTF-8)); # undeclared streams in UTF-8
use charnames qw(:full :short); # unneeded in v5.16
# std modules
use Unicode::Normalize; # std perl distro as of v5.8
use List::Util qw(max); # std perl distro as of v5.10
use Unicode::Collate::Locale; # std perl distro as of v5.14
# cpan modules
use Unicode::GCString; # from CPAN
# forward defs
sub pad($$$);
sub colwidth(_);
sub entitle(_);
my %price = (
"γύρος" => 6.50, # gyros
"pears" => 2.00, # like um, pears
"linguiça" => 7.00, # spicy sausage, Portuguese
"xoriço" => 3.00, # chorizo sausage, Catalan
"hamburger" => 6.00, # burgermeister meisterburger
"éclair" => 1.60, # dessert, French
"smørbrød" => 5.75, # sandwiches, Norwegian
"spätzle" => 5.50, # Bayerisch noodles, little sparrows
"包子" => 7.50, # bao1 zi5, steamed pork buns, Mandarin
"jamón serrano" => 4.45, # country ham, Spanish
"pêches" => 2.25, # peaches, French
"シュークリーム" => 1.85, # cream-filled pastry like eclair
"막걸리" => 4.00, # makgeolli, Korean rice wine
"寿司" => 9.99, # sushi, Japanese
"おもち" => 2.65, # omochi, rice cakes, Japanese
"crème brûlée" => 2.00, # crema catalana
"fideuà" => 4.20, # more noodles, Valencian
# (Catalan=fideuada)
"pâté" => 4.15, # gooseliver paste, French
"お好み焼き" => 8.00, # okonomiyaki, Japanese
);
my $width = 5 + max map { colwidth } keys %price;
# So the Asian stuff comes out in an order that someone
# who reads those scripts won't freak out over; the
# CJK stuff will be in JIS X 0208 order that way.
my $coll = Unicode::Collate::Locale->new(locale => "ja");
for my $item ($coll->sort(keys %price)) {
print pad(entitle($item), $width, ".");
printf " €%.2f\n", $price{$item};
}
sub pad($$$) {
my($str, $width, $padchar) = @_;
return $str . ($padchar x ($width - colwidth($str)));
}
sub colwidth(_) {
my($str) = @_;
return Unicode::GCString->new($str)->columns;
}
sub entitle(_) {
my($str) = @_;
$str =~ s{ (?=\pL)(\S) (\S*) }
{ ucfirst($1) . lc($2) }xge;
return $str;
}
=head1 SEE ALSO
See these manpages, some of which are CPAN modules:
L<perlunicode>, L<perluniprops>,
L<perlre>, L<perlrecharclass>,
L<perluniintro>, L<perlunitut>, L<perlunifaq>,
L<PerlIO>, L<DB_File>, L<DBM_Filter>, L<DBM_Filter::utf8>,
L<Encode>, L<Encode::Locale>,
L<Unicode::UCD>,
L<Unicode::Normalize>,
L<Unicode::GCString>, L<Unicode::LineBreak>,
L<Unicode::Collate>, L<Unicode::Collate::Locale>,
L<Unicode::Unihan>,
L<Unicode::CaseFold>,
L<Unicode::Tussle>,
L<Lingua::JA::Romanize::Japanese>,
L<Lingua::ZH::Romanize::Pinyin>,
L<Lingua::KO::Romanize::Hangul>.
The L<Unicode::Tussle> CPAN module includes many programs
to help with working with Unicode, including
these programs to fully or partly replace standard utilities:
I<tcgrep> instead of I<egrep>,
I<uniquote> instead of I<cat -v> or I<hexdump>,
I<uniwc> instead of I<wc>,
I<unilook> instead of I<look>,
I<unifmt> instead of I<fmt>,
and
I<ucsort> instead of I<sort>.
For exploring Unicode character names and character properties,
see its I<uniprops>, I<unichars>, and I<uninames> programs.
It also supplies these programs, all of which are general filters that do Unicode-y things:
I<unititle> and I<unicaps>;
I<uniwide> and I<uninarrow>;
I<unisupers> and I<unisubs>;
I<nfd>, I<nfc>, I<nfkd>, and I<nfkc>;
and I<uc>, I<lc>, and I<tc>.
Finally, see the published Unicode Standard (page numbers are from version
6.0.0), including these specific annexes and technical reports:
=over
=item §3.13 Default Case Algorithms, page 113;
§4.2 Case, pages 120–122;
Case Mappings, page 166–172, especially Caseless Matching starting on page 170.
=item UAX #44: Unicode Character Database
=item UTS #18: Unicode Regular Expressions
=item UAX #15: Unicode Normalization Forms
=item UTS #10: Unicode Collation Algorithm
=item UAX #29: Unicode Text Segmentation
=item UAX #14: Unicode Line Breaking Algorithm
=item UAX #11: East Asian Width
=back
=head1 AUTHOR
Tom Christiansen E<lt>tchrist@perl.comE<gt> wrote this, with occasional
kibbitzing from Larry Wall and Jeffrey Friedl in the background.
=head1 COPYRIGHT AND LICENCE
Copyright © 2012 Tom Christiansen.
This program is free software; you may redistribute it and/or modify it
under the same terms as Perl itself.
Most of these examples taken from the current edition of the “Camel Book”;
that is, from the 4ᵗʰ Edition of I<Programming Perl>, Copyright © 2012 Tom
Christiansen <et al.>, 2012-02-13 by O’Reilly Media. The code itself is
freely redistributable, and you are encouraged to transplant, fold,
spindle, and mutilate any of the examples in this manpage however you please
for inclusion into your own programs without any encumbrance whatsoever.
Acknowledgement via code comment is polite but not required.
=head1 REVISION HISTORY
v1.0.0 – first public release, 2012-02-27
=head1 NAME
perlunitut - Perl Unicode Tutorial
=head1 DESCRIPTION
The days of just flinging strings around are over. It's well established that
modern programs need to be capable of communicating funny accented letters, and
things like euro symbols. This means that programmers need new habits. It's
easy to program Unicode capable software, but it does require discipline to do
it right.
There's a lot to know about character sets, and text encodings. It's probably
best to spend a full day learning all this, but the basics can be learned in
minutes.
These are not the very basics, though. It is assumed that you already
know the difference between bytes and characters, and realise (and accept!)
that there are many different character sets and encodings, and that your
program has to be explicit about them. Recommended reading is "The Absolute
Minimum Every Software Developer Absolutely, Positively Must Know About Unicode
and Character Sets (No Excuses!)" by Joel Spolsky, at
L<http://joelonsoftware.com/articles/Unicode.html>.
This tutorial speaks in rather absolute terms, and provides only a limited view
of the wealth of character string related features that Perl has to offer. For
most projects, this information will probably suffice.
=head2 Definitions
It's important to set a few things straight first. This is the most important
part of this tutorial. This view may conflict with other information that you
may have found on the web, but that's mostly because many sources are wrong.
You may have to re-read this entire section a few times...
=head3 Unicode
B<Unicode> is a character set with room for lots of characters. The ordinal
value of a character is called a B<code point>. (But in practice, the
distinction between code point and character is blurred, so the terms often
are used interchangeably.)
There are many, many code points, but computers work with bytes, and a byte has
room for only 256 values. Unicode has many more characters than that,
so you need a method to make these accessible.
Unicode is encoded using several competing encodings, of which UTF-8 is the
most used. In a Unicode encoding, multiple subsequent bytes can be used to
store a single code point, or simply: character.
=head3 UTF-8
B<UTF-8> is a Unicode encoding. Many people think that Unicode and UTF-8 are
the same thing, but they're not. There are more Unicode encodings, but much of
the world has standardized on UTF-8.
UTF-8 treats the first 128 codepoints, 0..127, the same as ASCII. They take
only one byte per character. All other characters are encoded as two to
four bytes using a complex scheme. Fortunately, Perl handles this for
us, so we don't have to worry about this.
=head3 Text strings (character strings)
B<Text strings>, or B<character strings> are made of characters. Bytes are
irrelevant here, and so are encodings. Each character is just that: the
character.
On a text string, you would do things like:
$text =~ s/foo/bar/;
if ($string =~ /^\d+$/) { ... }
$text = ucfirst $text;
my $character_count = length $text;
The value of a character (C<ord>, C<chr>) is the corresponding Unicode code
point.
=head3 Binary strings (byte strings)
B<Binary strings>, or B<byte strings> are made of bytes. Here, you don't have
characters, just bytes. All communication with the outside world (anything
outside of your current Perl process) is done in binary.
On a binary string, you would do things like:
my (@length_content) = unpack "(V/a)*", $binary;
$binary =~ s/\x00\x0F/\xFF\xF0/; # for the brave :)
print {$fh} $binary;
my $byte_count = length $binary;
=head3 Encoding
B<Encoding> (as a verb) is the conversion from I<text> to I<binary>. To encode,
you have to supply the target encoding, for example C<iso-8859-1> or C<UTF-8>.
Some encodings, like the C<iso-8859> ("latin") range, do not support the full
Unicode standard; characters that can't be represented are lost in the
conversion.
=head3 Decoding
B<Decoding> is the conversion from I<binary> to I<text>. To decode, you have to
know what encoding was used during the encoding phase. And most of all, it must
be something decodable. It doesn't make much sense to decode a PNG image into a
text string.
=head3 Internal format
Perl has an B<internal format>, an encoding that it uses to encode text strings
so it can store them in memory. All text strings are in this internal format.
In fact, text strings are never in any other format!
You shouldn't worry about what this format is, because conversion is
automatically done when you decode or encode.
=head2 Your new toolkit
Add to your standard heading the following line:
use Encode qw(encode decode);
Or, if you're lazy, just:
use Encode;
=head2 I/O flow (the actual 5 minute tutorial)
The typical input/output flow of a program is:
1. Receive and decode
2. Process
3. Encode and output
If your input is binary, and is supposed to remain binary, you shouldn't decode
it to a text string, of course. But in all other cases, you should decode it.
Decoding can't happen reliably if you don't know how the data was encoded. If
you get to choose, it's a good idea to standardize on UTF-8.
my $foo = decode('UTF-8', get 'http://example.com/');
my $bar = decode('ISO-8859-1', readline STDIN);
my $xyzzy = decode('Windows-1251', $cgi->param('foo'));
Processing happens as you knew before. The only difference is that you're now
using characters instead of bytes. That's very useful if you use things like
C<substr>, or C<length>.
It's important to realize that there are no bytes in a text string. Of course,
Perl has its internal encoding to store the string in memory, but ignore that.
If you have to do anything with the number of bytes, it's probably best to move
that part to step 3, just after you've encoded the string. Then you know
exactly how many bytes it will be in the destination string.
The syntax for encoding text strings to binary strings is as simple as decoding:
$body = encode('UTF-8', $body);
If you needed to know the length of the string in bytes, now's the perfect time
for that. Because C<$body> is now a byte string, C<length> will report the
number of bytes, instead of the number of characters. The number of
characters is no longer known, because characters only exist in text strings.
my $byte_count = length $body;
And if the protocol you're using supports a way of letting the recipient know
which character encoding you used, please help the receiving end by using that
feature! For example, E-mail and HTTP support MIME headers, so you can use the
C<Content-Type> header. They can also have C<Content-Length> to indicate the
number of I<bytes>, which is always a good idea to supply if the number is
known.
"Content-Type: text/plain; charset=UTF-8",
"Content-Length: $byte_count"
=head1 SUMMARY
Decode everything you receive, encode everything you send out. (If it's text
data.)
=head1 Q and A (or FAQ)
After reading this document, you ought to read L<perlunifaq> too, then
L<perluniintro>.
=head1 ACKNOWLEDGEMENTS
Thanks to Johan Vromans from Squirrel Consultancy. His UTF-8 rants during the
Amsterdam Perl Mongers meetings got me interested and determined to find out
how to use character encodings in Perl in ways that don't break easily.
Thanks to Gerard Goossen from TTY. His presentation "UTF-8 in the wild" (Dutch
Perl Workshop 2006) inspired me to publish my thoughts and write this tutorial.
Thanks to the people who asked about this kind of stuff in several Perl IRC
channels, and have constantly reminded me that a simpler explanation was
needed.
Thanks to the people who reviewed this document for me, before it went public.
They are: Benjamin Smith, Jan-Pieter Cornet, Johan Vromans, Lukas Mai, Nathan
Gray.
=head1 AUTHOR
Juerd Waalboer <#####@juerd.nl>
=head1 SEE ALSO
L<perlunifaq>, L<perlunicode>, L<perluniintro>, L<Encode>
=encoding utf8
=head1 NAME
perl5224delta - what is new for perl v5.22.4
=head1 DESCRIPTION
This document describes differences between the 5.22.3 release and the 5.22.4
release.
If you are upgrading from an earlier release such as 5.22.2, first read
L<perl5223delta>, which describes differences between 5.22.2 and 5.22.3.
=head1 Security
=head2 Improved handling of '.' in @INC in base.pm
The handling of (the removal of) C<'.'> in C<@INC> in L<base> has been
improved. This resolves some problematic behaviour in the approach taken in
Perl 5.22.3, which is probably best described in the following two threads on
the Perl 5 Porters mailing list:
L<http://www.nntp.perl.org/group/perl.perl5.porters/2016/08/msg238991.html>,
L<http://www.nntp.perl.org/group/perl.perl5.porters/2016/10/msg240297.html>.
=head2 "Escaped" colons and relative paths in PATH
On Unix systems, Perl treats any relative paths in the PATH environment
variable as tainted when starting a new process. Previously, it was allowing a
backslash to escape a colon (unlike the OS), consequently allowing relative
paths to be considered safe if the PATH was set to something like C</\:.>. The
check has been fixed to treat C<.> as tainted in that example.
=head1 Modules and Pragmata
=head2 Updated Modules and Pragmata
=over 4
=item *
L<base> has been upgraded from version 2.22 to 2.22_01.
=item *
L<Module::CoreList> has been upgraded from version 5.20170114_22 to 5.20170715_22.
=back
=head1 Selected Bug Fixes
=over 4
=item *
Fixed a crash with C<s///l> where it thought it was dealing with UTF-8 when it
wasn't.
L<[perl #129038]|https://rt.perl.org/Ticket/Display.html?id=129038>
=back
=head1 Acknowledgements
Perl 5.22.4 represents approximately 6 months of development since Perl 5.22.3
and contains approximately 2,200 lines of changes across 52 files from 16
authors.
Excluding auto-generated files, documentation and release tools, there were
approximately 970 lines of changes to 18 .pm, .t, .c and .h files.
Perl continues to flourish into its third decade thanks to a vibrant community
of users and developers. The following people are known to have contributed
the improvements that became Perl 5.22.4:
Aaron Crane, Abigail, Aristotle Pagaltzis, Chris 'BinGOs' Williams, David
Mitchell, Eric Herman, Father Chrysostomos, James E Keenan, Karl Williamson,
Lukas Mai, Renee Baecker, Ricardo Signes, Sawyer X, Stevan Little, Steve Hay,
Tony Cook.
The list above is almost certainly incomplete as it is automatically generated
from version control history. In particular, it does not include the names of
the (very much appreciated) contributors who reported issues to the Perl bug
tracker.
Many of the changes included in this version originated in the CPAN modules
included in Perl's core. We're grateful to the entire CPAN community for
helping Perl to flourish.
For a more complete list of all of Perl's historical contributors, please see
the F<AUTHORS> file in the Perl source distribution.
=head1 Reporting Bugs
If you find what you think is a bug, you might check the articles recently
posted to the comp.lang.perl.misc newsgroup and the perl bug database at
https://rt.perl.org/ . There may also be information at
http://www.perl.org/ , the Perl Home Page.
If you believe you have an unreported bug, please run the L<perlbug> program
included with your release. Be sure to trim your bug down to a tiny but
sufficient test case. Your bug report, along with the output of C<perl -V>,
will be sent off to perlbug@perl.org to be analysed by the Perl porting team.
If the bug you are reporting has security implications, which make it
inappropriate to send to a publicly archived mailing list, then please send it
to perl5-security-report@perl.org. This points to a closed subscription
unarchived mailing list, which includes all the core committers, who will be
able to help assess the impact of issues, figure out a resolution, and help
co-ordinate the release of patches to mitigate or fix the problem across all
platforms on which Perl is supported. Please only use this address for
security issues in the Perl core, not for modules independently distributed on
CPAN.
=head1 SEE ALSO
The F<Changes> file for an explanation of how to view exhaustive details on
what changed.
The F<INSTALL> file for how to build Perl.
The F<README> file for general stuff.
The F<Artistic> and F<Copying> files for copyright information.
=cut
=head1 NAME
perlnumber - semantics of numbers and numeric operations in Perl
=head1 SYNOPSIS
$n = 1234; # decimal integer
$n = 0b1110011; # binary integer
$n = 01234; # octal integer
$n = 0x1234; # hexadecimal integer
$n = 12.34e-56; # exponential notation
$n = "-12.34e56"; # number specified as a string
$n = "1234"; # number specified as a string
=head1 DESCRIPTION
This document describes how Perl internally handles numeric values.
Perl's operator overloading facility is completely ignored here. Operator
overloading allows user-defined behaviors for numbers, such as operations
over arbitrarily large integers, floating points numbers with arbitrary
precision, operations over "exotic" numbers such as modular arithmetic or
p-adic arithmetic, and so on. See L<overload> for details.
=head1 Storing numbers
Perl can internally represent numbers in 3 different ways: as native
integers, as native floating point numbers, and as decimal strings.
Decimal strings may have an exponential notation part, as in C<"12.34e-56">.
I<Native> here means "a format supported by the C compiler which was used
to build perl".
The term "native" does not mean quite as much when we talk about native
integers, as it does when native floating point numbers are involved.
The only implication of the term "native" on integers is that the limits for
the maximal and the minimal supported true integral quantities are close to
powers of 2. However, "native" floats have a most fundamental
restriction: they may represent only those numbers which have a relatively
"short" representation when converted to a binary fraction. For example,
0.9 cannot be represented by a native float, since the binary fraction
for 0.9 is infinite:
binary0.1110011001100...
with the sequence C<1100> repeating again and again. In addition to this
limitation, the exponent of the binary number is also restricted when it
is represented as a floating point number. On typical hardware, floating
point values can store numbers with up to 53 binary digits, and with binary
exponents between -1024 and 1024. In decimal representation this is close
to 16 decimal digits and decimal exponents in the range of -304..304.
The upshot of all this is that Perl cannot store a number like
12345678901234567 as a floating point number on such architectures without
loss of information.
Similarly, decimal strings can represent only those numbers which have a
finite decimal expansion. Being strings, and thus of arbitrary length, there
is no practical limit for the exponent or number of decimal digits for these
numbers. (But realize that what we are discussing the rules for just the
I<storage> of these numbers. The fact that you can store such "large" numbers
does not mean that the I<operations> over these numbers will use all
of the significant digits.
See L</"Numeric operators and numeric conversions"> for details.)
In fact numbers stored in the native integer format may be stored either
in the signed native form, or in the unsigned native form. Thus the limits
for Perl numbers stored as native integers would typically be -2**31..2**32-1,
with appropriate modifications in the case of 64-bit integers. Again, this
does not mean that Perl can do operations only over integers in this range:
it is possible to store many more integers in floating point format.
Summing up, Perl numeric values can store only those numbers which have
a finite decimal expansion or a "short" binary expansion.
=head1 Numeric operators and numeric conversions
As mentioned earlier, Perl can store a number in any one of three formats,
but most operators typically understand only one of those formats. When
a numeric value is passed as an argument to such an operator, it will be
converted to the format understood by the operator.
Six such conversions are possible:
native integer --> native floating point (*)
native integer --> decimal string
native floating_point --> native integer (*)
native floating_point --> decimal string (*)
decimal string --> native integer
decimal string --> native floating point (*)
These conversions are governed by the following general rules:
=over 4
=item *
If the source number can be represented in the target form, that
representation is used.
=item *
If the source number is outside of the limits representable in the target form,
a representation of the closest limit is used. (I<Loss of information>)
=item *
If the source number is between two numbers representable in the target form,
a representation of one of these numbers is used. (I<Loss of information>)
=item *
In C<< native floating point --> native integer >> conversions the magnitude
of the result is less than or equal to the magnitude of the source.
(I<"Rounding to zero".>)
=item *
If the C<< decimal string --> native integer >> conversion cannot be done
without loss of information, the result is compatible with the conversion
sequence C<< decimal_string --> native_floating_point --> native_integer >>.
In particular, rounding is strongly biased to 0, though a number like
C<"0.99999999999999999999"> has a chance of being rounded to 1.
=back
B<RESTRICTION>: The conversions marked with C<(*)> above involve steps
performed by the C compiler. In particular, bugs/features of the compiler
used may lead to breakage of some of the above rules.
=head1 Flavors of Perl numeric operations
Perl operations which take a numeric argument treat that argument in one
of four different ways: they may force it to one of the integer/floating/
string formats, or they may behave differently depending on the format of
the operand. Forcing a numeric value to a particular format does not
change the number stored in the value.
All the operators which need an argument in the integer format treat the
argument as in modular arithmetic, e.g., C<mod 2**32> on a 32-bit
architecture. C<sprintf "%u", -1> therefore provides the same result as
C<sprintf "%u", ~0>.
=over 4
=item Arithmetic operators
The binary operators C<+> C<-> C<*> C</> C<%> C<==> C<!=> C<E<gt>> C<E<lt>>
C<E<gt>=> C<E<lt>=> and the unary operators C<-> C<abs> and C<--> will
attempt to convert arguments to integers. If both conversions are possible
without loss of precision, and the operation can be performed without
loss of precision then the integer result is used. Otherwise arguments are
converted to floating point format and the floating point result is used.
The caching of conversions (as described above) means that the integer
conversion does not throw away fractional parts on floating point numbers.
=item ++
C<++> behaves as the other operators above, except that if it is a string
matching the format C</^[a-zA-Z]*[0-9]*\z/> the string increment described
in L<perlop> is used.
=item Arithmetic operators during C<use integer>
In scopes where C<use integer;> is in force, nearly all the operators listed
above will force their argument(s) into integer format, and return an integer
result. The exceptions, C<abs>, C<++> and C<-->, do not change their
behavior with C<use integer;>
=item Other mathematical operators
Operators such as C<**>, C<sin> and C<exp> force arguments to floating point
format.
=item Bitwise operators
Arguments are forced into the integer format if not strings.
=item Bitwise operators during C<use integer>
forces arguments to integer format. Also shift operations internally use
signed integers rather than the default unsigned.
=item Operators which expect an integer
force the argument into the integer format. This is applicable
to the third and fourth arguments of C<sysread>, for example.
=item Operators which expect a string
force the argument into the string format. For example, this is
applicable to C<printf "%s", $value>.
=back
Though forcing an argument into a particular form does not change the
stored number, Perl remembers the result of such conversions. In
particular, though the first such conversion may be time-consuming,
repeated operations will not need to redo the conversion.
=head1 AUTHOR
Ilya Zakharevich C<ilya@math.ohio-state.edu>
Editorial adjustments by Gurusamy Sarathy <gsar@ActiveState.com>
Updates for 5.8.0 by Nicholas Clark <nick@ccl4.org>
=head1 SEE ALSO
L<overload>, L<perlop>
=head1 NAME
perlvar - Perl predefined variables
=head1 DESCRIPTION
=head2 The Syntax of Variable Names
Variable names in Perl can have several formats. Usually, they
must begin with a letter or underscore, in which case they can be
arbitrarily long (up to an internal limit of 251 characters) and
may contain letters, digits, underscores, or the special sequence
C<::> or C<'>. In this case, the part before the last C<::> or
C<'> is taken to be a I<package qualifier>; see L<perlmod>.
A Unicode letter that is not ASCII is not considered to be a letter
unless S<C<"use utf8">> is in effect, and somewhat more complicated
rules apply; see L<perldata/Identifier parsing> for details.
Perl variable names may also be a sequence of digits, a single
punctuation character, or the two-character sequence: C<^> (caret or
CIRCUMFLEX ACCENT) followed by any one of the characters C<[][A-Z^_?\]>.
These names are all reserved for
special uses by Perl; for example, the all-digits names are used
to hold data captured by backreferences after a regular expression
match.
Since Perl v5.6.0, Perl variable names may also be alphanumeric strings
preceded by a caret. These must all be written in the form C<${^Foo}>;
the braces are not optional. C<${^Foo}> denotes the scalar variable
whose name is considered to be a control-C<F> followed by two C<o>'s.
These variables are
reserved for future special uses by Perl, except for the ones that
begin with C<^_> (caret-underscore). No
name that begins with C<^_> will acquire a special
meaning in any future version of Perl; such names may therefore be
used safely in programs. C<$^_> itself, however, I<is> reserved.
Perl identifiers that begin with digits or
punctuation characters are exempt from the effects of the C<package>
declaration and are always forced to be in package C<main>; they are
also exempt from C<strict 'vars'> errors. A few other names are also
exempt in these ways:
ENV STDIN
INC STDOUT
ARGV STDERR
ARGVOUT
SIG
In particular, the special C<${^_XYZ}> variables are always taken
to be in package C<main>, regardless of any C<package> declarations
presently in scope.
=head1 SPECIAL VARIABLES
The following names have special meaning to Perl. Most punctuation
names have reasonable mnemonics, or analogs in the shells.
Nevertheless, if you wish to use long variable names, you need only say:
use English;
at the top of your program. This aliases all the short names to the long
names in the current package. Some even have medium names, generally
borrowed from B<awk>. For more info, please see L<English>.
Before you continue, note the sort order for variables. In general, we
first list the variables in case-insensitive, almost-lexigraphical
order (ignoring the C<{> or C<^> preceding words, as in C<${^UNICODE}>
or C<$^T>), although C<$_> and C<@_> move up to the top of the pile.
For variables with the same identifier, we list it in order of scalar,
array, hash, and bareword.
=head2 General Variables
=over 8
=item $ARG
=item $_
X<$_> X<$ARG>
The default input and pattern-searching space. The following pairs are
equivalent:
while (<>) {...} # equivalent only in while!
while (defined($_ = <>)) {...}
/^Subject:/
$_ =~ /^Subject:/
tr/a-z/A-Z/
$_ =~ tr/a-z/A-Z/
chomp
chomp($_)
Here are the places where Perl will assume C<$_> even if you don't use it:
=over 3
=item *
The following functions use C<$_> as a default argument:
abs, alarm, chomp, chop, chr, chroot,
cos, defined, eval, evalbytes, exp, fc, glob, hex, int, lc,
lcfirst, length, log, lstat, mkdir, oct, ord, pos, print, printf,
quotemeta, readlink, readpipe, ref, require, reverse (in scalar context only),
rmdir, say, sin, split (for its second
argument), sqrt, stat, study, uc, ucfirst,
unlink, unpack.
=item *
All file tests (C<-f>, C<-d>) except for C<-t>, which defaults to STDIN.
See L<perlfunc/-X>
=item *
The pattern matching operations C<m//>, C<s///> and C<tr///> (aka C<y///>)
when used without an C<=~> operator.
=item *
The default iterator variable in a C<foreach> loop if no other
variable is supplied.
=item *
The implicit iterator variable in the C<grep()> and C<map()> functions.
=item *
The implicit variable of C<given()>.
=item *
The default place to put the next value or input record
when a C<< <FH> >>, C<readline>, C<readdir> or C<each>
operation's result is tested by itself as the sole criterion of a C<while>
test. Outside a C<while> test, this will not happen.
=back
C<$_> is by default a global variable. However, as
of perl v5.10.0, you can use a lexical version of
C<$_> by declaring it in a file or in a block with C<my>. Moreover,
declaring C<our $_> restores the global C<$_> in the current scope. Though
this seemed like a good idea at the time it was introduced, lexical C<$_>
actually causes more problems than it solves. If you call a function that
expects to be passed information via C<$_>, it may or may not work,
depending on how the function is written, there not being any easy way to
solve this. Just avoid lexical C<$_>, unless you are feeling particularly
masochistic. For this reason lexical C<$_> is still experimental and will
produce a warning unless warnings have been disabled. As with other
experimental features, the behavior of lexical C<$_> is subject to change
without notice, including change into a fatal error.
Mnemonic: underline is understood in certain operations.
=item @ARG
=item @_
X<@_> X<@ARG>
Within a subroutine the array C<@_> contains the parameters passed to
that subroutine. Inside a subroutine, C<@_> is the default array for
the array operators C<pop> and C<shift>.
See L<perlsub>.
=item $LIST_SEPARATOR
=item $"
X<$"> X<$LIST_SEPARATOR>
When an array or an array slice is interpolated into a double-quoted
string or a similar context such as C</.../>, its elements are
separated by this value. Default is a space. For example, this:
print "The array is: @array\n";
is equivalent to this:
print "The array is: " . join($", @array) . "\n";
Mnemonic: works in double-quoted context.
=item $PROCESS_ID
=item $PID
=item $$
X<$$> X<$PID> X<$PROCESS_ID>
The process number of the Perl running this script. Though you I<can> set
this variable, doing so is generally discouraged, although it can be
invaluable for some testing purposes. It will be reset automatically
across C<fork()> calls.
Note for Linux and Debian GNU/kFreeBSD users: Before Perl v5.16.0 perl
would emulate POSIX semantics on Linux systems using LinuxThreads, a
partial implementation of POSIX Threads that has since been superseded
by the Native POSIX Thread Library (NPTL).
LinuxThreads is now obsolete on Linux, and caching C<getpid()>
like this made embedding perl unnecessarily complex (since you'd have
to manually update the value of $$), so now C<$$> and C<getppid()>
will always return the same values as the underlying C library.
Debian GNU/kFreeBSD systems also used LinuxThreads up until and
including the 6.0 release, but after that moved to FreeBSD thread
semantics, which are POSIX-like.
To see if your system is affected by this discrepancy check if
C<getconf GNU_LIBPTHREAD_VERSION | grep -q NPTL> returns a false
value. NTPL threads preserve the POSIX semantics.
Mnemonic: same as shells.
=item $PROGRAM_NAME
=item $0
X<$0> X<$PROGRAM_NAME>
Contains the name of the program being executed.
On some (but not all) operating systems assigning to C<$0> modifies
the argument area that the C<ps> program sees. On some platforms you
may have to use special C<ps> options or a different C<ps> to see the
changes. Modifying the C<$0> is more useful as a way of indicating the
current program state than it is for hiding the program you're
running.
Note that there are platform-specific limitations on the maximum
length of C<$0>. In the most extreme case it may be limited to the
space occupied by the original C<$0>.
In some platforms there may be arbitrary amount of padding, for
example space characters, after the modified name as shown by C<ps>.
In some platforms this padding may extend all the way to the original
length of the argument area, no matter what you do (this is the case
for example with Linux 2.2).
Note for BSD users: setting C<$0> does not completely remove "perl"
from the ps(1) output. For example, setting C<$0> to C<"foobar"> may
result in C<"perl: foobar (perl)"> (whether both the C<"perl: "> prefix
and the " (perl)" suffix are shown depends on your exact BSD variant
and version). This is an operating system feature, Perl cannot help it.
In multithreaded scripts Perl coordinates the threads so that any
thread may modify its copy of the C<$0> and the change becomes visible
to ps(1) (assuming the operating system plays along). Note that
the view of C<$0> the other threads have will not change since they
have their own copies of it.
If the program has been given to perl via the switches C<-e> or C<-E>,
C<$0> will contain the string C<"-e">.
On Linux as of perl v5.14.0 the legacy process name will be set with
C<prctl(2)>, in addition to altering the POSIX name via C<argv[0]> as
perl has done since version 4.000. Now system utilities that read the
legacy process name such as ps, top and killall will recognize the
name you set when assigning to C<$0>. The string you supply will be
cut off at 16 bytes, this is a limitation imposed by Linux.
Mnemonic: same as B<sh> and B<ksh>.
=item $REAL_GROUP_ID
=item $GID
=item $(
X<$(> X<$GID> X<$REAL_GROUP_ID>
The real gid of this process. If you are on a machine that supports
membership in multiple groups simultaneously, gives a space separated
list of groups you are in. The first number is the one returned by
C<getgid()>, and the subsequent ones by C<getgroups()>, one of which may be
the same as the first number.
However, a value assigned to C<$(> must be a single number used to
set the real gid. So the value given by C<$(> should I<not> be assigned
back to C<$(> without being forced numeric, such as by adding zero. Note
that this is different to the effective gid (C<$)>) which does take a
list.
You can change both the real gid and the effective gid at the same
time by using C<POSIX::setgid()>. Changes
to C<$(> require a check to C<$!>
to detect any possible errors after an attempted change.
Mnemonic: parentheses are used to I<group> things. The real gid is the
group you I<left>, if you're running setgid.
=item $EFFECTIVE_GROUP_ID
=item $EGID
=item $)
X<$)> X<$EGID> X<$EFFECTIVE_GROUP_ID>
The effective gid of this process. If you are on a machine that
supports membership in multiple groups simultaneously, gives a space
separated list of groups you are in. The first number is the one
returned by C<getegid()>, and the subsequent ones by C<getgroups()>,
one of which may be the same as the first number.
Similarly, a value assigned to C<$)> must also be a space-separated
list of numbers. The first number sets the effective gid, and
the rest (if any) are passed to C<setgroups()>. To get the effect of an
empty list for C<setgroups()>, just repeat the new effective gid; that is,
to force an effective gid of 5 and an effectively empty C<setgroups()>
list, say C< $) = "5 5" >.
You can change both the effective gid and the real gid at the same
time by using C<POSIX::setgid()> (use only a single numeric argument).
Changes to C<$)> require a check to C<$!> to detect any possible errors
after an attempted change.
C<< $< >>, C<< $> >>, C<$(> and C<$)> can be set only on
machines that support the corresponding I<set[re][ug]id()> routine. C<$(>
and C<$)> can be swapped only on machines supporting C<setregid()>.
Mnemonic: parentheses are used to I<group> things. The effective gid
is the group that's I<right> for you, if you're running setgid.
=item $REAL_USER_ID
=item $UID
=item $<
X<< $< >> X<$UID> X<$REAL_USER_ID>
The real uid of this process. You can change both the real uid and the
effective uid at the same time by using C<POSIX::setuid()>. Since
changes to C<< $< >> require a system call, check C<$!> after a change
attempt to detect any possible errors.
Mnemonic: it's the uid you came I<from>, if you're running setuid.
=item $EFFECTIVE_USER_ID
=item $EUID
=item $>
X<< $> >> X<$EUID> X<$EFFECTIVE_USER_ID>
The effective uid of this process. For example:
$< = $>; # set real to effective uid
($<,$>) = ($>,$<); # swap real and effective uids
You can change both the effective uid and the real uid at the same
time by using C<POSIX::setuid()>. Changes to C<< $> >> require a check
to C<$!> to detect any possible errors after an attempted change.
C<< $< >> and C<< $> >> can be swapped only on machines
supporting C<setreuid()>.
Mnemonic: it's the uid you went I<to>, if you're running setuid.
=item $SUBSCRIPT_SEPARATOR
=item $SUBSEP
=item $;
X<$;> X<$SUBSEP> X<SUBSCRIPT_SEPARATOR>
The subscript separator for multidimensional array emulation. If you
refer to a hash element as
$foo{$x,$y,$z}
it really means
$foo{join($;, $x, $y, $z)}
But don't put
@foo{$x,$y,$z} # a slice--note the @
which means
($foo{$x},$foo{$y},$foo{$z})
Default is "\034", the same as SUBSEP in B<awk>. If your keys contain
binary data there might not be any safe value for C<$;>.
Consider using "real" multidimensional arrays as described
in L<perllol>.
Mnemonic: comma (the syntactic subscript separator) is a semi-semicolon.
=item $a
=item $b
X<$a> X<$b>
Special package variables when using C<sort()>, see L<perlfunc/sort>.
Because of this specialness C<$a> and C<$b> don't need to be declared
(using C<use vars>, or C<our()>) even when using the C<strict 'vars'>
pragma. Don't lexicalize them with C<my $a> or C<my $b> if you want to
be able to use them in the C<sort()> comparison block or function.
=item %ENV
X<%ENV>
The hash C<%ENV> contains your current environment. Setting a
value in C<ENV> changes the environment for any child processes
you subsequently C<fork()> off.
As of v5.18.0, both keys and values stored in C<%ENV> are stringified.
my $foo = 1;
$ENV{'bar'} = \$foo;
if( ref $ENV{'bar'} ) {
say "Pre 5.18.0 Behaviour";
} else {
say "Post 5.18.0 Behaviour";
}
Previously, only child processes received stringified values:
my $foo = 1;
$ENV{'bar'} = \$foo;
# Always printed 'non ref'
system($^X, '-e',
q/print ( ref $ENV{'bar'} ? 'ref' : 'non ref' ) /);
This happens because you can't really share arbitrary data structures with
foreign processes.
=item $OLD_PERL_VERSION
=item $]
X<$]> X<$OLD_PERL_VERSION>
The revision, version, and subversion of the Perl interpreter, represented
as a decimal of the form 5.XXXYYY, where XXX is the version / 1e3 and YYY
is the subversion / 1e6. For example, Perl v5.10.1 would be "5.010001".
This variable can be used to determine whether the Perl interpreter
executing a script is in the right range of versions:
warn "No PerlIO!\n" if $] lt '5.008';
When comparing C<$]>, string comparison operators are B<highly
recommended>. The inherent limitations of binary floating point
representation can sometimes lead to incorrect comparisons for some
numbers on some architectures.
See also the documentation of C<use VERSION> and C<require VERSION>
for a convenient way to fail if the running Perl interpreter is too old.
See L</$^V> for a representation of the Perl version as a L<version>
object, which allows more flexible string comparisons.
The main advantage of C<$]> over C<$^V> is that it works the same on any
version of Perl. The disadvantages are that it can't easily be compared
to versions in other formats (e.g. literal v-strings, "v1.2.3" or
version objects) and numeric comparisons can occasionally fail; it's good
for string literal version checks and bad for comparing to a variable
that hasn't been sanity-checked.
The C<$OLD_PERL_VERSION> form was added in Perl v5.20.0 for historical
reasons but its use is discouraged. (If your reason to use C<$]> is to
run code on old perls then referring to it as C<$OLD_PERL_VERSION> would
be self-defeating.)
Mnemonic: Is this version of perl in the right bracket?
=item $SYSTEM_FD_MAX
=item $^F
X<$^F> X<$SYSTEM_FD_MAX>
The maximum system file descriptor, ordinarily 2. System file
descriptors are passed to C<exec()>ed processes, while higher file
descriptors are not. Also, during an
C<open()>, system file descriptors are
preserved even if the C<open()> fails (ordinary file descriptors are
closed before the C<open()> is attempted). The close-on-exec
status of a file descriptor will be decided according to the value of
C<$^F> when the corresponding file, pipe, or socket was opened, not the
time of the C<exec()>.
=item @F
X<@F>
The array C<@F> contains the fields of each line read in when autosplit
mode is turned on. See L<perlrun> for the B<-a> switch. This array
is package-specific, and must be declared or given a full package name
if not in package main when running under C<strict 'vars'>.
=item @INC
X<@INC>
The array C<@INC> contains the list of places that the C<do EXPR>,
C<require>, or C<use> constructs look for their library files. It
initially consists of the arguments to any B<-I> command-line
switches, followed by the default Perl library, probably
F</usr/local/lib/perl>, followed by ".", to represent the current
directory. ("." will not be appended if taint checks are enabled,
either by C<-T> or by C<-t>, or if configured not to do so by the
C<-Ddefault_inc_excludes_dot> compile time option.) If you need to
modify this at runtime, you should use the C<use lib> pragma to get
the machine-dependent library properly loaded also:
use lib '/mypath/libdir/';
use SomeMod;
You can also insert hooks into the file inclusion system by putting Perl
code directly into C<@INC>. Those hooks may be subroutine references,
array references or blessed objects. See L<perlfunc/require> for details.
=item %INC
X<%INC>
The hash C<%INC> contains entries for each filename included via the
C<do>, C<require>, or C<use> operators. The key is the filename
you specified (with module names converted to pathnames), and the
value is the location of the file found. The C<require>
operator uses this hash to determine whether a particular file has
already been included.
If the file was loaded via a hook (e.g. a subroutine reference, see
L<perlfunc/require> for a description of these hooks), this hook is
by default inserted into C<%INC> in place of a filename. Note, however,
that the hook may have set the C<%INC> entry by itself to provide some more
specific info.
=item $INPLACE_EDIT
=item $^I
X<$^I> X<$INPLACE_EDIT>
The current value of the inplace-edit extension. Use C<undef> to disable
inplace editing.
Mnemonic: value of B<-i> switch.
=item @ISA
X<@ISA>
Each package contains a special array called C<@ISA> which contains a list
of that class's parent classes, if any. This array is simply a list of
scalars, each of which is a string that corresponds to a package name. The
array is examined when Perl does method resolution, which is covered in
L<perlobj>.
To load packages while adding them to C<@ISA>, see the L<parent> pragma. The
discouraged L<base> pragma does this as well, but should not be used except
when compatibility with the discouraged L<fields> pragma is required.
=item $^M
X<$^M>
By default, running out of memory is an untrappable, fatal error.
However, if suitably built, Perl can use the contents of C<$^M>
as an emergency memory pool after C<die()>ing. Suppose that your Perl
were compiled with C<-DPERL_EMERGENCY_SBRK> and used Perl's malloc.
Then
$^M = 'a' x (1 << 16);
would allocate a 64K buffer for use in an emergency. See the
F<INSTALL> file in the Perl distribution for information on how to
add custom C compilation flags when compiling perl. To discourage casual
use of this advanced feature, there is no L<English|English> long name for
this variable.
This variable was added in Perl 5.004.
=item $OSNAME
=item $^O
X<$^O> X<$OSNAME>
The name of the operating system under which this copy of Perl was
built, as determined during the configuration process. For examples
see L<perlport/PLATFORMS>.
The value is identical to C<$Config{'osname'}>. See also L<Config>
and the B<-V> command-line switch documented in L<perlrun>.
In Windows platforms, C<$^O> is not very helpful: since it is always
C<MSWin32>, it doesn't tell the difference between
95/98/ME/NT/2000/XP/CE/.NET. Use C<Win32::GetOSName()> or
Win32::GetOSVersion() (see L<Win32> and L<perlport>) to distinguish
between the variants.
This variable was added in Perl 5.003.
=item %SIG
X<%SIG>
The hash C<%SIG> contains signal handlers for signals. For example:
sub handler { # 1st argument is signal name
my($sig) = @_;
print "Caught a SIG$sig--shutting down\n";
close(LOG);
exit(0);
}
$SIG{'INT'} = \&handler;
$SIG{'QUIT'} = \&handler;
...
$SIG{'INT'} = 'DEFAULT'; # restore default action
$SIG{'QUIT'} = 'IGNORE'; # ignore SIGQUIT
Using a value of C<'IGNORE'> usually has the effect of ignoring the
signal, except for the C<CHLD> signal. See L<perlipc> for more about
this special case.
Here are some other examples:
$SIG{"PIPE"} = "Plumber"; # assumes main::Plumber (not
# recommended)
$SIG{"PIPE"} = \&Plumber; # just fine; assume current
# Plumber
$SIG{"PIPE"} = *Plumber; # somewhat esoteric
$SIG{"PIPE"} = Plumber(); # oops, what did Plumber()
# return??
Be sure not to use a bareword as the name of a signal handler,
lest you inadvertently call it.
If your system has the C<sigaction()> function then signal handlers
are installed using it. This means you get reliable signal handling.
The default delivery policy of signals changed in Perl v5.8.0 from
immediate (also known as "unsafe") to deferred, also known as "safe
signals". See L<perlipc> for more information.
Certain internal hooks can be also set using the C<%SIG> hash. The
routine indicated by C<$SIG{__WARN__}> is called when a warning
message is about to be printed. The warning message is passed as the
first argument. The presence of a C<__WARN__> hook causes the
ordinary printing of warnings to C<STDERR> to be suppressed. You can
use this to save warnings in a variable, or turn warnings into fatal
errors, like this:
local $SIG{__WARN__} = sub { die $_[0] };
eval $proggie;
As the C<'IGNORE'> hook is not supported by C<__WARN__>, you can
disable warnings using the empty subroutine:
local $SIG{__WARN__} = sub {};
The routine indicated by C<$SIG{__DIE__}> is called when a fatal
exception is about to be thrown. The error message is passed as the
first argument. When a C<__DIE__> hook routine returns, the exception
processing continues as it would have in the absence of the hook,
unless the hook routine itself exits via a C<goto &sub>, a loop exit,
or a C<die()>. The C<__DIE__> handler is explicitly disabled during
the call, so that you can die from a C<__DIE__> handler. Similarly
for C<__WARN__>.
The C<$SIG{__DIE__}> hook is called even inside an C<eval()>. It was
never intended to happen this way, but an implementation glitch made
this possible. This used to be deprecated, as it allowed strange action
at a distance like rewriting a pending exception in C<$@>. Plans to
rectify this have been scrapped, as users found that rewriting a
pending exception is actually a useful feature, and not a bug.
C<__DIE__>/C<__WARN__> handlers are very special in one respect: they
may be called to report (probable) errors found by the parser. In such
a case the parser may be in inconsistent state, so any attempt to
evaluate Perl code from such a handler will probably result in a
segfault. This means that warnings or errors that result from parsing
Perl should be used with extreme caution, like this:
require Carp if defined $^S;
Carp::confess("Something wrong") if defined &Carp::confess;
die "Something wrong, but could not load Carp to give "
. "backtrace...\n\t"
. "To see backtrace try starting Perl with -MCarp switch";
Here the first line will load C<Carp> I<unless> it is the parser who
called the handler. The second line will print backtrace and die if
C<Carp> was available. The third line will be executed only if C<Carp> was
not available.
Having to even think about the C<$^S> variable in your exception
handlers is simply wrong. C<$SIG{__DIE__}> as currently implemented
invites grievous and difficult to track down errors. Avoid it
and use an C<END{}> or CORE::GLOBAL::die override instead.
See L<perlfunc/die>, L<perlfunc/warn>, L<perlfunc/eval>, and
L<warnings> for additional information.
=item $BASETIME
=item $^T
X<$^T> X<$BASETIME>
The time at which the program began running, in seconds since the
epoch (beginning of 1970). The values returned by the B<-M>, B<-A>,
and B<-C> filetests are based on this value.
=item $PERL_VERSION
=item $^V
X<$^V> X<$PERL_VERSION>
The revision, version, and subversion of the Perl interpreter,
represented as a L<version> object.
This variable first appeared in perl v5.6.0; earlier versions of perl
will see an undefined value. Before perl v5.10.0 C<$^V> was represented
as a v-string rather than a L<version> object.
C<$^V> can be used to determine whether the Perl interpreter executing
a script is in the right range of versions. For example:
warn "Hashes not randomized!\n" if !$^V or $^V lt v5.8.1
While version objects overload stringification, to portably convert
C<$^V> into its string representation, use C<sprintf()>'s C<"%vd">
conversion, which works for both v-strings or version objects:
printf "version is v%vd\n", $^V; # Perl's version
See the documentation of C<use VERSION> and C<require VERSION>
for a convenient way to fail if the running Perl interpreter is too old.
See also C<L</$]>> for a decimal representation of the Perl version.
The main advantage of C<$^V> over C<$]> is that, for Perl v5.10.0 or
later, it overloads operators, allowing easy comparison against other
version representations (e.g. decimal, literal v-string, "v1.2.3", or
objects). The disadvantage is that prior to v5.10.0, it was only a
literal v-string, which can't be easily printed or compared, whereas
the behavior of C<$]> is unchanged on all versions of Perl.
Mnemonic: use ^V for a version object.
=item ${^WIN32_SLOPPY_STAT}
X<${^WIN32_SLOPPY_STAT}> X<sitecustomize> X<sitecustomize.pl>
If this variable is set to a true value, then C<stat()> on Windows will
not try to open the file. This means that the link count cannot be
determined and file attributes may be out of date if additional
hardlinks to the file exist. On the other hand, not opening the file
is considerably faster, especially for files on network drives.
This variable could be set in the F<sitecustomize.pl> file to
configure the local Perl installation to use "sloppy" C<stat()> by
default. See the documentation for B<-f> in
L<perlrun|perlrun/"Command Switches"> for more information about site
customization.
This variable was added in Perl v5.10.0.
=item $EXECUTABLE_NAME
=item $^X
X<$^X> X<$EXECUTABLE_NAME>
The name used to execute the current copy of Perl, from C's
C<argv[0]> or (where supported) F</proc/self/exe>.
Depending on the host operating system, the value of C<$^X> may be
a relative or absolute pathname of the perl program file, or may
be the string used to invoke perl but not the pathname of the
perl program file. Also, most operating systems permit invoking
programs that are not in the PATH environment variable, so there
is no guarantee that the value of C<$^X> is in PATH. For VMS, the
value may or may not include a version number.
You usually can use the value of C<$^X> to re-invoke an independent
copy of the same perl that is currently running, e.g.,
@first_run = `$^X -le "print int rand 100 for 1..100"`;
But recall that not all operating systems support forking or
capturing of the output of commands, so this complex statement
may not be portable.
It is not safe to use the value of C<$^X> as a path name of a file,
as some operating systems that have a mandatory suffix on
executable files do not require use of the suffix when invoking
a command. To convert the value of C<$^X> to a path name, use the
following statements:
# Build up a set of file names (not command names).
use Config;
my $this_perl = $^X;
if ($^O ne 'VMS') {
$this_perl .= $Config{_exe}
unless $this_perl =~ m/$Config{_exe}$/i;
}
Because many operating systems permit anyone with read access to
the Perl program file to make a copy of it, patch the copy, and
then execute the copy, the security-conscious Perl programmer
should take care to invoke the installed copy of perl, not the
copy referenced by C<$^X>. The following statements accomplish
this goal, and produce a pathname that can be invoked as a
command or referenced as a file.
use Config;
my $secure_perl_path = $Config{perlpath};
if ($^O ne 'VMS') {
$secure_perl_path .= $Config{_exe}
unless $secure_perl_path =~ m/$Config{_exe}$/i;
}
=back
=head2 Variables related to regular expressions
Most of the special variables related to regular expressions are side
effects. Perl sets these variables when it has a successful match, so
you should check the match result before using them. For instance:
if( /P(A)TT(ER)N/ ) {
print "I found $1 and $2\n";
}
These variables are read-only and dynamically-scoped, unless we note
otherwise.
The dynamic nature of the regular expression variables means that
their value is limited to the block that they are in, as demonstrated
by this bit of code:
my $outer = 'Wallace and Grommit';
my $inner = 'Mutt and Jeff';
my $pattern = qr/(\S+) and (\S+)/;
sub show_n { print "\$1 is $1; \$2 is $2\n" }
{
OUTER:
show_n() if $outer =~ m/$pattern/;
INNER: {
show_n() if $inner =~ m/$pattern/;
}
show_n();
}
The output shows that while in the C<OUTER> block, the values of C<$1>
and C<$2> are from the match against C<$outer>. Inside the C<INNER>
block, the values of C<$1> and C<$2> are from the match against
C<$inner>, but only until the end of the block (i.e. the dynamic
scope). After the C<INNER> block completes, the values of C<$1> and
C<$2> return to the values for the match against C<$outer> even though
we have not made another match:
$1 is Wallace; $2 is Grommit
$1 is Mutt; $2 is Jeff
$1 is Wallace; $2 is Grommit
=head3 Performance issues
Traditionally in Perl, any use of any of the three variables C<$`>, C<$&>
or C<$'> (or their C<use English> equivalents) anywhere in the code, caused
all subsequent successful pattern matches to make a copy of the matched
string, in case the code might subsequently access one of those variables.
This imposed a considerable performance penalty across the whole program,
so generally the use of these variables has been discouraged.
In Perl 5.6.0 the C<@-> and C<@+> dynamic arrays were introduced that
supply the indices of successful matches. So you could for example do
this:
$str =~ /pattern/;
print $`, $&, $'; # bad: perfomance hit
print # good: no perfomance hit
substr($str, 0, $-[0]),
substr($str, $-[0], $+[0]-$-[0]),
substr($str, $+[0]);
In Perl 5.10.0 the C</p> match operator flag and the C<${^PREMATCH}>,
C<${^MATCH}>, and C<${^POSTMATCH}> variables were introduced, that allowed
you to suffer the penalties only on patterns marked with C</p>.
In Perl 5.18.0 onwards, perl started noting the presence of each of the
three variables separately, and only copied that part of the string
required; so in
$`; $&; "abcdefgh" =~ /d/
perl would only copy the "abcd" part of the string. That could make a big
difference in something like
$str = 'x' x 1_000_000;
$&; # whoops
$str =~ /x/g # one char copied a million times, not a million chars
In Perl 5.20.0 a new copy-on-write system was enabled by default, which
finally fixes all performance issues with these three variables, and makes
them safe to use anywhere.
The C<Devel::NYTProf> and C<Devel::FindAmpersand> modules can help you
find uses of these problematic match variables in your code.
=over 8
=item $<I<digits>> ($1, $2, ...)
X<$1> X<$2> X<$3> X<$I<digits>>
Contains the subpattern from the corresponding set of capturing
parentheses from the last successful pattern match, not counting patterns
matched in nested blocks that have been exited already.
Note there is a distinction between a capture buffer which matches
the empty string a capture buffer which is optional. Eg, C<(x?)> and
C<(x)?> The latter may be undef, the former not.
These variables are read-only and dynamically-scoped.
Mnemonic: like \digits.
=item @{^CAPTURE}
X<@{^CAPTURE}> X<@^CAPTURE>
An array which exposes the contents of the capture buffers, if any, of
the last successful pattern match, not counting patterns matched
in nested blocks that have been exited already.
Note that the 0 index of @{^CAPTURE} is equivalent to $1, the 1 index
is equivalent to $2, etc.
if ("foal"=~/(.)(.)(.)(.)/) {
print join "-", @{^CAPTURE};
}
should output "f-o-a-l".
See also L</$I<digits>>, L</%{^CAPTURE}> and L</%{^CAPTURE_ALL}>.
Note that unlike most other regex magic variables there is no single
letter equivalent to C<@{^CAPTURE}>.
This variable was added in 5.25.7
=item $MATCH
=item $&
X<$&> X<$MATCH>
The string matched by the last successful pattern match (not counting
any matches hidden within a BLOCK or C<eval()> enclosed by the current
BLOCK).
See L</Performance issues> above for the serious performance implications
of using this variable (even once) in your code.
This variable is read-only and dynamically-scoped.
Mnemonic: like C<&> in some editors.
=item ${^MATCH}
X<${^MATCH}>
This is similar to C<$&> (C<$MATCH>) except that it does not incur the
performance penalty associated with that variable.
See L</Performance issues> above.
In Perl v5.18 and earlier, it is only guaranteed
to return a defined value when the pattern was compiled or executed with
the C</p> modifier. In Perl v5.20, the C</p> modifier does nothing, so
C<${^MATCH}> does the same thing as C<$MATCH>.
This variable was added in Perl v5.10.0.
This variable is read-only and dynamically-scoped.
=item $PREMATCH
=item $`
X<$`> X<$PREMATCH> X<${^PREMATCH}>
The string preceding whatever was matched by the last successful
pattern match, not counting any matches hidden within a BLOCK or C<eval>
enclosed by the current BLOCK.
See L</Performance issues> above for the serious performance implications
of using this variable (even once) in your code.
This variable is read-only and dynamically-scoped.
Mnemonic: C<`> often precedes a quoted string.
=item ${^PREMATCH}
X<$`> X<${^PREMATCH}>
This is similar to C<$`> ($PREMATCH) except that it does not incur the
performance penalty associated with that variable.
See L</Performance issues> above.
In Perl v5.18 and earlier, it is only guaranteed
to return a defined value when the pattern was compiled or executed with
the C</p> modifier. In Perl v5.20, the C</p> modifier does nothing, so
C<${^PREMATCH}> does the same thing as C<$PREMATCH>.
This variable was added in Perl v5.10.0.
This variable is read-only and dynamically-scoped.
=item $POSTMATCH
=item $'
X<$'> X<$POSTMATCH> X<${^POSTMATCH}> X<@->
The string following whatever was matched by the last successful
pattern match (not counting any matches hidden within a BLOCK or C<eval()>
enclosed by the current BLOCK). Example:
local $_ = 'abcdefghi';
/def/;
print "$`:$&:$'\n"; # prints abc:def:ghi
See L</Performance issues> above for the serious performance implications
of using this variable (even once) in your code.
This variable is read-only and dynamically-scoped.
Mnemonic: C<'> often follows a quoted string.
=item ${^POSTMATCH}
X<${^POSTMATCH}> X<$'> X<$POSTMATCH>
This is similar to C<$'> (C<$POSTMATCH>) except that it does not incur the
performance penalty associated with that variable.
See L</Performance issues> above.
In Perl v5.18 and earlier, it is only guaranteed
to return a defined value when the pattern was compiled or executed with
the C</p> modifier. In Perl v5.20, the C</p> modifier does nothing, so
C<${^POSTMATCH}> does the same thing as C<$POSTMATCH>.
This variable was added in Perl v5.10.0.
This variable is read-only and dynamically-scoped.
=item $LAST_PAREN_MATCH
=item $+
X<$+> X<$LAST_PAREN_MATCH>
The text matched by the last bracket of the last successful search pattern.
This is useful if you don't know which one of a set of alternative patterns
matched. For example:
/Version: (.*)|Revision: (.*)/ && ($rev = $+);
This variable is read-only and dynamically-scoped.
Mnemonic: be positive and forward looking.
=item $LAST_SUBMATCH_RESULT
=item $^N
X<$^N> X<$LAST_SUBMATCH_RESULT>
The text matched by the used group most-recently closed (i.e. the group
with the rightmost closing parenthesis) of the last successful search
pattern.
This is primarily used inside C<(?{...})> blocks for examining text
recently matched. For example, to effectively capture text to a variable
(in addition to C<$1>, C<$2>, etc.), replace C<(...)> with
(?:(...)(?{ $var = $^N }))
By setting and then using C<$var> in this way relieves you from having to
worry about exactly which numbered set of parentheses they are.
This variable was added in Perl v5.8.0.
Mnemonic: the (possibly) Nested parenthesis that most recently closed.
=item @LAST_MATCH_END
=item @+
X<@+> X<@LAST_MATCH_END>
This array holds the offsets of the ends of the last successful
submatches in the currently active dynamic scope. C<$+[0]> is
the offset into the string of the end of the entire match. This
is the same value as what the C<pos> function returns when called
on the variable that was matched against. The I<n>th element
of this array holds the offset of the I<n>th submatch, so
C<$+[1]> is the offset past where C<$1> ends, C<$+[2]> the offset
past where C<$2> ends, and so on. You can use C<$#+> to determine
how many subgroups were in the last successful match. See the
examples given for the C<@-> variable.
This variable was added in Perl v5.6.0.
=item %{^CAPTURE}
=item %LAST_PAREN_MATCH
=item %+
X<%+> X<%LAST_PAREN_MATCH> X<%{^CAPTURE}>
Similar to C<@+>, the C<%+> hash allows access to the named capture
buffers, should they exist, in the last successful match in the
currently active dynamic scope.
For example, C<$+{foo}> is equivalent to C<$1> after the following match:
'foo' =~ /(?<foo>foo)/;
The keys of the C<%+> hash list only the names of buffers that have
captured (and that are thus associated to defined values).
The underlying behaviour of C<%+> is provided by the
L<Tie::Hash::NamedCapture> module.
B<Note:> C<%-> and C<%+> are tied views into a common internal hash
associated with the last successful regular expression. Therefore mixing
iterative access to them via C<each> may have unpredictable results.
Likewise, if the last successful match changes, then the results may be
surprising.
This variable was added in Perl v5.10.0. The C<%{^CAPTURE}> alias was
added in 5.25.7.
This variable is read-only and dynamically-scoped.
=item @LAST_MATCH_START
=item @-
X<@-> X<@LAST_MATCH_START>
C<$-[0]> is the offset of the start of the last successful match.
C<$-[>I<n>C<]> is the offset of the start of the substring matched by
I<n>-th subpattern, or undef if the subpattern did not match.
Thus, after a match against C<$_>, C<$&> coincides with C<substr $_, $-[0],
$+[0] - $-[0]>. Similarly, $I<n> coincides with C<substr $_, $-[n],
$+[n] - $-[n]> if C<$-[n]> is defined, and $+ coincides with
C<substr $_, $-[$#-], $+[$#-] - $-[$#-]>. One can use C<$#-> to find the
last matched subgroup in the last successful match. Contrast with
C<$#+>, the number of subgroups in the regular expression. Compare
with C<@+>.
This array holds the offsets of the beginnings of the last
successful submatches in the currently active dynamic scope.
C<$-[0]> is the offset into the string of the beginning of the
entire match. The I<n>th element of this array holds the offset
of the I<n>th submatch, so C<$-[1]> is the offset where C<$1>
begins, C<$-[2]> the offset where C<$2> begins, and so on.
After a match against some variable C<$var>:
=over 5
=item C<$`> is the same as C<substr($var, 0, $-[0])>
=item C<$&> is the same as C<substr($var, $-[0], $+[0] - $-[0])>
=item C<$'> is the same as C<substr($var, $+[0])>
=item C<$1> is the same as C<substr($var, $-[1], $+[1] - $-[1])>
=item C<$2> is the same as C<substr($var, $-[2], $+[2] - $-[2])>
=item C<$3> is the same as C<substr($var, $-[3], $+[3] - $-[3])>
=back
This variable was added in Perl v5.6.0.
=item %{^CAPTURE_ALL}
X<%{^CAPTURE_ALL}>
=item %-
X<%->
Similar to C<%+>, this variable allows access to the named capture groups
in the last successful match in the currently active dynamic scope. To
each capture group name found in the regular expression, it associates a
reference to an array containing the list of values captured by all
buffers with that name (should there be several of them), in the order
where they appear.
Here's an example:
if ('1234' =~ /(?<A>1)(?<B>2)(?<A>3)(?<B>4)/) {
foreach my $bufname (sort keys %-) {
my $ary = $-{$bufname};
foreach my $idx (0..$#$ary) {
print "\$-{$bufname}[$idx] : ",
(defined($ary->[$idx])
? "'$ary->[$idx]'"
: "undef"),
"\n";
}
}
}
would print out:
$-{A}[0] : '1'
$-{A}[1] : '3'
$-{B}[0] : '2'
$-{B}[1] : '4'
The keys of the C<%-> hash correspond to all buffer names found in
the regular expression.
The behaviour of C<%-> is implemented via the
L<Tie::Hash::NamedCapture> module.
B<Note:> C<%-> and C<%+> are tied views into a common internal hash
associated with the last successful regular expression. Therefore mixing
iterative access to them via C<each> may have unpredictable results.
Likewise, if the last successful match changes, then the results may be
surprising.
This variable was added in Perl v5.10.0. The C<%{^CAPTURE_ALL}> alias was
added in 5.25.7.
This variable is read-only and dynamically-scoped.
=item $LAST_REGEXP_CODE_RESULT
=item $^R
X<$^R> X<$LAST_REGEXP_CODE_RESULT>
The result of evaluation of the last successful C<(?{ code })>
regular expression assertion (see L<perlre>). May be written to.
This variable was added in Perl 5.005.
=item ${^RE_DEBUG_FLAGS}
X<${^RE_DEBUG_FLAGS}>
The current value of the regex debugging flags. Set to 0 for no debug output
even when the C<re 'debug'> module is loaded. See L<re> for details.
This variable was added in Perl v5.10.0.
=item ${^RE_TRIE_MAXBUF}
X<${^RE_TRIE_MAXBUF}>
Controls how certain regex optimisations are applied and how much memory they
utilize. This value by default is 65536 which corresponds to a 512kB
temporary cache. Set this to a higher value to trade
memory for speed when matching large alternations. Set
it to a lower value if you want the optimisations to
be as conservative of memory as possible but still occur, and set it to a
negative value to prevent the optimisation and conserve the most memory.
Under normal situations this variable should be of no interest to you.
This variable was added in Perl v5.10.0.
=back
=head2 Variables related to filehandles
Variables that depend on the currently selected filehandle may be set
by calling an appropriate object method on the C<IO::Handle> object,
although this is less efficient than using the regular built-in
variables. (Summary lines below for this contain the word HANDLE.)
First you must say
use IO::Handle;
after which you may use either
method HANDLE EXPR
or more safely,
HANDLE->method(EXPR)
Each method returns the old value of the C<IO::Handle> attribute. The
methods each take an optional EXPR, which, if supplied, specifies the
new value for the C<IO::Handle> attribute in question. If not
supplied, most methods do nothing to the current value--except for
C<autoflush()>, which will assume a 1 for you, just to be different.
Because loading in the C<IO::Handle> class is an expensive operation,
you should learn how to use the regular built-in variables.
A few of these variables are considered "read-only". This means that
if you try to assign to this variable, either directly or indirectly
through a reference, you'll raise a run-time exception.
You should be very careful when modifying the default values of most
special variables described in this document. In most cases you want
to localize these variables before changing them, since if you don't,
the change may affect other modules which rely on the default values
of the special variables that you have changed. This is one of the
correct ways to read the whole file at once:
open my $fh, "<", "foo" or die $!;
local $/; # enable localized slurp mode
my $content = <$fh>;
close $fh;
But the following code is quite bad:
open my $fh, "<", "foo" or die $!;
undef $/; # enable slurp mode
my $content = <$fh>;
close $fh;
since some other module, may want to read data from some file in the
default "line mode", so if the code we have just presented has been
executed, the global value of C<$/> is now changed for any other code
running inside the same Perl interpreter.
Usually when a variable is localized you want to make sure that this
change affects the shortest scope possible. So unless you are already
inside some short C<{}> block, you should create one yourself. For
example:
my $content = '';
open my $fh, "<", "foo" or die $!;
{
local $/;
$content = <$fh>;
}
close $fh;
Here is an example of how your own code can go broken:
for ( 1..3 ){
$\ = "\r\n";
nasty_break();
print "$_";
}
sub nasty_break {
$\ = "\f";
# do something with $_
}
You probably expect this code to print the equivalent of
"1\r\n2\r\n3\r\n"
but instead you get:
"1\f2\f3\f"
Why? Because C<nasty_break()> modifies C<$\> without localizing it
first. The value you set in C<nasty_break()> is still there when you
return. The fix is to add C<local()> so the value doesn't leak out of
C<nasty_break()>:
local $\ = "\f";
It's easy to notice the problem in such a short example, but in more
complicated code you are looking for trouble if you don't localize
changes to the special variables.
=over 8
=item $ARGV
X<$ARGV>
Contains the name of the current file when reading from C<< <> >>.
=item @ARGV
X<@ARGV>
The array C<@ARGV> contains the command-line arguments intended for
the script. C<$#ARGV> is generally the number of arguments minus
one, because C<$ARGV[0]> is the first argument, I<not> the program's
command name itself. See L</$0> for the command name.
=item ARGV
X<ARGV>
The special filehandle that iterates over command-line filenames in
C<@ARGV>. Usually written as the null filehandle in the angle operator
C<< <> >>. Note that currently C<ARGV> only has its magical effect
within the C<< <> >> operator; elsewhere it is just a plain filehandle
corresponding to the last file opened by C<< <> >>. In particular,
passing C<\*ARGV> as a parameter to a function that expects a filehandle
may not cause your function to automatically read the contents of all the
files in C<@ARGV>.
=item ARGVOUT
X<ARGVOUT>
The special filehandle that points to the currently open output file
when doing edit-in-place processing with B<-i>. Useful when you have
to do a lot of inserting and don't want to keep modifying C<$_>. See
L<perlrun> for the B<-i> switch.
=item IO::Handle->output_field_separator( EXPR )
=item $OUTPUT_FIELD_SEPARATOR
=item $OFS
=item $,
X<$,> X<$OFS> X<$OUTPUT_FIELD_SEPARATOR>
The output field separator for the print operator. If defined, this
value is printed between each of print's arguments. Default is C<undef>.
You cannot call C<output_field_separator()> on a handle, only as a
static method. See L<IO::Handle|IO::Handle>.
Mnemonic: what is printed when there is a "," in your print statement.
=item HANDLE->input_line_number( EXPR )
=item $INPUT_LINE_NUMBER
=item $NR
=item $.
X<$.> X<$NR> X<$INPUT_LINE_NUMBER> X<line number>
Current line number for the last filehandle accessed.
Each filehandle in Perl counts the number of lines that have been read
from it. (Depending on the value of C<$/>, Perl's idea of what
constitutes a line may not match yours.) When a line is read from a
filehandle (via C<readline()> or C<< <> >>), or when C<tell()> or
C<seek()> is called on it, C<$.> becomes an alias to the line counter
for that filehandle.
You can adjust the counter by assigning to C<$.>, but this will not
actually move the seek pointer. I<Localizing C<$.> will not localize
the filehandle's line count>. Instead, it will localize perl's notion
of which filehandle C<$.> is currently aliased to.
C<$.> is reset when the filehandle is closed, but B<not> when an open
filehandle is reopened without an intervening C<close()>. For more
details, see L<perlop/"IE<sol>O Operators">. Because C<< <> >> never does
an explicit close, line numbers increase across C<ARGV> files (but see
examples in L<perlfunc/eof>).
You can also use C<< HANDLE->input_line_number(EXPR) >> to access the
line counter for a given filehandle without having to worry about
which handle you last accessed.
Mnemonic: many programs use "." to mean the current line number.
=item IO::Handle->input_record_separator( EXPR )
=item $INPUT_RECORD_SEPARATOR
=item $RS
=item $/
X<$/> X<$RS> X<$INPUT_RECORD_SEPARATOR>
The input record separator, newline by default. This influences Perl's
idea of what a "line" is. Works like B<awk>'s RS variable, including
treating empty lines as a terminator if set to the null string (an
empty line cannot contain any spaces or tabs). You may set it to a
multi-character string to match a multi-character terminator, or to
C<undef> to read through the end of file. Setting it to C<"\n\n">
means something slightly different than setting to C<"">, if the file
contains consecutive empty lines. Setting to C<""> will treat two or
more consecutive empty lines as a single empty line. Setting to
C<"\n\n"> will blindly assume that the next input character belongs to
the next paragraph, even if it's a newline.
local $/; # enable "slurp" mode
local $_ = <FH>; # whole file now here
s/\n[ \t]+/ /g;
Remember: the value of C<$/> is a string, not a regex. B<awk> has to
be better for something. :-)
Setting C<$/> to a reference to an integer, scalar containing an
integer, or scalar that's convertible to an integer will attempt to
read records instead of lines, with the maximum record size being the
referenced integer number of characters. So this:
local $/ = \32768; # or \"32768", or \$var_containing_32768
open my $fh, "<", $myfile or die $!;
local $_ = <$fh>;
will read a record of no more than 32768 characters from $fh. If you're
not reading from a record-oriented file (or your OS doesn't have
record-oriented files), then you'll likely get a full chunk of data
with every read. If a record is larger than the record size you've
set, you'll get the record back in pieces. Trying to set the record
size to zero or less is deprecated and will cause $/ to have the value
of "undef", which will cause reading in the (rest of the) whole file.
As of 5.19.9 setting C<$/> to any other form of reference will throw a
fatal exception. This is in preparation for supporting new ways to set
C<$/> in the future.
On VMS only, record reads bypass PerlIO layers and any associated
buffering, so you must not mix record and non-record reads on the
same filehandle. Record mode mixes with line mode only when the
same buffering layer is in use for both modes.
You cannot call C<input_record_separator()> on a handle, only as a
static method. See L<IO::Handle|IO::Handle>.
See also L<perlport/"Newlines">. Also see L</$.>.
Mnemonic: / delimits line boundaries when quoting poetry.
=item IO::Handle->output_record_separator( EXPR )
=item $OUTPUT_RECORD_SEPARATOR
=item $ORS
=item $\
X<$\> X<$ORS> X<$OUTPUT_RECORD_SEPARATOR>
The output record separator for the print operator. If defined, this
value is printed after the last of print's arguments. Default is C<undef>.
You cannot call C<output_record_separator()> on a handle, only as a
static method. See L<IO::Handle|IO::Handle>.
Mnemonic: you set C<$\> instead of adding "\n" at the end of the print.
Also, it's just like C<$/>, but it's what you get "back" from Perl.
=item HANDLE->autoflush( EXPR )
=item $OUTPUT_AUTOFLUSH
=item $|
X<$|> X<autoflush> X<flush> X<$OUTPUT_AUTOFLUSH>
If set to nonzero, forces a flush right away and after every write or
print on the currently selected output channel. Default is 0
(regardless of whether the channel is really buffered by the system or
not; C<$|> tells you only whether you've asked Perl explicitly to
flush after each write). STDOUT will typically be line buffered if
output is to the terminal and block buffered otherwise. Setting this
variable is useful primarily when you are outputting to a pipe or
socket, such as when you are running a Perl program under B<rsh> and
want to see the output as it's happening. This has no effect on input
buffering. See L<perlfunc/getc> for that. See L<perlfunc/select> on
how to select the output channel. See also L<IO::Handle>.
Mnemonic: when you want your pipes to be piping hot.
=item ${^LAST_FH}
X<${^LAST_FH}>
This read-only variable contains a reference to the last-read filehandle.
This is set by C<< <HANDLE> >>, C<readline>, C<tell>, C<eof> and C<seek>.
This is the same handle that C<$.> and C<tell> and C<eof> without arguments
use. It is also the handle used when Perl appends ", <STDIN> line 1" to
an error or warning message.
This variable was added in Perl v5.18.0.
=back
=head3 Variables related to formats
The special variables for formats are a subset of those for
filehandles. See L<perlform> for more information about Perl's
formats.
=over 8
=item $ACCUMULATOR
=item $^A
X<$^A> X<$ACCUMULATOR>
The current value of the C<write()> accumulator for C<format()> lines.
A format contains C<formline()> calls that put their result into
C<$^A>. After calling its format, C<write()> prints out the contents
of C<$^A> and empties. So you never really see the contents of C<$^A>
unless you call C<formline()> yourself and then look at it. See
L<perlform> and L<perlfunc/"formline PICTURE,LIST">.
=item IO::Handle->format_formfeed(EXPR)
=item $FORMAT_FORMFEED
=item $^L
X<$^L> X<$FORMAT_FORMFEED>
What formats output as a form feed. The default is C<\f>.
You cannot call C<format_formfeed()> on a handle, only as a static
method. See L<IO::Handle|IO::Handle>.
=item HANDLE->format_page_number(EXPR)
=item $FORMAT_PAGE_NUMBER
=item $%
X<$%> X<$FORMAT_PAGE_NUMBER>
The current page number of the currently selected output channel.
Mnemonic: C<%> is page number in B<nroff>.
=item HANDLE->format_lines_left(EXPR)
=item $FORMAT_LINES_LEFT
=item $-
X<$-> X<$FORMAT_LINES_LEFT>
The number of lines left on the page of the currently selected output
channel.
Mnemonic: lines_on_page - lines_printed.
=item IO::Handle->format_line_break_characters EXPR
=item $FORMAT_LINE_BREAK_CHARACTERS
=item $:
X<$:> X<FORMAT_LINE_BREAK_CHARACTERS>
The current set of characters after which a string may be broken to
fill continuation fields (starting with C<^>) in a format. The default is
S<" \n-">, to break on a space, newline, or a hyphen.
You cannot call C<format_line_break_characters()> on a handle, only as
a static method. See L<IO::Handle|IO::Handle>.
Mnemonic: a "colon" in poetry is a part of a line.
=item HANDLE->format_lines_per_page(EXPR)
=item $FORMAT_LINES_PER_PAGE
=item $=
X<$=> X<$FORMAT_LINES_PER_PAGE>
The current page length (printable lines) of the currently selected
output channel. The default is 60.
Mnemonic: = has horizontal lines.
=item HANDLE->format_top_name(EXPR)
=item $FORMAT_TOP_NAME
=item $^
X<$^> X<$FORMAT_TOP_NAME>
The name of the current top-of-page format for the currently selected
output channel. The default is the name of the filehandle with C<_TOP>
appended. For example, the default format top name for the C<STDOUT>
filehandle is C<STDOUT_TOP>.
Mnemonic: points to top of page.
=item HANDLE->format_name(EXPR)
=item $FORMAT_NAME
=item $~
X<$~> X<$FORMAT_NAME>
The name of the current report format for the currently selected
output channel. The default format name is the same as the filehandle
name. For example, the default format name for the C<STDOUT>
filehandle is just C<STDOUT>.
Mnemonic: brother to C<$^>.
=back
=head2 Error Variables
X<error> X<exception>
The variables C<$@>, C<$!>, C<$^E>, and C<$?> contain information
about different types of error conditions that may appear during
execution of a Perl program. The variables are shown ordered by
the "distance" between the subsystem which reported the error and
the Perl process. They correspond to errors detected by the Perl
interpreter, C library, operating system, or an external program,
respectively.
To illustrate the differences between these variables, consider the
following Perl expression, which uses a single-quoted string. After
execution of this statement, perl may have set all four special error
variables:
eval q{
open my $pipe, "/cdrom/install |" or die $!;
my @res = <$pipe>;
close $pipe or die "bad pipe: $?, $!";
};
When perl executes the C<eval()> expression, it translates the
C<open()>, C<< <PIPE> >>, and C<close> calls in the C run-time library
and thence to the operating system kernel. perl sets C<$!> to
the C library's C<errno> if one of these calls fails.
C<$@> is set if the string to be C<eval>-ed did not compile (this may
happen if C<open> or C<close> were imported with bad prototypes), or
if Perl code executed during evaluation C<die()>d. In these cases the
value of C<$@> is the compile error, or the argument to C<die> (which
will interpolate C<$!> and C<$?>). (See also L<Fatal>, though.)
Under a few operating systems, C<$^E> may contain a more verbose error
indicator, such as in this case, "CDROM tray not closed." Systems that
do not support extended error messages leave C<$^E> the same as C<$!>.
Finally, C<$?> may be set to a non-0 value if the external program
F</cdrom/install> fails. The upper eight bits reflect specific error
conditions encountered by the program (the program's C<exit()> value).
The lower eight bits reflect mode of failure, like signal death and
core dump information. See L<wait(2)> for details. In contrast to
C<$!> and C<$^E>, which are set only if an error condition is detected,
the variable C<$?> is set on each C<wait> or pipe C<close>,
overwriting the old value. This is more like C<$@>, which on every
C<eval()> is always set on failure and cleared on success.
For more details, see the individual descriptions at C<$@>, C<$!>,
C<$^E>, and C<$?>.
=over 8
=item ${^CHILD_ERROR_NATIVE}
X<$^CHILD_ERROR_NATIVE>
The native status returned by the last pipe close, backtick (C<``>)
command, successful call to C<wait()> or C<waitpid()>, or from the
C<system()> operator. On POSIX-like systems this value can be decoded
with the WIFEXITED, WEXITSTATUS, WIFSIGNALED, WTERMSIG, WIFSTOPPED,
WSTOPSIG and WIFCONTINUED functions provided by the L<POSIX> module.
Under VMS this reflects the actual VMS exit status; i.e. it is the
same as C<$?> when the pragma C<use vmsish 'status'> is in effect.
This variable was added in Perl v5.10.0.
=item $EXTENDED_OS_ERROR
=item $^E
X<$^E> X<$EXTENDED_OS_ERROR>
Error information specific to the current operating system. At the
moment, this differs from C<L</$!>> under only VMS, OS/2, and Win32 (and
for MacPerl). On all other platforms, C<$^E> is always just the same
as C<$!>.
Under VMS, C<$^E> provides the VMS status value from the last system
error. This is more specific information about the last system error
than that provided by C<$!>. This is particularly important when C<$!>
is set to B<EVMSERR>.
Under OS/2, C<$^E> is set to the error code of the last call to OS/2
API either via CRT, or directly from perl.
Under Win32, C<$^E> always returns the last error information reported
by the Win32 call C<GetLastError()> which describes the last error
from within the Win32 API. Most Win32-specific code will report errors
via C<$^E>. ANSI C and Unix-like calls set C<errno> and so most
portable Perl code will report errors via C<$!>.
Caveats mentioned in the description of C<L</$!>> generally apply to
C<$^E>, also.
This variable was added in Perl 5.003.
Mnemonic: Extra error explanation.
=item $EXCEPTIONS_BEING_CAUGHT
=item $^S
X<$^S> X<$EXCEPTIONS_BEING_CAUGHT>
Current state of the interpreter.
$^S State
--------- -------------------------------------
undef Parsing module, eval, or main program
true (1) Executing an eval
false (0) Otherwise
The first state may happen in C<$SIG{__DIE__}> and C<$SIG{__WARN__}>
handlers.
The English name $EXCEPTIONS_BEING_CAUGHT is slightly misleading, because
the C<undef> value does not indicate whether exceptions are being caught,
since compilation of the main program does not catch exceptions.
This variable was added in Perl 5.004.
=item $WARNING
=item $^W
X<$^W> X<$WARNING>
The current value of the warning switch, initially true if B<-w> was
used, false otherwise, but directly modifiable.
See also L<warnings>.
Mnemonic: related to the B<-w> switch.
=item ${^WARNING_BITS}
X<${^WARNING_BITS}>
The current set of warning checks enabled by the C<use warnings> pragma.
It has the same scoping as the C<$^H> and C<%^H> variables. The exact
values are considered internal to the L<warnings> pragma and may change
between versions of Perl.
This variable was added in Perl v5.6.0.
=item $OS_ERROR
=item $ERRNO
=item $!
X<$!> X<$ERRNO> X<$OS_ERROR>
When referenced, C<$!> retrieves the current value
of the C C<errno> integer variable.
If C<$!> is assigned a numerical value, that value is stored in C<errno>.
When referenced as a string, C<$!> yields the system error string
corresponding to C<errno>.
Many system or library calls set C<errno> if they fail,
to indicate the cause of failure. They usually do B<not>
set C<errno> to zero if they succeed. This means C<errno>,
hence C<$!>, is meaningful only I<immediately> after a B<failure>:
if (open my $fh, "<", $filename) {
# Here $! is meaningless.
...
}
else {
# ONLY here is $! meaningful.
...
# Already here $! might be meaningless.
}
# Since here we might have either success or failure,
# $! is meaningless.
Here, I<meaningless> means that C<$!> may be unrelated to the outcome
of the C<open()> operator. Assignment to C<$!> is similarly ephemeral.
It can be used immediately before invoking the C<die()> operator,
to set the exit value, or to inspect the system error string
corresponding to error I<n>, or to restore C<$!> to a meaningful state.
Mnemonic: What just went bang?
=item %OS_ERROR
=item %ERRNO
=item %!
X<%!> X<%OS_ERROR> X<%ERRNO>
Each element of C<%!> has a true value only if C<$!> is set to that
value. For example, C<$!{ENOENT}> is true if and only if the current
value of C<$!> is C<ENOENT>; that is, if the most recent error was "No
such file or directory" (or its moral equivalent: not all operating
systems give that exact error, and certainly not all languages). The
specific true value is not guaranteed, but in the past has generally
been the numeric value of C<$!>. To check if a particular key is
meaningful on your system, use C<exists $!{the_key}>; for a list of legal
keys, use C<keys %!>. See L<Errno> for more information, and also see
L</$!>.
This variable was added in Perl 5.005.
=item $CHILD_ERROR
=item $?
X<$?> X<$CHILD_ERROR>
The status returned by the last pipe close, backtick (C<``>) command,
successful call to C<wait()> or C<waitpid()>, or from the C<system()>
operator. This is just the 16-bit status word returned by the
traditional Unix C<wait()> system call (or else is made up to look
like it). Thus, the exit value of the subprocess is really (C<<< $? >>
8 >>>), and C<$? & 127> gives which signal, if any, the process died
from, and C<$? & 128> reports whether there was a core dump.
Additionally, if the C<h_errno> variable is supported in C, its value
is returned via C<$?> if any C<gethost*()> function fails.
If you have installed a signal handler for C<SIGCHLD>, the
value of C<$?> will usually be wrong outside that handler.
Inside an C<END> subroutine C<$?> contains the value that is going to be
given to C<exit()>. You can modify C<$?> in an C<END> subroutine to
change the exit status of your program. For example:
END {
$? = 1 if $? == 255; # die would make it 255
}
Under VMS, the pragma C<use vmsish 'status'> makes C<$?> reflect the
actual VMS exit status, instead of the default emulation of POSIX
status; see L<perlvms/$?> for details.
Mnemonic: similar to B<sh> and B<ksh>.
=item $EVAL_ERROR
=item $@
X<$@> X<$EVAL_ERROR>
The Perl error from the last C<eval> operator, i.e. the last exception that
was caught. For C<eval BLOCK>, this is either a runtime error message or the
string or reference C<die> was called with. The C<eval STRING> form also
catches syntax errors and other compile time exceptions.
If no error occurs, C<eval> sets C<$@> to the empty string.
Warning messages are not collected in this variable. You can, however,
set up a routine to process warnings by setting C<$SIG{__WARN__}> as
described in L</%SIG>.
Mnemonic: Where was the error "at"?
=back
=head2 Variables related to the interpreter state
These variables provide information about the current interpreter state.
=over 8
=item $COMPILING
=item $^C
X<$^C> X<$COMPILING>
The current value of the flag associated with the B<-c> switch.
Mainly of use with B<-MO=...> to allow code to alter its behavior
when being compiled, such as for example to C<AUTOLOAD> at compile
time rather than normal, deferred loading. Setting
C<$^C = 1> is similar to calling C<B::minus_c>.
This variable was added in Perl v5.6.0.
=item $DEBUGGING
=item $^D
X<$^D> X<$DEBUGGING>
The current value of the debugging flags. May be read or set. Like its
L<command-line equivalent|perlrun/B<-D>I<letters>>, you can use numeric
or symbolic values, e.g. C<$^D = 10> or C<$^D = "st">. See
L<perlrun/B<-D>I<number>>. The contents of this variable also affects the
debugger operation. See L<perldebguts/Debugger Internals>.
Mnemonic: value of B<-D> switch.
=item ${^ENCODING}
X<${^ENCODING}>
This variable is no longer supported.
It used to hold the I<object reference> to the C<Encode> object that was
used to convert the source code to Unicode.
Its purpose was to allow your non-ASCII Perl
scripts not to have to be written in UTF-8; this was
useful before editors that worked on UTF-8 encoded text were common, but
that was long ago. It caused problems, such as affecting the operation
of other modules that weren't expecting it, causing general mayhem.
If you need something like this functionality, it is recommended that use
you a simple source filter, such as L<Filter::Encoding>.
If you are coming here because code of yours is being adversely affected
by someone's use of this variable, you can usually work around it by
doing this:
local ${^ENCODING};
near the beginning of the functions that are getting broken. This
undefines the variable during the scope of execution of the including
function.
This variable was added in Perl 5.8.2 and removed in 5.26.0.
=item ${^GLOBAL_PHASE}
X<${^GLOBAL_PHASE}>
The current phase of the perl interpreter.
Possible values are:
=over 8
=item CONSTRUCT
The C<PerlInterpreter*> is being constructed via C<perl_construct>. This
value is mostly there for completeness and for use via the
underlying C variable C<PL_phase>. It's not really possible for Perl
code to be executed unless construction of the interpreter is
finished.
=item START
This is the global compile-time. That includes, basically, every
C<BEGIN> block executed directly or indirectly from during the
compile-time of the top-level program.
This phase is not called "BEGIN" to avoid confusion with
C<BEGIN>-blocks, as those are executed during compile-time of any
compilation unit, not just the top-level program. A new, localised
compile-time entered at run-time, for example by constructs as
C<eval "use SomeModule"> are not global interpreter phases, and
therefore aren't reflected by C<${^GLOBAL_PHASE}>.
=item CHECK
Execution of any C<CHECK> blocks.
=item INIT
Similar to "CHECK", but for C<INIT>-blocks, not C<CHECK> blocks.
=item RUN
The main run-time, i.e. the execution of C<PL_main_root>.
=item END
Execution of any C<END> blocks.
=item DESTRUCT
Global destruction.
=back
Also note that there's no value for UNITCHECK-blocks. That's because
those are run for each compilation unit individually, and therefore is
not a global interpreter phase.
Not every program has to go through each of the possible phases, but
transition from one phase to another can only happen in the order
described in the above list.
An example of all of the phases Perl code can see:
BEGIN { print "compile-time: ${^GLOBAL_PHASE}\n" }
INIT { print "init-time: ${^GLOBAL_PHASE}\n" }
CHECK { print "check-time: ${^GLOBAL_PHASE}\n" }
{
package Print::Phase;
sub new {
my ($class, $time) = @_;
return bless \$time, $class;
}
sub DESTROY {
my $self = shift;
print "$$self: ${^GLOBAL_PHASE}\n";
}
}
print "run-time: ${^GLOBAL_PHASE}\n";
my $runtime = Print::Phase->new(
"lexical variables are garbage collected before END"
);
END { print "end-time: ${^GLOBAL_PHASE}\n" }
our $destruct = Print::Phase->new(
"package variables are garbage collected after END"
);
This will print out
compile-time: START
check-time: CHECK
init-time: INIT
run-time: RUN
lexical variables are garbage collected before END: RUN
end-time: END
package variables are garbage collected after END: DESTRUCT
This variable was added in Perl 5.14.0.
=item $^H
X<$^H>
WARNING: This variable is strictly for
internal use only. Its availability,
behavior, and contents are subject to change without notice.
This variable contains compile-time hints for the Perl interpreter. At the
end of compilation of a BLOCK the value of this variable is restored to the
value when the interpreter started to compile the BLOCK.
When perl begins to parse any block construct that provides a lexical scope
(e.g., eval body, required file, subroutine body, loop body, or conditional
block), the existing value of C<$^H> is saved, but its value is left unchanged.
When the compilation of the block is completed, it regains the saved value.
Between the points where its value is saved and restored, code that
executes within BEGIN blocks is free to change the value of C<$^H>.
This behavior provides the semantic of lexical scoping, and is used in,
for instance, the C<use strict> pragma.
The contents should be an integer; different bits of it are used for
different pragmatic flags. Here's an example:
sub add_100 { $^H |= 0x100 }
sub foo {
BEGIN { add_100() }
bar->baz($boon);
}
Consider what happens during execution of the BEGIN block. At this point
the BEGIN block has already been compiled, but the body of C<foo()> is still
being compiled. The new value of C<$^H>
will therefore be visible only while
the body of C<foo()> is being compiled.
Substitution of C<BEGIN { add_100() }> block with:
BEGIN { require strict; strict->import('vars') }
demonstrates how C<use strict 'vars'> is implemented. Here's a conditional
version of the same lexical pragma:
BEGIN {
require strict; strict->import('vars') if $condition
}
This variable was added in Perl 5.003.
=item %^H
X<%^H>
The C<%^H> hash provides the same scoping semantic as C<$^H>. This makes
it useful for implementation of lexically scoped pragmas. See
L<perlpragma>. All the entries are stringified when accessed at
runtime, so only simple values can be accommodated. This means no
pointers to objects, for example.
When putting items into C<%^H>, in order to avoid conflicting with other
users of the hash there is a convention regarding which keys to use.
A module should use only keys that begin with the module's name (the
name of its main package) and a "/" character. For example, a module
C<Foo::Bar> should use keys such as C<Foo::Bar/baz>.
This variable was added in Perl v5.6.0.
=item ${^OPEN}
X<${^OPEN}>
An internal variable used by PerlIO. A string in two parts, separated
by a C<\0> byte, the first part describes the input layers, the second
part describes the output layers.
This variable was added in Perl v5.8.0.
=item $PERLDB
=item $^P
X<$^P> X<$PERLDB>
The internal variable for debugging support. The meanings of the
various bits are subject to change, but currently indicate:
=over 6
=item 0x01
Debug subroutine enter/exit.
=item 0x02
Line-by-line debugging. Causes C<DB::DB()> subroutine to be called for
each statement executed. Also causes saving source code lines (like
0x400).
=it