- 論壇徽章:
- 0
|
2. 詞法分析(Lexical analysis)
¶
A Python program is read by a parser. Input to the parser is a stream of
tokens, generated by the lexical analyzer. This chapter describes how the
lexical analyzer breaks a file into tokens.
一個Python程序由 解析器 讀入, 輸入解析器的是由 詞法分析器 生成的語言符號流。本章討論詞法分析器如何把文件分隔成語言符號。
Python reads program text as Unicode code points; the encoding of a source file
can be given by an encoding declaration and defaults to UTF-8, see
PEP 3120
for details. If the source file cannot be decoded, a
SyntaxError
is
raised.
Python使用Unicode code points作為程序文本,源程序文件的編碼可以通過聲明顯式地修改,默認(rèn)為UTF-8,詳見
PEP 3120
。如果無法解碼源代碼,就會出現(xiàn)
SyntaxError
異常。
2.1. 行結(jié)構(gòu)(Line structure)
¶
A Python program is divided into a number of logical lines.
一個Python程序被分割成許多 邏輯行 。
2.1.1. 邏輯行(Logical lines)
¶
The end of a logical line is represented by the token NEWLINE. Statements
cannot cross logical line boundaries except where NEWLINE is allowed by the
syntax (e.g., between statements in compound statements). A logical line is
constructed from one or more physical lines by following the explicit or
implicit line joining rules.
邏輯行的結(jié)束以NEWLINE(新行)語言符號表示。語句不能跨多個邏輯行邊界,除非語法上允許NEWLINE(例如,復(fù)合語句中的語句之間)。一個邏輯行由一個物理行,或者根據(jù)顯式/隱式 行連接規(guī)則 連接的多個物理行構(gòu)成。
2.1.2. 物理行(Physical lines)
¶
A physical line is a sequence of characters terminated by an end-of-line
sequence. In source files, any of the standard platform line termination
sequences can be used - the Unix form using ASCII LF (linefeed), the Windows
form using the ASCII sequence CR LF (return followed by linefeed), or the old
Macintosh form using the ASCII CR (return) character. All of these forms can be
used equally, regardless of platform.
一個物理行即一個字符序列,它由一個“斷行符號序列”結(jié)束。在源代碼中,任何平臺的標(biāo)準(zhǔn)”斷行符號序列“都可以使用:Unix形式為ASCII
LF(換行)字符;Windows形式為ASCII字符序列CR LF(回車加換行);在Macintosh形式為ASCII
CR(回車)字符。無論在什么平臺上,以上這三種形式都可以使用。
When embedding Python, source code strings should be passed to Python APIs using
the standard C conventions for newline characters (the \n character,
representing ASCII LF, is the line terminator).
在嵌入Python(embedding Python)的場合里,傳遞給Python API的源代碼字符串應(yīng)該使用標(biāo)準(zhǔn)C的斷行習(xí)慣,即Unix形式。
2.1.3. 注釋(Comments)
¶
A comment starts with a hash character (#) that is not part of a string
literal, and ends at the end of the physical line. A comment signifies the end
of the logical line unless the implicit line joining rules are invoked. Comments
are ignored by the syntax; they are not tokens.
一個注釋以 # 字符(它不能是串字面值的一部分)開始,結(jié)束于該物理行的結(jié)尾。如果沒有隱式的行連接,那么注釋就意味著該邏輯行的終止。語法分析會忽略注釋,它們不被看作是語言符號.
2.1.4. 編碼聲明(Encoding declarations)
¶
If a comment in the first or second line of the Python script matches the
regular expression coding[=:]\s*([-\w.]+), this comment is processed as an
encoding declaration; the first group of this expression names the encoding of
the source code file. The recommended forms of this expression are :
Python腳本第一行或者第二行中的注釋如果與正則表達(dá)式 coding[=:]\s*([-\w.]+) 匹配,那么 這個注釋就被認(rèn)為是編碼聲明。此正則表達(dá)式的第一組為該源代碼文件指定了的編碼名稱。正則表達(dá)式的推薦形式為:
# -*- coding: -*-
which is recognized also by GNU Emacs, and :
GNU Emacs可識別這種風(fēng)格,而:
# vim:fileencoding=
which is recognized by Bram Moolenaar’s VIM.
Bram Moolenaar’s VIM.以上這種風(fēng)格。
If no encoding declaration is found, the default encoding is UTF-8. In
addition, if the first bytes of the file are the UTF-8 byte-order mark
(b'\xef\xbb\xbf'), the declared file encoding is UTF-8 (this is supported,
among others, by Microsoft’s notepad).
如果沒有找到什么任何編碼聲明,默認(rèn)編碼為UTF-8。另外,如果文件和前幾個字節(jié)為UTF-8字節(jié)序標(biāo)記(即byte-order mark): b'\xef\xbb\xbf' ,也意味著文件以UTF-8編碼(其他程序也支持這種方式,比如微軟的 notepad )
If an encoding is declared, the encoding name must be recognized by Python. The
encoding is used for all lexical analysis, including string literals, comments
and identifiers. The encoding declaration must appear on a line of its own.
如果聲明了一種編碼,則這個編碼必須是Python可以接受的。此編碼設(shè)置會被使用于整個詞法分析過程中,包括字符串字面值、注釋和標(biāo)識符。編碼聲明必須在它所在位置的的一行內(nèi)。
2.1.5. 顯式行連接(Explicit line joining)
¶
Two or more physical lines may be joined into logical lines using backslash
characters (\), as follows: when a physical line ends in a backslash that is
not part of a string literal or comment, it is joined with the following forming
a single logical line, deleting the backslash and the following end-of-line
character. For example:
兩個或更多物理行可以使用反斜線字符( \ )合并成一個邏輯行,具體地說:當(dāng)一個物理行結(jié)束于一個反斜線處時(這個反斜線不能是字符串字面值或注釋的一部分),它就同其后的物理行合并成一個邏輯行,同時將它之后的反斜線和行結(jié)束符刪除,例如:
if 1900 year 2100 and 1 month 12 \
and 1 day 31 and 0 hour 24 \
and 0 minute 60 and 0 second 60: # Looks like a valid date
return 1
A line ending in a backslash cannot carry a comment. A backslash does not
continue a comment. A backslash does not continue a token except for string
literals (i.e., tokens other than string literals cannot be split across
physical lines using a backslash). A backslash is illegal elsewhere on a line
outside a string literal.
以反斜線結(jié)尾的行后不能有注釋。反斜線不能接續(xù)注釋行。除了字符串字面值,反斜線也不能接續(xù)任何語言符號(即,不是字符串字面值的語言符號不能通過反斜線跨越物理)。在字符串字面值之外的行內(nèi)其它地方出現(xiàn)的反斜線都是非法的。
2.1.6. 隱式行連接(Implicit line joining)
¶
Expressions in parentheses, square brackets or curly braces can be split over
more than one physical line without using backslashes. For example:
在小括號, 中括號,大括號中的表達(dá)式,不須借助反斜線就可以跨越多個物理行,例如:
month_names = ['Januari', 'Februari', 'Maart', # These are the
'April', 'Mei', 'Juni', # Dutch names
'Juli', 'Augustus', 'September', # for the months
'Oktober', 'November', 'December'] # of the year
Implicitly continued lines can carry comments. The indentation of the
continuation lines is not important. Blank continuation lines are allowed.
There is no NEWLINE token between implicit continuation lines. Implicitly
continued lines can also occur within triple-quoted strings (see below); in that
case they cannot carry comments.
隱式連接的行可以尾隨注釋,如何縮進(jìn)接續(xù)行并不重要。空接續(xù)行是允許的。.在隱式接續(xù)行間中是沒有NEWLINE語言符號的。隱式行連接在三重引用串(后述)中也是合法的,但那種情況下不能加注釋。
2.1.7. 空行(Blank lines)
¶
A logical line that contains only spaces, tabs, formfeeds and possibly a
comment, is ignored (i.e., no NEWLINE token is generated). During interactive
input of statements, handling of a blank line may differ depending on the
implementation of the read-eval-print loop. In the standard interactive
interpreter, an entirely blank logical line (i.e. one containing not even
whitespace or a comment) terminates a multi-line statement.
一個僅包括空格、制表符、進(jìn)紙符和一個可選注釋的邏輯行,在解析過程中是被忽略的(即不會產(chǎn)生對應(yīng)的NEWLINE語言符號)。在語句進(jìn)行交互式輸
入時,空行的處理依賴于“輸入-計算-輸出”(read-eval-print)循環(huán)的實現(xiàn)方式而不同。在標(biāo)準(zhǔn)交互解釋器中,一個純粹的空行(即不包括任
何東西,甚至注釋和空白)才會結(jié)束多行語句。
2.1.8. 縮進(jìn)(Indentation)
¶
Leading whitespace (spaces and tabs) at the beginning of a logical line is used
to compute the indentation level of the line, which in turn is used to determine
the grouping of statements.
邏輯行的前導(dǎo)空白(空格和制表符)用于計算行的縮進(jìn)層次,縮進(jìn)層次然后用于語句的分組。
Tabs are replaced (from left to right) by one to eight spaces such that the
total number of characters up to and including the replacement is a multiple of
eight (this is intended to be the same rule as used by Unix). The total number
of spaces preceding the first non-blank character then determines the line’s
indentation. Indentation cannot be split over multiple physical lines using
backslashes; the whitespace up to the first backslash determines the
indentation.
首先,
制表符被轉(zhuǎn)換成(從左到右)一至八個空格,這樣直到包括替換部分的字符總數(shù)達(dá)到八的倍數(shù)(這是為了與UNIX的規(guī)則保持一致。然后,根據(jù)首個非空白字符前
的空格總數(shù)計算行的縮進(jìn)層次!翱s進(jìn)”是不能用反斜線跨物理行接續(xù)的。只有反斜線之前的空白字符才用于確定縮進(jìn)層次。
Indentation is rejected as inconsistent if a source file mixes tabs and spaces
in a way that makes the meaning dependent on the worth of a tab in spaces; a
TabError is raised in that case.
如果源文件混合使用了制表符和空格,并且縮進(jìn)的意義依賴于制表符的空格長度的話,那么縮進(jìn)可能因為不一致被拒絕。
Cross-platform compatibility note: because of the nature of text editors on
non-UNIX platforms, it is unwise to use a mixture of spaces and tabs for the
indentation in a single source file. It should also be noted that different
platforms may explicitly limit the maximum indentation level.
跨平臺兼容性注意: 由于在非UNIX平臺上的文本編輯器特性,在單個源文件里使用混合空格和制表符的縮進(jìn)是不明智的。另一個值得注意的地方是不同平臺可能明確地限制了最大縮進(jìn)層次。
A formfeed character may be present at the start of the line; it will be ignored
for the indentation calculations above. Formfeed characters occurring elsewhere
in the leading whitespace have an undefined effect (for instance, they may reset
the space count to zero).
換頁符呆以出現(xiàn)在行首,但以上介紹的縮進(jìn)計算過程會忽略它。在行前置空白的其它位置上出現(xiàn)的換頁符會導(dǎo)致未定義的行為(例如,它可能使空格數(shù)重置為零)。
The indentation levels of consecutive lines are used to generate INDENT and
DEDENT tokens, using a stack, as follows.
相臨行的縮進(jìn)層次用于產(chǎn)生語言符號INDENT和DEDENT,在這個過程中使用了堆棧數(shù)據(jù)結(jié)構(gòu),如下所述。
Before the first line of the file is read, a single zero is pushed on the stack;
this will never be popped off again. The numbers pushed on the stack will
always be strictly increasing from bottom to top. At the beginning of each
logical line, the line’s indentation level is compared to the top of the stack.
If it is equal, nothing happens. If it is larger, it is pushed on the stack, and
one INDENT token is generated. If it is smaller, it must be one of the
numbers occurring on the stack; all numbers on the stack that are larger are
popped off, and for each number popped off a DEDENT token is generated. At the
end of the file, a DEDENT token is generated for each number remaining on the
stack that is larger than zero.
在未讀入文件第一行之前,壓入(push)棧一個以后不會彈出的(pop)零。所有堆棧中的數(shù)字都從底部向頂部增長。在每個邏輯行的開頭處,它的縮
進(jìn)層次與棧頂比較,如果兩者相等則什么也不會發(fā)生;如果它大于棧頂,將其壓入棧中,并產(chǎn)生一個INDENT語言符號;如果小于棧頂,
那么它的值應(yīng)該已經(jīng)出現(xiàn)于堆棧中,堆棧中所有大于它的數(shù)都將被彈出,并且每個都產(chǎn)生一個DEDENT語言符號。到達(dá)文件尾時,堆棧中大于零的數(shù)字都被彈
出,每彈出一個數(shù)字也產(chǎn)生一個DEDENT語言符號。
Here is an example of a correctly (though confusingly) indented piece of Python
code:
這是一個有著正確縮進(jìn)格式的Python代碼的例子(雖然有點亂):
def perm(l):
# Compute the list of all permutations of l
if len(l) 1:
return [l]
r = []
for i in range(len(l)):
s = l[:i] + l[i+1:]
p = perm(s)
for x in p:
r.append(l[i:i+1] + x)
return r
The following example shows various indentation errors:
下面的例子展示了各種縮進(jìn)錯誤:
def perm(l): # error: first line indented (首行縮進(jìn))
for i in range(len(l)): # error: not indented (未縮進(jìn))
s = l[:i] + l[i+1:]
p = perm(l[:i] + l[i+1:]) # error: unexpected indent (意外縮進(jìn))
for x in p:
r.append(l[i:i+1] + x)
return r # error: inconsistent dedent (不一致的縮進(jìn))
(Actually, the first three errors are detected by the parser; only the last
error is found by the lexical analyzer — the indentation of return r does
not match a level popped off the stack.)
(事實上, 前三個錯誤是由解析器發(fā)現(xiàn)的。僅僅最后一個錯誤是由詞法分析器找到的— return r 的縮進(jìn)層次與彈出堆棧的數(shù)不匹配。)
2.1.9. 語言符號間的空白(Whitespace between tokens)
¶
Except at the beginning of a logical line or in string literals, the whitespace
characters space, tab and formfeed can be used interchangeably to separate
tokens. Whitespace is needed between two tokens only if their concatenation
could otherwise be interpreted as a different token (e.g., ab is one token, but
a b is two tokens).
除了位于在邏輯行開始處或者字符串當(dāng)中,空格,制表符和進(jìn)紙符這些空白字符可以等效地用于分隔語言符號(token)。只在兩個符號在連接后會有其它含義時才需要使用空間分割它們,例如,ab是一個符號,但a b是兩個符號。
2.2. 其它語言符號(Other tokens)
¶
Besides NEWLINE, INDENT and DEDENT, the following categories of tokens exist:
identifiers, keywords, literals, operators, and delimiters. Whitespace
characters (other than line terminators, discussed earlier) are not tokens, but
serve to delimit tokens. Where ambiguity exists, a token comprises the longest
possible string that forms a legal token, when read from left to right.
除了NEWLINE、INDENT和DEDENT外,還有以下幾類語言符號: 標(biāo)識符 , 關(guān)鍵字 、 字面值 、 運算符 和 分隔符 ?瞻撞皇钦Z言符號(除了斷行符,如前所述),但可以用于分隔語言符號。如果在構(gòu)造某語言符號可能存在歧義時,就試圖用盡量長的字符串(從左至右讀出的)構(gòu)造一個合法語言符號。
2.3. 標(biāo)識符和關(guān)鍵字(Identifiers and keywords)
¶
Identifiers (also referred to as names) are described by the following lexical
definitions.
標(biāo)識符(也稱為 名字 )由以下詞法定義描述。
The syntax of identifiers in Python is based on the Unicode standard annex
UAX-31, with elaboration and changes as defined below; see also
PEP 3131
for
further details.
下面介紹的Python標(biāo)識符定義是在Unicode standard annex UAX-31的基礎(chǔ)上加以修改而成的,更多細(xì)節(jié)可以參考
PEP 3131
。
Within the ASCII range (U+0001..U+007F), the valid characters for identifiers
are the same as in Python 2.x: the uppercase and lowercase letters A through
Z, the underscore _ and, except for the first character, the digits
0 through 9.
在ASCII范圍(U+0001..U+007F)內(nèi),標(biāo)識符的有效字符與Python 2.x相同:大小寫字母(A-Z)、下劃線,以及不能作為標(biāo)識符開始的數(shù)字(0-9)。
Python 3.0 introduces additional characters from outside the ASCII range (see
PEP 3131
). For these characters, the classification uses the version of the
Unicode Character Database as included in the
unicodedata
module.
Python 3.0引入了在ASCII范圍之外額外字符(參見
PEP 3131
)。對于這些字符進(jìn)行分類(classification),可以使用在
unicodedata
模塊中的Unicode Character Database版本。
Identifiers are unlimited in length. Case is significant.
標(biāo)識符不限長度,區(qū)分大小寫。
identifier ::=
id_start
id_continue
*
id_start ::=
id_continue ::= id_start[/url]
, plus characters in the categories Mn, Mc, Nd, Pc and others with the Other_ID_Continue property>
The Unicode category codes mentioned above stand for:
以上Unicode category code的縮寫是:
Lu - uppercase lettersLl - lowercase lettersLt - titlecase lettersLm - modifier lettersLo - other lettersNl - letter numbersMn - nonspacing marksMc - spacing combining marksNd - decimal numbersPc - connector punctuations
All identifiers are converted into the normal form NFC while parsing; comparison
of identifiers is based on NFC.
在解析時,所有標(biāo)識符都被轉(zhuǎn)換為NFC形式,標(biāo)識符的比較是基于NFC的。
A non-normative HTML file listing all valid identifier characters for Unicode
4.1 can be found at
有一篇非標(biāo)準(zhǔn)的HTML文件列出了所以有效的標(biāo)識符Unicode 4.1字符,可以這里找到:
http://www.dcl.hpi.uni-potsdam.de/home/loewis/table-3131.html
.
2.3.1. 關(guān)鍵字(Keywords)
¶
The following identifiers are used as reserved words, or keywords of the
language, and cannot be used as ordinary identifiers. They must be spelled
exactly as written here:
以下標(biāo)識符用作保留字, 或者叫做語言的 關(guān)鍵字 ,它們不能作為普通標(biāo)識符使用,而且它們必須按如下嚴(yán)格拼寫:
False class finally is return
None continue for lambda try
True def from nonlocal while
and del global not with
as elif if or yield
assert else import pass
break except in raise
2.3.2. 保留的標(biāo)識符類型(Reserved classes of identifiers)
¶
Certain classes of identifiers (besides keywords) have special meanings. These
classes are identified by the patterns of leading and trailing underscore
characters:
除了關(guān)鍵字,某些類型的標(biāo)識符也具有特殊含義,這種標(biāo)識符一般都以下劃線開始或結(jié)束:
_*Not imported by from module import *. The special identifier _ is used
in the interactive interpreter to store the result of the last evaluation; it is
stored in the
builtins
module. When not in interactive mode, _
has no special meaning and is not defined. See section
The import statement
.
from moduls import * 不會導(dǎo)入這些符號。在交互式解釋器中,特殊標(biāo)識符 _ 保存上次計算(evaluation)的結(jié)果,這個符號 內(nèi)置 模塊之中。在非交互方式時, _ 沒有特殊含義,而且是沒有定義的。
Note
The name _ is often used in conjunction with internationalization;
refer to the documentation for the
gettext
module for more
information on this convention.
名字 _ 通常用于國際化開發(fā),關(guān)于這個使用習(xí)慣,可以參考模塊
gettext
。
__*__System-defined names. These names are defined by the interpreter and its
implementation (including the standard library); applications should not expect
to define additional names using this convention. The set of names of this
class defined by Python may be extended in future versions. See section
特殊方法名(Special method names)
.
系統(tǒng)預(yù)定義的名字。這種名字由解釋器及其實現(xiàn)定義(包括標(biāo)準(zhǔn)庫)。應(yīng)用程序不應(yīng)該使用這種方法定義標(biāo)識符。Python的未來版本可能會引入更多的這類名字,請參考
特殊方法名(Special method names)
。
__*Class-private names. Names in this category, when used within the context of a
class definition, are re-written to use a mangled form to help avoid name
clashes between “private” attributes of base and derived classes. See section
標(biāo)識符(名字) (Identifiers (Names))
.
類私有名字。此類名字出現(xiàn)在類定義的上下文中,它們會在更名為其他名字(mangled form),以避免基類與繼承類的“私有”屬性的名字沖突,參考
標(biāo)識符(名字) (Identifiers (Names))
。
2.4. 字面值(Literals)
¶
Literals are notations for constant values of some built-in types.
字面值是某些內(nèi)置類型常量的表示法。
2.4.1. 字符串與字節(jié)的字面值(String and Bytes literals)
¶
String literals are described by the following lexical definitions:
字符串字面值由以下詞法定義描述:
stringliteral ::= [
stringprefix
](
shortstring
|
longstring
)
stringprefix ::= "r" | "R"
shortstring ::= "'"
shortstringitem
* "'" | '"'
shortstringitem
* '"'
longstring ::= "'''"
longstringitem
* "'''" | '"""'
longstringitem
* '"""'
shortstringitem ::=
shortstringchar
|
stringescapeseq
longstringitem ::=
longstringchar
|
stringescapeseq
shortstringchar ::=
longstringchar ::=
stringescapeseq ::= "\"
bytesliteral ::=
bytesprefix
(
shortbytes
|
longbytes
)
bytesprefix ::= "b" | "B"
shortbytes ::= "'"
shortbytesitem
* "'" | '"'
shortbytesitem
* '"'
longbytes ::= "'''"
longbytesitem
* "'''" | '"""'
longbytesitem
* '"""'
shortbytesitem ::=
shortbyteschar
|
bytesescapeseq
longbytesitem ::=
longbyteschar
|
bytesescapeseq
shortbyteschar ::=
longbyteschar ::=
bytesescapeseq ::= "\"
One syntactic restriction not indicated by these productions is that whitespace
is not allowed between the
stringprefix
or
bytesprefix
and the
rest of the literal. The source character set is defined by the encoding
declaration; it is UTF-8 if no encoding declaration is given in the source file;
see section
編碼聲明(Encoding declarations)
.
一個上面沒有表示出來的語法限制是,在
stringprefix
或
bytesprefix
與其余字面值之間不允許出現(xiàn)空白字符。源代碼的字符集由編碼聲明定義,如果源文件內(nèi)沒有指定編碼聲明,則默認(rèn)為UTF-8,參見
編碼聲明(Encoding declarations)
。
In plain English: Both types of literals can be enclosed in matching single quotes
(') or double quotes ("). They can also be enclosed in matching groups
of three single or double quotes (these are generally referred to as
triple-quoted strings). The backslash (\) character is used to escape
characters that otherwise have a special meaning, such as newline, backslash
itself, or the quote character.
通俗地講,這兩種字面值可以用單引號( ’ )或雙引號( ” )括住。它們也可以用成對的三個單引號和雙引號(這叫做 三重引用串 ),反斜線( \ )可以用于引用其它有特殊含義的字符,例如新行符、反斜線本身或者引用字符。
String literals may optionally be prefixed with a letter 'r' or 'R';
such strings are called raw strings and treat backslashes as literal
characters. As a result, '\U' and '\u' escapes in raw strings are not
treated specially.
字符串字面值可以用’u’和’U’開頭,這樣的字符串字面值叫作 原始串 ,它不對反斜線進(jìn)行轉(zhuǎn)義。原始串中的 '\U' 和 '\u' 不會得到特殊處理。
Bytes literals are always prefixed with 'b' or 'B'; they produce an
instance of the
bytes
type instead of the
str
type. They
may only contain ASCII characters; bytes with a numeric value of 128 or greater
must be expressed with escapes.
字節(jié)串字面一定要以 'b' 或 'B' 開始,這會產(chǎn)生一個:bytes 類的實例,而不是
str
的。它只能包括ASCII字符,超過數(shù)值128的字節(jié)必須用轉(zhuǎn)義字符表達(dá)。
In triple-quoted strings, unescaped newlines and quotes are allowed (and are
retained), except that three unescaped quotes in a row terminate the string. (A
“quote” is the character used to open the string, i.e. either ' or ".)
在三重引用串中,未轉(zhuǎn)義新行和引用字符是允許的(并且會被保留),除非三個連續(xù)的引用字符結(jié)束了該串。(引用字符指用于開始字符串的字符, 如 ’ 和 ” )
Unless an 'r' or 'R' prefix is present, escape sequences in strings are
interpreted according to rules similar to those used by Standard C. The
recognized escape sequences are:
如果沒有使用 ’r’ 或 ’R’ 前綴,那么其含義就按照類似C標(biāo)準(zhǔn)中的規(guī)則解釋,可接受的轉(zhuǎn)義的字符如下:
Escape Sequence
Meaning
Notes
\newline
Backslash and newline ignored
\\
Backslash (\)
\'
Single quote (')
\"
Double quote (")
\a
ASCII Bell (BEL)
\b
ASCII Backspace (BS)
\f
ASCII Formfeed (FF)
\n
ASCII Linefeed (LF)
\r
ASCII Carriage Return (CR)
\t
ASCII Horizontal Tab (TAB)
\v
ASCII Vertical Tab (VT)
\ooo
Character with octal value
ooo
(1,3)
\xhh
Character with hex value hh
(2,3)
Escape sequences only recognized in string literals are:
只由字符串字面值支持的轉(zhuǎn)義字符有:
Escape Sequence
Meaning
Notes
\N{name}
Character named name in the
Unicode database
\uxxxx
Character with 16-bit hex value
xxxx
(4)
\Uxxxxxxxx
Character with 32-bit hex value
xxxxxxxx
(5)
Notes:
As in Standard C, up to three octal digits are accepted.
與C標(biāo)準(zhǔn)相同,最多只接受三個八進(jìn)制數(shù)字。
Unlike in Standard C, at most two hex digits are accepted.
不像C標(biāo)準(zhǔn),最多只接受兩個十六進(jìn)制數(shù)據(jù)。
In a bytes literal, hexadecimal and octal escapes denote the byte with the
given value. In a string literal, these escapes denote a Unicode character
with the given value.
在字節(jié)字面值中,十六進(jìn)制和八進(jìn)制轉(zhuǎn)義字符都是指定一個字節(jié)的值。在字符串字面值中,這些轉(zhuǎn)義字符指定的是一個Unicode字符的值。
Individual code units which form parts of a surrogate pair can be encoded using
this escape sequence. Unlike in Standard C, exactly two hex digits are required.
任何構(gòu)成部分surrogate pair的單獨code unit都可以使用轉(zhuǎn)義字符序列編碼。不像C標(biāo)準(zhǔn),這里要求給全兩個十六進(jìn)制數(shù)字。
Any Unicode character can be encoded this way, but characters outside the Basic
Multilingual Plane (BMP) will be encoded using a surrogate pair if Python is
compiled to use 16-bit code units (the default). Individual code units which
form parts of a surrogate pair can be encoded using this escape sequence.
任何Unicode字符都可以用這種方式編碼,但如果Python是按16位code
unit編譯的話(默認(rèn)),在基本多語言平面(BMP)之外的字符會使用surrogate pair編碼。任何構(gòu)成部分surrogate
pair的單獨code unit都可以使用這種轉(zhuǎn)義字符序列編碼。
Unlike Standard C, all unrecognized escape sequences are left in the string
unchanged, i.e., the backslash is left in the string. (This behavior is
useful when debugging: if an escape sequence is mistyped, the resulting output
is more easily recognized as broken.) It is also important to note that the
escape sequences only recognized in string literals fall into the category of
unrecognized escapes for bytes literals.
不像C標(biāo)準(zhǔn),所有不能被解釋的轉(zhuǎn)義序列留在串不作改變,即 反斜線留在串中 (這個行為在調(diào)試中特別有用:如果有轉(zhuǎn)義字符輸錯了,可以很容易地判斷出來)。但也要留意,字節(jié)字面值并不接受那些只有在字符串字面值內(nèi)有效的轉(zhuǎn)義字符。
Even in a raw string, string quotes can be escaped with a backslash, but the
backslash remains in the string; for example, r"\"" is a valid string
literal consisting of two characters: a backslash and a double quote; r"\"
is not a valid string literal (even a raw string cannot end in an odd number of
backslashes). Specifically, a raw string cannot end in a single backslash
(since the backslash would escape the following quote character). Note also
that a single backslash followed by a newline is interpreted as those two
characters as part of the string, not as a line continuation.
即使在原始串中,字符引用也可以使用反斜線轉(zhuǎn)義,但反斜線會保留在字符串中,例如, r"\"" 是一個有效的字符串,它由兩個字符組成,一個反斜線一個雙引號;而 r"\" 則不是(甚至原始串也不能包括奇數(shù)個反斜線。另外, 原始串也不能以反斜線結(jié)束 (因為反斜線會把后面的引用字符轉(zhuǎn)義)。同時,也要注意在新行符后出現(xiàn)的反斜線,解釋為串部分中的兩個字符,而 不是 續(xù)行處理。
2.4.2. 字符串字面值的連接(String literal concatenation)
¶
Multiple adjacent string literals (delimited by whitespace), possibly using
different quoting conventions, are allowed, and their meaning is the same as
their concatenation. Thus, "hello" 'world' is equivalent to
"helloworld". This feature can be used to reduce the number of backslashes
needed, to split long strings conveniently across long lines, or even to add
comments to parts of strings, for example:
多個空白分隔的相鄰字符串字面值,可能使用了不同的引用習(xí)慣,這是允許的,并且它們在連接時含義是一樣的。因此, ”hello” 'world' 等價于 ”helloworld” 。這個功能可以用來減少需要的反斜線,把跨越多行的長字符串,甚至可以在串的某個部分加注釋,例如:
re.compile("[A-Za-z_]" # letter or underscore
"[A-Za-z0-9_]*" # letter, digit or underscore
)
Note that this feature is defined at the syntactical level, but implemented at
compile time. The ‘+’ operator must be used to concatenate string expressions
at run time. Also note that literal concatenation can use different quoting
styles for each component (even mixing raw strings and triple quoted strings).
注意這個功能是在語法層次上定義的,但卻是在編譯時實現(xiàn)的。在運行時連接字符串表達(dá)式必須使用”+”運算符。再次提醒,在字面值連接時,不同的引用字符可以混用,甚至原始串與三重引用串也可以混合使用。
2.4.3. 數(shù)值型的字面值(Numeric literals)
¶
There are three types of numeric literals: integers, floating point numbers, and
imaginary numbers. There are no complex literals (complex numbers can be formed
by adding a real number and an imaginary number).
存在有三種類型的數(shù)值型字面值:整數(shù)、浮點數(shù)和虛數(shù)。沒有復(fù)數(shù)字面值(復(fù)數(shù)可以用一個實數(shù)加上一個虛數(shù)的方法構(gòu)造)
Note that numeric literals do not include a sign; a phrase like -1 is
actually an expression composed of the unary operator ‘-‘ and the literal
1.
注意數(shù)值型的字面值并不包括正負(fù)號,像 -1 ,實際上是一個組合了一元運算符 ‘-‘ 和字面值``1``的表達(dá)式。
2.4.4. 整數(shù)字面值(Integer literals)
¶
Integer literals are described by the following lexical definitions:
整數(shù)字面值由以下詞法定義描述:
integer ::=
decimalinteger
|
octinteger
|
hexinteger
|
bininteger
decimalinteger ::=
nonzerodigit
digit
* | "0"+
nonzerodigit ::= "1"..."9"
digit ::= "0"..."9"
octinteger ::= "0" ("o" | "O")
octdigit
+
hexinteger ::= "0" ("x" | "X")
hexdigit
+
bininteger ::= "0" ("b" | "B")
bindigit
+
octdigit ::= "0"..."7"
hexdigit ::=
digit
| "a"..."f" | "A"..."F"
bindigit ::= "0" | "1"
There is no limit for the length of integer literals apart from what can be
stored in available memory.
沒有對整數(shù)長度的軟件限制,其大小只取決于有效內(nèi)存的容量。
Note that leading zeros in a non-zero decimal number are not allowed. This is
for disambiguation with C-style octal literals, which Python used before version
3.0.
注意,非零十進(jìn)制數(shù)字中不允許用0作為前綴,這種寫法會與C語言風(fēng)格的八進(jìn)制字面值產(chǎn)生歧義(用于3.0之前版本的Python)。
Some examples of integer literals:
整數(shù)字面值的一些例子:
7 2147483647 0o177 0b100110111
3 79228162514264337593543950336 0o377 0x100000000
79228162514264337593543950336 0xdeadbeef
2.4.5. 浮點型字面值(Floating point literals)
¶
Floating point literals are described by the following lexical definitions:
浮點型的字面值可以用以下詞法定義描述:
floatnumber ::=
pointfloat
|
exponentfloat
pointfloat ::= [
intpart
]
fraction
|
intpart
"."
exponentfloat ::= (
intpart
|
pointfloat
)
exponent
intpart ::=
digit
+
fraction ::= "."
digit
+
exponent ::= ("e" | "E") ["+" | "-"]
digit
+
Note that the integer and exponent parts are always interpreted using radix 10.
For example, 077e010 is legal, and denotes the same number as 77e10. The
allowed range of floating point literals is implementation-dependent. Some
examples of floating point literals:
注意整數(shù)部分和指數(shù)部分都看作是十進(jìn)制的。例如, 077e010 是合法的,它等價于 77e10 。浮點型字面的允許范圍是依賴實現(xiàn),以下是一些浮點數(shù)的例子:
3.14 10. .001 1e100 3.14e-10 0e0
Note that numeric literals do not include a sign; a phrase like -1 is
actually an expression composed of the unary operator - and the literal
1.
注意數(shù)值型字面值并不包括正負(fù)號,像 -1 ,實際上是一個組合了一元運算符 ‘-‘ 和字面值``1``的表達(dá)式。
2.4.6. 虛數(shù)字面值(Imaginary literals)
¶
Imaginary literals are described by the following lexical definitions:
虛數(shù)字面值可以用下面詞法定義描述:
imagnumber ::= (
floatnumber
|
intpart
) ("j" | "J")
An imaginary literal yields a complex number with a real part of 0.0. Complex
numbers are represented as a pair of floating point numbers and have the same
restrictions on their range. To create a complex number with a nonzero real
part, add a floating point number to it, e.g., (3+4j). Some examples of
imaginary literals:
虛數(shù)是實部為零的復(fù)數(shù)。復(fù)數(shù)由一對有著相同取值范圍的浮點數(shù)對表示。為了創(chuàng)建一個非零實部的復(fù)數(shù),可以對它增加一個浮點數(shù),例如, (3+4j) 。下面是一些例子:
3.14j 10.j 10j .001j 1e100j 3.14e-10j
2.5. 運算符(Operators)
¶
The following tokens are operators:
運算符包括以下語言符號:
+ - * ** / // %
>> & | ^ ~
> >= == !=
2.6. 分隔符(Delimiters)
¶
The following tokens serve as delimiters in the grammar:
以下符號用作語法上的分隔符:
( ) [ ] { }
, : . ; @ =
+= -= *= /= //= %=
&= |= ^= >>=
The period can also occur in floating-point and imaginary literals. A sequence
of three periods has a special meaning as an ellipsis literal. The second half
of the list, the augmented assignment operators, serve lexically as delimiters,
but also perform an operation.
句號可以出現(xiàn)在浮點數(shù)和虛數(shù)字面值中出現(xiàn),連續(xù)三個句號的一個序列是片斷的省略寫法。在這個表格的后半部分,即參數(shù)化賦值運算符,它們在詞法上是分隔符,同時也執(zhí)行運算。
The following printing ASCII characters have special meaning as part of other
tokens or are otherwise significant to the lexical analyzer:
以下ASCII可打印字符,在作為其它語言符號的一部分時有特殊含義,或者對于詞法分析器具有特殊作用:
' " # \
The following printing ASCII characters are not used in Python. Their
occurrence outside string literals and comments is an unconditional error:
以下ASCII可打印字符,并不在Python中使用,當(dāng)它們出現(xiàn)在注釋和字符串字面值之外時就是非法的:
$ ?
本文來自ChinaUnix博客,如果查看原文請點:http://blog.chinaunix.net/u1/42957/showart_2106775.html |
|