本文和大家分享的主要是python3
中全角与半角字符的转换相关内容,一起来看看吧,希望对大家
学习python3有所帮助。
一、背景介绍
·
解决什么问题
:快速方便的对文本进行全角半角自动转换
·
适用什么场景
:学生答题数据中全角字符替换为半角字符
二、全角半角原理
·
全角即:
D ouble
B yte
C haracter
,简称
DBC
·
半角即:
S ingle
B yte
C haracter
,简称
SBC
·
在
windows
中,中文和全角字符都占两个字节,并且使用了
ascii chart 2 (codes 128–255)
;
·
全角字符的第一个字节总是被置为
163
,而第二个字节则是相同半角字符码加上
128
(不包括空格,全角空格和半角空格也要考虑进去);
·
对于中文来说,它的第一个字节被置为大于
163
,如
’
阿
’
为
:176 162
,检测到中文时不进行转换。
·
例如:半角
a
为
65
,则全角
a
是
163
(第一个字节)、
193
(第二个字节,
128+65
)。
全角半角示例:(文本 test.txt
包含全角和半角字符)
F:\test>
type
test.
txt123456
123456
abcdefg
abcdefg
中国你好
三、使用 Python3 实现全角半角转换
# -*- coding:utf-8 -*-
”’
全角即:Double Byte Character
,简称:
DBC
半角即:Single Byte Character
,简称:
SBC
”’
def DBC2SBC(ustring):
”’
全角转半角
”’
rstring = “”
for uchar in ustring:
inside_code = ord(uchar)
if inside_code == 0x3000:
inside_code = 0x0020
else:
inside_code -= 0xfee0
if not (0x0021 <= inside_code and inside_code <= 0x7e):
rstring += uchar
continue
rstring += chr(inside_code)
return rstring
def SBC2DBC(ustring):
”’
半角转全角
”’
rstring = “”
for uchar in ustring:
inside_code = ord(uchar)
if inside_code == 0x0020:
inside_code = 0x3000
else:
if not (0x0021 <= inside_code and inside_code <= 0x7e):
rstring += uchar
continue
inside_code += 0xfee0
rstring += chr(inside_code)
return rstring
s = ”’
array(‘
0
’ => ‘0’, ‘
1
’ => ‘1’, ‘
2
’ => ‘2’, ‘
3
’ => ‘3’, ‘
4
’ => ‘4’,
‘
5
’ => ‘5’, ‘
6
’ => ‘6’, ‘
7
’ => ‘7’, ‘
8
’ => ‘8’, ‘
9
’ => ‘9’,
‘
A
’ => ‘A’, ‘
B
’ => ‘B’, ‘
C
’ => ‘C’, ‘
D
’ => ‘D’, ‘
E
’ => ‘E’,
‘
F
’ => ‘F’, ‘
G
’ => ‘G’, ‘
H
’ => ‘H’, ‘
I
’ => ‘I’, ‘
J
’ => ‘J’,
‘
K
’ => ‘K’, ‘
L
’ => ‘L’, ‘
M
’ => ‘M’, ‘
N
’ => ‘N’, ‘
O
’ => ‘O’,
‘
P
’ => ‘P’, ‘
Q
’ => ‘Q’, ‘
R
’ => ‘R’, ‘
S
’ => ‘S’, ‘
T
’ => ‘T’,
‘
U
’ => ‘U’, ‘
V
’ => ‘V’, ‘
W
’ => ‘W’, ‘
X
’ => ‘X’, ‘
Y
’ => ‘Y’,
‘
Z
’ => ‘Z’, ‘
a
’ => ‘a’, ‘
b
’ => ‘b’, ‘
c
’ => ‘c’, ‘
d
’ => ‘d’,
‘
e
’ => ‘e’, ‘
f
’ => ‘f’, ‘
g
’ => ‘g’, ‘
h
’ => ‘h’, ‘
i
’ => ‘i’,
‘
j
’ => ‘j’, ‘
k
’ => ‘k’, ‘
l
’ => ‘l’, ‘
m
’ => ‘m’, ‘
n
’ => ‘n’,
‘
o
’ => ‘o’, ‘
p
’ => ‘p’, ‘
q
’ => ‘q’, ‘
r
’ => ‘r’, ‘
s
’ => ‘s’,
‘
t
’ => ‘t’, ‘
u
’ => ‘u’, ‘
v
’ => ‘v’, ‘
w
’ => ‘w’, ‘
x
’ => ‘x’,
‘
y
’ => ‘y’, ‘
z
’ => ‘z’,
‘
(
’ => ‘(‘, ‘
)
’ => ‘)’, ‘
〔
’ => ‘[‘, ‘
〕
’ => ‘]’, ‘
【
’ => ‘[‘,
‘
】
’ => ‘]’, ‘
〖
’ => ‘[‘, ‘
〗
’ => ‘]’, ‘”‘ => ‘[‘, ‘”‘ => ‘]’,
‘\” => ‘[‘, ‘\” => ‘]’, ‘
{
’ => ‘{‘, ‘
}
’ => ‘}’, ‘
《
’ => ‘<‘,
‘
》
’ => ‘>’,
‘
%
’ => ‘%’, ‘
+
’ => ‘+’, ‘—’ => ‘-‘, ‘
-
’ => ‘-‘, ‘
~
’ => ‘-‘,
‘
:
’ => ‘:’, ‘
。
’ => ‘.’, ‘
、
’ => ‘,’, ‘
,
’ => ‘.’, ‘
、
’ => ‘.’,
‘
;
’ => ‘,’, ‘
?
’ => ‘?’, ‘
!
’ => ‘!’, ‘…’ => ‘-‘, ‘‖’ => ‘|’,
‘”‘ => ‘”‘, ‘\” => ‘`’, ‘\” => ‘`’, ‘
|
’ => ‘|’, ‘
〃
’ => ‘”‘,
‘ ’ => ‘ ‘);
”’
#
全角转半角
print(DBC2SBC(s))
#
半角转全角
print(SBC2DBC(s))
s = ”’
中文测试
”’
#
全角转半角
print(DBC2SBC(s))
#
半角转全角
print(SBC2DBC(s))
来源:
陈鹏个人博客