Fork me on GitHub

Function Programming in Python: 2. Callables

0. 前言


“Function Programming in Python” Chapter 2:Callables

反复调用函数是函数式编程中的一个重点。在Python中有好多种不同的函数(或者说是可调用的something function-like)创建方式:

  1. 用关键字def创建常规命名函数

  2. 用关键字lambda创建的匿名函数

  3. 定义了__call__()函数的类的实例

  4. 闭包(closures)

  5. 实例的静态函数

  6. 生成器函数

纯函数(pure function)和无副作用代码(side-effect-free code)的好处是更容易调试和测试。很多时候在一个函数中side effects很难被完全消除,但是我们可以用函数式编程的思考方式在一定程度上削弱它。

1. Named Functions and Lambdas


Named Functions和Lambdas的主要区别就在于是否有__qualname__属性。

>>> def hello1(name):
..... print("Hello", name)
.....
>>> hello2 = lambda name: print("Hello", name)
>>> hello1('David')
Hello David
>>> hello2('David')
Hello David
>>> hello1.__qualname__
'hello1'
>>> hello2.__qualname__
'<lambda>'
>>> hello3 = hello2 # can bind func to other names
>>> hello3.__qualname__
'<lambda>'
>>> hello3.__qualname__ = 'hello3'
>>> hello3.__qualname__
'hello3'

2. Closures and Callable Instances


有一种说法,类是“附带操作的数据(data with operations attached)”,闭包(closure)是“附带数据的操作(operations with data attached)”。虽然两者都是把逻辑处理和数据放进一个对象,但是类强调的是可修改或可重绑定的状态,而闭包强调不可变性和纯函数。看一个简单的例子:

定义一个带有__call()__函数的类,这样这个类的实例就可被调用:

# A class that creates callable adder instances
class Adder(object):
    def __init__(self, n):
        self.n = n
    def __call__(self, m):
        return self.n + m
add5_i = Adder(5) # "instance" or "imperative"

一个闭包:

def make_adder(n):
    def adder(m):
        return m + n
    return adder
add5_f = make_adder(5) # "functional"

两种方式看起来完全一样,但是第一种方式中add5_i多了一个可修改的变量self.n

>>> add5_i(10)
15
>>> add5_f(10) # only argument affects result
15
>>> add5_i.n = 10 # state is readily changeable
>>> add5_i(10) # result is dependent on prior flow
20

有一点需要注意,在Python中闭包的变量绑定是按变量名字传递,而不是变量值传递

# almost surely not the behavior we intended!
>>> adders = []
>>> for n in range(5):
        adders.append(lambda m: m+n)
>>> [adder(10) for adder in adders]
[14, 14, 14, 14, 14]
>>> n = 10
>>> [adder(10) for adder in adders]
[20, 20, 20, 20, 20]

显然上述结果不是我们想要的,但是做一点小改动就可以达到目的:

>>> adders = []
>>> for n in range(5):
....    adders.append(lambda m, n=n: m+n)
....
>>> [adder(10) for adder in adders]
[10, 11, 12, 13, 14]
>>> n = 10
>>> [adder(10) for adder in adders]
[10, 11, 12, 13, 14]
>>> add4 = adders[4]
>>> add4(10, 100) # Can override the bound value
110

3. Methods of Classes


类中的所有methods都是可调用的。

3.1 Accessors and Operators

不管是否用@property修饰的accessors都是可调用的,但是accessors 作为getter不能传入参数,作为setter不能返回值:

class Car(object):
    def __init__(self):
        self._speed = 100

    @property
    def speed(self):
        print("Speed is", self._speed)
        return self._speed

    @speed.setter
    def speed(self, value):
        print("Setting to", value)
        self._speed = value

# >> car = Car()
# >>> car.speed = 80 # Odd syntax to pass one argument
# Setting to 80
# >>> x = car.speed
# Speed is 80

python支持操作符重载,如下例重载操作符<<

>>> class TalkativeInt(int):
        def __lshift__(self, other):
            print("Shift", self, "by", other)
            return int.__lshift__(self, other)
....
>>> t = TalkativeInt(8)
>>> t << 3
Shift 8 by 3
64

3.2 Static Methods of Instances

有时候我们需要把一些纯函数放在类中,避免它污染外部名字空间,但是我们希望这些纯函数通过类本身也可以调用,而不是只能通过类的一个实例调用,这时候我们需要@staticmethod

import math
class RightTriangle(object):
    "Class used solely as namespace for related functions"
    @staticmethod
    def hypotenuse(a, b):
        return math.sqrt(a**2 + b**2)

    @staticmethod
    def sin(a, b):
        return a / RightTriangle.hypotenuse(a, b)

    @staticmethod
    def cos(a, b):
        return b / RightTriangle.hypotenuse(a, b)

>>> RightTriangle.hypotenuse(3,4)
5.0
>>> rt = RightTriangle()
>>> rt.sin(3,4)
0.6
>>> rt.cos(3,4)
0.8

如果你的名字空间中全部都是纯函数,那你应该通过类(class)访问它们,而不是类的实例(instance)。但是如果你要将一些纯函数和一些依赖实例变量的函数混合进类中,那你应该用@staticmethod修饰纯函数。

3.3 Generator Functions

生成器函数返回的不是一个常规的值,而是一个可迭代的生成器。

def get_primes():
    "Simple lazy Sieve of Eratosthenes"
    candidate = 2
    found = []
    while True:
        if all(candidate % prime != 0 for prime in found):
            yield candidate
            found.append(candidate)
        candidate += 1

>>> primes = get_primes()
>>> next(primes), next(primes), next(primes)
(2, 3, 5)
>>> for _, prime in zip(range(10), primes):
... print(prime, end=" ")
....
7 11 13 17 19 23 29 31 37 41

你或许注意到了,上例中返回的generator里面是一个无限大的序列,这个无限长的序列并非在get_primes()返回时就已经生成并存储在内存中,而是在真正使用generator时,才一个一个生成出来。实际上这个黑魔法关键在yield上,在程序执行到yield这行时,它会在return一个值之后把这个函数“冻”住,保存好其中的所有变量,等到下次这个generator被调用(next())的时候“解冻”。需要注意的是,生成器推导式虽然也可以返回一个生成器,但是经过实验证明,它并不具备yield那种魔法,也就是说生成器推导式并不似定义生成器函数那样节省内存。当需要获取非常大的序列时,应当定义一个生成器函数(使用yield关键字)。

In [1]: def gt():
   ...:     for i in range(100000000):
   ...:         yield i
   ...:         

In [2]: g = gt()

In [3]: type(g)
Out[3]: generator   # 此时ipython仅占用内存47508K
In [1]: g = (i for i in range(100000000) )    
In [2]: type(g)
Out[2]: generator     #内存占用3G多
In [5]: g = [ i for i in range(100000000) ]   # 内存同样占用3G多

另外,生成器函数作为函数自然可以接收参数:

In [3]: def gt(number=10):
   ...:     for i in range(number, 100000000):
   ...:         yield i
   ...:         

In [4]: g = gt(20)

In [5]: g.next()
Out[5]: 20

In [6]: 
  1. generator用于生成一串值

  2. yield关键字和return很像,但是yield多干一件事:保存生成器函数的“状态”

  3. 生成器只是一种特殊的迭代器

  4. 我们可以使用next()获取生成器的下一个值(for循环获取值实际上是隐式调用next()

更多关于generatoryield的讨论可以戳这里

4. Multiple Dispatch


在程序设计语言中,许多时候同一个概念的操作或运算可能需要针对不同数量、不同类型的数据而做不同的处理。既然是“同一概念”,如果能用同样的名字来命名这个操作或运算的函数,会有助于程序代码清晰的表达出语义。但是函数的名字一样了,程序该如何判断应该选用同名函数的哪个版本就成了个问题,这里就需要在编译时由编译器来选择,或在运行时进行方法分派。

参数的数量、类型等信息组成了函数的signature。在不同语言中,函数的signature不仅可以包含参数的数量、类型,也可能包含参数的结构/模式,甚至可能包括返回类型的数量和类型。在C++中,使用同一个名字来命名signature不同的函数,叫做函数重载。然后Python中并没有函数重载,但是我们依然需要多分派技术。

剪刀石头布的例子:

class Thing(object): pass
class Rock(Thing): pass
class Paper(Thing): pass
class Scissors(Thing): pass

4.1 Many Branches

一个纯粹imperative的版本,有很多重复、条件判断、嵌套等:

def beats(x, y):
    if isinstance(x, Rock):
        if isinstance(y, Rock):
            return None # No winner
        elif isinstance(y, Paper):
            return y
        elif isinstance(y, Scissors):
            return x
        else:
            raise TypeError("Unknown second thing")
    elif isinstance(x, Paper):
        if isinstance(y, Rock):
            return x
        elif isinstance(y, Paper):
            return None # No winner
        elif isinstance(y, Scissors):
            return y
        else:
            raise TypeError("Unknown second thing")
    elif isinstance(x, Scissors):
        if isinstance(y, Rock):
            return y
        elif isinstance(y, Paper):
            return x
        elif isinstance(y, Scissors):
            return None # No winner
        else:
            raise TypeError("Unknown second thing")
    else:
        raise TypeError("Unknown first thing")
rock, paper, scissors = Rock(), Paper(), Scissors()
# >>> beats(paper, rock)
# <__main__.Paper at 0x103b96b00>
# >>> beats(paper, 3)
# TypeError: Unknown second thing

4.2 Delegating to the Object

要改进上例,我们可以先考虑分离出三个类,将beats分别委派给各个类:

class DuckRock(Rock):
    def beats(self, other):
        if isinstance(other, Rock):
            return None # No winner
        elif isinstance(other, Paper):
            return other
        elif isinstance(other, Scissors):
            return self
        else:
            raise TypeError("Unknown second thing")

class DuckPaper(Paper):
    def beats(self, other):
        if isinstance(other, Rock):
            return self
        elif isinstance(other, Paper):
            return None # No winner
        elif isinstance(other, Scissors):
            return other
        else:
            raise TypeError("Unknown second thing") 

class DuckScissors(Scissors):
    def beats(self, other):
        if isinstance(other, Rock):
            return other
        elif isinstance(other, Paper):
            return self
        elif isinstance(other, Scissors):
            return None # No winner
        else:
            raise TypeError("Unknown second thing") 

def beats2(x, y):
    if hasattr(x, 'beats'):
        return x.beats(y)
    else:
        raise TypeError("Unknown first thing")

rock, paper, scissors = DuckRock(), DuckPaper(), DuckScissors()
# >>> beats2(rock, paper)
# <__main__.DuckPaper at 0x103b894a8>
# >>> beats2(3, rock)
# TypeError: Unknown first thing

4.3 Pattern Matching

最后,我们可以更直接地使用多分派技术表达所有的逻辑,虽然函数数目有点多,但是逻辑更清晰,不易出错:

from multipledispatch import dispatch
@dispatch(Rock, Rock)
def beats3(x, y): return None

@dispatch(Rock, Paper)
def beats3(x, y): return y

@dispatch(Rock, Scissors)
def beats3(x, y): return x

@dispatch(Paper, Rock)
def beats3(x, y): return x

@dispatch(Paper, Paper)
def beats3(x, y): return None

@dispatch(Paper, Scissors)
def beats3(x, y): return x

@dispatch(Scissors, Rock)
def beats3(x, y): return y

@dispatch(Scissors, Paper)
def beats3(x, y): return x

@dispatch(Scissors, Scissors)
def beats3(x, y): return None

@dispatch(object, object)
def beats3(x, y):
    if not isinstance(x, (Rock, Paper, Scissors)):
        raise TypeError("Unknown first thing")
    else:
        raise TypeError("Unknown second thing")

# >>> beats3(rock, paper)
# <__main__.DuckPaper at 0x103b894a8>
# >>> beats3(rock, 3)
# TypeError: Unknown second thing

转载请注明出处:BackNode

My zhiFuBao

Buy me a cup of coffee

blogroll

social