A “memoized” function is a function that only calculates the return value for each combination of arguments once and returns the previously calculated value if the function is called a second time with the same arguments.
In PHP, I often see this implemented with code like this:
class ProductRepository implements ProductRepositoryInterface { private $products = []; public function product($id) { if (! isset($this->products[$id])) { $this->products[$id] = $this->load($id); } return $this->products[$id]; } private function load($id) { ... } }
The method results are cached as array in an attribute. I often used this approach myself without thinking much about it, but I realized that it is flawed and there is a much better alternative.
An indication was that code like this always caused trouble in tests if an instance was shared between tests (in an ideal world, this should not happen of course).
But we can also see it from a theoretical point of view: the class violates the Single Responsibility Principle: It is responsible for two things:
- Loading products from some source
- Caching results
How can we split this up? Simple: Use a decorator. The decorator CachedProductRepository
wraps the original implementation and delegates the method call if there is no cached result yet.
class ProductRepository implements ProductRepositoryInterface { public function product($id) { return $this->load($id); } private function load($id) { ... } } class CachedProductRepository implements ProductRepositoryInterface { private $products = []; private $repository; public function __construct(ProductRepositoryInterface $repository) { $this->repository = $repository; } public function product($id) { if (! isset($this->products[$id])) { $this->products[$id] = $this->repository->product($id); } return $this->products[$id]; } }
Now we can use this cache decorator anywhere where we want to cache results and leave it if not. Also we can have different implementations of ProductRepositoryInterface
without duplicating the caching code.
Given any product repository implementation $productRepository
, we can add memoization with:
$productRepository = new CachedProductRepository($productRepository);
Generalized solutions
I looked for a general reusable solution and found some that were implemented as higher order functions (i.e. functions that receive other functions as argument and/or return other functions), that map a function to a memoized function.
Example usage for such a memoize()
function:
function loadProduct($id) { ... } $memoizedLoadProduct = memoize('loadProduct'); $memoizedLoadProduct(42);
This is nice for actual functions, but using it with a class is not so practical anymore. We could use the array callback syntax to pass an instance method to the memoize function:
$memoizedLoadProduct = memoize([$productRepository, 'product'])
But then we would have to change client code to use this new function $memoizedLoadProduct
, no chance to make use of polymorphism. Of course we could still write the cache decorator and use a memoize function inside but that is going to be more complicated than necessary.
To take “more complicated than necessary” to an extreme level, take a look at dominionenterprises/memoize-php. If you really want persistence and cache lifetime, this seems to be a good solution. If not, this is not for you.
Another interesting solution is amitsnyderman/php-memoize-trait but I would not use it because it’s too much magic: you add a trait to any class which adds a __call()
method that lets you call existing methods with an additional underscore to memoize the result:
$instance = new Object(); $instance->method(); // normal $instance->_method(); // memoized
While this is easy to use at first, there are several issues:
- The code does not reveal its purpose, it is not self-explanatory anymore. What is “_method”? Where is it defined?
- No IDE assistance, like autocompletion.
- No transparent usage possible. The calling code has to decide if to use memoization, it has to be aware of this magic trait.
- Still difficult to test. You cannot decide not to use memoization in tests. You cannot even easily replace the class with a test dummy anymore.
So I present you my own solution:
“Memoize” trait for cache decorator
trait Memoize { /** * @var array [method][parameters] */ private $memoizedResults = []; protected function memoizedCall($methodName, $args) { $serializedArgs = \serialize($args); if (! isset($this->memoizedResults[$methodName][$serializedArgs])) { $this->memoizedResults[$methodName][$serializedArgs] = $this->subject->$methodName(...$args); } return $this->memoizedResults[$methodName][$serializedArgs]; } }
Link to Gist: Memoize.php
Usage:
- Include the trait
- Add attribute
$subject
with original instance - Implement all methods of the interface as
return $this->memoizedCall(__FUNCTION__, func_get_args());
Example:
class CachedProductRepository implements ProductRepositoryInterface { use Memoize; private $subject; public function __construct(ProductRepositoryInterface $repository) { $this->subject = $repository; } public function product($id) { return $this->memoizedCall(__FUNCTION__, \func_get_args()); } }
For this simple example, the overhead of the trait is not really worth it, I would prefer our original cached decorator implementation. But as soon as you have multiple methods to memoize and they have more than one parameter or non-scalar parameters, this should come in handy.
Explanation
All results passed through memoizedCall
are stored in a single array, grouped by method name and arguments.
The “magic constant” __FUNCTION__
always contains the current function/method name.
func_get_args()
returns an array with all passed arguments, which we serialize to retrieve a distinct string that can be used as array key. Note: this won’t work with values that are not serializable, like resources. You will have to treat those separately.
$this->subject->$methodName(...$args)
uses argument unpacking with “…” as a convenient way to call a method with an array of arguments. Before PHP 5.6 one would have needed call_user_func_array()
for this.
Is it worth it?
What do we win with this generalized solution? Of course, we have to write less code and it’s still somewhat clear, what the methods are doing. But there are drawbacks:
- The method using my trait
$this->memoizedCall(__FUNCTION__, func_get_args())
is shorter than
if (! isset($this->products[$id])) { $this->products[$id] = $this->repository->product($id); } return $this->products[$id];
but less obvious. How does it work? What are the arguments used for? Code is read more often than written, so a few more lines to be more explicit and self-explanatoy are a good investment. That’s still true if there are more arguments. Let’s add a
$version
parameter to our example:if (! isset($this->products[$id][$version])) { $this->products[$id][$version] = $this->repository->product($id, $version); } return $this->products[$id][$version];
Still clear and not too complex. Being explicit makes the code more readable.
- The existence of this
memoizedCall()
method might mislead into thoughtlessly using it for anything. Resources that are not serializable are the least problem because this will be noticed immediately. But also big complex object structures can cause trouble because serialization gets imperformant. It’s better to force developers (and if it’s only yourself) to think of the smallest possible identifier and only use memoization where it makes sense.
Conclusion
The Memoize trait is a quick and easy to use way to create cache decorators but quick and easy (not to confuse with “simple”) should not be the our main goal in most cases. If you decide to use it, restrict it to methods with a small signature: just a few scalar values or objects without big hierarchy.
Cache decorators are great to separate caching results from calculation of these results. Use them where caching makes sense and put some thoughts into finding a minimal identifier for the given arguments. No generalized solution can relieve you of this responsibility.
Thanks for sharing! Just to let you know that it’s already implemented in Hack language: https://docs.hhvm.com/hack/attributes/special#__memoize
Nice! I didn’t know that, but have to admit that I never actually tried out Hacklang
This is CACHE, not memoize. Memoize is for constants, general cache is for variables.
You cannot memoize anything like $id -> $object _from_external_source (that can be modified or not available.
The requirement to memoize is that it only works with pure functions(immutable input, immutable output), not with procedures/methods or anything that touches the network or external memory (globals, $this, closures).
It must be deterministic, you cannot memoize random_bytes or random_int for example. But you can memoize cos, sin, …, you can use memoize for fourier transform (worthy, lengthy but pure).
In other words, if you cannot map a set A to a set B, it is not memoizable.
Thanks for the clarification!