I agree with most of what you write, and I use test fixtures that spin up real services with docker for running integration tests. However, I'd argue that mocking will always have a usefulness, as it goes hand in hand with unit tests. Integration tests confirms the behaviour for the code branching you are able to reach that way, and more closely reflects reality and normal operations. Mocking allows you to more easily test the edge cases that the software unit might encounter that are hard to create conditions for otherwise.
In fact, from also being tempted by what you write (ie. why mock, when you can automatically spin up a docker service). I've done this too, and my conclusion is that it is more trouble than it's worth, although that can differ for other systems, I suppose. The only realistic approach is that a test fixture spins services up once, and not per test/unit, so it just introduced a concern for correct clean and concurrency limitations that wouldn't exist. It also has unexpected issues with wildly varying startup times whenever or not the container image is cached.
All in all, it works well enough now, but when I do it all over somewhere, I'm tempted to keep a clean distinction of deploying temporary services for running integration tests, and regular unit tests, without the in between.
TL;DR: mocking isn't just for "i need service A in my test", but also "I need A to behave like X", and don't get that with external services.
Just want to make sure, but it seems like I can sum up your reasons for mocking are:
- spinning up the service is hard
- startup times can be inconsistent
I may have not read deep enough but I don't see much explanation around the "I need A to behave like X" -- could you give a more concrete example? Lets say you had two queue implementations and one used Redis and one used RabbitMQ (i.e `GenericQueue<Redis|Rabbit>`), it seems ridiculous to need one of them to act like the other -- I'd love an example if you have one in mind.
Optimizing spinning up the service and making startup times consistent are basically solved in half a day of work (that pays off forever), in my experience. Here's an excerpt from a Gitlab CI config I carry around from project-to-project:
e2e:
stage: test
only:
- merge_requests
services:
- docker:dind
variables:
DOCKER_HOST: tcp://docker:2376 # TLS enabled (2375 for disabled)
DOCKER_TLS_CERTDIR: '/certs'
DOCKER_TLS_VERIFY: 1
DOCKER_CERT_PATH: '/certs/client'
script:
- make ensure-images setup build
- make test-e2e
The "ensure-images" make target actually just runs a `docker pull` before the tests:
Which actually reminds me, another cool thing you can mock if you do this -- it's not 100% parity but there's a cool stripe replacement called localstripe[0]. There are bugs/unimplemented features (expecting a random F/OSS project without a ton of members to replicate stripe's API is a lot) but it exists and isn't too hard to use, so I use it in test/local/dev environments, use stripe test API keys in staging, and the real keys in prod.
Another thing worth thinking is that how you optimize each piece is different -- it's often pretty fast to make new exchanges/databases and you can even just TRUNCATE tables if you're really worried about it. Here's an example from some code I wrote:
public resetForTest(): Promise<this> {
if (!this.conn) { return Promise.reject(new DatabaseDisconnectedError()); }
// Do not truncate if database is <production db name>
// for example when process.env.USE_LOCAL_DB_FOR_TEST is set
if (this.getDatabaseName() === "<production db name>") {
return Promise.resolve(this);
}
return this.getConnection()
.then(c => c.query(`
TRUNCATE TABLE <some table> CASCADE;
TRUNCATE TABLE <another table> CASCADE;
... more ...
`))
.then(() => this);
}
And this pattern extends itself to anything that is testable, so I've enshrined it:
Completely separate from all this -- some of the time you can literally just.. run these dependencies in memory[1]. Doesn't get much faster than that. I haven't run into a mock that didn't take time and effort to maintain/alter whenever code or the underlying systems changed. Time spent maintaining mocks is time wasted.
Anyway, I won't go as far as saying that mocks are never useful, sometimes a mock is the easiest and best way -- but I can say that I spend most of my time writing E2E tests, and I trust them a lot and they give me trust in my codebase much more than mocks or unit tests do. I barely write unit tests anymore to check edge cases, because if I really cared I'd just generate them -- and I don't really have to care because I use strong type systems where possible (Typescript for JS, I avoid PHP/Python/Ruby, Rust, Haskell).
Here comes the hot-take -- unit tests (and their rise in popularity/necessity) is actually a direct reflection of the adoption of non-compile-time-type-checked dynamically interpreted languages, and burdensome class-based "type systems". Correct usage of good, expressive and concise (where possible) type systems encourages you to make invalid/nonsensical states impossible, and people got excited about how fast you could churn out code (compared to Java most things feel pretty productive) and they started having numbers where they thought they had strings, and negative numbers where they thought they had natural numbers. Good type systems make it easy to make these cases impossible. As for the business stuff (never store decimals for currency, make sure your amounts are natural numbers, etc), you have to learn that with time/intuition built over time -- you don't know to write that unit test unless it's bit you before (though sitting down to write unit tests might tease it out of you).
I write unit tests as regression tests basically now, but I find I rarely have to do that, since most of the time when a weird case has gone through it's an indicator that I was too lose with the types.
There's a place for e2e tests that run the actual services, and a place for tests that use mock servers. As a full stack developer, I know for a fact that spinning up a service, even if it's dockerized, is more often than not a hassle. Maybe because of configuration hell, maybe because setting up the database (with the migrations, fixture data, etc.) is never a straightforward job even if it's "automated". And if the architecture requires service A to talk to service B, which in turn needs to talk to service C, you end up having to spin up way too much infrastructure (each with their own configuration and database setup hells) for just a simple test. It's not worth it, and often it's not feasible since you may not even have enough memory in your computer.
For frontend developers or data scientists who couldn't care less about the backend, going through this hassle is frustrating and there's no reason why they should go through it.
Mocks are also a fundamental part of building integrations, since they allow both backend and frontend teams to work in parallel against a specification/contract.
That said, for the real integration test, you do need to run the real services and have them talking to each other. If you can do that in your own machine, great. But often the only feasible way to do this is in the cloud.
> There's a place for e2e tests that run the actual services, and a place for tests that use mock servers. As a full stack developer, I know for a fact that spinning up a service, even if it's dockerized, is more often than not a hassle. Maybe because of configuration hell, maybe because setting up the database (with the migrations, fixture data, etc.) is never a straightforward job even if it's "automated".
Configuration hell happens in the cloud as well, and again I noted earlier -- it should be easy for your application to automatically apply and make sure a given database is migrated. Applications shouldn't start if they're not running on the version of database they expect. Maybe producing fixture data is hard but this is stuff you would have had to do with a mock, anyway. Good mocks also need fixture data because they need to act on that data just like the thing they're mocking would.
As a side note this is literally what tooling/infrastructure engineers get paid to do. They build good systems for application developers so that it will feel like magic. It will be easier for everyone, forever, and you're going to reap the rewards in meaningful, impactful tests that are closer to your production infrastructure. People are doing things these days like "testing in production" because mocks just don't match production enough.
If this is like.. what you should do with 2 engineers at the very beginning of a startup, then sure, write mocks to your heart's content. Or even better, don't mock anything just write the basic E2E tests (against test cloud resources or your production cloud resources), and get to shipping even faster. If resources are that tight, no one has time to sit around writing mocks in a code-base with a high churn rate right?
> And if the architecture requires service A to talk to service B, which in turn needs to talk to service C, you end up having to spin up way too much infrastructure (each with their own configuration and database setup hells) for just a simple test. It's not worth it, and often it's not feasible since you may not even have enough memory in your computer.
People are running around with $3000 Macbooks with very fast cores and lots of memory, but running their cloud instances with ASGs of t2.micros/smalls/larges/xlarges initially. Half the time all you need for most apps is a database (postgres), cache (redis), message queue (redis), and S3 (minio), and something to send emails. You can run local versions of these requirements on <4 hyper threads (let's say 2/4 cores). You're not going to be able to test BigQuery or DynamoDB (actually, you can run ScyllaDB but that's besides the point) or some cloud-specific stuff, but most apps just don't need that stuff. It's not that hard to run all 5 of these services on a modern dev machine, nevermind if companies spent half the price of the laptop and got you a desktop machine (like the "thin client" business machines out there these days[0]).
But again, all I can hear is that the experience is lacking -- you've just never seen it done simply, doesn't mean it can't be done or it isn't a good idea -- you've just never run into a place with the right resources/time to build proper developer tooling.
I'd argue it is worth it because you're going to build a mock that approximates those services to test them, are you not? Or are you just going to mock that the request came back correctly? I already mentioned that if you want to just not spin the thing up at all, then yeah write a mock for the function call or whatever, but if you're going to write a mock for that thing, you're basically writing technical debt that will be wrong/subtly broken as soon as the underlying thing changes. Why do that when you can just... use the actual thing?
> For frontend developers or data scientists who couldn't care less about the backend, going through this hassle is frustrating and there's no reason why they should go through it.
Sure but this is the same point as before -- it's hard only when you haven't dedicated any engineering time or the right engineers to it. If it's crucial to you all moving faster, then take the time, build the shared machinery, and it will pay dividends. Maintaining mocks is not free, and the costs are externalized/hidden -- mocks do not maintain parity with the actual implementation, they are a reflection of how you think the actual implementation works.
As a side note, if you want to accelerate frontend developers or data scientists there are many other good options -- they can develop against an API instance that's in the cloud completely (if you have preview apps for Heroku, or a similar setup elsewhere for example) -- data scientists could use hosted notebooks with access to the appropriate data lakes. There are lots of other things you can do there for a fully streamlined experience. I'm going to assume that a fully streamlined experience wasn't the goal here, but rather easy deployment of this software on their machines -- again, this is a matter of building a good experience. If it's hard, make it easy -- write good software and it will look like magic.
> Mocks are also a fundamental part of building integrations, since they allow both backend and frontend teams to work in parallel against a specification/contract.
Mocks are not a fundamental part of building integrations -- the actual backing service is the only fundamental part. This is why people usually just... don't worry about writing tests, or don't mock when they first start out. Writing and maintaining good mocks that mock what you want and reflect the functionality well is non trivial.
Specifying a contract for the software you're producing is a good practice, but it is not tied to mocking, it is to afford you flexibility in writing integrations (instances of the interface). You have chosen to write an integration that doesn't run anywhere else except in tests. Writing a Queue<LocalMemory> is drastically less useful than writing a Queue<Elasticache> and Queue<Redis> and Queue<Kinesis>, though they are all implementations, because Queue<LocalMemory> is never used anywhere else. Bugs an under-engineering/discrepancies in Queue<LocalMemory> will never be seen in production and are a waste to work on, when you can spin up Redis and use Queue<Redis> locally. Queue<Redis> might actually see use in real life, and we know that Queue<Elasticache> and Queue<Redis> are actually kind of the same thing (definitely more-so than Queue<LocalMemory>).
> That said, for the real integration test, you do need to run the real services and have them talking to each other. If you can do that in your own machine, great. But often the only feasible way to do this is in the cloud.
??? This isn't true really. You can absolutely run software on your own machine and fully integration test a system. In fact, for the amount of time that your app's database (which is normally the limiting factor size wise) is less than 100GB, you can run most complete apps (imagine a simple one like Laravel/Rails/Django) on your own machine.
If your app can run on <10 t2 instances (let's assume 3 of those are RDS), and doesn't use more than 500GB of data in total (again, harddrives are bigger than this these days).
Look, we don't have to agree, diversity of opinion is great -- but none of these arguments are doing it for me. They all just amount to "it's hard" with the assumption being that mocks are free. Mocks are not free, and while they may have their place, I would rather just do the work of making it easy for me to spin up the infrastructure where possible. You don't have to be an expert -- hire someone else who is to make sure your developers are productive. If it's <5 devs then OK, maybe no one has that time or expertise, doesn't mean it's not a good idea.
Apologies for not getting to read and reply to your full post, but I wanted to quickly respond to the request for an example regarding "I need A to behave like X". I don't think the example is too contrived:
Let's say A is a REST API service. And you wish to check how your code behaves when a client receives a 5XX response. There might be reliable ways to cause A to reply with such a failure, however, if what you intend to verify is this particular behavior, then simply forcing A to reply with a 5XX through mocking will probably get you there quicker, and have it be decoupled from A's implementation (e.g. the conditions that triggered 5XX no longer apply after a version bump, etc)
Edit: I took the time to read the full post :). In broad terms, I agree with you. I think however that the approach to software testing allows a bit more nuance, and the usefulness of any approach must be determined more on a case-by-case. In very broad strokes, they follow the pros-and-cons associated with unit- and integration-tests. Are you interested in testing the behavior as end-to-end, then integration tests will get you there more reliably than any complicated setup of unit tests will ever do. Containerized services for temporary deployments is also a great way to spin up such dependencies. However, unit tests serve a different purpose, which deliberately try to be as decoupled from any dependency as possible, and as cohesive as possible. It then follows that if you zoom in to this software unit, and value it for its decoupled nature, and you wish to test how it behaves given certain conditions, mocking is a better and easier way to go about it.
> Let's say A is a REST API service. And you wish to check how your code behaves when a client receives a 5XX response. There might be reliable ways to cause A to reply with such a failure, however, if what you intend to verify is this particular behavior, then simply forcing A to reply with a 5XX through mocking will probably get you there quicker, and have it be decoupled from A's implementation (e.g. the conditions that triggered 5XX no longer apply after a version bump, etc)
Ahh thank you for clarifying what you meant, I see what you mean now, I was imagining something else in my head that was incorrect -- that case makes absolute sense to mock.
> Edit: I took the time to read the full post :). In broad terms, I agree with you. I think however that the approach to software testing allows a bit more nuance, and the usefulness of any approach must be determined more on a case-by-case. In very broad strokes, they follow the pros-and-cons associated with unit- and integration-tests. Are you interested in testing the behavior as end-to-end, then integration tests will get you there more reliably than any complicated setup of unit tests will ever do. Containerized services for temporary deployments is also a great way to spin up such dependencies. However, unit tests serve a different purpose, which deliberately try to be as decoupled from any dependency as possible, and as cohesive as possible. It then follows that if you zoom in to this software unit, and value it for its decoupled nature, and you wish to test how it behaves given certain conditions, mocking is a better and easier way to go about it.
Fully agreed here! Case by case is the way to decide when to use either tool, and I think that kind of timing (what level of "zoom" you're at in the codebase) is pretty reasonable as a metric.
The reason I use mocks is not primarily to avoid latency or setting things up, it's because it's the best way to test a permutation space in an isolated fashion.
Type systems are great and indeed many unit tests are a poor replacement for it, but for any moderately complex piece of logic, types can only make trivial guarantees.
Also, clever use of types might enforce something nicely, but they can still be really hard to work with. Error messages are often cryptic and don't describe anything semantic about the problem. A properly constructed unit test can easily tell you "You ended up returning a date that was set in the past compared to the expected outcome" which gives you much more information.
> The reason I use mocks is not primarily to avoid latency or setting things up, it's because it's the best way to test a permutation space in an isolated fashion.
So I think I've been talking past people on this point and not getting what people mean -- when I think of a "mock" I think of like a completely re-implemented piece of infrastructure. I am totally on board with the idea of mocking out individual methods and/or functions that indicate a certain point in the state permutation space.
What I was trying to disagree with was the idea that people should be maintaining mocks/approximations of pieces of simple infrastructure (ex. Redis) they could just spin up. So in my opinion, never write an in-memory cache component (with redis operations like get/set/...) for a local integration test when you can just run redis. If you want to just test how some piece functions when Cache<Redis>.set(...) fails, then that totally makes sense to just mock out (as in make .set() throw/return an exception/error or whatever).
> Type systems are great and indeed many unit tests are a poor replacement for it, but for any moderately complex piece of logic, types can only make trivial guarantees.
> Also, clever use of types might enforce something nicely, but they can still be really hard to work with. Error messages are often cryptic and don't describe anything semantic about the problem. A properly constructed unit test can easily tell you "You ended up returning a date that was set in the past compared to the expected outcome" which gives you much more information.
Can I introduce you to my friends Haskell and Rust? It really depends on the type system, but to be fair there's also quite a productivity trough early on working with these languages and trying to really take advantage of the expressiveness of their type systems.
In the end you can't model out really complex logic -- but often you can definitely avoid writing the kind of things you would have checked for with units tests, and some integration tests at least. Agreed on the second point, but the languages mentioned above have made great strides in that respect, Rust was well known at the start for it's error message readability and Haskell is pretty decent at it, though the sticky situations you can get into in Haskell are an order of magnitude stickier.
IMO that date question is a really good/easy problem for generated tests -- never producing a date that is in the past is a property that you're trying to maintain, and randomizing input is likely to be more effective in doing the fuzzing for the inputs. To be fair, that's more complexity than a simple unit test would add (especially in terms of simply preventing regression).
In fact, from also being tempted by what you write (ie. why mock, when you can automatically spin up a docker service). I've done this too, and my conclusion is that it is more trouble than it's worth, although that can differ for other systems, I suppose. The only realistic approach is that a test fixture spins services up once, and not per test/unit, so it just introduced a concern for correct clean and concurrency limitations that wouldn't exist. It also has unexpected issues with wildly varying startup times whenever or not the container image is cached.
All in all, it works well enough now, but when I do it all over somewhere, I'm tempted to keep a clean distinction of deploying temporary services for running integration tests, and regular unit tests, without the in between.
TL;DR: mocking isn't just for "i need service A in my test", but also "I need A to behave like X", and don't get that with external services.