Solar wrote:
We are not that far apart. I like me some well-performing code as well. That is why I'd rather use std::sort than trying to come up with something myself -- I know that sorting algorithm will have been massaged to within an inch of it's life. I would probably not have come up with the byte tweaking that is short string optimization. And Boost.Asio's networking code will beat any of my attempts clean out of the water.
Perhaps you just have more confidence in your ability to create near-optimal code than I have.
I'm for what's the best solution for the current problem, given a reasonable amount of effort. When I have a well-known solution for the very same problem, I'm all for re-using it. Re-inventing the wheel is very good for educational purposes. For other purposes, it better to use widespread and better-tested solutions. If I had to sort an array, I'd definitively use std::sort(), because I know its properties and I trust it. I also know that's *not* the fastest algorithm in the average case, but that's not a defect: it's a kind of checked quicksort which is guaranteed to never degenerate to quadratic complexity because it would go for a safer O(N*logN) sorting algorithm like heapsort. Plus it falls back to insertion sort for small arrays etc. In other words, I know it's good and I know why it's good.
Most of the stuff in STL is good and I'd rather not re-implement it in production code, unless I really have a good reason to but.. that doesn't mean I'll use it
blindly. For example, if I had to sort a big array of numbers in the range 0-100, I'd measure std::sort() vs. a custom counting sort. I'll ask myself the question: is it worth a custom implementation?
Also for linked lists: for most of user space code, I'm fine with just sticking with std::list<T>. But, if I have to write a piece of core code where I have a ton of objects and a plenty of linked lists per object, I'd seriously consider using some sort of intrustive linked lists in order to skip the heap allocation for each linked list node. If an object belongs to 10 lists, we'd need 1 + 10 allocations per object and that's way too much. Anyway, for user space code I'm kind of tolerant to some degree about inefficiencies if the benefits in terms of readability and maintenance costs are high enough, but for kernel development, I'm not at all. There, I believe it's fine many things to be custom. Kernels have to be super fast because their overhead multiply in the upper levels. My kernel is written in C, but even if I wanted to use C++, I'd never consider a model like std::list<T>, not only because of the overhead in additional heap allocations, but because it many cases, certain operations are simply expected to
not fail. Therefore, there cannot be any risky actions like trying to allocate memory in those code paths.
Anyway, generally speaking code is
not slow because of std::list<T> or because of std::sort() instead of some custom fancy solution. Code is slow because people are fine copying tons of XML around. Code is slow because a ton of useless data flows between machines. Clients do 10 requests, instead of one. Interfaces are designed poorly. Code is slow (and don't scale) because people grab locks and than do RPC calls while holding the lock, because "that's an easy solution". Code is slow because of perverse 20-level architectures designed by people who never wrote efficient code in the first place. Code is slow because relational databases are abused by people who don't even know SQL and consider it "low level" for their needs.
That's why I'm obsessed with performance.
If one feels guilty about coping a single string unnecessarily, can you imagine how she/he will feel about copying unnecessarily GBs of files or designing a whole system poorly? That's what I'm talking about. I care about performance all the time, but I just decide to be more or less "cheap", depending on the context.
For example, in the past I wrote a whole 24/7 service in Python as a side-project while working for one of my former employers. The service does mostly I/O, so being written in Python was fine. I didn't consider C++ for that, because it's was not the cost-efficient solution for that problem. Still there, I used decent algorithms, but nothing rocket science. The best decision I took was to
not listen to an engineer who advised me to just make the client send 5-10 MB files to the service and get all of its data processed, because that was the simpler solution. What I did instead was to extract from there only the necessary data (a few KBs, on average), send it to the service, get it processed and re-assemble it back in the client. For the same service, another engineer suggested the server to "crunch" GBs of zip and tgz files by decompressing them and then looking for the right files instead of the super complicated set of rules I had to implement and maintain in order to choose exactly the right files in every scenario, dealing with a ton of legacy issues.
At the end, in both cases I won the battle and, thanks to those decisions, the service with just two VMs, scaled up meeting the demands of 100s of people plus several CI/CD services that started to rely on that. The simpler solutions would have decreased the scalability by an order of magnitude or more.