OSDev.org

The Place to Start for Operating System Developers
It is currently Tue Mar 19, 2024 5:01 am

All times are UTC - 6 hours




Post new topic Reply to topic  [ 189 posts ]  Go to page Previous  1 ... 9, 10, 11, 12, 13  Next
Author Message
 Post subject: Re: Advice for novice programmers thread
PostPosted: Thu Jul 15, 2021 8:57 am 
Offline
Member
Member

Joined: Wed Aug 30, 2017 8:24 am
Posts: 1590
vvaltchev wrote:
But let me remark that using placeholder nodes has a price as well:
We are talking about Java, right? If performance was the top priority, we would not be talking about Java.

Mind you, the same design is possible (using vtables/function pointers) in C, so you can have your NULL free linked list without a JVM.
Quote:
Don't get the wrong idea that I'm a sort of low-level "hacker" who doesn't care about type systems and semantics. I do care, but I don't wanna make compromises with performance as well. So, I tend to use more what is cheap, most of the time. Obviously, it depends on the project I'm working on.

In my career so far, I have learned to ignore performance until it can no longer be ignored. I have learned (the hard way) that programmers are very, very bad at predicting bottlenecks, that what you'd think would take forever can be over in a flash, and vice versa. So I will usually err on the side of readability, understandability, and stability. And if a few cycles need to be sacrificed for that, then who cares? So we may disagree there. In case of a container, I would first care about the interface of it before I ever cared about its implementation.

For example, containers are basically the use case for C++ templates. So if I was building my OS in C++, I'd write the containers as templates. And yes, it would mean that list<pci_device> and list<usb_device> would be different types, with different member functions, and I would have the same code multiple times in the .text segment. However, templates are objectively superior in the areas of readability and stability, over intrusive data structures. Therefore that would be my choice, if I was writing in C++. Since I am not, but rather am writing in C, the runner-up for this problem are intrusive structures, which are themselves easier to understand and maintain than data structures with external storage, or those godawful hacks based on the C preprocessor I have seen.

Also, small point of order I have to thank Solar for reminding me of: We are in the "advice for novice programmers thread". Keyword: novice. A novice car mechanic wouldn't rebuild an entire engine on the first day, a novice baker would not be expected to create a multi-layer wedding cake in the first week, and a novice programmer does not have to know how to build a linked list when there are libraries available.
vvaltchev wrote:
Yep, I know that position. You don't want to teach people a deep understanding of software. You want them to just use pre-built blocks without thinking. Alright.
Is knowing about "next" pointers really furthering your understanding of software? Aren't you zooming in too far? Software is mostly about the big picture, and the little details can be hand-waved for a very long time. In fact I have met several novice programmers who made precisely the mistake of zooming in too far, trying to understand a problem by tracing it with a debugger into the deepest layers of the system. They would have traced it into kernel mode if they could. But completely overlooked that the function being called is a well-known function, with documentation and everything, and loosing themselves in the minute details merely overloaded the mind and made them miss the actual problem (which was that the function was used wrong).

_________________
Carpe diem!


Top
 Profile  
 
 Post subject: Re: Advice for novice programmers thread
PostPosted: Thu Jul 15, 2021 12:02 pm 
Offline
Member
Member
User avatar

Joined: Thu Nov 16, 2006 12:01 pm
Posts: 7612
Location: Germany
nullplan wrote:
...a novice programmer does not have to know how to build a linked list when there are libraries available.


I would go even further. From the very beginning, programming languages and libraries have been all about making things easier. C++ isn't even where it could be, yet. It pains me to see that a simple HTTP request in C++ is still orders of magnitude more complicated than the same operation in Java, that no proper database API has emerged for C++ yet, that it took until C++17 to get a proper file API into the standard, or that Unicode support is still provided by a somewhat poorly-ported Java API (although there is hope that std::text will make an appearance with C++23).

Yes, C++ is and always was good for the low-level stuff, and that won't go away. But unless we strive to make the high-level stuff even easier than the huge improvements since C++11 did, we're rowing a slowly sinking ship. Trying to teach kids 1970's C first "as a foundation" doesn't help.

_________________
Every good solution is obvious once you've found it.


Top
 Profile  
 
 Post subject: Re: Advice for novice programmers thread
PostPosted: Thu Jul 15, 2021 1:32 pm 
Offline
Member
Member

Joined: Fri May 11, 2018 6:51 am
Posts: 274
nullplan wrote:
Mind you, the same design is possible (using vtables/function pointers) in C, so you can have your NULL free linked list without a JVM.
Yes, I totally agree. At the price of making indirect calls all the time and creating additional objects, sure. I'm still wandering how many people would prefer that to a simple NULL-pointer check.

nullplan wrote:
I have learned to ignore performance until it can no longer be ignored
It all boils down to performance, I know. I'm sad to hear you too taking this position. Everybody is free to think whatever they want, just it's sad for me that nowadays most people have this attitude towards performance. Our software gets slower and slower every single day, despite computers getting faster. Software was faster when computers were slower. I was hoping that among kernel developers, I would find many more people considering performance as part of their core values as a developer. Maybe I was wrong, I don't know.

Here's what I think about the whole performance problem: https://vladonsoftware.wordpress.com/2019/09/24/the-always-measure-mindset/
Note: if it's against forum's rules to post links to blog posts written by ourselves (because of self-promotion etc.), let me know and I'll remove the link. Just, wanted to share my opinion in a structured way, instead spamming here with a long off-topic post.


[off-topic]
-----------------------
@Solar I totally appreciated that you didn't "heat up" at all this time. Actually, I was the one who started heating up a little bit. Thanks for not throwing more gasoline on the fire :-) No matter how irreconcilable our views on software might be, I care about not fighting with people at personal level. We still can learn a lot from each other despite coming from two distant galaxies :-)

_________________
Tilck, a Tiny Linux-Compatible Kernel: https://github.com/vvaltchev/tilck


Top
 Profile  
 
 Post subject: Re: Advice for novice programmers thread
PostPosted: Thu Jul 15, 2021 1:50 pm 
Offline
Member
Member

Joined: Tue Feb 18, 2020 3:29 pm
Posts: 1071
vvaltchev wrote:
I'm sad to hear you too taking this position. Everybody is free to think whatever they want, just it's sad for me that nowadays most people have this attitude towards performance. Our software gets slower and slower every single day, despite computers getting faster. Software was faster when computers were slower. I was hoping that among kernel developers, I would find many more people considering performance as part of their core values as a developer. Maybe I was wrong, I don't know.

I agree. I am developing a microkernel, which is slower, but that is because No. 1 security and No. 2 because designing a microkernel would be more fun.

_________________
"How did you do this?"
"It's very simple — you read the protocol and write the code." - Bill Joy
Projects: NexNix | libnex | nnpkg


Top
 Profile  
 
 Post subject: Re: Advice for novice programmers thread
PostPosted: Thu Jul 15, 2021 3:03 pm 
Offline
Member
Member
User avatar

Joined: Thu Nov 16, 2006 12:01 pm
Posts: 7612
Location: Germany
vvaltchev wrote:
We still can learn a lot from each other despite coming from two distant galaxies :-)


We are not that far apart. I like me some well-performing code as well. That is why I'd rather use std::sort than trying to come up with something myself -- I know that sorting algorithm will have been massaged to within an inch of it's life. I would probably not have come up with the byte tweaking that is short string optimization. And Boost.Asio's networking code will beat any of my attempts clean out of the water.

Perhaps you just have more confidence in your ability to create near-optimal code than I have.

_________________
Every good solution is obvious once you've found it.


Top
 Profile  
 
 Post subject: Re: Advice for novice programmers thread
PostPosted: Thu Jul 15, 2021 4:42 pm 
Offline
Member
Member

Joined: Fri May 11, 2018 6:51 am
Posts: 274
Solar wrote:
We are not that far apart. I like me some well-performing code as well. That is why I'd rather use std::sort than trying to come up with something myself -- I know that sorting algorithm will have been massaged to within an inch of it's life. I would probably not have come up with the byte tweaking that is short string optimization. And Boost.Asio's networking code will beat any of my attempts clean out of the water.

Perhaps you just have more confidence in your ability to create near-optimal code than I have.

I'm for what's the best solution for the current problem, given a reasonable amount of effort. When I have a well-known solution for the very same problem, I'm all for re-using it. Re-inventing the wheel is very good for educational purposes. For other purposes, it better to use widespread and better-tested solutions. If I had to sort an array, I'd definitively use std::sort(), because I know its properties and I trust it. I also know that's *not* the fastest algorithm in the average case, but that's not a defect: it's a kind of checked quicksort which is guaranteed to never degenerate to quadratic complexity because it would go for a safer O(N*logN) sorting algorithm like heapsort. Plus it falls back to insertion sort for small arrays etc. In other words, I know it's good and I know why it's good.

Most of the stuff in STL is good and I'd rather not re-implement it in production code, unless I really have a good reason to but.. that doesn't mean I'll use it blindly. For example, if I had to sort a big array of numbers in the range 0-100, I'd measure std::sort() vs. a custom counting sort. I'll ask myself the question: is it worth a custom implementation?

Also for linked lists: for most of user space code, I'm fine with just sticking with std::list<T>. But, if I have to write a piece of core code where I have a ton of objects and a plenty of linked lists per object, I'd seriously consider using some sort of intrustive linked lists in order to skip the heap allocation for each linked list node. If an object belongs to 10 lists, we'd need 1 + 10 allocations per object and that's way too much. Anyway, for user space code I'm kind of tolerant to some degree about inefficiencies if the benefits in terms of readability and maintenance costs are high enough, but for kernel development, I'm not at all. There, I believe it's fine many things to be custom. Kernels have to be super fast because their overhead multiply in the upper levels. My kernel is written in C, but even if I wanted to use C++, I'd never consider a model like std::list<T>, not only because of the overhead in additional heap allocations, but because it many cases, certain operations are simply expected to not fail. Therefore, there cannot be any risky actions like trying to allocate memory in those code paths.

Anyway, generally speaking code is not slow because of std::list<T> or because of std::sort() instead of some custom fancy solution. Code is slow because people are fine copying tons of XML around. Code is slow because a ton of useless data flows between machines. Clients do 10 requests, instead of one. Interfaces are designed poorly. Code is slow (and don't scale) because people grab locks and than do RPC calls while holding the lock, because "that's an easy solution". Code is slow because of perverse 20-level architectures designed by people who never wrote efficient code in the first place. Code is slow because relational databases are abused by people who don't even know SQL and consider it "low level" for their needs. That's why I'm obsessed with performance. If one feels guilty about coping a single string unnecessarily, can you imagine how she/he will feel about copying unnecessarily GBs of files or designing a whole system poorly? That's what I'm talking about. I care about performance all the time, but I just decide to be more or less "cheap", depending on the context.

For example, in the past I wrote a whole 24/7 service in Python as a side-project while working for one of my former employers. The service does mostly I/O, so being written in Python was fine. I didn't consider C++ for that, because it's was not the cost-efficient solution for that problem. Still there, I used decent algorithms, but nothing rocket science. The best decision I took was to not listen to an engineer who advised me to just make the client send 5-10 MB files to the service and get all of its data processed, because that was the simpler solution. What I did instead was to extract from there only the necessary data (a few KBs, on average), send it to the service, get it processed and re-assemble it back in the client. For the same service, another engineer suggested the server to "crunch" GBs of zip and tgz files by decompressing them and then looking for the right files instead of the super complicated set of rules I had to implement and maintain in order to choose exactly the right files in every scenario, dealing with a ton of legacy issues.

At the end, in both cases I won the battle and, thanks to those decisions, the service with just two VMs, scaled up meeting the demands of 100s of people plus several CI/CD services that started to rely on that. The simpler solutions would have decreased the scalability by an order of magnitude or more.

_________________
Tilck, a Tiny Linux-Compatible Kernel: https://github.com/vvaltchev/tilck


Top
 Profile  
 
 Post subject: Re: Advice for novice programmers thread
PostPosted: Tue Sep 14, 2021 6:36 am 
Offline

Joined: Tue Sep 14, 2021 6:24 am
Posts: 1
Struggle, but not too much.
Struggling a little bit when you are learning is really important so that you can solve the problem yourself and find an answer using resources that exist.

_________________
minotaurfightstore | minotaurfightstore


Top
 Profile  
 
 Post subject: Re: Advice for novice programmers thread
PostPosted: Mon Nov 08, 2021 11:50 pm 
Offline
Member
Member

Joined: Sun Jun 23, 2019 5:36 pm
Posts: 618
Location: North Dakota, United States
I have to agree with Solar on this one. In my kernel, I have not once written a data structure. Not a one. Every data structure I've used is either something provided by the Rust standard library or one that someone else has already written because I know that any data structure I write is going to be far less efficient than one somebody else has written and that is most likely very heavily optimized. Knowing how something works is a great thing, but it also can be taken to the extreme, which is the fatal mistake colleges and universities teach. By the logic of "use C++ like C" you might as well say "lets write our own floating-point number parser and floating-point math library because reasons". Its an utterly ridiculous notion because, at the end of the day, when are you ever going to practically hand-write your own data structures? When are you ever going to hand-write your own sorting algorithm implementation? How does knowing how quicksort or bubble sort benefit you in the real world? How is hand-writing a lexer (not with anything like LLVM but manually using a 2-dimentional array, with each character being its own state) going to be useful when that isn't how real lexers are written (and I don't even think that's how they work given unicode and all)? (The method is solely restricted to the first 128 characters of the ASCII table -- it couldn't possibly be unicode aware... The number of states and transitions that would need is so high I wouldn't even know where to begin to calculate it.) How is taking that (non-unicode-aware) lexer and hand-writing a code generator (specific to x86 no less) going to benefit you in the practical world? (This last question is speculation, because I haven't yet taken the compiler writing course, but if the lexer/parser one is any indicator, well...)
I might be wrong on some of these things. I'm no professional programmer. But last time I checked, writing data structures and sorting algorithms was not something you needed to know unless you were doing something really, really unique, or really special that no already provided data structure or algorithm library was capable of doing. Most (all?) people who write parsers/lexers either use LLVM/GCC or parser combinator libraries, or PEGs, and rarely is one hand-written without good reason. Most people used boost/the STL for locks, semaphores, etc., and rarely used the low-level OS-specific APIs unless the APIs provided by the OS could do something the abstracted libraries couldn't (which to my knowledge is pretty rare).
Again: I might be wrong. And if I am, do feel free to correct me. But the above stuff is things that my professors taught me, and the vast majority of it is stuff that I know far better solutions for (e.g. I would never write a lexer using a 2D array and a hand-coded DFA, for example). Its nice theoretical knowledge, I guess, but I can only see it "practically" useful in academia. Sorry for the long rant. If I'm wrong on some of these things, that's okay, I admit to being no expert on these matters.
Edit: fixed a typo.


Top
 Profile  
 
 Post subject: Re: Advice for novice programmers thread
PostPosted: Tue Nov 09, 2021 4:43 am 
Offline

Joined: Sat Oct 10, 2020 4:05 pm
Posts: 14
Ethin wrote:
How is hand-writing a lexer (not with anything like LLVM but manually using a 2-dimentional array, with each character being its own state) going to be useful when that isn't how real lexers are written (and I don't even think that's how they work given unicode and all)? (The method is solely restricted to the first 128 characters of the ASCII table -- it couldn't possibly be unicode aware... The number of states and transitions that would need is so high I wouldn't even know where to begin to calculate it.) How is taking that (non-unicode-aware) lexer and hand-writing a code generator (specific to x86 no less) going to benefit you in the practical world? (This last question is speculation, because I haven't yet taken the compiler writing course, but if the lexer/parser one is any indicator, well...)

In regards to lexing and parsing, LLVM is a library for optimisation and code generation, and GCC (Gnu Compiler Collection) does have a backend you could hook into, similar to LLVM, but it is also not for generating scanners or parsers. For that, the tools I think of first are flex and bison, but there are hundreds written in different languages, targeting different platforms, and using different algorithms.
I would think that dealing with unicode should be pretty easy. Python requires that any unicode character used in an identifier must be a character in an actual language, but Swift allows any unicode character in identifiers (Yes, this includes things like '+').
For the simple case, I imagine you could extend the state machine to parse multi-byte UTF-8 characters in identifiers, and string parsing to pass over them so you don't accidentally end a string earlier than you should have - I don't need to care what unicode character it is so long as I can figure out how long it is. As to how python does it, I have no idea, but a quick google search indicated that there are some attributes of unicode characters that they use to determine if it is valid in an identifier, so it is probably not much more than the simple case to check that. In fact, you might not even need to extend the state machine, but instead just jump to identifier handling, and handle unicode there.
As to why you would roll your own lexer, I would mostly do it for speed (even if it is a tiny amount), but I haven't tested a generator against my state machine lexer yet, so take that with a pinch of salt. For some (very old) basic benchmarks and more information about handwritten state machine lexers, I'd recommend Sean Barrett's 'Some Strategies For Fast Lexical Analysis when Parsing Programming Languages'.
For parsing, it is incredibly difficult to write a state machine parser, but a handwritten recursive descent parser is actually not too hard.
Hand writing a code generator will 100% generate a massive speed boost in compiling, though. LLVM is incredibly slow, even with the amount of optimisation it does, and while slightly better, the GCC backend is still pretty slow. Whatever custom code generator you produce initially will not be optimising though, and a large amount of work would need to go into it to make any optimisations any good, so using a previous backend infrastructure is basically your only option if you just want to get something done quickly.

In regards to algorithms and data structures in general, you technically create a data structure every time you use the 'struct' keyword (thats just me being pedantic, don't take it too seriously), but for well-known general-purpose data structures, I agree that the only time you want to write your own is once you get into optimisation. This might be because you need to handle memory better, or perhaps you want to use a particular collision resolution technique in a hash table because of how you are using it and the standard libraries don't provide this sort of modularity. This is where knowing how things works becomes important - you must understand what your tools are doing and why in order to know when to make a new one. Before then, write simple code that does what you need and no more.


Top
 Profile  
 
 Post subject: Re: Advice for novice programmers thread
PostPosted: Tue Nov 09, 2021 5:02 am 
Offline
Member
Member
User avatar

Joined: Fri Sep 03, 2021 5:20 pm
Posts: 91
Doctor5555 wrote:
In regards to lexing and parsing, LLVM is a library for optimisation and code generation, and GCC (Gnu Compiler Collection) does have a backend you could hook into, similar to LLVM, but it is also not for generating scanners or parsers. For that, the tools I think of first are flex and bison, but there are hundreds written in different languages, targeting different platforms, and using different algorithms.


Instead of flex, I'd recommend RE2C. It's really powerful and easy to use. As for bison, I've been meaning to try out lemon instead. I had some experience with both flex and bison in the past and wasn't impressed.

_________________
Writing a bootloader in under 15 minutes: https://www.youtube.com/watch?v=0E0FKjvTA0M


Top
 Profile  
 
 Post subject: Re: Advice for novice programmers thread
PostPosted: Tue Nov 09, 2021 6:10 am 
Offline
Member
Member

Joined: Wed Mar 30, 2011 12:31 am
Posts: 676
Ethin wrote:
Again: I might be wrong. And if I am, do feel free to correct me.

Most major parsers are hand-written. GCC famously switched away from a Bison grammar to a hand-written parser over a decade ago.

My recommendation on reading material for parsers is... Bob Nystrom's Crafting Interpreters. My language, Kuroko, is based on it.

Ethin wrote:
When are you ever going to hand-write your own sorting algorithm implementation? How does knowing how quicksort or bubble sort benefit you in the real world?

Universities generally teach computer science. Software engineering, as a university curriculum, remains relatively rare. A software engineer may never need to know these things, but understanding algorithms like this is fundamental to the idea of computer science as a discipline. It is a form of mathematics and logic.


Random anecdote: My university compilers course used flex and bison, and I despised it because I spent far more time being annoyed with getting the tools to do what I wanted than I would have spent writing a parser by hand... a fact I wouldn't discover until over a decade later. A real shame...

_________________
toaruos on github | toaruos.org | gitlab | twitter | bim - a text editor


Top
 Profile  
 
 Post subject: Re: Advice for novice programmers thread
PostPosted: Tue Nov 09, 2021 12:11 pm 
Offline
Member
Member

Joined: Tue Feb 18, 2020 3:29 pm
Posts: 1071
While I agree with @Ethin that using somebody else's code may be more efficient, there are a lot of cases where hand writing it yourself is better. The reason why is because in some cases, you can optimize the code for your particular use case. For example, with parsers, you typically wouldn't be able to optimize a Bison parser for the use case you are aiming for. Bison isn't bad, I am using it a lot right now, it just isn't as fast as writing a well-tuned recursive descent parser.

Now, one thing I want to mention. It is an utter disaster to teach people how to write code without them understanding how it works internally. Teaching students how to use C's qsort function is needed (and more pratical then writing your own quicksort function obviously), but not teaching them how quicksort works is very bad. You see, nowadays, there are libraries for everything. This is fine, but when a programmer needs to solve their own problem that a library doesn't cover, they can't, and come up with an inefficient system! This is a big problem.

Also, the workforce makes this worse, as all they care about is money! People will use hundreds of libraries just to speed up development so they can make money. Then their users have a slow program. Money isn't the only problem; when developers have to solve a problem not covered by a library, they were undertrained in the first place and so they write poor algorithms!

Coding is hard. It took me 4 years before I could write semi-decent code. We shouldn't rush new developers. Instead, they should understand how their computer works so they can optimize their code accordingly.

Security is another can of worms I won't touch on here...

_________________
"How did you do this?"
"It's very simple — you read the protocol and write the code." - Bill Joy
Projects: NexNix | libnex | nnpkg


Top
 Profile  
 
 Post subject: Re: Advice for novice programmers thread
PostPosted: Tue Nov 09, 2021 3:08 pm 
Offline
Member
Member

Joined: Fri May 11, 2018 6:51 am
Posts: 274
nexos wrote:
While I agree with @Ethin that using somebody else's code may be more efficient, there are a lot of cases where hand writing it yourself is better. The reason why is because in some cases, you can optimize the code for your particular use case. For example, with parsers, you typically wouldn't be able to optimize a Bison parser for the use case you are aiming for. Bison isn't bad, I am using it a lot right now, it just isn't as fast as writing a well-tuned recursive descent parser.

Now, one thing I want to mention. It is an utter disaster to teach people how to write code without them understanding how it works internally. Teaching students how to use C's qsort function is needed (and more pratical then writing your own quicksort function obviously), but not teaching them how quicksort works is very bad. You see, nowadays, there are libraries for everything. This is fine, but when a programmer needs to solve their own problem that a library doesn't cover, they can't, and come up with an inefficient system! This is a big problem.

Also, the workforce makes this worse, as all they care about is money! People will use hundreds of libraries just to speed up development so they can make money. Then their users have a slow program. Money isn't the only problem; when developers have to solve a problem not covered by a library, they were undertrained in the first place and so they write poor algorithms!

Coding is hard. It took me 4 years before I could write semi-decent code. We shouldn't rush new developers. Instead, they should understand how their computer works so they can optimize their code accordingly.

Security is another can of worms I won't touch on here...

+1


I'd add that having a good algorithms and data structures knowledge does not mean re-writing mainstream things like sorting algorithms, binary search trees, hash tables etc. from scratch: it means knowing very well how to use them and be prepared to customize in case of necessity. Also, while there are libraries and frameworks for a ton of things and 99% of the developers can "survive" by just assembling calls to those libraries with a bit of "glue code" and business logic, that's not how typically good software is written. Good software requires solving those nasty and complicated problems that most of developers wanna avoid. That does not mean re-implementing the well-known algorithms and re-inventing the wheel: it means writing something that previously didn't exist.

It's a common misconception that DSA knowledge is used only to implement well-known standard functions. In reality, every single trivial business-logic function still contains algorithms: just, people often don't see them. Every time we write a foreach loop, we're doing an O(n) operation over some container. It's super-easy to hit bottle-necks even in "simple" business-logic code where a few O(n) functions call each other, ending up in a O(n^3) cost. The software engineer who studied computer science should know that and re-engineer the whole component to increase the performance. Also, the typical real world problem is: why my software does not scale? Are you aware of the complexity of your code paths? Are you using the right locking primitives? Did you measure the lock contention? Are you using fine-grained locking? Lock-less algorithms? RCU? Do you realize that you're dealing with DAG, even if that's not mentioned anywhere? Did you notice that the problem you're trying to solve in O(n^2) can be reduced to a well-known problem for which there is fancy O(n*log(n) + loglog(n)) solution in a paper published 10 years ago? Did you consider using DP to avoid re-computing the overlapping solutions? There are no generic dynamic programming DSA in the standard libraries, because each DP problem requires a custom DP solution. Did you know how many cool things can be achieved by augmenting well-know data structures like binary search trees?

Advanced algorithms and data structures are involved in all of that, all the time. So, yeah, knowing basic DSA is mandatory, because in the real world, if you wanna do the cool stuff and get payed well, you're going to need waaay more than that, and that's true no matter if you wanna be a kernel developer or not.

_________________
Tilck, a Tiny Linux-Compatible Kernel: https://github.com/vvaltchev/tilck


Top
 Profile  
 
 Post subject: Re: Advice for novice programmers thread
PostPosted: Tue Nov 09, 2021 5:47 pm 
Offline
Member
Member

Joined: Sun Jun 23, 2019 5:36 pm
Posts: 618
Location: North Dakota, United States
I see a couple problems with this theorem about "knowing how things work under the hood" (especially for beginner C++ students):
1. Its completely unnecessary information. If I use std::sort, do you honestly think I give a damn about what algorithm it uses? For all I know, it could use CPU dispatching to figure out the fastest algorithm for the system its running on. Do you honestly think a beginner or intermediate programmer of C++ is going to care about that? No, no they won't, assuming they even know what CPU dispatch even means. (As it so happens, glibc functions like strncmp/strcmp uses CPU dispatching to dispatch execution to the fastest implementation of those functions for the running system, but is that something you need to know to use the function correctly? No, no you don't.) This kind of mindset is dangerous and can be easily taken to the extreme, and can result in you getting bogged down in superfluous details that don't even matter in anything like debugging. For newcomers to the language, they get the impression that the innards of a function are a necessity to understand, and so when bug-hunting they will examine those details, and it will take them exponentially longer to locate the actual bug because they'll see if the bug is in the implementation of strncmp or std::sort or insert-your-heavily-optimized-and-analysed-stdlib-function-here (which it most likely isn't).
2. Knowing the internals of a data structure doesn't necessarily help you. To know what the "best" data structure is for a given solution you also have to take into account CPU cache friendliness, memory fetches, etc., and that's definitely not something that a data structures/algorithms class is going to teach you (at least, mine didn't).
3. Security of a system is most commonly made poor by reinventing the wheel. When a newcomer to C++ learns to write C instead of C++ in C++, and are taught C++ the wrong way (don't use the STL, write your own implementations of practically everything), the possibility of a security vulnerability being caused by that individual is dramatically increased. If you use the STL for everything -- as you should be doing in modern C++ unless your doing something special -- the likelihood of you causing the most common security vulnerabilities (out-of-bounds reads, buffer overflows, stack overflows, null pointer dereferences, use-after-free, etc.) is reduced to almost zero. Some memory vulnerabilities (e.g. use-after-free) are entirely eliminated. Its still possible for these bugs to exist, but they certainly won't be in your end-user program or library.
The point that Solar and I are trying to get across is this: it is not necessary to know the internals of every possible thing you use. Using that logic, do you know the internals of your floating-point math library? Do you know the internals of that particular regular expression engine your using? I'm pretty sure that 80-90 percent of the time, you don't. And you don't need to. I don't need to know how std::vector works or how std::list is implemented down to the microscopic level to select the fastest one. That's because the implementation of either of those data structures is implementation-dependent and for all I know the implementation could change tomorrow. If when designing every end-user product or back-end service we worried about the internals, nothing would ever get done. A business and its software developers don't care about how the C++ STL is implemented or how that very fast hashtable they're using works; they care about getting something useful out of it, and so they do. If they cared about the internals of everything they used, they'd never produce anything because they'd be constantly fighting about what's better.
As one of my friends put it, things like std::vector/std::sort/std::move are foundational components. They have been as heavily optimized and tuned as you could possibly get. Anything you write is, 99.99999 percent of the time, going to be far less efficient than if you just used the library that's been optimized to the extreme and beyond by countless software developers. When you become an advanced C++ programmer and are diving into the lower-level areas of software programming like OS development, then these details matter, and knowing them does help a great deal, because in any other language other than C++ and Rust (and a few others) you do have to reinvent the wheel. But even in C++, you can most definitely use the STL in a bare metal target, you just have to implement the memory allocator and find an implementation that doesn't depend on the host OS (and I know there are quite a few of those out there). But as a software developer working on a back-end server to handle shipping and orders, for example, these details are unnecessary and are things that will only bog you down in unnecessary complexities.
Performance is not always about the data structures you use. The data structures have a pretty minor performance hit or gain in the big picture. All modern processors are fast enough that the majority of the time you won't even perceive a performance difference unless you profile the code (unless of course your using a data structure that was very badly written by someone who doesn't actually understand data structures which is, again, why I'm stressing the fact that, if at all possible, use existing implementations and don't write your own). What matters is how much data you put into them. If you put a billion items in an std::vector, then yeah, performance is probably gonna suffer because std::vector wasn't exactly designed to handle that much at once. But if your doing that, then you should be asking yourself why exactly your loading that much data into memory, and you should be worrying about far worse problems than how much data that std::vector has, like if the system your working on does similar (and very bad) practices of loading ridiculously large amounts of data at once instead of trying to locate only the data you actually need. If you focus on only the data you need, those billion items will drop most likely to maybe a few thousand, and std::vector is perfectly capable of handling datasets that large. At that point, the performance hit will, once again, be negligible.


Top
 Profile  
 
 Post subject: Re: Advice for novice programmers thread
PostPosted: Tue Nov 09, 2021 8:25 pm 
Offline
Member
Member

Joined: Tue Feb 18, 2020 3:29 pm
Posts: 1071
I'm not talking about implementation; I wouldn't make anyone look at that. What I mean is knowing the theory behind sorting.

Also, I would use STL any day instead of writing my own implementation of what it covers, but that doesn't mean a CS student shouldn't understand fundamental algorithms behind sorting, linked lists, memory management, etc. I am learning modern C++ through a course, and this course does it write. It looks at the internals of C++, going as far as explaining disassembly. After showing the C way of doing things, it shows the C++ way and says, "do it this way". It hasn't gotten to STL, but I'm sure a similar approach will be taken there.

Finally, the consensus these days is "modern CPUs are fast enough". That is trap, don't go there. Moore's law is ending, look for yourself! When I use an app, I get appalled at the speed most of the time. The reason? Because of layering. Nowadays, everyone uses Electron (which has a full Chromium installation per app!) or React, when there are much faster alternatives such as Xamarin. People make functions 5 or 10 lines each, and then subordinate to other functions. They create too many variables that aren't needed, make too many layers, depend on scripting languages too much, and who knows what else. The result? Windows XP has a more pleasant experience then Fedora with KDE has!

The fix is for people to understand algorithms, and hand write code where it helps, and then use superior libraries where they help. We need to focus on performance with (but not above or equal to) security, and not say "modern CPUs are fast enough".

Because of the above-mentioned excuse, programmers depend on stuff written by others that they don't understand. They go to Stack Overflow and copy-and-paste oodles of JavaScript, C++, and who-knows what else. And they don't even know that they are writing slow software. Why? Because they were told, "modern CPUs are fast enough".

_________________
"How did you do this?"
"It's very simple — you read the protocol and write the code." - Bill Joy
Projects: NexNix | libnex | nnpkg


Top
 Profile  
 
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 189 posts ]  Go to page Previous  1 ... 9, 10, 11, 12, 13  Next

All times are UTC - 6 hours


Who is online

Users browsing this forum: Bing [Bot] and 9 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
Powered by phpBB © 2000, 2002, 2005, 2007 phpBB Group