Elements of Computer Programming
I've always thought that the basics of computer programming were rather simple. Let me see if I can convey something of that to the non-programmers who read this. The idea isn't to teach a useful skill in any actual language, but to convey the idea of what's going on, in concrete enough elements that you could think about how to do something with those elements, and thus demystify what the computer is doing.
The language is one I made up -- an odd blend of BASIC and assembly -- but I hope that it's close enough to English, simple math notation, and intuitive graphics that it can be read and understood even without any experience. Odd bits: the numbers on the left are line numbers, labelling that line of code; they can also be thought of as memory addresses. The colon-ized words ('code:', 'loop:', 'average:') are aliases for the line numbers following them, so 'loop:' means line 30, and 'goto loop:' means "go to line 30". Use of the aliases is optional, as in line 140, which goes to line 100 without bothering with a label. '<-' is an assignment operation, the data on the right is stored in the variable, the memory address, on the left. In the data section, the words aren't instructions, but are labelling that piece of memory. grades is treated as an array, with grades[0] meaning line 240, grades[1] meaning line 250, grades[3] meaning line 270, and so on. The other labels could also be treated as an array, e.g. n[2] would mean line 220. But the way this program is written doing so wouldn't be very meaningful, while regarding 'grades' as an array of a student's grades fits with the rest of the program.
The idea is that computer memory is a big linear array of numerically addressed locations, like mailboxes lining a street. Instructions, lines in a program, are just contents in mailboxes. So is data. One mailbox -- initially the first, line 10 -- is labelled as containing the next instruction. The computer -- the hardware, the CPU -- fetches an instruction from that mailbox, obeys it, then fetches an instruction from the sequentially next mailbox, unless the instruction was a 'goto', which changes the instruction mailbox; repeat. Referring to a mailbox ('n' in 'n + 1') except when on the left of '<-', which modifies the contents. The 'end' instruction tells the CPU to stop, else it might keep on going and start trying to execute the data, which probably wouldn't work well. And yes, it would be possible for the program to modify its own instructions, simply by writing to memory locations before the CPU gets to executing them. This isn't a common practice, though.
This isn't the simplest way to introduce programming, and you'll have to tell me if it's at all effective, but I'm trying this way because THIS IS IT. All of modern programming can be described with what's used above: a numerically addressed array of memory, a fetch and execute cycle of instructions stored in that memory, and an instruction set including basic math operations, comparisons with 'if' instructions branching on those comparisons, 'goto' instructions (or jumps), and special instructions for interacting with the outside world, in this case 'print' and 'getNumberFromKeyboard'. (As opposed to 'getCurrentLocationOfMouse', say). All of Windows, or Photoshop, or some AI program written in a high-level language full of abstract concepts, gets compiled, translated, into these basic elements. (Actually, even more basic elements, where words have to be treated on a character by character basis).
It's not the most fun way to program, and any decent language will give you syntax for at least loops ("do 10 times", "do while this condition is true") and creating functions (blocks of code you can invoke from anywhere), and bigger data types, but you can roll all of that with the basic elements. You can, because we do -- we just usually have a program (the compiler) do so automatically, using the more convenient constructs.
As an example, let me describe the 'print' function the hard way.
Here we can see that a procedure (another word for function) is just another block of instructions somewhere, which we can jump to and from. What makes it a procedure is that we store (in returnAddressStack) where we're jumping from, and use that at the end of the 'print' block to know where to jump back to. The reason we need a Stack, and thus a returnPointer, is that procedure can call other procedures, or even themselves; if we used just a returnAddress variable we could only use one procedure at a time, or else lose track of where we should return to.
This is such an important part of programming that most CPUs have special hardware instructions to do the dirty work of returnAddressStack and returnPointer for us. Assembly language programmers still have to load a procedure's arguments, e.g. 'stringToPrint', manually. Useful programming languages devolve that to the compiler as well.
Hmm. In retrospect I'll be surprised if all this text works for what I wanted. Oh well...
code: 10 print "Grade Averaging Program" 20 print "Enter the student's grades. Enter -1 when done." loop: 30 grade <- getNumberFromKeyboard 40 if grade < 0 then goto average: 50 grades[numgrades] <- grade 60 numgrades <- numgrades + 1 70 goto loop: average: 80 print "Printing grades:" 90 n <- 0 100 sumgrades <- 0 110 print grades[n] 120 n <- n + 1 130 sumgrades <- sumgrades + grades[n] 140 if n < numgrades then goto 100 150 print "Average:" sumgrades/numgrades 160 end data: 200 n 210 grade 220 numgrades 230 sumgrades 240 grades
The language is one I made up -- an odd blend of BASIC and assembly -- but I hope that it's close enough to English, simple math notation, and intuitive graphics that it can be read and understood even without any experience. Odd bits: the numbers on the left are line numbers, labelling that line of code; they can also be thought of as memory addresses. The colon-ized words ('code:', 'loop:', 'average:') are aliases for the line numbers following them, so 'loop:' means line 30, and 'goto loop:' means "go to line 30". Use of the aliases is optional, as in line 140, which goes to line 100 without bothering with a label. '<-' is an assignment operation, the data on the right is stored in the variable, the memory address, on the left. In the data section, the words aren't instructions, but are labelling that piece of memory. grades is treated as an array, with grades[0] meaning line 240, grades[1] meaning line 250, grades[3] meaning line 270, and so on. The other labels could also be treated as an array, e.g. n[2] would mean line 220. But the way this program is written doing so wouldn't be very meaningful, while regarding 'grades' as an array of a student's grades fits with the rest of the program.
The idea is that computer memory is a big linear array of numerically addressed locations, like mailboxes lining a street. Instructions, lines in a program, are just contents in mailboxes. So is data. One mailbox -- initially the first, line 10 -- is labelled as containing the next instruction. The computer -- the hardware, the CPU -- fetches an instruction from that mailbox, obeys it, then fetches an instruction from the sequentially next mailbox, unless the instruction was a 'goto', which changes the instruction mailbox; repeat. Referring to a mailbox ('n' in 'n + 1') except when on the left of '<-', which modifies the contents. The 'end' instruction tells the CPU to stop, else it might keep on going and start trying to execute the data, which probably wouldn't work well. And yes, it would be possible for the program to modify its own instructions, simply by writing to memory locations before the CPU gets to executing them. This isn't a common practice, though.
This isn't the simplest way to introduce programming, and you'll have to tell me if it's at all effective, but I'm trying this way because THIS IS IT. All of modern programming can be described with what's used above: a numerically addressed array of memory, a fetch and execute cycle of instructions stored in that memory, and an instruction set including basic math operations, comparisons with 'if' instructions branching on those comparisons, 'goto' instructions (or jumps), and special instructions for interacting with the outside world, in this case 'print' and 'getNumberFromKeyboard'. (As opposed to 'getCurrentLocationOfMouse', say). All of Windows, or Photoshop, or some AI program written in a high-level language full of abstract concepts, gets compiled, translated, into these basic elements. (Actually, even more basic elements, where words have to be treated on a character by character basis).
It's not the most fun way to program, and any decent language will give you syntax for at least loops ("do 10 times", "do while this condition is true") and creating functions (blocks of code you can invoke from anywhere), and bigger data types, but you can roll all of that with the basic elements. You can, because we do -- we just usually have a program (the compiler) do so automatically, using the more convenient constructs.
As an example, let me describe the 'print' function the hard way.
10 stringToPrint <- 1000 20 returnAddressStack[returnPointer] <- 50 30 returnPointer <- returnPointer + 1 40 goto print: 50 stringToPrint <- 1030 ... print: 300 currentChar <- 0 310 if stringToPrint[currentChar] = NULL then goto return: 320 printChar stringToPrint[currentChar] 330 currentChar <- currentChar + 1 340 goto 310 return: 350 returnPointer <- returnPoint - 1 360 goto returnAddressStack[returnPointer] ... 1000 "Grade Averaging Program", NULL 1030 "Enter the student's grades. Enter -1 when done.", NULL ... 10010 returnPointer 10020 returnAddressStack
Here we can see that a procedure (another word for function) is just another block of instructions somewhere, which we can jump to and from. What makes it a procedure is that we store (in returnAddressStack) where we're jumping from, and use that at the end of the 'print' block to know where to jump back to. The reason we need a Stack, and thus a returnPointer, is that procedure can call other procedures, or even themselves; if we used just a returnAddress variable we could only use one procedure at a time, or else lose track of where we should return to.
This is such an important part of programming that most CPUs have special hardware instructions to do the dirty work of returnAddressStack and returnPointer for us. Assembly language programmers still have to load a procedure's arguments, e.g. 'stringToPrint', manually. Useful programming languages devolve that to the compiler as well.
Hmm. In retrospect I'll be surprised if all this text works for what I wanted. Oh well...