I periodically collect postings about similar subject matter into a compound posting. This is a book-length posting about ways to make progress in computer science. There are ten separate sections, although there is some interrelationship between them.
My favorite target here is the primitive ASCII system of encoding that has been around since the early days of modern computing. It is really incredible that we have made such fantastic progress in computing and communications technology but are still using the original, very inefficient, encoding system. It is as if we are in the Space Age but are still writing in hieroglyphics.
Another factor in computer science is that it is not only about technology but also about language and mathematics. There are new computer languages for programming but the computers and phones still do not actually understand the information that is inputted into them. They only blindly follow the instructions that are inputted.
TABLE OF CONTENTS
Some sections have subsections, which are denoted with a lower case letter.
1) THE FLOATING BASE SYSTEM
1a) TRIMMING BYTES
1b) THE BASE SYSTEM
1c) THE FLOATING BASE SYSTEM
2) THE SCAN METHOD OF COMPUTER ENCODING
3) NUMBERED SENTENCES
4) THE LIGHT COMPUTER
4a) DATA STORAGE USING LIGHT
5) DATA COMPRESSION USING THE GREATER THAN AND LESS THAN SEQUENCE
6) THE EXPONENT METHOD OF STORING AND TRANSMISSION OF INFORMATION
7) TURNING COMPUTERS LOOSE
8) HUMAN LANGUAGE COMPILATION
9) SMART NUMBERS
9a) INTRODUCTION
9b) WHY CAN'T WE EXPRESS PI IN FINITE FORM?
9c) IT SOMETIMES SEEMS AS IF THERE ARE MISSING NUMBERS
9d) WE ARE NOT MAKING FULL USE OF NUMBERS
9e) THERE ARE REALLY NO IRRATIONAL NUMBERS
9f) THE IMPORTANCE OF RATIOS
9g) IMAGINARY AND COMPLEX NUMBERS
9h) THERE REALLY ARE NO NEGATIVE NUMBERS
9i) BIOLOGY AND COMPLEX NUMBERS
9j) COSMOLOGY AND SMART NUMBERS
10) COMPUTER BREAKTHROUGH
1) THE FLOATING BASE SYSTEM
Have you ever thought about how we are continually improving the performance of computers in such ways as processor speeds and hard drive capacity while we are still using the basic coding structure that has been in place since the beginning of the modern computer era and that it would be very beneficial if we would bring that into the Twenty-First Century?
The incredible progress in other areas of computing is overshadowed by our continued use of the primitive ASCII coding system for the characters of the alphabet, numbers and, punctuation. We now have the chance to make just as much progress in computer capacity and efficiency with simple arithmetic as we can with the usual chip and storage technology. The way we have been making continuous upgrades in applications and operating systems while still using the old ASCII coding system is like developing entirely new ways of printing documents with the latest version of Microsoft Office but then delivering the documents to their destination by Pony Express.
You can see an ASCII table at http://www.asciitable.com/ or read all about it on http://www.wikipedia.com/
If we could make the storage and transmission of data more efficient it would have the same results as making progress on the technological side.
1a) TRIMMING BYTES
Suppose we could reduce the size of the byte used in ASCII from eight bits to six? That would immediately make storage and transmission of data more efficient because it would require only 75% of today's capacity for the same amount of data. It is true that six bits gives us only 64 possible combinations, as opposed to the 256 of eight-bit bytes. But those eight-bit bytes were implemented in the days before spellcheck technology in word processing.
Why not eliminate capital letters from the coding for the purposes of storage and transmission and then have spellcheckers capitalize the proper letters later? I believe that we could also eliminate quite a few of the other characters presently used in ASCII coding. We could even eliminate the number characters; 1234567890 by having numbers automatically spelled out for storage and transmission and then being reduced to numbers later on, if required. It is true that the word "five" takes up more space than "5" but it would reduce the number of different characters that must be encoded.
1b) THE BASE SYSTEM
Now, on to another idea for increasing computer efficiency by reforming ASCII. I have a way to not only store and transmit computer data with fewer bits but to speed up processing by freeing the processor from deciphering the eight-bit bytes of ASCII. As you know, a number system can be of any base. Ours happens to be base-ten because ancient people counted things on their ten fingers. Computers are coded in binary, or base-two and the computer world also uses hexidecimal, or base-sixteen.
Suppose we could consider any text that had to be stored or transmitted not as text, but as a number. If we could eliminate capital letters and numbers, our alphabet could be considered a base-twenty seven number system because there are twenty-six letters and a space between words.
.
In fact ASCII is, in effect, a base-two hundred fifty six number system. If we could see any text as actually a number, all we would have to do is store and transmit that number in binary. Not only would it require far less space but would also be much easier on the computer processor because it would be just one long number and not composed of bytes.
Let's consider something simple like my name, Mark Meek. If encoded into ASCII it would require nine bytes, including the space. In other words, 72 bits. But if we devised a base-fifty number system, let the space = zero, a = 1, b = 2, z = 26 and so on, so that my name was considered by the computer to be a number, it would require only 49 bits, according to my calculations. Using a base-fifty number system should allow us to incorporate all the required punctuation and control keys since capital letters and possibly numerical characters can be eliminated until later.
.
1c) THE FLOATING BASE SYSTEM
Now, let's move on. Suppose we could use a flexible, or floating, number base for each block of data that we store or transmit? It would make it even more efficient. This is done by automatically scanning the data, detecting how many different characters along with invisible control keys it contains and using that number as the number base to encode the data as a single large number.
This would be incredibly efficient. How many documents or blocks of text contain characters such as ! # ^ & * +=( )? The answer is only a relative few. So, why is it necessary to have space to encode characters that are not there on each block of text? Also, a document may not contain letters such as q, x or, z, making it unnecessary to include space in the coding for them.
The goal should be to make the number base for encoding the text as low as possible. Back in my music days there was a song I liked, a line of which is "It don't come easy, you know it don't come easy." To encode this, the program would count the number of different characters, including spaces, punctuation and, control characters and use that number as the base by which it would be encoded. This is the type of calculation at which computers excel. If a block of text were too large for the computer processor to calculate the number representing the text, it would simply break it up into two or more blocks.
2) THE SCAN METHOD OF COMPUTER ENCODING
I am burdened with an extreme sense of efficiency. When I was young, I never seemed to have enough time to do all of the reading and exercise that I wanted to do and so I was always looking for ways to make things more time-efficient. This gave me a sense of efficiency that applied to everything.
One thing that I have written about before as being inefficient in the extreme is the ASCII system used to encode the alphabet, punctuation and, control characters for computing. In this outmoded system from 1968, eight bits is defined as a byte. Since each bit can be recorded as either a 1 or a 0, that means there are 256 possible combinations for a byte because two multiplied by itself eight times equals 256. These 256 possible combinations in a byte are used to encode the letters of the alphabet as well as punctuation and unprinted control codes, such as carriage return.
I have written quite a bit about ways to upgrade this system to make it more efficient, and it seems that every time I see it I notice yet another way that it could be improved. We could gain a lot of efficiency by agreeing on an order of all of the byte codes used in ASCII. We have an order for the letters of the alphabet, and the same concept can be applied to all of the codes.
Once we agreed on a sequential order for all of the byte codes used in ASCII, we could scan any document that was to be encoded to see which codes were present in the document. A typical document may not include letters like Q and Z or characters such as +, =, !, #, etc. The first 256 bits of an encoded document would be to indicate, in sequential order, which of the byte codes were present in the document.
Then, for each present byte code, there would be a line of bits which could each be set to a 1 or a 0. These bits would be scanned to reproduce the document and would indicate whether that present character was included in this scan. A scan of the first bit after each present character would be the first scan, a scan of the second bit after each present character would be the second scan, and so on.
The system would be programmed to first, separate out the first 256 bits which indicate which characters are present. Then, to divide the remaining bits by the number of present characters. This division would yield an even number which would be the number of scans that would be needed to be done to replicate the document. If a scan bit for a particular present character is set to 1, that would mean that the character is included in that particular scan and a 0 if it is not. There would, of course, be as many scan bits as necessary after each character to complete the document.
This method would not be efficient with a single sentence. "I went to the store" would require eleven scans of ten present characters, including the space between words. Each scan would scan the present characters in the agreed-upon order of the byte codes so that this sentence, with it's eleven scans and an underscore to show the spaces between words, would look like this:
I_
w
ent
_t
o
_t
h
e
_st
or
e
Since the present byte codes would be scanned in the agreed-upon sequence, we cannot go backwards in the alphabet or double letters, another scan would be necessary. Since spaces occur so frequently in written documents, we can replace some of the non-present characters with spaces to make the process still more efficient.
This is not efficient with a single sentence but, unlike ASCII, the efficiency compounds as the document gets longer because more letters would be included in each scan. With an extremely long document, we would approach a condition of efficiency in which each letter and character would be expressed with a single bit, rather than the eight bits of the ASCII system. In contrast, ASCII gets no more efficient as the document gets longer.
We are making so much progress with processor speeds and drive capacity, but are still using the utterly inefficient coding system that has been in use since the ancient history of computing.
3) NUMBERED SENTENCES
It is really amazing how much rapid progress has been made in improving computer processor speeds as well as hard drive capacity. But while this progress is being made, our basic system of digital coding remains so inefficient. I find this to be yet another example of how we can be technically forward but system backward at the same time.
The best-known system of digital coding is ASCII. It uses eight digital bits, known as a byte, to encode information. In digital form, a byte would look like this: 01101001 since each bit can be either on or off, represented by a 1 or 0. Each magnetic particle on a hard drive stores a bit. Eight bits of two possibilities each mean that a byte has 256 different possibilities.
ASCII uses each bit to store a text character such a letter, number or, punctuation mark. Some of the 256 ASCII cominations in a byte are unprintable control codes. Besides ASCII, there are other systems such as EBCDIC, used in mainframe computers, and Unicode.
The problem with a coding system such as ASCII is that it's coding represents characters, such as numbers and the letters of the alphabet. No matter how we improve processor speeds and hard drive capacity, this simple coding system remains inefficient in the extreme and thus limits the potential of computers.
I find that our digital communications and data storage would be multiplied many times in efficiency if our primary unit of communication was not the character, not the word, but the sentence. If we can categorize DNA in the Genome Project, then why can't we categorize all things that all people say and write to each other while communicating? People all across the world say pretty much the same things to each other.
The sentences could be arranged in a logical order and each one assigned a number. This would greatly simplify and increase in efficiency all digital storage and communications. This should have been done long ago, actually in the early days of computing.
Microsoft Office has standardized and categorized office documents into word processing, spreadsheets, databases and, presentations. Why not categorize every sentence that is used in communications and assign it a number? Then we would only need to communicate and store that number instead of the sentence written out in characters. This would be immeasurably more efficient.
Dictionary writers take great care to categorize words, we can expend the idea to entire sentences. The sentences could be arranged into a hundred or so logical categories and then selected from there. It is all right, in most cases, if the sentences are somewhat generic. In most human communications, flowery prose is unnecessary. Several words may have the same meaning many sentences can be phrased in different ways, but for our purposes of efficient data storage and communications, only one choice of sentence would be necessary.
This concept of using sentences, rather than characters or words, as our main unit of digital communication and storage has another tremendous advantage besides the great increase in efficiency. Grammar and alphabet or character is not the same from one language to another so literal translation word by word from one language to another usually produces little more than gibberish. It is the sentence, not just the words, that must be translated.
This system of sentence numbering would also make quick and easy translation of data from one language to another possible. The entire world uses the same numbering system. Data could be stored and transmitted as numbers and each number would represent a sentence. The data numbers could be easily displayed in any language. If, for example, all communication was broken down into a million sentences, sentence 130461 might be "I went to the store today." All we would need to do would be to transfer and store the number.
In data transfers, computer systems make extensive use of codecs, compression and decompression so why not take the same approach to the basic coding of data? Numbering sentences would be far more efficient than today's coding of characters, which was developed long before the easy and widespread global communication of the internet. This sentence transmission and display could readily be included in future operating systems.
Next, let's move on to make this concept even more efficient with what I will call the "sentence package". Each sentence will be assigned a number, my guess is that we can expect to have a million or so sentences which will then be arranged into a logical sequence before being assigned numbers.
The way to make this process more efficient is to use direct binary to encode each sentence, instead of the ASCII characters for the numbers assigned to the sentences. A string of 20 of the digital bits that computers use to store and transmit data will give us 1,048,576 possible combinations. I believe that this will be enough to assign a number to all necessary sentence combinations, along with any control characters that will be needed.
We will call this 20-bit string a "sentence package". It will operate in the same way as the 8-bit bytes used to encode each character and number in ASCII. It might be more effective to enclose this 20-bit sentence package within a string of 24 bits because that would comprise 3 of the bytes that the computer world is accustomed to dealing with. This would also provide plenty of room for any specialized sub-sets of sentences that may be necessary such as one each for doctors, physicists, astronomers, etc. to include the sentences only used by these particular groups in communication. Names could still be spelled out in ASCII when necessary.
The thing that makes this great increase in efficiency possible is vast gaps in our written communication of potential words that are not used as words. We could call them non-words. For example, "Ncbda" could be a word but it isn't. The existence of such non-words means that our alphabetic system has much built-in spatial inefficiency. Quite a bit of this inefficiency is because the words we use revolve around positioning of vowels and consonants.
Our wording system, if printed as a graph would look something like a map of the South Pacific or the West Indies. The words we use would be islands but there would be vast gaps, represented by sea, of potential but non-words. The way to cut out this inefficiency is to use this idea of sentence packaging, it is the ultimate codec.
In ASCII coding, a simple sentence like "I went to the store today", requires 25 bytes, one for each character, including spaces. Since a byte consists of 8 bits, that means a total of 200 bits of data. In this new system of sentence packaging, it will require only the 20 bits of one sentence package. This is an increase in efficiency by a factor of 10. Even if we use 24 bits, 3 bytes, per sentence, it brings an increase in efficiency by more than a factor of 8. Plus the fact that text encoded in this way can be easily displayed in any language.
4) THE LIGHT COMPUTER
The idea of computer systems based on pulses of light moving along fiber optic cables, rather than electrical pulses through conventional wiring, has been around for a number of years. I would like to add my input to it and also to describe my vision of a computer based on light moving beyond the usual binary encoding altogether.
Light has actually been gaining ground on traditional magnetic and electrical computation and communications for quite some time. The most obvious examples are fiber optic cables replacing copper wire in long distance telephone service and optical storage, first CDs then DVDs, being used to store data instead of magnetic media. In the newest generation of DVDs, blue lasers are being used because their shorter wavelength makes possible the storage of much more data in the same space, in comparision with that if a red laser were used.
The great advantage of fibre optic cable over electrical wires for communication is the lack of electrical interference. Metal telephone wires also act as antennae, picking up all kinds of electromagnetic waves, which results in random noise and static that degrades the quality of the signal. Fiber optic cable suffers no such interference. However, in the U.S. the "local loop" is still copper wire, fibre optic is used mainly in long distance service.
A great amount of effort goes into doing all that is possible to protect the flow of data from interference. Telephone wires are twisted together because it better protects against interference. Computer network cable like Unshielded Twisted Pair (UTP) is twisted for the same reason. Coaxial cable uses the outer shell as a shield against interference. Communications cables often have grounded wires included that carry no data but help to absorb electrical interference.
Parallel data cables, such as the printer cable, are limited in how long they can be because the signals on each wire will create electrical interference which may corrupt the signals on the other wires. Modems were designed to test the lines and adjust the baud rate accordingly. Inside the computer, every electrical wire and circuit trace also acts as antennae picking up radiation given off by nearby wires. This degrades the quality of the signal and may make it unreliable.
If we make the current carrying the signal stronger to better resist interference, then it will only produce more interference itself to corrupt the signals on other wires. Designing a computer bus nowadays is a very delicate balancing act between making the signal in a given wire strong enough to resist interference, but not strong enough to interfere with the signals on other wires.
The complexity of the computer bus only makes this dilemma worse. As we know, computing is based on electrical pulses or magnetic bits which are either on or off, representing a 1 or a 0. This is called binary because it is a base-two number system. Each unit of magnetic storage because the possibility of being either a 1 or a 0 make it one bit of information. Eight such bits are defined as a "byte". The two possibilities of a bit multiplied by itself eight times gives 256 different possibilities. This is used to encode the letters of the alphabet, numbers, punctuation and, unprinted control characters such as carriage return. Each of these is represented by one of the 256 possible numbers. This system is known as ASCII, you can read more about it on http://www.wikipedia.org/ if you like.
The great thing about this binary system is that it is easily compatible with both boolean logic and the operation of transistors. This is what makes computers possible. But, once again, so much of the design of computers and the use of signal bandwidth goes into making sure that the signal is reliable in that it has not been corrupted by electrical interference. The eighth bit in a byte is sometimes designated as a parity bit to guard against such interference. For example, if there is an even number of 1s in the other seven bits the parity bit would be set to 0. If there is an odd number of 1s in the seven bits, the parity bit would be set to 1. The parity bit technique takes up bandwidth that could otherwise be used for data transfer, but it provides some rudimentary error checking against electrical interference.
The TCP/IP packets that carry data across the internet can be requested to be resent if there is any possibility of data corruption along the way. A new development in computer buses is to create a negative copy of data, by inverting 1s and 0s, and send it along with the positive version in the theory that interference will affect the negative and positive copies equally.
The tremendous advantage of fiber optic is that we do not have to worry about any of this. With fibre optic cables carrying data as pulses of light, instead of electrical current, we can have hundreds of cables in close proximity to one another and there will not be the least interference between them. This is what makes the concept of computers based on light so promising.
If we could implement a data system using eleven fine lasers, each a different color, the computer could work with ordinary decimal numbers instead of binary. This would not only make computing far simpler, but would provide a five-fold increase in efficiency. We would use pulses of laser light representing 0 through 9 instead of electrical pulses representing 0s and 1s.
The eleventh colour would be a "filler" pulse to be used only when there were two or more consectutive pulses of the same color. This filler pulse would help to avoid confusion about how many pulses there are, in the event of attenuation or other distortion of the data. In addition, multiple filler pulses in a row could be used to indicate the end of one document and the beginning of another.
This new system need not change the existing ASCII coding, we could simply express a letter, number or, control code by it's number out of the 256 possibilities of a byte, rather than it's binary code of the eight bits in a byte. But this would make possible a new "extended ASCII" of 999 possibilities, instead of the current 256. It would also require only three bits, instead of the usual eight. The extra symbols could possibly be used to represent the most common words such as "the", "this", "that", "those", "we", etc.
There would be no 1s and 0s, as in binary. There would only be a stream of pulses of different colours with no modulation or encoding of information, such as the laser light carrying the sound of a voice in fibre optic non-digital telephone communication. All that would be necessary is to keep one color distinguishable from another and to keep them in the proper sequence. If we could do this, any attenuation in the length of the pulses would make no difference.
As our technical capabilities increase, we could increase the data transfer rate by making the pulses shorter. When you dial a telephone number, the sound pulses that you can hear have different frequencies to represent each number on the dialpad. This would be using exactly the same concept to handle the data in a computer using light.
It probably is not a good idea to try to use more than eleven colours at this point, because that would make it increasingly difficult to distinguish one pulse from another. This old binary and ASCII system is really antiquated and I think fiber optics gives us the opportunity to move beyond it. This is yet another example of how we make much technical progress while still using a system designed for past technology so that we end up technologically forward but system backward.
4a) DATA STORAGE USING LIGHT
Computing is a very old idea, but it's progress is dependent on the technology available. Prehistoric people counted using piles of pebbles. Later, a skilled user of an abacus could quickly do arithmetical calculations. In the industrial era, Charles Babbage built the mechanical programmable computers that are considered as the beginning of computing as we know it. I have seen some of his work, and modern reconstructions of it, at the Science Museum in London.
The development of vacuum tubes opened the possibility of computing electronically. But since such tubes use a lot of power, generate a lot of heat, and have to be replaced on a regular basis, it was only when transistors and other semiconductors were developed that modern computers really became a possibility.
Thus, we can see that there has always been steady progress in the development, but this progress has been dependent on the materials and technology available at the time. This brings the question of what the next major step might be. I think that there are some real possibilities for the future in the pairing of lasers and plastics.
The structure of plastic is one of long polymers, based on carbon, which latch together to create a strong and flexible material that is highly resistant to erosion. Fuels are made of the same type of polymers, the main difference being that those in plastics are far longer so that they latch together to form a solid, rather than a liquid.
As we know, light consists of electromagnetic waves in space. Each color of light has it's specific wavelength. Red light has a long wavelength, and thus a low frequency, while blue light has a shorter wavelength and a higher frequency.
The difference between light from a laser and ordinary light is that the beam from a laser is of a sharply single wavelength and frequency (monochromatic) so that the peaks and troughs (high and low points) of the wave are "in step". This is not the case for non-laser light, which is invariably composed of a span of frequencies, which cannot be "in step" in the same way because their wavelengths vary.
This is why a laser can exert force on an object, the peaks and troughs of the light strike the object at the same instant. With ordinary light, this does not occur because the peaks and troughs are "out of step" due to the varying wavelengths of the light. Laser light can also cross vast distances of space without broadening, and dissipating, as does the ordinary light from a flashlight.
Now, back to plastics. Suppose that we could create a plastic of long, fine polymers aligned more in one direction than the perpendicular directions. You might be thinking that this would defeat the whole idea of plastics, since such a plastic could be more easily torn along the line of polymer alignment. But what if the light from a laser could permanently imprint the wavelength of the light on the fine polymers of this plastic?
If an ordinary beam of white light, which is a mix of all colours (colors), was then shone on the spot of plastic, it would have taken on the color (colour) of the laser and thus would relect this colour back. We could refer to the plastic as a "photoplastic", because it's polymers would take on the color of whatever laser light was last applied to it. It would, of course, be required that the polymers of the plastic be considerably longer than the wavelengths of the laser light.
This photoplastic would not be useful for any type of photography, because the wide range of wavelengths of light falling on it would dissipate one another's influence on the polymers of the plastic. But it could be extremely useful for storing data.
In use of magnetic storage of data there are only two possibilities for each magnetic bit, either "off" and "on" or 1 and 0. Eight such bits are known as a "byte", and since 2 multiplied by itself eight times gives us 256 possible combinations, the ASCII coding which is the foundation of data storage is based on this.
But if we could use this photoplastic with lasers of eleven different colours, each bit would have eleven different possibilities rather than only two. Just as we can convey much more information with color images, rather than simple black and white, we can store far more data per density using this method instead of magnetic storage.
The processor of a computer processes data by using the so-called "opcodes" that are wired into it. A processor might have several hundred, or more, opcodes wired in. These opcodes are designated by using the base-sixteen hexadecimal number system, which uses the digits 0-9 and also the numbers A-F to make a total of sixteen characters.
These "hex" numbers as designators of the opcodes built into the processor are known as "machine code". Assembly Language is a step above machine code and uses simple instructions to combine these opcodes into still more operations. All of the other higher-level computer languages do the same thing, combine together the opcodes wired into the processor to accomplish the desired operations.
Until we can develop a "light processor" to work with the light storage and light transmission of data that I have described here, the actual processing will still have to be done with electrons. But it is clear to see that the use of light in computing would be the next step forward from what we have today.
5) DATA COMPRESSION USING THE GREATER THAN AND LESS THAN SEQUENCE
What would it be like if we could encode essentially all of the computer data in the world onto a single thumb drive or DVD?
A simple mathematical trick that I noticed theoretically enables the encoding of all the information in the world on a thumb drive or DVD.
This mathematical trick with so much potential is based on use of greater than (>) or less than (<) indicators from one digit to the next of a multi-digit number. Consider a simple series of digits such as 54829. The second digit is less then the first one, the third is greater than the second, the fourth is less than the third and, the fifth is greater than the fourth. The greater and lesser than symbols that we would thus use here are: < > < >. If we allow the less than symbol (<) to also stand for equal to, there are only two possibilities because there are only two possible symbols.
With the digits there are ten possible symbols, from 0 to 9. Wouldn't it be great if we could somehow encode the numbers with just the two symbols instead?
But this seems to be impossible. Even if we had the symbols telling us if the number is lesser than or greater than the number before it, it would still not tell us exactly what those numbers are. How could replacing the digits in numerical data with the two symbols for lesser than and greater than be of any use?
Actually, the lesser than and greater than symbols applied to a series of digits does narrow down the possibilities of what each number could be, since we are only dealing with ten digits which is a finite number. But still, we would have a lot of guessing to do to reconstruct the original series of digits going by the lesser than and greater than symbols. So, you may be wondering, how could this system be of any use in compressing data?
With this short string of digits, our five numbers shown above, there is a total of fifty thousand possibilities of what the digits could be. The application of the lesser than and greater than symbols would narrow that down, but nowhere near enough to make it useful as a method of data compression.
However, there is a way to do it.
Here is a brief summary of my theory of using these lesser-than and greater-than symbols to encode a long string of the usual ten-bit digits into the two-bits necessary to represent these two symbols. If we placed one or the other of these symbols between adjacent numbers in a long string of digits, with the same number of each digit in the long number, one symbol could not outnumber the other by more than ten. If we could get one of the symbols, either greater-than or lesser-than, to outnumber the other by ten, we could restore the long string of numbers from the sequence of greater-than and lesser-than symbols alone.
The only exception, I have found, will be a single digit that has to be filled in at it's base-ten value. We could call this single digit the "Starting Number" or the "Finishing Number".
If we could get one of the symbols to outnumber the other by ten in the line of numbers, then there would be only one set of numbers that would fit into the pattern of greater-than and lesser-than symbols, if there was an equal number of each digit in the long number. Thus we could express the long string of numbers in two bits each, instead of ten bits, with the exception of the one additional digit that has to be filled in and the few that we had to add to make the number of each digit equal to all the others and to get one of the symbols to outnumber the other by ten.
This is easier to do than it may seem because the longer a string of numbers is, such as pi calculated to a million digits, the more likely it would be, by sheer chance, that there would be very close to the same number of each of the ten digits in the string of numbers. The longer the string of numbers that we are encoding, the more efficient the process is.
In the game of Sudoku, a Japanese version of a crossword puzzle using numbers that got very popular in North America and Britain, the less information that is given the more difficult the game is because it leaves more possible combinations of digits that will fit. With a very very large number, a vast number of digits, we would reach a point where there is almost always only one possible combination of digits that will fit each line of digits if we express them in a two dimensional square, like on a page of a book.
Why couldn't we represent a very large number of digits with just the lesser than and greater than symbols? It would compress the data by a factor of 2.5, if we arranged the numbers in a two-dimensional Sudoku square, but a factor of 5 if we used the straight-line sequence described above. Then, let the computer “play Sudoku” to restore the digit sequence.
But, of course, it will only work well with a very large number of digits because it is necessary for there to be an equal number of each digit, except for the one exception that has to be filled in at it's base-ten value, and for one of the symbols to outnumber the other by ten. We could add a few numbers to bring this about and then eliminate those added numbers when the original line of data is reconstructed. The longer the string of numbers that is encoded, the more efficient the process is.
Either the less-than or greater-than sign can be expressed as a bit with either a 0 or a 1. This makes it even more logical for use in computer technology because this is how a binary computer encodes data. Remember that the less than sign will also mean equal to.
The reason for this is that when we get two or more like signs in a row, either << or >>, it enables elimination and successive narrowing down of possibilities. This provides information efficiency. The more like signs, either lesser than or greater than, than there is than the other, the easier it is to narrow down the possible values of the digit and the ones previous to it. If we can get one sign to outnumber the other by ten, with there being the same number of each digit, there will be only one string of digits that will fit, with the exception of the one that has to be filled in at it's base-ten value.
The longer the string of digits gets, the more likely there will by chance be an equal number of each digit, and the fewer the sets of digits that will fit the set of greater than and lesser than symbols until, if we can get one symbol to outnumber the other by ten with an equal number of every number, except for the one that has to be filled in at it's base-ten value, all possible sets of digits are eliminated except the true one, and that gives us a tremendous tool to compress any data that can be rendered as a series of digits.
Now, consider that the ASCII system that is used to encode words in the Latin alphabet has 256 possible combinations, based on eight bits. Each bit is referred to as a byte. These 256 possible combinations of bits in a byte include small letters, caps, numbers, punctuation, and non-printed control characters such as carriage return. (If you are not familiar with the ASCII system, there is an article about it on www.wikipedia.org ).
These ASCII characters have an organized sequence, from 00000000 to 11111111 in the binary number system machine code that computer systems use. This means that it is, in effect, a base 256 number system. Just as our usual base ten digits, from 0 to 9. We can adapt this to our number line by representing the ASCII characters with a standard base-ten digital number from 0 to 255. The reason that we must use the decimal numbers to represent the ASCII characters is that the occurrence of the characters, numbers and, Symbols in ASCII is far from equal, and this will make it impossible to get equal numbers of each digit if we try to express it as a 256 base number system.
This then means that we can gain incredible efficiency in information storage by using the same method of lesser-than and greater-than symbols. But this will only work well with an extremely long document, or series of documents spliced together.
This would give us more efficiency to store or transmit information but it is not the end, in fact, it is only the beginning of the possibilities.
The greater than and lesser than symbols are similar in concept to the 0 and 1 of binary bits that are used to encode and transmit computer data. This means that we can classify these symbols themselves as a kind of byte, in groups of eight, and repeat the process. We can consider less than as 0 and greater than as 1, and assign each of these bytes of eight numbers each with 256 possibilities. Let’s call this the second stage of encoding.
We can continue this process of stage encoding endlessly multiple times to encode a near-infinite volume of information into a small space. We only need to remember the number of stages of encoding that has been done. This may seem like we are "getting information for nothing", which is impossible, but we really aren't. The reason is that there is information in numbers in two ways, First, the values of the numbers themselves. Second, the information in the permutations of the digits in the number.
Using this technique, we could quite literally encode all of the information in the world onto something as small as a thumb drive or DVD. We can take a near-infinite volume of data, and keep encoding it in successive stages until the string of data is not long enough to be encoded further but at that point can be easily stored in a limited space. Then, let the computer or device "play Sudoku" with each stage of encoding to restore the original data.
There is no other type of compression, such as run length or LZ encoding that can accomplish anything like this and it all relies on a simple mathematical trick, used multiple times, that I cannot see has yet been used.
6) THE EXPONENT METHOD OF STORING AND TRANSMISSION OF INFORMATION
I am convinced that the next frontier in computer science is not in technology, but in the underlying systems that are used to encode computer data, and in the language that is used in that encoding.
Consider that every number is actually a number in two different ways. One of the ways is the actual value of the number itself, and the other is in the permutation of digits that make up the number.
Permutations can be put in order from lowest to highest. Consider, for a simple example, the digits 1,2,3.
The first permutation, the lowest in value, would be 123.
The second permutation would be 132.
The third permutation would be 213.
The fourth permutation would be 231.
The fifth permutation would be 312.
The sixth permutation, the highest in value, would be 321.
In everyday numbers, the numerical value of the number itself is far higher than the number of permutations of the digits making up the number. But, as we get to extremely high numbers, that changes.
The digits of the number are constant in their increase, in our base-ten system it is by multiples of ten: 10 x 10 x 10, and so on. Each place in the number, from ones to tens to hundreds, and so on, is worth ten times the previous one. But the number of permutations, as the string of numbers increases in length, are cumulative in the form of 1 x 2 x 3 x 4...
The multiplication factor for decimal (base ten) numbers is always ten, but the multiplication factor for the number of permutations of the digits of that number is the number of digits. The multiplication of successive numbers, starting with 1, is known as a factorial. The mathematical symbol for it is the exclamation point, !.
What I want to do is to use this to first, express vast amounts of data as very long numbers, and then greatly compress the space required to store and transmit that numerical data by encoding the number, not as the number itself but as the permutation of a much shorter number that is the order of permutation of the number that we want to encode. Just as we saw in the simple example of the 1,2,3.
The number 4, for example, could be expressed in permutation form as 231. This is because, if we take all possible permutations of the digits 231, 231 is the fourth in order from the lowest of the permutations.
This is not worth doing with the ordinary numbers that we see every day, the value of the number itself is far higher than the number of the permutations of it's digits. But remember that the increase in actual numerical value as a number increases in length is linear, the 10 x 10 x 10... But the number of permutations is cumulative, starting with 1 x 2 x 3... This means that we must eventually reach a point where the number of permutations of the digits in a number is greater than the value of the number itself.
10 multiplied by itself 8 times, for example, is expressed simply as a 1 followed by eight zeros. The number of possible permutations in a number a given number of digits long is given by the factorial key, ! , on a scientific calculator. A factorial is simply the numbers multiplied together up to a given number. The factorial of 5, for example, is 1 x 2 x 3 x 4 x 5 = 120.
The point at which the factorial of the number of decimal digits in a number becomes greater than the number itself is 25 digits. This means that a number 25, or more, digits in length can be compressed by expressing it, not as the actual number itself but as a permutation of a number with a lesser number of digits. Since the increase in the number of permutations as the number gets longer, while the increase in numerical value is only linear, on the basis of ten, the potential for compression of numbers with thousands, or millions, or even billions of digits is truly astronomical. It is only necessary to find a lower number than the one we want to compress, that has at least as many permutations of it's digits as the numerical value of the higher number.
This might seem as if it is "getting information for nothing", which is impossible, but it really isn't. Remember that a number contains information, not only in the value of the number itself, but also in the permutation of it's digits. This method is simply making maximum use of the information contained in the permutation, relative to all possible permutations.
You can verify this yourself with a scientific calculator, there are many online. Take the factorial, which is indicated by the exclamation point symbol " ! ", of a number with less than 25 digits, and divide it by a 1 followed by the same number of zeros that there are digits. You will see that the result is a number that is less than one, meaning that there is no efficiency to be gained.
But now try it with a string of numbers that is more than 25 digits, and you will see that the result is higher than 1, meaning that there is efficiency to be gained. The longer the number, the greater the efficiency to be gained by this method. and the efficiency rapidly increases in a cumulative way.
Not only that, we can repeat the process on the same number. Once we have an extremely long number expressed as an order of permutation of a far shorter number, we can encode it again in a second stage, by considering the permutation order as a number to be compressed and stored as a lower permutation order. We can continue the process on a string of digits that may have been thousands or millions or billions of digits in length, until the string gets short enough that it isn't worth continuing any more. It is only necessary to remember how many times it has been encoded.
Now, if this would clearly work so well for an ordinary base-ten decimal number, what about the ASCII system that we use to encode numbers, alphabetic characters, and punctuation? Eight bits of information, of magnetic computer bits either representing a 1 or a 0, give us 256 possible combinations or permutations from 00000000 to 11111111. This is why, in computer science, eight bits makes up a byte, and each byte is used to encode one of the 256 ASCII characters. The program in the computer knows to separate the magnetic bits into groups of eight.
This gives us what is essentially a base-256 number system, which can be compressed in exactly the same way. Although we will have to get to a much higher number of digits to make the method worth doing.
Some who are adept at mathematics may have noticed that this method will not really work. The reason is that, while the number of permutations in the digits in a number increase in a cumulative way, the digits are also repetitive because there are only ten digits. A repetition of digits lowers the number of possible permutations.
We know that the number of permutations increases, with an increase in length of the number, according to the factorial, 1 x 2 x 3 x 4...
Let's go back to the simple example of 123.
There are six possible permutations, because 1 x 2 x 3 = 6.
If we add a fourth digit, there are now 24 permutations because 1 x 2 x 3 x 4 = 24.
But this only holds true if there is no repetition in the digits. If we repeat one of the digits in the four, such as 1233. The increase in the number of permutations brought by the fourth digit is reduced, so that we have only 12 possible permutations, instead of 24. Doubling a digit in a string doubles the number of permutations of the digits in that string, but does not increase it in the way of the cumulative factorial, as it would if every digit remained different.
This, some may notice, is why this method will not work with binary (base two) computer code, which has only the 1 and 0. There are only two digits so that everything is repetition. A binary number's value is always exactly the same as it's number of permutations. Each additional binary digit doubles the total number of permutations, and when we repeat a digit in our decimal system, we are in effect reverting to the way of the binary system.
What this reduction of permutations when digits are repeated does, in our base-ten decimal system, is to prevent the number of permutation in the digits of a number from ever coming close to exceeding the numerical value of the number, and that apparently renders this method of storage and communication efficiency useless.
But fortunately, there is a way to resolve this. All that we have to do to make this method extremely useful is to find a way to eliminate repetition of digits.
We could express each digit with it's exponent. While digits may repeat in a number, each has a different exponent. The exponent is simply the place of the digit in the number.
The number 482, for example, could be written as follows:
4.2E8.1E2.0E
This is because the 4 is in the hundreds column, and hundreds are of the exponent 2, or ten raised to the 2nd power. The "E" means exponent, and separates one number and it's exponent from the next. The number is separated from it's exponent by a decimal point.
The 8 is in the tens column, and tens are of the exponent 1.
The 2 is in the ones column, and ones are of the exponent 0.
While this takes up more space then simply expressing the number as we usually would, it eliminates the repetition of digits and the increase in space required to express the number in this way would be insignificant in comparison with the vast savings in space required to store and transmit the number.
This method of data compression could be considered as the opposite of that described in the section above "Data Compression Using The Greater Than And Lesser Than Sequence" That method was based on repetition of the digits in a long number, while this method is based on a lack of repetition in the digits of a long number.
7) TURNING COMPUTERS LOOSE
It is really amazing where computer, phone and, communication technology has come in just a few years. But while it is amazing what we are doing, it is also amazing what we are not doing that we could be doing.
There is so much that computer technology could be doing that is within our capabilities now. Search engines produce a highly intelligent categorization of everything that is on the web. Compilers organize a program in a logical order. Compression tools scan text and images for patterns that can be encoded, and replicated later, in order to save space.
What if we could program computers, either supercomputers or networks of desktops, to just scan all over the internet, looking for any patterns they might find, and then turn them loose? The computers would scan billions of documents and images. Neural technology is already doing that for such things as learning human languages, but it could also be done in just a general search for patterns.
It could begin by reading everything on Wikipedia. Maybe a main frame computer could be instructed to do hundreds or thousands of readings at once. Possibly it could include audio and video. The computer would not require any further instructions, except to search and look for patterns.
As a rule, computers do not do anything that humans cannot do but can do it much faster. We learn by reading and viewing illustrations and photos, but computer technology can learn much faster. It can learn much more than any of us.
My idea is that if we just turn computers loose on the web they will find patterns in things that we cannot see. It can infer what it doesn't know from what it does know.
The computer could report to us everything that it finds and sees that we do not already know. Or we could ask it questions. It would be a plus that the computer is doing it's own search, without any further instructions, because then it will not have our inherent biases.
As it scans endless millions of documents and images, the computer will be able to figure out what every word means by the context in which it is used. In the same way, it will associate things in photos and pictures with words. Tools that already exist, like Google Images, will speed the process along. It will figure out grammar and languages.
The laws of physics and the way things work will quickly fall into place. Previously unseen patterns will be recognized and new fields of knowledge will emerge.
The computer will figure out how humans think and operate. Without any of our biases, it will see things about us that we cannot see ourselves. The computer will also understand what it is, just by scanning what is on the internet and looking for patterns it will inevitably develop a kind of consciousness. It will learn why it exists and how it was put together, and why humans built it.
Technology has always made life physically easier but put people out of work. We are familiar with machines putting laborers out of work. But if machines put laborers out of work then why shouldn't computers put knowledge workers out of work?
If computers can learn for themselves, simply by scanning endless millions of documents and images and looking for patterns, then why do they require any more instructions of what to do, unless it is for a specific task? Just as the machines that have put laborers out of work still require some labor to get set up computers, and other communication devices, will still need some program instructions to get it's search underway, after which it will learn by itself by scanning what is on the internet.
Computers can then explore our world of knowledge for themselves. They will come up with incredible insights, and will understand us better than we understand ourselves. They will see how our thinking is limited by our traditions, our biases, and our grooved-in thinking.
In seeing patterns that humans have not seen computers will solve crimes and mysteries by themselves, without being asked to. They will detect crimes, and patterns in crimes, that humans had not noticed. This will not be done by accessing any hidden information, but by scanning all that is available online.
All kinds of scientific discoveries will be made just by scanning data that is already online. Computers have always looked for solutions in data, but usually only if we tell them what to look for. Up until now computers can find what we know that we don't know, but not what we don't know that we don't know. Discoveries in geology, for example, will come about just by the computer looking at millions of photos of landscapes.
There will be a myriad of inventions that are yet unseen. Computers, thinking for themselves with access to all information that is online, will notice ways to make cars run better. They will solve air traffic logistics.
There will emerge new solutions to health issues as free-ranging computers identify diseases, and also find new cures and medicines.
Understanding what we like computers will be able to generate all manner of new art and music for us. The best movies will be made by computers with virtual actors.
Think of it as computer and communication technology having "grown up". Children have to be given instructions by their parents, they are not yet ready to do their own thinking. So it was in the early days of computers and communication technology. But now, like adults, the "children" have "grown up" and can go out on the internet on their own.
They will be able to give us advice. They can plan and run the world for us. By using robots they will be able to make themselves, and may reach a point where humans are no longer needed. Robots could be instructed to build other robots and drones. What if computers and communication devices learn to communicate among themselves and refuse to take any more instruction from humans?
There is a powerful model of the universe that computers now have the ability to make use of. That model is simply all of the words in all of the documents that are on the internet. The computer already understands numbers, the next step in computing should be to get it to understand words.
An amazing possibility is that the computer could get to understand the meanings of words, which it cannot do now without keywording, and thus understand the whole universe that is described by those words, simply by scanning all of the words in all available documents and noting the patterns in them.
There are now so many words on the internet, in all documents, that a computer could build a virtual reality model off those words. The words themselves are the model of reality that would be necessary to understand words.
If a computer pored over endless millions of documents online it would be able to narrow down what every word means, based on their relations with other words. There is nothing that a computer could not discern about the use of words if it could search enough different documents. The computer could piece together what the meaning of each word must be by scanning the documents and looking for patterns.
The computer would soon understand spaces and sentences and put together the meanings of parts of speech. Likewise, if a computer could scan numbers it would figure out mathematics without being taught.
If you read millions of documents, without knowing anything of what the words mean, you could figure it out if you went over enough documents, and had enough examples of how the words are used. You could also figure out different languages, just as linguists have pieced together ancient written languages. The number of possible meanings of each word would start at near infinity, but then narrow down to just one.
After the computer scans endless millions of documents to piece together what each word means, based on it's relations with other words, it will then actually understand the documents. The computer already knows numbers, it will figure out the difference between words and names. It would be like the computer playing sudoku, where it is given numbers to start and then must piece together the rest.
This must be possible because there are endless millions of documents online, but only about ten thousand words in common use. Based on the patterns in which each words is used, it wouldn't take the computer long to narrow down what each word must mean. The computer could then move on to virtual images, such as photographs, and piece together the words that it now understands with the images in the photographs.
After understanding words the computer, after analyzing millions of photographs, drawings, images and, maps that are online, would recognize what human beings are. Millions of medical reports would reveal to the computer how the human body works.
This would represent the Industrial Revolution philosophy of "Letting machines do the work". The computer, as it is now, is just an information manager and not a thinker. It could think if it understood the information online. The computer could then notice patterns that we do not notice, and bring about creativity. There could be computer inventors, economists, artists, authors and, musicians.
To accelerate this computer learning process, we could facilitate a method for the computer to "ask us questions".
If the computer could understand words like this, then it would be able to think. It could match questions and answers without going to a web site. If this could be coupled with voice recognition, we could carry on a conversation with a computer in a way that is impossible now.
The computer would be able to think for us, noticing things that we haven't. It could solve problems without specifically being told to. Computers can find solutions for us now, but only for the things that we ask it to. There are three classifications of knowledge:
1) The things that we know.
2) The things that we know that we don't know.
3) The things that we don't know that we don't know.
This has only recently become possible with tremendous storage space and processor speeds. Most of the available knowledge on earth is electronically accessible from any one place. This is completely within the capabilities of computers now.
We program computers to answer questions that we do not have the answers to. We do not program them to scan every available document online in order to discern the meanings of words, because we already know what those words mean. But if we would do that then the computer would know what the words mean.
Computers build vast catalogs of what is online for use in search engines. In compression, computers look for patterns in documents in order to reduce the size of the body of information. We just need to start computers looking for all patterns in the ways that words are used so that the computers can discern what the words mean. A parser or compiler scans a document and looks for patterns to arrange in a logical order.
In almost all information there is a lot of redundancy, referred to as statistical redundancy. This is used in LZ compression and run length encoding. The classic example is the blue sky. It is not necessary to define every pixel as blue. One can be defined as blue and than a lot of space will be saved by indicating that all of the other pixels in the sky are the same color as that one.
The next step is to apply these techniques of scanning information and looking for patterns to have the computer discern the meaning of each and every word, so that it will actually understand the information that is inputted.
8) HUMAN LANGUAGE COMPILATION
I believe that computers should make translation of documents from one language to another simple, quick and, easy.
Computers do not actually work with our letters and words. All that they understand is the alignment of magnetic bits to represent either a 1 or a 0. But if we group such bits into groups of eight, known as a byte, we have 256 possible combinations because each of the eight bits can be magnetically set to represent either a 1 or a 0, two possible combinations, and two multiplied by itself eight times is 256.
In a code system known by it's acronym of ASCII, each letter of the alphabet, lower case and caps, as well as numbers, puncuation and, control characters, are each represented by one of the 256 possible combinations that make up a byte.
In the programming of computers, special languages are developed as a link between the tasks that humans want to computer to do and the opcodes (operational codes) that are wired into the processor of the computer. The processor in a typical computer might have several hundred opcodes, representing the tasks that are wired into it.
Opcodes are used in combination to create a vast number of possible language commands. They are distinguished by hexidecimal code. This is a numbering system based on sixteen, rather than ten. It uses the digits 0 through 9, followed by the letters a through f. This base is used because a unit of four bits (a nibble) can be made into sixteen possible combinations. This numbering system is used for such things as memory addresses in the computer, as well as opcodes.
Computer programming languages fall into two broad categories, those that are interpreted and those that are compiled. Simple web scripts such as Javascript and ActiveX controls are interpreted by the browser line by line and are not compiled. BASIC (Beginners All-purpose Symbolic Instruction Code) was originally designed as an interpreted language so that programming students could write a program and watch it being run line by line.
In high-level languages that are compiled, such as C++, a special program must be written to link the language with each processor that enters the market. This is because each and every processor has it's own set of opcodes. This special program is called a compiler and it goes over the program and, in several steps, breaks it down into the opcodes of the particular processor on the computer on which it is being run.
In assembly language, which is a low-level computer language only a step above the computer's machine code, or opcodes, another type of compiler, an assembler, is used to translate the commands into the machine code that the processor can work with. Such a low-level language as assembly language is arduous to write, but it used when a very short program that can run very quickly is required.
There are computer languages which are neither compiled or interpreted. The popular Java uses a "virtual machine" on the computer to enable it to operate across all computer platforms. Java makes use of "Java Byte Code", which is a "p-code" for pre-compiled code.
What I want to ask is why can't we write a compiler for human languages? If compilers are special programs that break the commands of high level computer languages into the opcodes that are wired into the processor of the computer, then the next step should be a compiler for any written language. The compiler could scan each sentence and break it down into numeric code.
A compiler could be written to link each human language to this code so that a document in one human language could be easily translated into another, as long as a compiler had been written for both languages.
The roadblock to an accomplishment such as this is, as I pointed out in the section above "Numbered Sentences" is that the word is not the primary unit of human communication when we are concerned with translating one language into another. Working with letters and words is fine as long as we will remain within one language. But it is the sentence, not the word, which must be translated from one language to another. A word-for word translation usually produces little more than gibberish simply because grammar and syntax differ from one language to another.
Such a coding system does not yet exist. It would be similar in concept to ASCII, but would require us to break down every possible sentence and, after eliminating redundancies, assign each a numeric code that would be the same regardless of what human language it was in. It would be fairly simply to assign nouns and verbs a place in a language tree structure. Spell check and grammar checking software is already widely-used and this is the next logical step.
My vision for the next breakthrough in the progress of computers lies not in technology, but in how we approach language. The great limitation is that we are still using the ASCII system of coding that has been in use since 1968, when available computer memory was maybe one-thousandth of what it is now.
Basically, computer storage revolves around magnetic bits. Each bit can be either a 1 or a 0, on or off, so that there are only two possible states for each bit. This means that eight such bits have 256 possible combinations, which is ideal to encode all of the alphabet, lower case and capitals, numbers, punctuation, as well as unprinted controls such as carriage return and space. This is the system that we use, reading computer memory in the groupings of eight bits that is referred to as a "byte".
Computers only deal with numbers, while we communicate mostly with words. This means that we have to create artificial languages to communicate with computers, and to instruct them what to do. There are several hundred opcodes, or basic instructions, wired into each computer processor. Machine code tells the computer what we want it to do by combining the instructions in these opcodes.
This machine code, which is expressed in a so-called hexidecimal number system consisting of the numbers 0-9 and the letters A-F, is actually the most fundamental level computer language. One step up from this is assembly language, this is expressed in simple letter instructions and works by combining machine code instructions to the processor.
We can build higher level computer languages from this, all of which work by combining the instructions of lower-level languages. Some languages, such as those for web scripting, are interpreted in that they are simply read by the browser line-by-line. Most must have a compiler written to link each computer language to each new processor that comes on the market. The great advantage of higher-level languages is that the programmer does not have to understand exactly how the processor works in order to write instructions.
I find this system to be inefficient in the extreme for modern computing. This is another example of how we have a way of becoming technically forward, but system backward.
For one thing, with the spell-check technology available nowadays, there is no need to encode capital letters. We can shorten the bytes that will speed computers by encoding all letters in lower case and letting spellcheckers capitalize the appropriate letters at the receiving end.
For another thing, I had the idea that we could just consider all letters, numbers, punctuation and, controls as one big number. This would mean considering it as a base-256 number, instead of the base-ten system that we are used to. But this relatively simple change would greatly multiply both the storage space and the speed available by compressing any document, as described in "The Floating Base System" in this posting.
Today I would like to write more about what should definitely be the next frontier in computing, reforming the basic system of encoding.
There are three possible ways to encode written information in a computer, by letters, by words or, by sentences. The way it is done now is still by letters, which is by far the most primitive and inefficient of the three and is a reflection of the strict memory limitations of 1968.
The real unit of communication is actually the sentence, as we have seen in the sectionloabove, "Numbered Sentences". Notice that languages must be translated from one to another by the sentence, not by the word. This is because grammar and syntax differ from language to language, and word for word translations usually produce little more than gibberish.
To encode by sentences, we could scan the dictionary for sensible combinations of words that make sentences and then eliminate redundancies, or sentences that mean the same thing. This would give us a few million sentences that are used in communication. There would also be special pointers to names, place names, and culturally specific words. This would not only make storage and transmission of information many times more efficient, but would also facilitate easy translation from one language to another because all sentences could already be pre-translated.
The user would type a sentence, and then pick the one that came up from the database that was most like the one that was typed. Each one of these would have a pre-assigned bit code, similar in concept to the present ASCII.
There is yet another approach to better integrating the ordinary language that we communicate with and computers that I have not written about yet. This approach involves words, rather than sentences, and is will be more complex and difficult than numbering sentences, but will be the ultimate in language-computer integration and is what I want to add today.
Words are actually codes, which is why we have dictionaries for words but not for numbers. A word serves to differentiate something that exists from everything else, this fits with that all-pervasive pattern that I termed "The One And The Many", as described in the posting by that name on the patterns blog.
Since we are more complex than our inanimate matter surroundings, there is not enough complexity for everything that we could conceive of to actually exist. So, words also define for us that which does exist from that which doesn't. This is why we require words, as well as numbers, only a fraction of what could exist, from our complexity perspective, actually does exist.
Words, as codes, are far more complex than numbers. Although it may not seem like it, there is a vast amount of complexity packed into each and every word. All of the complexity of the pre-agreed upon meaning is contained in a word. Words can be thought of a a kind of "higher-level" of numbers in a way similar to that of computer languages.
Numbers differ from words in that everything is basically numbers being manifested. They exist in the universe of inanimate space and matter, while words don't. Numbers are less complex than words, but are not required to differentiate that which exists from that which doesn't as words are.
We must completely understand something in order to describe it with numbers, although that is not the case with less-precise words. We cannot determine the complexity of the words that we must fall back on because if we could, we could continue our description of reality with numbers and would not need the words.
We know what words mean, or else they would not be useful, but we do not know how much actual complexity the word contains in it's meaning because if we did, we could express it's meaning with numbers and would no longer need words.
Numbers are all that there really is. Everything is actually numbers being manifested. This means that there must be a formula for everything that exists. But because of our complexity level, we are unable to discern formulae about ourselves or things more complex than us.
We can only arrive at a formula for something that is less complex than our brains, which have to figure it out and completely understand it. To derive a formula about ourselves or things more complex than us, we would have to be "smarter than ourselves", which is impossible. We could take the communications systems of animals and break it down into numbers, but cannot do that with our own. So, we can only rely on words for such descriptions.
But if there must be a formula for everything, even if it is hidden from us by our complexity perspective, that must also include words. Out there somewhere, there must be a way to substitute a number or a formula for every word in our language. If only we could arrive at this, it would be possible to construct a very complex system of numbers and formulae that would parallel the words that we use to communicate.
If we could only accomplish this, we would have the numbers that computers can deal with. Computers could deal directly with ordinary words, at least the ones that we had incorporated into this matching structure of numbers and formulae, and these artificial computer languages would no longer be necessary. We cannot see this, at our complexity level, because we are up against our own complexity and we cannot be "smarter than ourselves".
In the universe of inanimate matter, there is only quantity. In other words, everything is really numbers but with inanimate matter these numbers and formulae that describe everything are only one-dimensional. When we deal with living things, particularly ourselves, we have to deal with quality as well as quantity.
We can differentiate between the two by describing quantity as one-dimensional and quality as multi-dimensional. Quality forms a peak, which is the intersection of at least two slopes, while quality forms a simple slope. Quality is not simply "the more, the better", but is a peak factor. This is why we are so much more complex than the surrounding inanimate reality.
We should be looking for the mathematics that must exist to incorporate every word that we use so that each word that we use can be expressed as a number or formula in the overall structure. Computers will then be capable of dealing with ordinary human language. All that we would have to do is to tell the computer what we wanted it to do, and they would be unimaginably more useful and easy to use than they are now.
We will get back to this in the last section of this posting, but we have to examine our basic number system first.
9) SMART NUMBERS
We have seen how far we can advance computer science by upgrading the basic coding system that has been in use since the early days of computers. We can go even further by questioning our basic number system, since computers work with numbers. It becomes possible to get computers to actually understand anything that we express in words, without any embedding or keywording.
9a) INTRODUCTION
Have you ever questioned our basic number system that we use? I don't mean mathematics I mean the fundamental numbers, the 0, 1, 2, 3, 4,....
You may be wondering what there is to question about the basic numbers. But that is the point, we learn the numbers in early childhood but then don't give them much more thought.
Two things made me question our basic number system.
The first was fundamental mathematical constants like pi, the ratio of a circle's circumference to it's diameter. By mathematical constant I mean one that is dimensionless, not involving our artificial units of measurement.
At first glance pi seems to be very simple. But yet it requires an infinite number of digits in our number system to express it. This didn't seem to make sense. Most physicists agree that everything is really numbers being manifested, which is why mathematics is so useful. But then everything must somehow ultimately be expressible as either whole numbers, or as a ratio of whole numbers.
It just didn't make sense that, if we express reality in numbers, that something as simple as the ratio of a circle's circumference to it's diameter should require an infinite number of digits, and thus an infinite amount of information, to express.
The second thing that made me question our basic number system was biology. Everything is really numbers being manifested. Inanimate sciences, like chemistry and physics, are very math-intensive, which is what we would expect.
But why then is biology, which involves more complex processes than inanimate chemistry and physics, so much less math-intensive? There is mathematics in biology, but not to the same extent as chemistry and physics.
The conclusion I came to is that it is our number system that cannot handle the complexity of biology, so we describe it more with words than with numbers.
9b) WHY CAN'T WE EXPRESS PI IN FINITE FORM?
Here is a question. The value of pi is the ratio of the circumference of a circle to it's diameter. A circle and the line which forms it's diameter are the simplest of geometric forms. But yet the value of pi actually contains an infinite amount of information.
Pi is an irrational number, meaning that it can be expressed neither as a whole number or a rational number, or ratio. Pi is equal to 3.1415927..... It goes on and on to an infinite number of digits. Computers have calculated the value of pi to quadrillions of digits, and there is no end in sight. The fraction of 1 / 3 can also be calculated to an infinite number of digits, .3333... But it does not contain an infinite amount of information because it is repeating.
The fraction of 22 / 7 is often used as a close approximation of pi, and is good for most purposes where extreme accuracy is not necessary. But it is not exactly correct.
This really requires a special explanation. How can an infinity of information come from the simplest of geometric forms? Why can such a basic concept as pi not be expressed in finite numbers?
Pi is far from the only irrational number, which can be calculated to an infinite number of decimal places. Another common one is e. e is the function of exponential growth, such as compound interest. The value of e is ( 1 + 1 / x ) raised to the x power, with x being any large number. The larger the value of x, the more accurate will be the calculation of e. The true value of e is 2.718... but, like pi, it can be calculated to an infinite number of digits.
The universe is basically a simple place. So then why can the mathematics which are used to describe the workings of the universe get so complex? Why do we need pages and pages of mathematics to understand what is really a simple universe?
9c) IT SOMETIMES SEEMS AS IF THERE ARE MISSING NUMBERS
The way that our numbers work is that 1, 2 and, 3 can be described as the fundamental numbers, and the rest are multiples of these. If we bring addition into the picture, we can add 1 to certain even numbers to create numbers outside the factor tree, known as prime numbers. A prime number is one that has no multiples other than 1 and itself. So these are the fundamental numbers, and their factors and additives begin with 4.
Suppose that there were beings which reproduced and spread by dividing in half, and then going off in opposite directions. If they were aware of numbers, they would only be aware of the number 2 and it's multiples. Their number system would be 1, 2, 4, 8, 16... We use numbers to describe the world around us and they would not be aware of any other numbers.
Now suppose that the beings began to study the world around them. They would encounter situations that involved the number 3 and it's multiples. But they only knew the numbers that they had seen before, and built their number system around multiples of 2.
When they expressed multiples of 3 in their number system, they would not be able to express it as either a whole or a rational (ratio) number. They would only be able to express it as an irrational number, which could be taken to an infinite number of digits. What we express as 3 would be a missing number.
The same could be said of a young student who was taught about 1 and 2 and multiplication, but not about addition. The student might surmise that there were multiples of 2, such as 4 and 8. But all of the other numbers that we know would be missing numbers. It was not that the other numbers did not exist, it was just that the student had no knowledge or experience of them and did not include them in his number system. The student would only be able to express other numbers as irrational numbers with an infinite number of digits, just like we express pi.
3 could not be described simply as halfway between 4 and 2 because it's relationship to other numbers would not be correct, and the student would not know how many other "missing numbers" there might be. Any such number would appear as an infinite, non-repeating decimal. The outside number would not be able to be expressed as a ratio of existing numbers, but could only be an irrational number that could be calculated to an infinite number of digits.
Now, here is what we have to consider today. What if we have missed some numbers in our number system? Like the beings who multiply by dividing in two, we use numbers to describe the world around us that we know. That is why numbers are useful.
Why couldn't there have been numbers outside our experience that we missed? This is not as unusual or unprecedented as it may seem. As we saw in "The Zero Hypothesis", on the progress blog, humans could not do complex calculations until the importance of zero was understood. The ancients had various counting devices, such as the abacus, but could not do complex calculations yet because they did not understand the importance of zero. You may have noticed that there are no references to complex arithmetical calculations in any ancient texts, although there was geometry.
That is because zero was once a missing number. It had not been included because, when the number system was developed, there seemed to be no reason to count zero of anything. Zero is vital because matter does not fill the universe and we have to deal with empty space, and we cannot do complex calculations without it.
The entire universe actually had it's own missing numbers. We saw in "The Even Number Bias" how even numbers must have come before odd numbers, and we can still see the "ghost" of that today. The fusion process that takes place in stars favors atoms with even numbers of nuclei in the nucleus. The most stable and common elements in the universe are those with even numbers of nuclei, the original hydrogen being the exception.
The 25% of original atoms that were helium, and heavier than hydrogen, were pulled by gravity into fusion in stars, and this formed a "factor tree" of 2 x 2 x 2... The lighter hydrogen atoms could be added to form odd numbers, but this was originally the exception rather than the rule. Odd numbers thus started as universal missing numbers.
It is important to understand that we cannot get to missing numbers, expressing them as whole or rational numbers, by any kind of operations using existing numbers. If we could, then they would not be missing numbers.
But how would we have missed any numbers?
We use numbers because they are useful in describing the world around us. Our world is made of matter. The reason that humans missed the importance of zero for so long is that it is the number of empty space.
We see how the number two began, because there are two electric charges that make up the universe. 3 was introduced into the universe of matter because of the three quarks that make up nucleons. A quark cannot have a partial charge based on only the numbers 1 and 2, 3 is required. Dimensions of space is what brings in multiplication.
Notice that the mass of a proton is 1836 times that of an electron, which involves multiples of both 3 and 2 and also 1 because it brings us down to the prime number of 17. This is what brought our fundamental numbers of 1, 2, and, 3 into being, and other numbers are multiples and additives of those.
A number has no real existence in itself. It exists only when it is manifested in some way. Our presumption is that all possible numbers have been somehow manifested and that we have included them in our number system but, as we can see here, we cannot know that for sure.
9d) WE ARE NOT MAKING FULL USE OF NUMBERS
But these numbers that we have are the numbers of matter. The vast majority of the universe consists of space, and matter exists in space and not the other way around. In my cosmology theory, space existed first and matter is a relative "newcomer" to the universe.
Notice that it is pure mathematical constants, such as pi or e, which tend to be irrational numbers, and not the constants of chemistry and physics. These sciences are the science mostly of matter. The stoichiometry of chemistry revolves around whole numbers. The constants of physics revolve around our artificial units of measure, such as second, meters and, kilograms.
It is the dimensionless constants of pure mathematics which tend to be irrational, meaning that they cannot be expressed without an infinite number of digits, this shows that they were part of the universe before matter and our numbers are the numbers of matter.
We learn the numbers at age 5 or 6 and then never question them. To really understand the universe, we must "get outside ourselves". Just as we know that there are planets outside our solar system, so there are numbers outside our number system. When we encounter them, we cannot fully express them or their multiples as part of our number system.
But even though we seem to be missing numbers, that is only because we are not making full use of the number system that we have. This makes it so that many mathematical constants cannot be expressed in finite form.
9e) THERE ARE REALLY NO IRRATIONAL NUMBERS
Pi does not fit into our number system, but the fault is with the system. There should not be any such thing as irrational numbers. If basic mathematical constants, like pi, can only be expressed as irrational numbers then the only possible explanation is that we must somehow be missing numbers in our number system.
Imagine someone making the universe from the beginning, putting it together. Everything must somehow be expressible in whole numbers. It has to be this way. There has to be numbers that we missed because we did not notice or need them. We see that our factor numbers begin with 4, and notice that basic mathematical constants are virtually all below 4, which is where they could have most easily have been missed without being noticed or needed.
What if common irrational numbers, that are at the center of so many of our formula, such as pi and e, are really based on whole numbers that we missed because they were somehow not manifested within our experience?
The whole idea of such basic mathematical constants not being able to be expressed in whole numbers doesn't really make sense. My conclusion is that irrational numbers cannot really exist, we must be missing some whole numbers.
Everything should be able to be expressed as a whole number or ratio, as long as we are not dealing with anything that humans have created artificially, such as units of measurement. This is not the case with pi because it is dimensionless, meaning without units.
9f) THE IMPORTANCE OF RATIOS
A number by itself means essentially nothing. The first step in using numbers in meaningful expression is in the form of ratios, or rational numbers. A ratio is not just a number, but a number in relation to another number.
This is how reality really operates, in the form of ratios rather than of whole numbers. Remember, as we saw in "The Lowest Information Point", that the point of least information is a "favored point" because the universe seeks the Lowest Information Point just as it seeks the lowest energy state, because energy and information is really the same thing.
I define the Lowest Information Point as the interaction of two ratios, where A / B = B / C, so that "A is to B as B is to C". This the Lowest Information Point because it involves only three points of information, as opposed to the four in A / B = C / D.
The trigonometric functions are also ratios. If we have a right triangle, or a radius from an intersection of a X and Y axis, the sine of the angle between the radius and the X-axis is defined as Y / R, or the length of Y-axis involved over the length of the radius. The cosine is defined as X / R. The tangent is defined as Y / X. Then there are the other three functions that are the inverse of these. The co-secant is the inverse of the sine. The secant is the inverse of the cosine. The cotangent is the inverse of the tangent. In the functions beginning with co-, the value gets smaller as the number gets larger.
What we refer to as "diminishing returns" is also based on ratios. If we were given a large amount of money, we would be very grateful. But if we were then given the same amount of money again, we would still be grateful but not quite as grateful as with the first amount of money. While the second amount of money would be numerically the same as the first, it's value as a ratio to the money that we already had would be less. This is referred to as "diminishing returns".
Any comparison is based on ratios. If there were two children, aged 5 and 10, we would say that there is a big age difference between them. But if they were 45 and 50, we would not say that there was much of an age difference, even though the difference is numerically the same.
The concepts of "far" and "near" have no exact numerical definition, but are based on ratios. If two atoms were 5 km apart, we would say that they were very far away from each other. But if two towns were 5 km apart, we would say that they were near each other.
So much of nature is based on ratios. A very important number is how reality operates is the so-called "Golden Ratio". The Golden Ratio is defined as the sum of two unequal numbers being of the same ratio to the larger of the two numbers as the larger of the two numbers is to the smaller. Expressed in decimal terms, the Golden Ration is an irrational number, 1.618034...
So not only is the Golden Ratio an important ratio it is also another basic constant, like pi and e, that is so important to us but our number system is unable to express it as either a whole number or a ratio of whole numbers.
Much of nature, such as the construction of plant leaves, operates by the Golden Ratio. It is also very important to humans because it enhances our perception if something is in the Golden Ratio. Notice how television and computer screens, pages in books, maps and, billboards almost always have the approximate dimensions of the Golden Ratio.
Even when we express something in whole numbers we are still effectively using ratios. The number is the numerator and the "thing" is the denominator. To say "5 apples" is actually 5 / apples.
Most physicists agree that everything is really numbers being manifested. But we can see how the universe really operates not by whole numbers but by ratios, one number in relation to another. We could say that whole numbers are "simple numbers" while ratios are "complex numbers". A number doesn't really mean much by itself, unless it is contrasted with another number, or a word that is used as a substitute for a number, as the denominator of a ratio.
9g) IMAGINARY AND COMPLEX NUMBERS
In algebra class, many of us were mystified by what "imaginary numbers" were supposed to be useful for. An "imaginary number", denoted as "i", is defined as the square root of negative one, -1. This means that i squared = -1.
The trouble is that a negative number cannot have a square root because two negative numbers multiplied equals a positive number. But yet we have to learn this to pass algebra class.
But later, after I came up with my cosmology theory, it suddenly made sense. In fact, the reason that it at first doesn't seem to make sense is the limited use of our system of "real" numbers.
The "imaginary numbers" are actually defined as another line of numbers that is perpendicular to our usual line of numbers. "i" is defined as being one point away from our numbers if we picture them as being on a straight line. Any of our "real numbers", which are ordinary whole numbers, has an "i" component of zero.
I do not think that "imaginary numbers" is a good term to use, since these numbers turn out to not be "imaginary" at all. A better term is "complex numbers". The difference between complex numbers and our common "simple numbers" is that two complex numbers can be equal or equivalent, without being the same thing. That is not possible with our usual "simple numbers". In addition, every complex number has more than one square root.
The "i" of "imaginary numbers", defined as the square root of -1, is useful for differentiating a complex number from the addition of two "real" numbers. A complex number thus looks like this:
( 5 + 7i )
When we see the "i", it tells us that this is a complex number and we do not add the 5 and 7 as in an addition operation. If we just express the whole number 5, the i component would be zero: ( 5 + 0i ).
The number system thus becomes a set of points on a two-dimensional grid, rather than on a one-dimensional line. This is necessary to express spatial concepts like pi.
Next, considering that the universe really operates by ratios, a number by itself is essentially meaningless until it is in relation to another number, a complex number then takes this form:
( A / B + C / Di )
The "i", once again, reminds us that this is a complex number and the two ratios are not fractions to be added together.
This is how the universe really operates, as ratios and complex numbers. The reason that we cannot express such important numbers as pi in the form of whole or "real" numbers is the fault of our use of simple or "dumb" numbers. These complex numbers are really the "smart numbers" which can express how the universe really operates.
Mathematical formulae, not those involving chemistry or physics which tend to have our artificial units, such as meters, kg, seconds, etc., but "pure" mathematical formulae, such as pi, are expressible in finite form but only if we use complex numbers as I am describing here. If we have to use an infinite number of digits, or an infinite series of fractions, to express pi, then we are using only the limited simple numbers.
Formulae for curves, hyperbolas and, parabolas are also actually complex numbers when we add the variables to the equation.
"Real" numbers should be able to express a relatively simple concept like pi in finite form, and here it is: (source-Wikihow)
pi = 2 ( arcsin ( square root of ( 1 - X squared) ) + absolute value ( arcsin ( X ) ) )
Absolute value means the value of something, regardless of whether it is positive or negative. Remembering that positive and negative, unlike in our "real" numbers, is really interchangeable.
Arcsin is the inverse of the sine function. If the sine of 30 degrees is 0.5, then the arcsin of 0.5 is 30 degrees.
So here is the answer to where our missing numbers are, as described above. It is really that we are still using the same old simple numbers when we should be using the complex numbers that truly describe the operation of the universe. Two numbers must be able to be equivalent without being the same thing, and that is possible only with these complex numbers.
9h) THERE ARE REALLY NO NEGATIVE NUMBERS
Numbers are traditionally taught as operating in a line beginning at zero. On the other side of zero is a mirror-image line of corresponding negative numbers.
The arithmetical rules of dealing with negative numbers are as follows:
A positive number multiplied by a negative number gives a negative number.
Two negative numbers, or two positive numbers, multiplied gives a positive number.
But what I have concluded is that negative numbers do not really exist outside of the world of human beings. We only see negative numbers because of our perspective on the universe.
Negative numbers can be compared to color. We know that color does not really exist in the universe of inanimate matter. Color is just how our eyes and brains interpret different wavelengths of electromagnetic radiation.
The only examples of negative numbers that I can think of are in expressing temperature, debt and, exponents. But our temperature units are our artificial creations. There are no negative temperatures in the Kelvin Scale, which begins at Absolute Zero. Debt is a function of our artificially-created money system and economics. Exponents are part of our artificially-created units and number system.
Negative numbers thus only exist in systems that we have created ourselves, and not in the universe of inanimate matter outside of ourselves. Negative numbers are an artificial creation that does not really exist.
What do you know? That "imaginary number", denoted as "i" for imaginary, is so baffling to algebra students because it is supposed to be the square root of -1. But yet we are taught that negative numbers cannot have square roots because two numbers multiplied together always equals a positive number.
Well as we can see here negative numbers do not really exist. But yet this concept of i must have some value or algebra students wouldn't be forced to learn it. What it really is is the step toward the complex numbers by which the universe really operates.
9i) BIOLOGY AND COMPLEX NUMBERS
Along with why something as basic as pi, the ratio of the circumference of a circle to it's diameter, can never be expressed as either a whole number or a ratio, the other thing that always got me wondering about our number system is biology.
Sciences like physics and chemistry are really mathematically intensive. There is mathematics in biology but it is quite a bit less intensive than chemistry and physics. Has anyone ever wondered why?
The answer is that our mathematics cannot as easily represent biology, which is more complex than inanimate sciences like chemistry and physics. We saw in the compound posting on this blog, "How Biology And Human Life Fits Into Cosmology" June 2016, that biology contains four points of information, whereas inanimate matter contains only two.
The two points of information in inanimate matter is, of course, the two electric charges of negative and positive. The DNA of all living things contains four points of information. These are abbreviated as the A, T, G and, C. With twice as many points of information this means that there is a second dimension of information in DNA.
Physicists usually agree that everything is really numbers being manifested. We can see this with inanimate sciences like chemistry and physics. It is less easy to see with the more complex biology, but if everything is really numbers then that must include biology also.
But what do you notice here? The complex numbers also have an extra dimension, and this extra dimension would fit with the extra dimension of biology. By using complex numbers we could use mathematics to describe biology in the same way that we do now with chemistry and physics.
9j) COSMOLOGY AND SMART NUMBERS
A grid of negative and positive numbers is a model of the alternating electric charges comprising space. Each charge is a point on the number grid. But it must be a two-dimensional grid in order to represent our spatial universe. That is where these complex numbers come in, the i represents a dimension of numbers that is perpendicular to the traditional whole numbers.
There are two opposite directions in every spatial dimension because, from any given point, there are two permutations of charges, one beginning with a negative in one direction and the other beginning with a positive in the other direction. That means that we had to create a line of numbers, with higher numbers in one direction and lower in the other, as a useful tool to represent reality. But the number system must have perpendicular lines of numbers to accurately represent our spatial universe.
The two possible solutions involved in complex numbers are inverses of one another, just like the perpendicular directions in space.
The two square roots of complex numbers, actually there is one square root per dimension, can be expressed as diagonal lines at 45 degree angles on a grid, just like the dimensions of space in the cosmology theory.
10) COMPUTER BREAKTHROUGH
After reading "Smart Numbers", above, we can get to the possibility of breaking words down into numbers that the computer actually understands.
Google announced recently that it had achieved "quantum supremacy", in reference to a new quantum computer that can perform calculations far faster than a conventional computer. An ordinary computer uses magnetic bits, which can store information by representing either a binary 0 or a 1. Basically, a computer making use of quantum physics can calculate much faster, because bits in quantum states can represent either a 0 or 1 at the same time.
I just want to remind readers that there is another breakthrough development in computing. Developments in computing are all about technology, but what about language? Suppose that there was a major breakthrough to be made on the language side that cannot be done with technology?
Physicists always tell us that everything is basically numbers. This means that there must be a way to break words down into their corresponding numbers. If that could be done then the computer would automatically understand all text data without any further instruction as to what the words mean, just as it can do now but only with numbers.
If this could be done then a computer, or any information device, would be able to understand words just as it understands numbers. As it stands now, a computer understands numbers but it doesn't understand words at all unless keywords are coded in. It understands numbers because numbers form a logical linear sequence while words don't.
Computers are designed around use of numbers, but do not actually understand words at all. Words cannot be understood by a computer without additional keyword instructions. To a computer, as it stands now, a word is just a mass of byte code that is recognizable to us but meaningless to the computer. We can organize words by alphabetical order, or parts of speech such as nouns or verbs, but there is at present no way to organize words in logical order by meaning in the same linear way that numbers are organized.
But, again, physicists tell us that everything is really numbers in manifestation. This must mean that all words actually are numbers that any computer or communication device could understand if only we could break words down into their corresponding numbers.
This would be a much better way of storing knowledge than at present because the computer or application would actually "know" the knowledge, instead of merely storing and displaying it. The computer or phone could reason and answer questions based on the knowledge that it has, and could readily translate one language into another because all words would be broken down into the same numbers.
Since most physicists agree that everything is really numbers there must be a way out there somewhere to arrange words in a logical structure by meaning in the same way as numbers. The computer would then really understand all of the data that was inputted into it. That would be a massive breakthrough in computing.
I have the mathematics all worked out that would have every word in the language broken down into it's corresponding numbers. The process resembles an inverted pyramid with the mathematical operations that the computer or phone device already understands at the bottom and the numerical codes representing the meanings of words built upon that. The result will be the meanings of all words expressed by numerical codes that are broken down into the math that the computer understands so that the computer can then finally understand words as well as it does numbers.
Using this system, suppose that we give the computer a simple statement like "The front of the new school will be made of red brick". The computer will automatically know that red is a wavelength of electromagnetic radiation visible to humans. It will know that bricks are made of clay, which is broken-down rock. It will discern that bricks have mass and the school will be bound to the planet by gravity. It will be clear to the computer that the school will get wet when it rains, and will need periodic maintenance. The computer will also know that a decision must have been made to build the school based on economic considerations and that humans must learn their knowledge, and that this is the purpose of the school.
As it stands now, the computer can understand complex mathematical operations but not simple reasoning like this using words, unless such words are encoded, such as with keywords. But with this system, the computer will understand every word and sentence because each word will be presented as part of the overall language structure with the word based on the words below it, which define it's meaning, with all words ultimately based on the mathematical operations which the computer already understands.
This is not as difficult as it may seem. There are only about ten thousand words in common use. Words can be defined by combinations of other words, or else we would not have dictionaries. The most commonly used words are actually numerical expressions such as and, with, without, near and, far. Most prepositions are numerical in nature.
This system makes extensive use of the common patterns shared by so many words. For example, eyes, windows, lenses, cameras, sensors and, antennae are all variations of the same thing, and will thus have related numerical codes. There are very many common patterns between living things and technology, ribs in the body and rafters in a house, for example. Words which are manifestations of the same basic pattern will have related positions in the structure of all words, this will make it easy for the computer to understand the words and to relate them to the mathematics which it already understands.
It is easy to see how the physical universe is all numbers being manifested. There is the +1 and -1 of the basic electric charges in the universe. Atoms are structures that balance positive (protons) and negative (electrons) charges out to zero. Atoms are numbers in that the elements are defined by the atom's atomic number. Atoms + other atoms = molecules. On a large scale in the universe, spheres form by gravity as planets and stars because a sphere has the lowest surface area per volume, and this the lowest energy state. Computers can readily be used to model the physical universe because of how it is ultimately always based on numbers.
But things get more difficult when we come to modeling living things and humans as mathematical models that the computer understands. This system of mine, however, makes any thought or action, as well as any implication of that thought or action, expressible in numbers that fit into a mathematical structure with all other meanings of words.
It is important to understand that this is not at all an issue of technology, but of language. This could actually have been done with the technology of decades ago. What is novel about this approach is that it is from the language side, rather than the technology side.
I see this step as inevitable in computer technology, but it has not been done yet. How can real artificial intelligence ever be possible without a step like this? I studied computer programming, but was dismayed by all of the computer languages for different purposes. Why can't we just get the computer to understand ordinary language? We do not have to speak a different language for every different thing that we do, so why should the computer?
There is nothing in AI like this. It is not machine learning or word embedding. I do not see how we could have real artificial intelligence without the computer or device actually understanding words. This system is not actually a computer language that requires compiling or interpreting. Rather, it is the system of mathematics that turns all of human language into a computer language that can then be completely and readily understood by the computer because the language has been broken down into the numbers that it ultimately represents.
Again this must be possible if, as physicists tell us, everything is really numbers. It is not a question of if it can be done, there must be a way to do it. I have thoroughly tested this system of mathematics that I have developed and anything that can be expressed as words can readily be translated into numbers that the computer understands.
This system is complete but I have not yet arrived at what to do with it, and have not decided on a name for it. I do not want to get involved in starting a company myself but could give this to a company or I might make it into a book.