Sunday, September 18, 2016

Learning any base numbering system

In previous tutorials I made, I showed how to work with binary (a base 2 system) and hexadecimal (a base 16 system). There are many things they share and principals in their systems which are true. Knowing these principals you can derive and learn any base system quickly and intuitively.

The first thing you must realize is the BASE of the system. This is the determining factor in knowing the place value for each digit. For example if we use the decimal system what we count with everyday we see that it is base 10. That means there are 10 distinguishing characters used in this system. These characters correlate to the digits and in actuality can be any character but for this example we will use the characters we are all familiar with which would be [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]. As you can see there are 10 distinguishing characters or digits.

Lets talk about place values now. Given any base system each place value to the next is calculated by (BASE ^ PLACE).

So in the base 10 system its like this:
...... 10000(or 10^4), 1000(or 10^3), 100(or 10^2), 10(or 10^1), 1(or 10^0).

so given the decimal(base 10) number 54,321 we can see that there are:
5x10000 or 5x(10^4) = 50000
4x1000 or 4x(10^3) = 4000
3x100 or 3x(10^2) = 300
2x10 or 3x(10^1) = 20
1x1 or 1x(10^0) = 1

if you add these all up (50000 + 4000+ 300 + 20 +1) you get 54,321.

Now lets try duodecimal(base 12).
Lets establish the characters [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, A, B].
(where A can be thought of as 10 and B as 11)
now establish the place values.
.......20736(or 12^4), 1728(or 12^3), 144(or 10^2), 12(or 12^1), 1(or 12^0).
so given A53B7:
Ax20769 or Ax(12^4) = A0000
5x1728 or 5x(12^3) = 5000
3x144 or 3x(12^2) = 300
Bx12 or Bx(12^1) = B0
7x1 or 7x(12^0) = 7

if you add these up you get A53B7.

because every system is base based (lol didn't realize this pun until I wrote it) we can just divide the the bases to go from one base to another. using the remainder as the desired next digit in the resulting base's number.
One way is to do long division with the 2 different bases and help with base 10.
I'll get more into converting from any base to any other base in my next post.

Monday, June 20, 2016

VBES - Saving even more bandwidth with UTF8

So a while ago I came up with an encoding scheme which utilized UTF8's variable byte concept to essentially save bandwidth while chatting in instant messaging scenarios. After that I had thought of improving upon that idea even further but never got around to actually coding it, I had the idea down but I didn't have a solid working scheme much less code. The other day though I thought I'd work on it and after planning it out and finishing the encoder and then the decoder the next day I had come up and made an encoding scheme which meets all the requirements of my previous encoding scheme and performed better (in saving bandwidth). This new scheme implements the idea that some letters are used more often then others and so if I allocate fewer bits to those more likely used letters the final message should be pretty small. I broke up the letters, a few punctuation, a space and an escape for UTF8 fallback into groups of 4bits, 6bits, 8bits, and 9bits. It is completely backwards compatible with UTF8 which is nice in scenarios where VBES would produce larger messages, one can just used the UTF8 version instead and the message would still be decoded. Its a simple concept and I will be opting to use this for my future messenger app.

How it works:
* 0000
  0001
e 0010
t 0011
a 0100
o 0101
i 0110
n 0111
s 1000
h 1001
r 1010
d 1011
l 1100
c 1101

u 111000
m 111001
w 111010
f 111011
g 111100
. 111101

y 11111000
p 11111001
b 11111010
v 11111011
k 11111100
j 11111101

, 111111100
x 111111101
q 111111110
z 111111111

The * signifies that there is an escape and a UTF8 encoded character should fill the space.

Demo Encoder/Decoder: HIDDEN FOR CHALLENGE

Q/A:
How much space does this save?
In the best case scenarios the size of the message is 50 percent of the UTF8 message plus 1 byte. So that is (X*.5)+1 bytes.

Where does this 50 percent plus 1 come from?
The half comes from the fact that the most optimized characters are 4 bits long which is half a byte which is half the size of the smallest possible letter in UTF8 which is a byte. The +1 comes from the fact that if a message consisted of all 4bit letters and fit perfectly into multiples of 8bits there would still require 2 more bits for the UTF8 marker. With those 2 included bits there now has to be padding and thus an entire byte has been added to the message.

How does this compare to the previous encoding scheme?
The previous encoding scheme worked on optimizing several characters and allocating 6 bits for each of those optimized characters. Because 6 is 75 percent of 8 and also had the 2 added bits for the UTF8 marker its best scenario efficiency would be (X*.75)+1.

How comes this scheme doesn't optimize capital letters?
By default this scheme will turn the entire input message to lowercase because only the lowercase letters are optimized and thus the message would save the most space this way but you can disable this behavior if you truly wish to send a message with capital letters. The capital letters each will be encoded using standard UTF8 and their placement will be signified with 0000 which is also used to signify the placement of other UTF8 encoded characters.

Source (VB.net):



Tuesday, April 26, 2016

Interesting Videos

On the internet there are many sources of information. This post will revolve around videos on YouTube which provided me with knowledge of some form of intellectual interest.



More to come....

Saturday, January 23, 2016

Introductory to Java

I have started school again recently and one of my classes (for Computer Science) is Java 1. I figure it might be a good idea to make a blog post for it to document and to help classmates who need it. I will cover things in the lessons and explain them in my own way (to help classmates if they so happen to need it) as well as some things I notice, advice, etc. I will try to use illustrations where possible and analogies where concepts are somewhat complex. Also keep in mind that I am not an expert by any means but I thought I would provide this in hopes of people will get at least a feel for how programming is, So while some information for example terms and such might not be completely accurate they should be close enough to use as a stepping stone to reach an understanding on your own that you will develop. Of course if anyone feels as though they need to make corrections or contribute the comment section is open.

[Week 01]
    This week it seems we talked about how a computer system works and how it operates on data. I wont get into much details about that because this is a Java class I don't think this is all that important to know since Java is a high level language. There is no need to worry about things as ALU, or registers etc because they are on a much lower level and programming in Java you will most likely never encounter a scenario where you need to worry about such things.

    Finally towards the end of the first week we got to some code. Although simple there were a couple things we were told not to worry about until later but I feel as though they should have been covered early on so that there is room for familiarity when we do get to them later. We were asked to make a "Hello World" program. This program is typically the first program beginning programs try to make. Keep in mind that I wont be providing code for any other assignments besides the "Hello World" instead I will try to create similar code which will demonstrate the same material.

    Now the code for Hello World will look like this:


    Now there are some other lines before the snippet shown but those wont be important throughout the whole class I believe unless we get into import etc.

We can look at this following code line by line.

Line 1 ("public class HelloWorld {"):
    This specifies that there is a class called HelloWorld. First we have the word "public" this is the access modifier. This states that the "class" can be accessed through the rest of the program. There are other access modifiers like private and such as well and they should be used according to your needs and where appropriate.
Click here to know more about Java Access Modifiers
The next part "class" is usually the Type specifier and will provide information on what kind of thing something is. There are many Types and one can make their own as well. Next we have "HelloWorld". This is a name. This can be anything the programmer wants to call it but typically this should be given a good meaningful name which correlates to it purpose. There are restriction in naming things. You cannot name something that is already taken by a reserved keyword. So words like "String" or "public" will not be allowed. Names also cannot contain any blank spaces in between and to make up for that we have something called CamelCasing which basically has you capitalizing the first letter of each word in place of putting a space. Finally on this line we have a open curly brace "{". In this case this essentially encapsulates the code that will be associated with the class called "HelloWorld", but it will be used to encapsulate code to anything and is used to define the scope. Some people have this placed on the next line but it doesn't really matter as Java is not a line oriented language and so this can be placed anywhere as long as it is directly before the code you want associated with the class also you should pick some convention and stick with it for readability for other people. This easily shows what code is apart of what.

Line 2 ("  public static void main(String[] args) {"):
    The first thing we see here isn't very important in that you can not do this and your program will still run exactly as if you did have it. Its the indentation (the long blank space). Indention is an important thing that one should learn to do as to make it easy for people reading your code to know what level things are at. The more to the right a line is the deeper that code is. The more left means its closer to the general level. Again without it your programs will still work but readability will be lacking and I wont be surprised of the professor took points off for it. "public" was already covered in line 1 and essentially serves the same purpose. "static" states that this is shared among all other instances of the class HelloWorld. You can have a million instances of HelloWorld but they will all use this one main function. "void" essentially means that this is a function which(to the coder) doesn't doesn't return anything back, functions that don't return anything are also called methods. "main" is typically the name of the entry point and where the program begins executing the programmers code. Next we have an open parentheses "(". This is the opening for parameters, just like in Mathematics where you would have a function for example f(x) aka F of x. In pretty much the same idea the parentheses is there to distinguish the function part from the parameter part. Next we have "String[]". In this part there are actually several things happening. First this part syntactically is another Type specifier which was covered above. It is essentially telling what kind of data x is. Here we can see the type is "String". A string can be just thought of as text, so basically a bunch of characters. Then we have the "[]" which specifies that the String is actually an array. You can typically use this along with any Type when you want to make an array of them. So in this case for example instead of just containing one String, this can contain multiple Strings. "args" is the name of the String[] from earlier. This is used for referencing later just like how the x in f(x) is referenced when defining a function, for example f(x) = x + 1. The function name is f, the parameter is x and the function returns basically what ever number you use as x incremented by 1. So if I put in 1 I get 2 and if I put 2 it results in 3. The closing parentheses is used to enclose the parameter portion of the function. Some functions will not contain any parameters and will only show the opening and closing parentheses like this "()". Finally for this line we have another open curly brace "{". This again is to encapsulate anything that is to be associated with (in this case) this function and defines that this is the begin of the encapsulation or defining the beginning of the scope of this function. Scope is the accessibility of something given what level they are on. Remember the indentation. Things on the same level can access each other or on anything that is on a level to the left (as long as it is only to the left, the moment something branches to the right again, anything within that scope that went right will not be accessible).

Line 3 ("    System.out.println("Hello, World");"):
    First we see there is a long blank. This you will see is about twice the length of the blank space from the previous line. This indentation is on another level/scope. This means that anything on this line can access other things on the same scope or anything to the left (in this case things with one indentation). Next we have "System" and is a built in class to Java and has many useful things programmers typically would use. "." is the next thing we see and it is to specify that we are going to reference something that is a part of "System". "out" is the built in output stream object. It is used to output to the console, it is apart of the System class. "println" is a function that is a part of "out". In this case you can think of "System" as some company, "out" is some worker at that company, and "println" is a job that worker knows how to do. Now we see an open parentheses which provides the same functionality as mentioned before. It is there to specify the parameters. In this case (although not shown here) println will be taking in a parameter of Type string. Next we see "Hello World" (including the double quotes). This is called a string literal. A String literal is a string that is used without being assigned to a variable and instead used directly as is. Strings are distinguished with quotes and double quotes. We then see a closing parentheses which indicates the end of the parameters. Finally we see ";" which is used to end the code statement. Because Java is not a line oriented language it needs something in place to indicate and separate statements from one another.

Line 4 ("  }"):
    This is to close the encapsulation or scope of the code block. In this case it closes the function main from earlier.


Line 5 ("}"):
    This is to close the encapsulation or scope of the class HelloWorld.

EXTRAS:

Comments: Comments are bits of text to note a line of code or note the functionality of a function or the entire program. It can also be used to relay other information as well. Comments can be denoted with two back slashes for small single line comments. Multi-line or block comments can be denoted by beginning with "/*" (without double quotes) and ending them with "*/" (without double quotes). Here are some examples of comments:


Variables: Earlier we talked about String literals and how it wasn't assigned to a variable. Variables are a way of storing things for later referencing/usage. The times we use things such as String literals is when we don't need to reference back to it multiple times and make any changes etc. You create a variable by first specifying an access modifier though not always necessary and can sometimes be skipped. Next you need to specify a variable Type. This can be any of the primitive types such as int, long, float, and double or be specified to a custom Type you made either by creating a class or structure of the same Type name. Next you give the variable a name. This has the same name restrictions as mentioned earlier. Next you can either assign a value to the variable with an equal sign "=" and then close it or just close it without assigning a value with a semicolon ";". Here are some examples of variables:


[Week 02]
    Week two had some basic math and data types. We did addition, subtraction, multiplication, division, and modulus. We also went through unary and binary operators as well as compound operators as well as increments/decrements. Constants were touched on although for me it begs the question "what about readonly" ? It turns out there is no "readonly" in java. Just a mushed together "final". Java seems to be pretty bad if you are trying to do math and with accuracy. For instance in Java it is stated that an integer can store a value between -2,147,483,648 and 2,147,483,647. Now at first one might not see anything wrong with that and I would agree but the problem lies in that there is no unsigned integers. Now this honestly isn't all that big a deal but it unsigned types have their uses. Say we know for a fact that we will never have to deal with negative numbers in our program but we need to store values between 2,147,483,647 up to 4,294,967,295(2^32 - 1). In Java to represent that number we have to instead resort to an 8 byte variable (long) to represent that number instead of the 4 byte variable (int). It is pretty wasteful to allocated twice the amount memory for a number that can be represented with 4 bytes. Of course computers these days have plenty of memory and that is why I said its not such a big deal but it is still a fault and I personally don't know why it was a design choice in Java. Now onto the real problem, lets say we have an integer. Remember that an integer has the specific range -2,147,483,648 to 2,147,483,647 so anything outside that range an integer type simply cannot store it. Lets look at this line of code:

As you can probably figure the answer is -4,000,000,000 but that is outside the range of integer. So one would expect to get an error but instead Java gives the answer 294,967,296. This is very bad in that if you are doing math related things you would get an answer and might unknowingly use the wrong answer thinking it was correct. This also gives us a difficult time when debugging in cases where the input value is unknown and this unwanted behavior is only present in certain inputs. Your program would compile fine, and maybe run fine most of the time but every so often it will give you the wrong answer and nothing to indicate that it did so. After a few tests I figured out the behavior of what Java does in these situations. If the number is outside the range of integer Java will wrap around to the other end of the range and continue the arithmetic from there. I figured this out by taking integer's lowest possible value (-2,147,483,648) and subtracting 1 from it and see where it lands. It lands on the integer's maximum value (2,147,483,647). So Java essentially doesn't underflow or overflows. Now I don't know why Java is designed like this but in this situation one would want some indication of when it underflows or overflows so we don't get wrong answers in our calculations. So in this respect this behavior is pretty bad. The same thing happens when we do 2,000,000,000 + 2,000,000,000. We know the answer is 4,000,000,000 but Java cant store that value in an int, it gives us an answer anyways (-294,967,296)..... How can adding two positive numbers result in a negative number... bad Java BAD! Go to your room!!!

    Now this next problem isn't really a problem and it isn't only Java but several other programming languages as well like C#. The problem here is with the modulus operator. With all positive numbers as parameters it behaves as it should but when you add a negative number into the mix things become off. This is because some languages' modulus and clock arithmetic behave differently. And thus in cases like:

Java and C# gives the answer -1 while clock arithmetic and several other programming languages will give you 2. If you think about it 2 makes sense in that you actually have something left over and that it shows that the number is 2 MORE than a multiple of 3 (-10 is 2 more than -12). It doesn't really make sense to have a negative amount left over, but then again depending on your definition here -1 also makes sense in that it shows that -10 is 1 LESS than a multiple of 3 (-10 is 1 less than -9).
So if you are doing modulus keep the scenario of negative numbers in mind and that remainder and modulus are not always the same thing or defined the same.

    So enough with ints what other variables are there.
First we can look at a boolean. A boolean stores one of 2 states. You can think of it as being one bit (although it does NOT take up only 1 bit in memory). Booleans in Java do not store numeric values it seems and only "true" or "false" states. Now lets take a look at a byte. A byte is a representation of 8 bits in memory. A byte is typically the smallest unit of data after a bit that people use. In Java it can store the values between (inclusive) -128 to 127 with a maximum of 256 possible states. As you can see this has the same fault as int mentioned above. Even though technically it can represent values up to 255 (because 0 is a state you cannot store the value 256) starting from 0 it is wasted on negative numbers when negative numbers aren't needed. Furthermore byte data is usually thought of as being a value between 0 and 255 so again I don't know why Java is designed this way.
Next you might say what if we double the number of bits/bytes. Wouldn't that allow me to store a larger number? And yes you would, for a 16 bit/2 byte type it is usually called a short. This stored the inclusive range of -32,768 to 32,767. And then if you double that to 32 bits/4 bytes you get int which we have discussed above. Finally for whole numbers of primitive types we have long. A long is a 64 bit/8 byte type. It can store the inclusive range -9,223,372,036,854,775,808 to 9,223,372,036,854,775,807. Besides whole numbers we can also represent fractional numbers. For this we have the floating point types first of which is float. Float is a 32bit/4 byte variable. It stores the value in a specific format as the sign (whether positive or negative), the exponent (power at which to raise to), and the fractional part (value to use in calculating the original number). Now because these numbers use exponents to represent their values their range can be from a very small to very large but at a price. Precision/Accuracy is forsaken due to rounding errors when trying to store a number that is too specific (large number of decimal places). Float types can store values between 1.40129846432481707 * 10^-45 to 3.40282346638528860 * 10^38 (positive or negative) or about 7 digits. We can increase the precision and accuracy by doing what we did with the whole number types and that is to double the number of bits/bytes. In doubling the size of a typical float we then get a 64 bit/8 byte floating point number the double. Now this can store even larger/smaller numbers ranging between 4.94065645841246544 * 10^-324 to 1.79769313486231570 *10^308 (positive or negative) or about 15 digits. Finally there is the char type which is used to represent individual characters. Char has a maximum of 65536 states and can store the numeric values of between 0 and 65535. To get around the unsigned short problem you can do arithmetic with char types if needed but it becomes unclear for others when reading it even thought a character and a number its typically (because of the type name) thought of as characters.

    Lets move onto operators.
In Java there are a couple operators although I personally wished there were a couple more to make out lives easier but essentially we have the following:

Now there are others but these are basically what people will use mostly.