Wednesday, 10 August 2011

The Basic Data Structures of Computer Science.

Friends, I am back with another mind blowing topic of the "The Basic Data Structures of Computer Science".


A data structure is "something" that holds data in the memory of a computer. Sticking data in the memory is quite easy but for example, you store all the names of friends in the computer's memory. What should you do if you wanted to print out an alphabetical listing?
Before we start, here is a brief rundown on the several basic data structures. You should know (to a certain degree) about integers, floats, chars, and bools.

Integers - they store numbers from -32k to 32k
Floats - they can store decimals
Characters - one "symbol" storage, like 'a', '!', and so on.
Bools - they store a "true" or "false" value, represented as 1 and 0's.

What about strings? Well, they're just an array of characters! Hence, they are another data structure!



'h''i''_''1''2''3'
(an example of a string)


a. Stacks - Have you played with Legos? Or as a child, did you try... stacking blocks on top of another? Maybe you have seen people in department stores stack box after box until it reaches up all the way to the ceiling.

Or, the most conventional explanation of stacks: At some restaurants like an all-you-eat buffet, you probably notice that closeby (near all that food) there is a massive stack of plates. You reach out, grab one and instantly the stack pops up!

Now... what happens if you happen to grab the last plate and someone behind you tries to grab one? First of all, there is no plate and he might go hungry which is a "bad" thing. If you are planning to use a stack, make sure you don't grab something that doesn't exist. Your program might crash or funny errors might occur.

What a stack looks like:

apple
oranges
bananas
grapes



Here is some vocabulary:
Pushing - adding something to the stack
Popping - taking something off the stack
Stack overflow - when you add too much stuff onto the stack (like stuffing the stack with so many plates the whole thing... topples over and causes what is known as a "crash")
FIFO - first in, first out. When you add something onto the stack, it is on the top so if you wish to pop something off it, the "something" you just added will be the first one to come off.

Note that in a stack you CANNOT access anything in it except for the top in order to keep with the true definition of a stack.

How do you implement a stack? You could use an array and once you add something onto the stack, you resize it (adding one more cell to the array) and stick it into index 0, or the front of an array. There are other ways to implement stacks and they will be covered!

What are some of the uses for a stack?


Back then, they had calculators based on "RPN" or reverse polish notation. How do you use these calculators? It seems weird compared to the calculators most of us use in school these days. If you wanted to calculate 5 x 7, on a RPN calculator, you would type in 5, hit the enter key, 7, enter, then you would hit the x button. The RPN calculator is based on a stack so the 5 first goes in, then the 7. When the x button is pressed, it pops the 7, then pops the 5 and multiplies them together to show 35! The 35 is then pushed onto the stack for later use. What are the advantages? Let's say you had to solve this: 6 + (5 x (3 + 2 (5 + 2) ) ) = ???? Pretty complicated and tricky to type in on a regular "modern" calculator. How to do this? Type in 6, enter, 5, enter, 3, enter, 2, enter, 5, enter, 2, enter, then a plus. This will give you 5 +2 = 7. Then, 7 is pushed onto the stack while 5 and 2 are popped off. You would then hit x. This pops off 2 and the 7 and pushes back on a 14. And you keep going :) If you notice, this is a lot simpler so here's a programming exercise! Try to program a rpn calculator!

b. Queues - Don't you just HATE standing in line? In fact, some people probably die of boredom just from standing in line...

When people line up in front of something, let's say a ticket booth, only the first person in line gets to buy a ticket. If someone decides to go buy a ticket, he will travel to the end of line and wait until the queue (or line) in front of him is gone.
Here's some vocabulary:
Enqueue - something goes to the end of the line
Dequeue - the first thing in front of the line leaves, and the lines moves up
LILO - last in last out. In a line, the last person to get in is the last one to leave :P That's life!

So how do you implement this? The easiest way (to understand) is to use an array.

A picture of a queue: 



c. Linked Lists
Have you ever built model trains as a child? Connecting each one at the end or at the front... then turning on the switch and watch the train go? A linked list is similar to a train. It can be short (consisting of the locomotive) or it can go all the way hauling tons of materal.

In the world of computers, a linked list consists of "nodes" and each node points to the next one in line. The front of a list is called the head and it starts off the list. The last one in line points to a nonexistent node or a "null" node. This indicates the end of a list.  











How are lists useful? Why not use an array? Linked lists save memory and they are called "dynamic storage" since the size of the list reflects the amount of data. Arrays are "static" since they do not change. There are certain tradeoffs between the two and more about speed and efficiency will be covered in section 5.

There are several different styles of linked lists. With most of these styles, there is another structure that holds 2 pointers. One is to the front of the list and the other is to the end. One is a circular linked list where the last node of the list points to the first node, creating a "chain." You would need that structure to distinguish between the first and last nodes. Another is having each node contain a left or right pointer. There are many possibilities!

d. Hashes
If you ever visit a grocery store (webvan doesn't exist anymore!) and examine some goods, you would probably notice a barcode on every product. What's a barcode? It is an unique id # for that certain item. Or, if you play computer games and you notice some shareware games need a "special key or code" to register and just randomly typing in information doesn't work, it probably works off a hashing scheme.

So what's hashing? Hashing (think of it as taking something and chopping it into pieces) allows the breaking down of data into a much simpler form for storage. Let's say there are at least 10 billion individual products on sale but your store only sells food. Just food. You want to minimize the amount of data storage and you ONLY want barcodes for food products. You don't want to have 10 billion numbers in storage.




This is where hashing comes in. If grapes are item 12421414, you run it through a magic hashing function and it returns "5." Not bad? This can simplify a lot of things since you would only need a storage space large enough to hold up to a 5th item. Now... here's the bad part. What if bananas (20002002) converts into 5? What should you do? Try tweaking your "magic hashing function"? You could and this is called a "collision." This is where linked lists come in. In each data cell of your storage, if there is a collision, the node in that cell will "link" to the node that "collides" with it.

Here is what I would do:
I would create a structure called "food" and have several items in it. The barcode, the name, and price and null pointer.
 
      struct food{

      int barcode_hash;
      string name;
      double price; 

      food *next;
      }; // don't forget this! 
      


This is so if I have a collision, I will still be able to identify the product. I would place all of this in an array from code #0 all the way down to x, where x is the largest value I recieve from the hashing function out of all my products. 

The array would contain structures that could contain null pointers or possibly point to another colliding item. Now, if my scanner scans in 20002002 (bananas) and returns a 5, it goes to location 5 of my array and looks. It sees grapes. The barcode and name for grapes isn't the same as bananas so it moves onto the next node. Tada! It found a match! 
What else could you do with hashes?


On some programs they require you to type in your name, address, name of your pet and you HAVE to type it in correctly. Why? There is a hidden hashing sequence that converts all this data into some "X56zwa31" code you have to type in. How could it work? It could add up your address #, take your name and grab the first letter, then add it onto some weird sequence of letters. It's whatever the programmer decided to have. :) 
Is it possible to crack a hashing sequence? Yes... providing you have enough data but it's pointless. Just register and it will save you the trouble of being cheap. Why? Most hashing sequences are VERY hard to crack. 

Here's an example:
Name: Jonathan
Address: 12345 Anywhere St.
Age: 16
Hashing code: 2TBW1N (yes, it's possible to generate something like this but how about...) 

Name: Vincent 
Address: 34211 Something Lane
Age: 14
Hashing Code: ??? (even if you figure out how to create a code for my data, is it possible to use the same one on Vincent and get the right one?)

e. Trees (a long tall vegetable....) What are trees? In the digital world, they consist of nodes linked to each other... but here's the catch. They are NOT linked together like a linked list. Each node in a tree contains a left and right pointer. These two pointers could point to another node or NULL. 

Binary Trees (what they are commonly called) can be used to store and sort data efficiently. Most of this will be covered in later chapters (speed and the like) but here is why binary trees can be very fast. 

Let's take a binary SEARCH tree which contains sorted values. Everything to the left side of a node is less than the value inside that node and everything to the right of the node is bigger. Now, you might ask this question. What if everything is bigger or lesser than the first node? Wouldn't I get a linked list? You are right and this is a worse case scenario.

In most scenarioes, the binary tree will have some values scattered around, making it more "balanced," or the two sides of the tree (starting from the top node called the head) are roughly the same. This is where BST (binary search trees) come in.

Here is a example:  
























The code: 
 
      struct node{

      int value;   // this could be a character, string, whatever you want to 
				   // compare
      node *next;
      }; // don't forget this! 
      

This is a "roughly balanced" tree as opposed to this one which is a linked list: 3 5 8 9 10 14 15 17
Now, here comes the fun part! Taking the binary tree (not the list), if I wanted to search for the value 5, I start at the head, 10. 5 is smaller than 10 so I go to the left. The next node is 8. 5 is less than 8 so I move on to the left node, which is 3. 5 is bigger than 3 so I move to the right. What do I find? 5! A matching node! 

I'm giving a perfect "rigged" example but this is why you might want to use a binary tree instead of a linked list. What if you wanted to search for 3? Well, it would take you a little longer with the binary tree and the linked list has it as its first node (the head). However, GENERALLY, a binary tree gives you better odds at finding something quicker.


No comments:

Post a Comment