Tuesday, September 30, 2008

Ruby : The Study in Stock Investing

I started to get interested in Ruby. And while I was starting Ruby, I got this idea that I'll make
a program in Ruby that will automate stock investing! From retrieving stock suggestions from
a website to running calculations on it and feeding it to an investor site, the program would automate the whole process.

While coding away I read a book about stock investing and found out there's more to investing then what I had thought prematurely. I thought trading many times frequently would generate revenue, but it turns out the commission fees and tax alone are not small enough to create a good revenue.

To hell with reality! Why not just code first and then think of some ways to make the code useful!?
Yeah I thought so at first. I was 70% done, and my algorithm was actually working to some level,
but then it felt so futile. I have only around 6 months of CS related stuff to do and I wasn't going to make some purposeless code take my time.

But it's that I wrote so much code, and it works relatively to a certain level. Aside from a logical gap between what the code does and how it doesn't fulfill the goal I intended, the code does, in a higher level, is capable of facilitating data transfer between two sites, and even uses WATIR to mimic human interaction with the site.

I am therefore putting the code openly on this blog, in case anyone would find it useful.

Here is the code, as it is.. it's not been cleaned up. It's in its raw state.

To help understand the code, I've written a brief explanation of the logic behind the code,
written below and also as a .doc file:
==============================================================
Here goes:

Before reading the following, there is a terminology conflict that is bound to create confusion among readers – ‘modules’ are not some data structure, they are just conceptual block of functionality. They are actually implemented as classes.

My project which I couldn't end is a test on how a group in a special interest in an area of subject can produce datasets, which are standard enough that unintentionally aid an activity which couldn't have functioned well but with the use of more-than-average resources.

The setting I got inspired from and tried to place in setting was stock trading.

Technology I used was Ruby with a good amount of use of WATIR - the IE handling library for Ruby.

Programming Design Summary:

I set up 3 modules that cover the process of stock trading,

1. Retriever,
2. Strategist,
3. Investor

I'll briefly describe how they were designed and how they function:



1. Retriever

The retriever is a basic scraper of Internet sites.

Basically, this is a scraper/spider of sorts that will gather data
by scraping it from a suggestion website.

I got the idea from this site:
http://www.ibm.com/developerworks/linux/library/l-spider/index.html

The site I scraped was this site:
http://caps.fool.com/Stats.aspx

The target website was Fools.com, and the page lists the recommended stock picks, categorized by certain types of the recommendation – highest rated, newly picked, “watercooler”, to name a few.

The Ruby – based scraper reads the site, does simple parsing of the site to gather the stocks picks, and forms it into an array for the “Strategist” and the “Investor” to use.

The key data are: stock name and rating, along with reference stock prices at the time of the scraping.

2. Strategist

Inside this module lies the heart of the algorithm. It gets the recommendation stock picks, and decides which are the most important characteristics of the given group of stock picks. Afterwards it gives a frame where it provides a place to put an algorithm to run to actually sort out the stock picks. The generic Strategy provided in the code base uses the “Knapsack” algorithm, with optimal goal of making the most value, with limitations being your capital and the price of stocks, which are represented as weight in the algorithm.

Afterwards, the module figures out which stocks should be queued for investing, and puts it into a portfolio file which is read and updated by the Investor module.

3. Investor:

The Investor module talks with the Internet sites that do investing. The generic module in the code base does simulation investing with http://vse.marketwatch.com, a virtual stock exchange site.

The Investor module reads the portfolio file generated by the Strategy module, which then uses that to run the steps as a human would in inserting the investment data to a stock investing site.

The generic module in the code base employs ‘WATIR’, which is a really nice Ruby library that simulates Internet human interaction by using Ruby to control Internet Explorer. It is also possible to control Firefox with ‘WATIR’ but it hasn’t been used in the code base.

It’s kinda neat to see it work – the Internet Explorer works by itself, inserting stock quotes and pushing buttons to actually enter data.

After it enters data, the Investor checks off the stocks it requested to the portfolio file. The Investor module also checks, as when it is called, to check whether the requested stocks have actually finished their transaction and the stocks bought. The portfolio gets updated promptly as a result.

4. Coordinator

The modules are separate, and they only get woken up when scheduled, to increase resource use. Which means, a sort of a “coordinator” is needed to coordinate the sequence of how the modules are run in sequence, along with the passing of data and housekeeping of necessary files – such as a capital file and a portfolio file.

The module has not yet been made into a class. The basic inner workings of the module has been written in the main file – stocker.rb.

5. Test modules

There are some modules that are used for gathering data, to be used in cases when there is a need to collect stock data(ie recommendations) to compare investment results with hindsight and without, the latter being the actual core functionality of this project.

They are either inherited and modified versions of the listed Retriever, Strategist, and Investor modules. They haven’t been finished, and the usage was being tested in stocker.rb file, seen at the top few lines of the file.


LAST WORDS:

The code will not be touched again. It’s provided AS-IS, and I’m sure among the viewers there is bound to be someone that finds the scripts useful, at least partially, to their own use. I hope the code lines spur a new life that way.

Reasons why I’m not pursuing this further:

I have worked on the code for two years off and on, from concept to near-realization. At the time of formulating the idea I didn’t think too much or know much about stock investing, and the amount of excess expenses and rules that would make my project unrealistic. Thing is I found out about this when I got too deep in and things were actually being implemented and tested. After reading some books about stock investing I decided that another big chunk of time would be needed to refactor my logic to make this work beneficial. At the time of this writing, I have six months to pursue some project of my own before I have no spare time at all, and I decided I wouldn’t give it on this project.

I learned a lot about Ruby through this project, and I am rather sulking at the fact that I won’t be able to look into this code at least in the near future. That’s why I’m posting this online.

Thank you!

Work done in Work so FAR

Before I start my discussion on Ruby and my project,

I'd like to discuss what I have been doing at my work lately.
Finally I have been involved in some pretty heavy coding (webpage that is),
and so far I've coded 4 versions of a game site (teaser, closed beta, open beta, and commercial),
complete with CSS/HTML/JS (specifically, jQuery), almost 95% written by me.
Recently I'm near completion of a facelift version of our main portal site, which give or take
is around 75 web pages. That means a compiled CSS of around 2000 lines and around 300 lines of
Javascript(I should've written more but thanks jQuery!).

I even coded the MySpace page for the game in CSS hack sorta way, to resemble another
game site in MySpace which was designed and made by the actual MySpace team, of which
their work is around 10K a site, because they remove the ad on top. I take pride in that
the commercial, MySpace site of this game is nearly similar in design and layout to this game,
and it was done only by hacking the hell out of a personal MySpace profile!

Now I KNOW when I look at a .PSD file, of how to design the layout in CSS. I just KNOW.
I've even written a C# app to read in game character stats and generate it into HTML (took 3 hours to code it, and used it on 4 characters. So I spent 4 hours to code the data, where otherwise hand-coding would have taken a whole freeggin day).

Here are a list of techniques I employ for webpage coding:

1. CSS

-I use a technique of controlling the deployment of CSS styles on specific sub pages and main page
by listing the parent tags first and making more specific naming further on (details on a separate blog article)

-Targeting IE6 and IE7 CSS hacks specifically by concise hack method of '* html' and '*:first-child+html'

-Extensive use of Firebug inside Firefox to interactively do a WYSIWYG editing of CSS

-Striving to separate content from design. All CSS does is giving the look, and HTML alone just lays out the information clearly and simplies the viewing of it.

-Use of barebone CSS sets. These things are AWESOME! They make default paddings/borders/margins all to non-existent or zero so you don't have to worry about it.

-Standardizing the look of the website by creating what's called a "design theme," or translating it into CSS. For example, a website will have only two type of rounded corner boxes, warning messages are in blocks, always accompany an icon with a yellow text background, etc..

2. HTML

-Care is put into put data appropriately in tags that mostly fit the data. For example, menus or
similar types of a parent type are put into list tags, and written blocks inside paragraphs. That way, CSS can target it better and make it viewable even without CSS.

-NEVER, EVER put design parts into HTML. You don't need to put a
tag to create gaps between a title and paragraph. TABLEs never have background. Heck, images, unless dynamically needed, don't need to have specific image sources in HTML. Make HTML show
almost only content and layout, nothing else!

3. Javascript

- Use jQuery. Even Visual Studio will ship with jQuery, having Intellisense and everything!
It gets the nitty gritty part, especially AJAX and browser compatibility anal pains out of sight for you and sort of makes a setting to create high-level Javascript code. What took 70% overhead of environment setting and 30% functional, actual working algorithm for writing Javascript code is, with jQuery, almost 90% coding working algorithm! It's also sooooo concise. 3 lines of code and you create a content slide down and up. Beat that!

- Use more jQuery! Sure, Prototype, Mootools, Dojo are all significant frameworks to work with, but none can beat the overall simplicity with relatively good response time. I had to do research at work about the 4 frameworks and I didn't know any of them before. After research, I concluded jQuery is THE BEST FRAMEWORK .. for me.

-Bloated Javascript isn't that good. It takes a long time to load, and since Javascript is based half on event-based execution of code, sometimes if you are not careful you will create race conditions where a code will execute faster than the other, unintentionally.




For the past 2 weeks I've stayed up nearly every night to code the 70-plus page main portal.
Sure I think coding pages is a busy-work. With so much content, designers not 100% sure of their "design theme" and constantly changing/editing the .PSD files, the work can be highly
stressful, especially staying up all night and coding.. webpages. However, when everything's done, CSS is all tidied up, and presentation of webpages / UI are as I intended, it gives a satisfaction that trumps the work.



However, coding by itself is such a repetitive action that resembles other menial tasks like
stapling or pasting envelopes. If you fail to strive shortening coding and grow an aversion towards repetition, I don't think growth will happen.