Open Source Adventure: 2014

Tuesday, October 28, 2014

2 years of UW use and finally cracked out how to use ANKI with UW

i don't have a lot of time so I'll write this in brief.
This method will make you use the Notes capability in UW,
import the Notes you made into Anki,
and review the heck out of it.

Yes, this is no traditional flashcard, as the cards that will be made by import will only have the front side. But repetition of reading is still studying. You can use you usual Anki settings to study the cards.

Programming tools:
- you need to install python. Google search on how to do that.
- install csvkit: http://csvkit.readthedocs.org/en/0.9.0/

copy this following to a file called 'list2csv.py':

import csv,csvkit, re

# 4-line program data: name, city/ST,'Family Medicine', ACGMEID, programID
# cSV format: ACGMEID,name, city,ST, programID
csvout = open('notes.csv','wb')
writer = csvkit.CSVKitWriter(csvout, encoding='utf-8', quoting=csv.QUOTE_NONE, escapechar="\\")

f = open('notes.txt')
cnt = 0
csvARR = []
for line in f:
    line = line.decode('utf-8-sig')
    i = cnt%4
    print str(i)+" : "+line.strip()
    csvARR.append(line.strip())

    if i == 3:
        try:
            csvline= '$'.join(csvARR)
            writer.writerow(csvline.split(','))
        except UnicodeDecodeError:
            pass
        csvARR = []
    cnt = cnt + 1

1. Now, highlight/copy all the notes you made. Save it as 'notes.txt'
- be careful: you can see that '$' is what I chose to split the data, because I never used '$' in any of my notes. Just search for '$' in the 'notes.txt' and see if you typed it ever. If you did, just choose any other character you never used (eg. '#', '@')
2. open up the command line to the directory that has both 'notes.txt' and 'list2csv.py'
3. type: 'python list2csv.py', which will make the file "notes.csv"
4. Open up Anki, create a new Deck.
5. File->Import and select that 'notes.csv'
6. Settings:
- Type: Basic (I actually made another note type so it only has a front side but this also works)
- 'Field is separated by' is $
- You'll have 4 fields: Question ID, Main Division, Subdivision, Note. For Field 1 and 2, Change to 'discard field'.
- Field 3 - you can just choose it as the 'Tag', so it's easy to sort out. (Hey, free tagging information!)
- Field 4 - Change to 'Map to Front'
7. Click import, but double check the Deck is the Deck you want it imported to!!

Voila. I should've done this, instead of making amazing fill-in-the-blanks notes with UW data. Plus this is a vanilla legal way of using UW.

Sigh.

Saturday, June 21, 2014

Another breakthrough in gathering pertinent data

While searching for my next phase career, there is a useful database given to us by the motherload of the motherload database of careers post-medical school.

Their data is good, but to strategize the maximum cost-benefit ratio of investment on my application, I needed to cross-reference that data with city population/income of the respective cities.

Population states if the cities are rural or urban,
Income states if the cities are affluent.

Naturally, rural, low-income cities would have a higher cost-benefit ratio (at this point the reader can assume my lackluster potential application edge)

I already had a CSV file and the matter of the fact was to scrub the address and get a string (city, state initials) ready to put it into some census database. That took a little elbow grease as my regex knowledge is not so good and I always have to consistently remind myself of it. I succeeded, and thus 3 lines of address was able to turn into just a city,ST string.

Good. On to next phase.
The census database. My instinctive jerk was to go and use the US Census data they freely give away. Alas, it's freaking complex, it only has population data, and the cities are so minutely divided that it started giving headaches at the end.

Scrubbing the CSV file I got from US Census, I was able to get through some of the data but due to the minute discrepencies between city names, I ended up manually entering/fixing the data. Mind you, at the point I already gave up on getting income.

However, while manually entering in the data I realized that low-population doesn't really mean low competition. Low-population could also mean higher affluent, gentrified populations as well. Income data was becoming crucial.

I was splitting hairs, and I knew I needed a new breakthrough. I tried searching for Python wrappers of US Census APIs, but that was shooting at the darkness - I knew it was going to be a time sinkhole.

Then I tried looking at websites that have these information. I came across a pretty good one. Simple enough to be easily scraped, pertinent up-to-date information. Good. All I gotta do is a matter of finding a good way to auto-submit my city-ST string and scrape the data returned.

I tried the requests module, but making the headers and cookies set in such a way to fool the server to act as if a normal user was using the website was and still always is above my head. Mind you, this was already 12 hours into making this thing work.

Then something came up in the searches. Mechanize. My old Python friend I used when I scraped webpages for stupid things in college. I followed the tutorial, and voila! It submitted my query, and spit out a workable webpage HTML with the data I needed.

From there I was wondering if I should go and do BeautifulSoup on the data, or just do a raw Regex match on the strings that have the population/ income data. Luckily, the latter option worked.

Now the cross-referencing is set. At the moment I'm running the motherload CSV data against the census, writing a new CSV with the new info appended as two new columns at the end. Mmmm. I even went so far as to include do-at-least-two-more-tries of attempt at submitting, well since the case came about of where connection was reset by peer.

What I'm seeing is that quite a few of Californian cities have no population/income data. Weird.. They aren't no-name cities either.

Sunday, March 30, 2014

MCAT studying tool I made in 2009 is still going strong

While prepping for MCAT in 2009, I came across SN2ed's awesome MCAT study schedule in SDN. He broke it down to daily to-dos, packed all of it into a 96 day system.

Thing is, it was a huge hassle to put all of that by hand into Google Calendar, and since I was toying heavily with jQuery at the time (RIGHT after I quit my job as a web programmer at a game company), so I decided to see if I can make a CSV exporter of the calendar. After some time I made it, and uploaded it to a free web hosting, and posted it on SDN.

Just because, I also linked in a traffic analyzer.

Well, it's still going strong. If you Google 'SN2ed calendar', my site comes in at the top 10 links.

SDN since have taken in SN2ed's schedule and maintains a Google Calendar, but still doesn't have that option to create one that is targeted to one's test date.

At average, it draws in 21 people. So far, 14,400 people have been to the site. Every now and then, I get messages from SDN asking for advice regarding usage.

Well, I never wrote anything about it in this blog, but figured I gotta start collecting all of my projects in one place.

How a computer engineering major studies in med school

Slight rambling here:

I was too busy with medical school to maintain this fine blog of mine.
Well, it's also that my questioning of who I really am - the dichotomy that is my identity, makes me hesitant in updating technological blogs when I don't even have a good medical-related blog. Well I do have one but it's not public. Not even Google-searchable, only accessible if I give the URL.

Anyways. What I wanted to post here, are ways that I use my background knowledge of computers and programming to help me save some seconds doing the "dailys" in a medical student.

Let me see what I can spill here...

First my background layer: I use Xubuntu and Linux Mint XFCE as my operating systems. This means I don't need to deal with Windows anymore. I can be free of viruses, malware, shaming myself of illegally obtaining said Operating System, and join the open-source revolution. Not a dime to megacorps software, and donations to open-source software developers that I truly admire and used. For instance, I donated $10 to the makers of Synergy (tool of which I will discuss later).

My gear: I have 3 main devices that I use to study = two laptops and a Nexus 7 tablet.

My trusty Acer Aspire 5610 runs XFCE Linux Mint, and it's around 7 years old. This came with that awful bitter taste called Windows Vista, and since taking the plunge into Linux 6 years ago, this laptop stands to go strong for more years to come.

Another laptop was donated to me by a student from my med school. She spilled something on it, and the motherboard, upon taking it apart, was burnt. Harddrive was also fried. It was a Thinkpad - and I love Thinkpads' chassis and keyboard so much that I decided to buy a new motherboard for it. This one guy had an upgrade version of the motherboard with a kick-ass graphics chip on it, so I splurged an extra $100 to buy it.

Onwards to how I use it now, after many many experiments of a good study environment.

In a nutshell, my mouse and keyboard that are hooked on to the Acer laptop can slide over to my Thinkpad laptop, with the help of Synergy. This means I have a dual screen setup. Generally, the greater surface area you have, the more productive you are. Not only that, I have two laptops simulating it, so I have two entirely capable hardware running programs, which kinda works like having a very capable single laptop (but expensive).

Not only that, my Thinkpad can access my Acer's harddrives and folders by way of the open-source software called NFS.

Ah, but I haven't discussed how is it I run Thinkpad, when its harddrive was fried. I run the entire operating system (Xubuntu) from USB. I didn't do the LiveCD to USB method, but rather used the Xubuntu USB LiveCD image to do an actuall install to the 8GB USB drive, so it acts much like a harddisk installation. Of course, I had to do some minor adjustments to the fstab to get the EXT4 FS to not write so much onto the RAM.

Where does my Nexus 7 fit in? Well, it fits perfectly in my lab coat pocket, so I pack it with textbooks to read. But the biggest winner software (or 'app' since we're talking Android.. ) is Anki.

Anki is a spaced-memorization software, much like what Firecracker is based on. What I don't like about Firecracker, due to largely by my insolence, is that I really feel like I need to write the questions myself, after reading from a textbook. Maybe it's because I'm old fashioned. I don't like being asked to conform to a system, at least when it comes to software.

Anki solved this, but it took me an year to get used to using it. I still haven't cracked how to actually finish the questions I made - I made close to 10,000 this year (M3) and I still have 1000Qs in USMLEWorld and 2 rotations to go. I stopped using it during first year due to this very reason, but theoretically you get to read actively, which solves the biggest problem in reading textbooks - just mindlessly moving my eyes. At least I get forced to give an answer and get them right. It's just I can't do more than 200 per day, but then that's something I have to overcome.

Anyway, Anki started as an open-source software for Windows, Mac, and Linux. It also has iPhone, Android apps as well. It even has free online syncing capabilities, so that whatever questions I make in my laptop, I can sync it to my Nexus 7, and just do the questions while I'm on the many wait-modes on the ward.

Another awesome open-source Linux software is an automator, which automates mouse clicks and keyboard entries, but much more. It's kind of like AutoHotkey in the Windows world. The name of it is called xdotool.

You can do a lot with it, but what I made last year saves me that annoying redundant step of entering my login info, clicking, and more clicking to get to the actual screen of a certain question bank.

On to another thing that I came across, and due to it being near the edge of a EULA of a certain question bank I will be somewhat cryptic here. Let's say you have a Java-based solution of a study material that locks your laptop keys so it only works in its Java software, perhaps due to potential copyright infringement. Rightfully so, but also is a pain in the ass if you want to write notes.

I mean, to take notes you either have to write them on paper, or hope on a dual screen monitor, maybe the lock doesn't work. What about if you want to take a screenshot of a diagram so you can keep it in your One Note or Evernote?

Well, with Synergy, the locking effect doesn't work. Meaning, if I use that software in my Acer laptop, I can take notes/write Anki questions on my Thinkpad laptop. Cool.

But then, why do I have to type the stuff that's already on screen? Why can't I just bypass the copy/paste lock of this Java software? The only way would be, that I came up of, is an OCR text-recognition from a screenshot I would take of the text inside the Java software. By the way, taking screenshot is always possible in Linux when you run that Java software.

So, I searched if there are ANY free open-source solutions that do OCR. I came across one : tesserect-ocr. And then upon tinkering with it, along with some formatting I did with the help of sed, I got my solution. I would take a screenshot of the text, save it to an image file of a set name, and on my Thinkpad laptop, run my script to run tesseract-ocr, save it to a text file, and open it up on a text editor. I would then copy that text to Anki, trim/edit to make my questions.

It wasn't quick enough. So, I bound the script to a key-combination of my keyboard, which just comes with XFCE-based distributions(under 'Keyboard' settings).

Thus, now I can save a screenshot, wave my mouse across to my other laptop, use my key combination to run the image-to-text conversion.

Now, I realized I'd freak out if any of my two laptops would somehow get busted. So, I finally pushed myself to set up a backup solution, using rsync and cron. My crucial setting/scripts would be stored to my external 1TB harddrive everyday at dawn. Since my Thinkpad entirely runs from my 8GB USB drive, that one's backup involved doing the same thing but just dd-dumping the entire 8GB image to my external harddrive. It works, and hopefully this will save me if such happens that need saving.

Results of all of these efficiency solutions?

I'd say It's the same as not using any of these solutions, because I've found out technical solutions can be a dangerous gamble and distraction. Time invested in creating solutions often go overboard, as it is my hobby, enough to bleed into studying time. Making is one thing, using it consistently is another. Anki card-making takes a long time still, and I have yet to come to a fruitful result using this solution.

Anki is still a lifestyle to master, as I found it very boring to having to click through the questions I made. However, that part is just the life of a medical student - always at the crossroads of denial or buckling down for studying.

Whew. Back to studying and dealing with M4 prep.

Open Source Adventure