In the last month I’ve taken on the position of VP of IT in Cornell BEARS, a mentoring organization. My job is to update and maintain the website at cornellbears.org, and to handle any coding needs.
So far I’ve spent about 14 hours on a parsing script for Cornell Qualtrics data. I used BeautifulSoup, an excellent HTML/XML parsing module that made my life so much better.
This is the breakdown of my 14 hours:
- 30 minutes: clicking through the survey; skimming the code the previous IT person did
- 30 minutes: exploring the different export options (decided to go with XML, the HTML was an ugly table); Googling different XML parsers, reading about them
- 30 minutes: Decided on BeautifulSoup based on stackoverflow and friend recommendation, started installing; Reading about lxml, trying to install binary files or something
- 15 minutes: Realized I didn’t have to compile from source, used an .exe to install lxml instead
- 30 minutes: Read BeautifulSoup documentation
- 2 hours: tried to follow previous IT person’s logic, ended up scrapping that
- 15 minutes: Regex refresher
- 1 hour: debugging code in response to invalid data entries (for some reason Qualtrics still saves invalid data, such as blank data)
- 15 minutes: Reading about alternatives to Python’s lack of switch statements, then decided to be simple and used elifs
- 30 minutes: Reorganizing code, making common code into functions
- 30 minutes: spelling mistakes
- 2 hours: actual coding
- 30 minutes: the mystery of the missing commas; Optimizing code aesthetics
- 30 minutes: Using the DOM inspector and finding out that the questions weren’t always labeled in order behind the scenes (tragic moment)
I realize that didn’t add up to 14 hours, but I did take some breaks to update my website and the CCA website, so this is more or less accurate.
I really like Beautiful Soup. I wonder if I can use it with PHP later on to generate blog script. Or instead of PHP, I can just use purely Python. I would like that.