Tag Archives: glacebay

What I learned writing web scrapers last week


I started writing web scrapers last week. If you don’t know, web scraper code can read web pages on the Internet and pull information from them.

I have to thank the Ontario Minister of Health for prompting me to do this. The Minister used to share COVID-19 information on twitter, but then chose recently to no longer do that. You can come to your own conclusions as to why she stopped. As for me, I was irritated by the move. Enough so that I decided to get the information and publish it myself.

Fortunately I had two things to start with. One, this great book: Automate the Boring Stuff with Python. There is a chapter in there on how to scrape web pages using Python and something called Beautiful Soup. Two, I had the minister’s own web site: https://covid-19.ontario.ca/. It had the data I wanted right there! I wrote a little program called covid.py to scrape the data from the page and put it all on one line of output which I share on twitter every day.

Emboldened by my success, I decided to write more code like this. The challenge is finding a web page where the data is clearly marked by some standard HTML. For example, the COVID data I wanted is associated with paragraph HTML tag and it has a class label of  covid-data-block__title and covid-data-block__data. Easy.

My next bit of code was obit.py: this program scrapes the SaltWire web site (Cape Breton Post) for obituaries listed there, and writes it out into HTML. Hey, it’s weird, but again the web pages are easy to scrape. And  it’s an easy way to read my hometown’s obits to see if any of my family or friends have died. Like the Covid data, the obit’s were associated with some html, this time it was a div statement of class sw-obit-list__item. Bingo, I had my ID to get the data.

My last bit of code was somewhat different. The web page I was scraping was on the web but instead of HTML it was a CSV file. In this case I wrote a program called icu.sh to get the latest ICU information on the province of Ontario. (I am concerned Covid is going to come roaring back and the ICUs will fill up again.) ICU.sh runs a curl command and in conjunction with the tail command gets the latest ICU data from an online CSV file. ICU.sh then calls a python program to parse that CSV data and get the ICU information I want.

I learned several lessons from writing this code. First, when it comes to scraping HTML, it’s necessary that the page is well formed and consistent. In the past I tried scraping complex web pages that were not and I failed. With the COVID data and the obituary data,  those pages were that way and I succeeded. Second, not all scraping is going to be from HTML pages: sometimes there will be CSV or other files. Be prepared to deal with the format you are given. Third, once you have the data, decide how you want to publish / present it. For the COVID and ICU data, I present them in a simple manner on twitter. Just the facts, but facts I want to share. For the obit data, that is just fun and for myself. For that, I spit it into a temporary HTML file and open it in a browser to review.

If you want to see the code I wrote, you can go to my repo in Github. Feel free to fork the code and make something of your own. If you want to see some data you might want to play with, Toronto has an open data site, here. Good luck!

 

On the things parents tell their kids and the things kids remember

Vihos Sweets

This is a picture of a street in downtown Glace Bay. Next to the Dominion is a small place called Vihos Sweets. It didn’t exist when I was growing up, but it did when my mom was a teen. She worked there for a time, and she occasionally talked about it.

Though she didn’t talk about it a lot, it stuck in my mind and I often thought about it. I don’t know why. Maybe I liked the sound of it. Maybe the way she described it made it seem special. Perhaps I was trying to imagine having my own job someday. I am not sure.

I wonder of the many things I’ve told my kids what they remember. You hope that the big lessons you try and impart to your kids are the things that stick. But often times it is the little things. Things like the name of a place you worked at for a short time when you were younger.

Try and be comfortable with the notion that  you have less control than you think.  You can only live and speak as best as you can, and hope that is enough to send them in the right direction. They may recall the important things you passed on. They may recall something you said in passing. They are their own person, and they will absorb and recall what they need.

(Image via http://capermemories.blogspot.com/)

 

Four good pieces on my hometown, Glace Bay

Anyone with an interest in Glace Bay will find these worth reading:

  1. A COAL TOWN FIGHTS FOR ITS LIFE | Maclean’s | MARCH 15 1954: this was fascinating. A story from Maclean’s Magazine in the 1950s that documented Glace Bay at the crossroads. So much in this piece explains my home town and the people who lived there.
  2. Glace Bay hockey rink’s new name closer to its roots | CBC.ca: a mainstay of Glace Bay is the hockey rink. When I was a kid I lived about 100 meters from it. I spent most of my early days (until grade 10) going to it. So many memories back then revolved around that building.
  3. KEN MACDONALD: Remembering the miners | Local-Lifestyles | Lifestyles | Cape Breton Post: a good piece from the local paper on the mines of Glace Bay and the miners who lived and sometimes died in them.
  4. Miners’ houses: Lawren Harris in Glace Bay – Nova Scotia Advocate: finally this piece on Glace Bay with a focus on a famous painting of Glace Bay by Lawren Harris (shown above). It used to be in the AGO and I often paused to reflect on it, and my hometown. Just like I am doing now.