Pulling data from the LEGO product website

July 26th, 2009

As I mentioned in a tweet, I worked with my son in realtime to pull data from the LEGO website. He wanted to see all the pieces counts for products. So we built this ruby script together to pull the data and parse it with Hpricot. Took about 10 minutes.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
<code>
require 'rubygems'
require 'open-uri'
require 'hpricot'
 
# URL below is for 7-9 year-old full listing of products
LEGOURL = "http://shop.lego.com/ByAge/Leaf.aspx?cn=100005&d=100001&va=1"
# Might need these to change country and language
#/xt/ChangeCountry.aspx?ShipTo=US&return=' + returnPage;
#/xt/ChangeLanguage.aspx?LangId=2057&return=' + returnPage;
 
# Basic structure of lego product listing
# div.class = ThumbText -> ul -> li -> p.class=underline
# div.class = ThumbText -> ul -> li -> p.class=itemsText2
# 
 
doc = Hpricot(open(LEGOURL))
i = 1
doc.search("div.ThumbText").each do |description_box|
  description_box.search("p.underline").each do |title|
    item_name = title.to_plain_text.delete("\n")
    pieces_section = description_box.search("p.itemsText2")
    if pieces_section && !pieces_section.empty?
      pieces_section.each do |pieces|
        if pieces.to_plain_text =~ /Pieces/
          puts "#{i}\t#{item_name}\t" + pieces.to_plain_text.delete("\n").delete("Pieces:")
        else
          puts "#{i}\t#{item_name}\t0"
        end
      end
    else
      puts "#{i}\t#{item_name}\t0"
    end
    i += 1
  end
end
</code>

Test Post From TextMate

October 19th, 2006

Let’s see how this shows up. Nothing exciting as there hasn’t been any real content around here in almost a year. I’m thinking about returning to the blog – but not sure…

Great Lines From Children’s Books

January 22nd, 2006

We now own over 300 children’s books. I’m sure that sounds like a lot (and it is), but we read between 10 and 15 books a day to our kids. 300 books is only about a month’s rotation worth of books. Lately I’ve realized there are some great lines (both funny and poignant) in these books. Here’s a small collection of what I like.

"The sky was warmin’ up for the comin’ of the day."
Barn Dance by Bill Martin Jr. and John Archambault, illustrated by Ted Rand

"But red paint puts singing in my head"
Red is Best by Kathy Stinson, illustrated by Robin Baird Lewis

"There was kissing on TV and I hate kissing."
Alexander and the Terrible, Horrible, No Good, Very Bad Day by Judith Viorst, illustrated by Ray Cruz

"Then comes a Mixed-Up data and WHAM! I don’t know who or what I am!"
My Many Colored Days by Dr. Seuss, paintings by Steve Johnson and Lou Fancher

"Prententious P, Quaking Q, Rhyming R, and Sibilant S piled into the room below Towering T, stepping over little Unimportant U."
The Alphazeds by Shirley Glaser and Milton Glaser

"Then Leonardo Made A Very Big Decision"
Leonardo the Terrible Monster by Mo Willems

"It’s music that we all adore. It’s what we go to concerts for."
Zin! Zin! Zin! a Violin by Lloyd Moss, illustrated by Marjorie Priceman

"And the moon sailed along with him."
Harold and the Purple Crayon by Crocket Johnson

"’That’s right,’ she said. ‘Who needs donuts when you’ve got love?’"
Who Needs Donuts? by Mark Alan Stamaty

"’And now,’ cried Max, ‘let the wild rumpus start!’"
Where the Wild Things Are by Maurice Sendak

"The wide world comforts her."
When Sophie Gets Angry – Really, Really Angry… by Molly Bang

"Fariy dust is very useful. I use it to turn oatmeal into cake."
Alice the Fairy by David Shannon

"I love you anyway too."
Olivia by Ian Falconer

"Wait…that smell…could it be…? Pancakes!"
Hey, Pancakes! by Tamson Weston, illustrated by Stephen Gammell

"’I wouldn’t mind if we were in a story,’ said Ann. ‘Because people in stories don’t go around all day looking for an old shell. Interesting things happen.’"
Magic Beach by Crocket Johnson

Gorilla myths about humans

December 14th, 2005

Along the same lines as this story.

These myths below have been circulating in the gorilla community for years. It’s time to debunk them.

Myth: Humans are bloodthirsty and violent.

Fact: Humans are more interested in money and commercial success. While not inherently bloodthirsty or violent, humans will use those tactics to achieve money and success.

Myth: Male humans are aggressive and dangerous.

Fact: Male humans alone are not aggressive or dangerous. But groups of male humans are are very dangerous. It’s advised that you stay far away from any group of 3 or more male humans.

Myth: Male humans yell at sports on TV in anger.

Fact: This type of interaction with the TV is used to intimidate rivals, to play around, or to impress potential mates.

Myth: Humans in captivity are brought from Africa.

Fact: It has been illegal to capture humans and bring them into captivity for more than 140 years.

Journalism != Literature

December 7th, 2005

(A friend suggested that I should go back to writing things here after I suggested shutting down the blog. Luckily the following came to my doorstep on Sunday.)

I suppose this is all due to a New Yorker who’s first name was Truman. But you decide if this journalism or great literature:

Outside Jaipur, young men virtually bonded into labor hack with primitive tools at old tires. They work in an archaic assembly line beside the highway, chopping the tires into pieces and loading them onto trucks so they can be burned as toxic fuel at a brick kiln. The tent camp they call home splays out in dirty disarray behind them. A brutish overseer verbally whips them to work faster.

“Please take me out of here,” Rafiq Ahmed, 21, whispered as he bent in the darkness to lift another load. “My back hurts.”

On the revamped road next to him, the darkness has been banished by electric lights overhead. Auto-borne commuters race along six silky lanes toward the Golden Heritage Apartments, the Vishal Mini-Mart, the Bajaj Showroom featuring the New Pulsar 2005 with Alloy Wheels, all the while burning rubber that will eventually fall to the young men, hidden by night, obscured by speed, forgotten by progress, to dispose.

Or how about this paragraph.

To drive it is to gain momentum, to not want to stop, and not have to. Drivers no longer pass through towns, but by them, or where the highway soars into the air, over them. The rural landscape, formerly painted in pointillist detail, becomes a blur, an abstraction – a vanishing trick that may portend things to come.

While all this prose is beautiful if we were reading Hemingway or Zadie Smith, but this is the New York Times – read more if you want.

I’m generally a fan of the NYT, but this writing is overboard. The editors need to remind the journalists they are writing for newspapers, not novels. The sad part is the story is amazing by itself, India has only built 334 miles of new 4-lane roads in the last 50 years (Des Moines to Chicago) and the country – come on, India is more than 1/3 the size of the U.S. And now they are building and upgrading 40,000 miles of roads. This story doesn’t need metaphoric prose to interest the reader – the facts alone are worth reading.

Are churches really more dangerous?

October 30th, 2005

This article is a horrible example of journalism. It’s loaded with conclusions and no facts to back it up.

It starts off with:

Churches have long been considered safe havens from the evil of the outside world. No more.

But no where in the article, does it ever state whether there has been a marked increase in criminal activity around churches. Sure, it states that security has increased at churches, but that’s different than actual crimes. The article states how churches reflect changes in the larger society, yet crime in the larger society has been trending downward for the last 10 years (see U.S. DOJ for data).

The article quotes from both James Cobble of Christian Ministry Resources (also at ChurchLawToday) and two people from GuideOne insurance. Both have a vested interested in hyping the threat to churches. Cobble’s interest is from selling consulting services to churches and GuideOne from selling insurance to churches.

At the minimum churches should be publicly sharing crime reports that happen on their property. That would at least allow journalists to have some facts back up their conclusions.

Newspaper Marketing

October 10th, 2005

Two things converged for me this week about newspaper leadership. First, I ran across a very old article about concerns of newspaper executives circa 1996. Next I read Byron Calame’s Public Editor column in the Sunday New York Times about what the Times’ editors know about their readers. The article about newspaper executive concerns has this:

The second most commonly voiced need is for understanding broad marketing principles and market forces affecting their newspaper or corporation. Half of all publishers and CEOs select this from a list of nine management areas. The top concern, with 55% naming it, involves efficient use of current technological resources.

Publishers don’t know who their audience is and they think marketing can help them figure that out (note, this article is written by a market researcher). But traditional product marketing doesn’t work at newspapers – at least not with respect to subscribers. Subscribers are looking to newspapers to be better and more compelling – and those are things that come from taking risks, not market research. Now contrast this with what the editor of the New York Times Magazine thinks about his readers:

"I imagine my reader is a late-thirties-something woman, a lawyer or educator or businesswoman. She’s busy with work, and also with family matters, but Sunday morning is a time she’ll allow herself to read something that is not work related, or kids’ homework related. She’s got 45 minutes, an hour. She wants to lose herself in a story, one big story – 8,000, 9,000 words. My hunch is she wants to read not something escapist but something substantive – something that holds a mirror up to her own life or opens a window onto a pretty troubled world."

The point is, you’re not going to get happy readers unless you can do what the editor of the NYT Magazine did – paint a picture of who the reader is. Newspapers have gotten so far removed from their readers that they need market research firms to tell them who buys their paper. Some of the success of blogs has been due to their more personal connection with readers – something newspapers just don’t get anymore.

From the “Where are they now files – The Smurfs”

October 10th, 2005

This excerpt from today’s WSJ Evening Wrap is just too funny/sad/odd:

They became a smash hit TV series in the U.S. in the 1980s, but their latest TV incarnation is getting a mixed response. With the blessing of Peyo’s [the Smurfs' creator] family, UNICEF has produced a 25-second advertisement in which the Smurf village is bombed and strafed by warplanes. The video begins with the Smurfs happily doing their thing, but ends with most of the village apparently slaughtered, with Baby Smurf left wounded and screaming amid the carnage. The cartoon is scheduled for wide airing in Belgium this week, but a sneak preview last week reportedly left several real-life babies crying, as well.

Self Conscious Iowans

October 8th, 2005

From reading Juice this week, I saw an ad for this Sunday’s Des Moines Register story on the first ever poll of what Americans think of Iowa. This is not a poll of Iowans – it’s a poll of non-Iowans to ask for their opinion of our state.

Are we the only state that does this? Do we think so lowly of ourselves that we need to ask the rest of the country “Do you like me?” I’ve written about this before.

Nonetheless, it’s a good reason to pick up the paper tomorrow to see what it says.

DMR – Another Redesign

October 2nd, 2005

Looks like DesMoinesRegister.com has gone through another iteration of redesign. I primarily get my news via RSS these days, so I don’t know when they changed the site. I commented on DesMoinesRegister.com in the past so I’ll offer my thoughts again.

For once, I’m impressed by some of what they’ve done. First, the speed with which they did some tweaks to the site, (it took them years to get to the grey-brown background they introduced last spring). And second, they finally lightened up the web pages a little (they were too dark before).

Nonetheless, there are still problem spots.

dmreg site

  1. The whole top navigation is floating in the white-air too much. They need to anchor it to the browser or add some lines so I don’t feel like all these buttons would float into each other with a strong wind.
  2. Using the white space of the background of the page to divide these menus doesn’t work. And the cool effect of sliding menus is overkill. People want to read news, not be impressed by animation tricks.
  3. I’m glad they are putting cookie crumbs on pages; this helps the reader know where they are. But this too needs some lines around it to anchor it on the page.
  4. This list of options needs to be trimmed down and moved elsewhere on the page. Remove “send letter to editor” – I’m not going to write your stupid editor, I’m going to blog about it. Remove “email newsletters” – nobody reads junk email anymore. Remove “subscribe” – I’m reading it free on the web, I’m not going to subscribe

The Smart (Ass) Kids in the Class

October 2nd, 2005

Of course, the Wall Street Journal is considered the newspaper of record for U.S. business. But about once a week they go about picking on the wallets that pay their salaries. I’d guess that most of the WSJ staff is filled with Ivy League graduates who feel they’re a just a bit smarter than their friends who opted for a career in business. Here is the latest proof:

There are thousands of corporate aircraft flying the skies over the U.S. Most companies say these planes are necessary to conveniently and securely transport employees to distant facilities or meetings. Top executives “are really 24-hour-a-day, seven-day-a-week people,” notes Mike Nichols, an official with the National Business Aviation Association, a trade group. “These are really flying offices.”

But a comparison of golf scores and flight records, some of which are available from commercial aviation-data services, shows that companies also use their jets for another purpose: as airborne limousines to fly CEOs and other executives to golf dates or to vacation homes where they have golf-club memberships.

At some companies, hundreds of flights in recent years have involved golf, played either for business, pleasure or both. Among companies whose top executives have flown on corporate jets to golf destinations are Alltel Corp., Motorola Inc., General Dynamics Corp., McKesson Corp., Verizon Communications Inc., SLM Corp. (Sallie Mae), U.S. Steel Corp., Cintas Corp., PNC Financial Services Group Inc. and National City Corp.

I’m sure glad the WSJ has a Saturday edition so they can publish important investigative reports like this one. I least had a good laugh while reading it.

Iowa-Mississippi River-New Orleans

September 9th, 2005

An article in the Register today has people asking about whether to rebuild New Orleans and how we should do it. I know you’re thinking that you’re tired of all this talk of New Orleans in Iowa.

I mean, who cares?

Well the folks at Stratfor do. Read this if you want to understand the importance of New Orleans to the United States.

The ports of South Louisiana and New Orleans, which run north and south of the city, are as important today as at any point during the history of the republic. On its own merit, the Port of South Louisiana is the largest port in the United States by tonnage and the fifth-largest in the world. It exports more than 52 million tons a year, of which more than half are agricultural products — corn, soybeans and so on. A larger proportion of U.S. agriculture flows out of the port. Almost as much cargo, nearly 57 million tons, comes in through the port — including not only crude oil, but chemicals and fertilizers, coal, concrete and so on.

A simple way to think about the New Orleans port complex is that it is where the bulk commodities of agriculture go out to the world and the bulk commodities of industrialism come in. The commodity chain of the global food industry starts here, as does that of American industrialism. If these facilities are gone, more than the price of goods shifts: The very physical structure of the global economy would have to be reshaped. Consider the impact to the U.S. auto industry if steel doesn’t come up the river, or the effect on global food supplies if U.S. corn and soybeans don’t get to the markets.

And if you still don’t get it let me give you an example.

Iowa currently exports about $3.7 billion in crops each year to overseas locations (See USDA Foreign Agricultural Service data for past five years). It’s hard to find data on how much of that goes down the Mississippi, but it looks like only a small percent (around 1%) goes to Canada. So if we assume 99% of that $3.7 billion floats down the Mississippi during the fall, that would be about $1200 per Iowa resident.

And that’s just the export portion. There are raw materials that come in via the Mississippi that are surely used in Iowa manufacturing.

So ask yourself, could you lose $1200 this year? How about every year? Think about those questions when someone asks whether we should rebuild New Orleans.

New Editor at DMR

August 31st, 2005

Well, it looks like the usual corporate ladder rearranging is happening today as the Des Moines Register hires Carolyn Washburn to be Editor.

As I mentioned before, the Register is a management training ground for newspaper editors within Gannett. You may think the Idaho Statesman is a Knight Ridder paper, but up until August of this year it was owned by Gannett. Gannett had owned the Idaho Statesman since 1971 and traded it in a swap with Knight Ridder at the beginning of August 2005. So Washburn is pure Gannett material through-and-through. The Statesman is a slightly smaller paper in terms of circulation (around 140,000 daily, 190,000 Sunday) than the Register. So this might be considered an step up for Washburn. Although, I believe the Rochester, NY paper where she was prior to the Statesman bigger circulation numbers than the Register.

This still leaves open my desire for more Iowans in the top leadership of the news side of the Register. With Carol Hunter as editorial page editor (a native of Kansas, but from all over) and the new editor Carolyn Washburn, we have no strong Iowa connections on at the helm of news (unless you count Dick Doak and Jerry Perkins).

In addition, newspapers need to start looking outside their own ranks if they want to tackle the problems they face (declining circulation, unhappy readers). People who’ve succeeded in the past in newspapers are part of the current problem. Please stop rewarding them with promotions.

Register Blogs

August 27th, 2005

As you know, the Des Moines Register’s teenage son, Juice, has been blogging for about four months now. And while the Juice blogs are continually crappy or just plain boring, we all knew at some point the full Register would have to join in the party. This past week the Register has turned on five initial blogs covering football (for both Hawkeyes and Cyclones), theater, music and books. There’s not much content up there now, we’ll see how the pages fill out. (I’m assuming Nancy Clark won’t be writing for the Des Moines Register Blogs.)

This leads to some interesting questions about blogs and the operation of newspapers.

  • Does blog content get fed back to premium research services like Newsbank and Nexis?
  • Does the library/archives staff at the Register print copies of the blogs for the official archives?
  • Will the Register sell you a reprint of the blog like they will sell you back issues of the paper? (Note: Don’t pay to search the Register archives when you have Google. Just add ’site:desmoinesregister.com’ to your Google search and you’ll pull up links to all the old Register stories without having to pay a dime.)

All these are holes that show up when old media crosses into the new media world. When Gannett starts buying bloggers, then we should be worried.

Marching Onward

August 24th, 2005

You won’t hear much from me in the next couple of weeks. A march to complete lots of software development by Labor Day is driving all my time away from the blog.

The conversation that started this effort went something like this near the end of July.

Boss: “How much development effort do you estimate is left in the project?”

Me: “About 10 person-weeks. With me as the sole developer, that would put us near October for completion.”

Boss: “Sounds good. Let’s get it done by Labor Day.”

As you can see, the expectations don’t match reality. Nonetheless, work expands and contracts to fill the time allotted to it.

Reserve your advertising space now – ‘Blog readers younger, richer’

August 16th, 2005

Computerworld mentions a recent comScore study which shows that:

Blog visitors are 11% more likely than the average Internet user to have incomes of $75,000 or more and are 30% more likely to live in households headed by someone between the ages of 18 and 34, the study found.

During the first quarter, the average blog visitor viewed 77% more Web pages than the average Internet user, and spent 23 hours per week online, compared with 13 hours per week for the average user, according to the study. Regarding e-commerce behavior, blog visitors are 30% more likely to shop online than the average user.

Just a note to all the incoming advertisers wanting to reserve space on my blog. Just email me your request for space and the terms of the contract and I’ll get your ad up and running immediately.

What is a named pipe?

August 15th, 2005

Many people come here searching for what a named pipe is. Since I’ve got some knowledge on this topic (and it’s the blog’s namesake), I thought I’d try to help them out finally. The article below requires some basic Unix knowledge.

To start off, a pipe (symbolized by |) is way to transfer data between processes without using any I/O. It truly acts like a “virtual pipe” for the data coming out of one process to flow into the input of another process. Most shell users have used these with something like:

ls -l | head -5

This will show the top 5 lines from a directory listing. It pipes the directory listing from the ls -l command into the head command – and it avoids using any I/O (with the exception of reading the directory listing).

A named pipe (sometimes called a FIFO – for “first in, first out”), is a Unix pipe that acts like a file. It has a name so it can be referred to by processes (e.g. open, read from, written to). But it’s still a pipe in that the data flows through the named pipe – data written to the pipe is read by processes that are reading from the pipe. The nice thing about named pipes is that multiple processes can read and write to them. And processes don’t start sending data through the pipe until there is something there to consume it.

How to make a named pipe

FreeBSD – mkfifo [-m] pipename
Solaris – mkfifo [-m] pipename
Linux – mknod pipename p or mkfifo [-m] pipename

An example

A great way to use named pipes is to avoid writing intermediate files for compressing while backing up databases. Usually people will dump their database to the filesystem and then compress it from there. But with named pipes you can dump your database to a named pipe and then compress the namedpipe. (I realize the example below is a little trivial since mysql dumps to STDOUT, so a regular pipe would work fine.)


mkfifo dbdump

mysqldump -uroot -ppassword database >dbdump &

cat dbdump | gzip > database.gz

The above creates a named pipe called dbdump. Then starts the mysqldump command to send the database dump to the named pipe. This step is set to the background so you can run the compress command next. Then you send the output of the named pipe into the gzip command and direct it to a compressed file. There you go, no I/O until you have the compressed data.

References

Advanced Unix Programming, Marc J. Rochkind. Prentice Hall P T R, 1985.
Introduction to Named Pipes – Linux Journal
The lost art of named pipes – TechTarget

Anger’s gone – who’s next?

August 3rd, 2005

Des Moines Register Editor Paul Anger is leaving to be Editor at the Detroit Free Press. Here’s a
link to the story.

This is great. Anger has really done nothing for the paper in terms of news coverage and quality. Sure he can brag about Kauffman’s Pulitzer nomination, but that’s an anomaly. Overall, the paper has sucked since he took over – that’s what happens when you move someone from sports into the Editor position.

Also, this further solidifies the Des Moines Register as stepping stone newspaper within Gannett. Of the 100 or so daily newspapers that Gannett owns, the Register is near the middle-bottom in terms of circulation (with about 150,000 daily subs) compared to Gannett’s other daily papers. And with Gannett owning another 600 or so non-daily newspapers, the Register operates as a perfect management training ground for people to step out of non-daily newspapers into the busy world of a sizable (but not too big) daily newspaper. If they succeed, they get promoted to one of Gannett’s flagship papers (like the Indianapolis Star or Arizona Republic).

Whoever gets picked for job next should make a commitment to staying with the paper for at least 5 years. That way he or she can really focus on improving the quality of the news without worrying about where within Gannett they can get a bigger paycheck and more head count.

More information please

August 3rd, 2005

I mentioned last week about Vilsack creating the Heartland PAC and it’s prototype website. Now a reporter at the Mason City Globe Gazette is writing about the Heartland PAC’s fund raising activities so far (thanks to State 29 for the original link). But the reporter isn’t helping readers to make decisions for themselves, they are parroting the same old (but true) line about Democrats and labor being too close to each other.

First off, they should provide a link to the IRS 8872 filing for the Heartland PAC (or at least a link to where you can search for it on the IRS web site). Next, they should just print all the contributers and let you do your own analysis. This is not a long list, there were only 16 contributions (two from the same PAC) during this period. Remember, bandwidth is cheap.

Here is my 5-minute analysis of the $635,000 in contributions so far:

  • 7 Unions – $315,000 – 50%
  • 6 Corporations – $220,000 – 35%
  • 1 Individual – $50,000 – 8%
  • 1 PAC – $50,000 – 8%

Oh, go ahead and play with the spreadsheet I created yourself.

Why can’t reporters do this stuff for us? This took me about 10 minutes to do from finding the IRS form to creating the spreadsheet. Why do they always have to pick and choose the stuff they think is interesting? (Even when it’s not interesting, “Shocking news, local Democrat gets lots of money from labor unions”, this is not news.)

More Local News

August 2nd, 2005

Back in March, the 3 largest newspaper chains agreed to buy a majority share in Topix.net. Think of Topix.net as a cross between Google News and Google Local – they offer news from a variety of sources that is aggregated by geography. (Yes, they also do this same sort of thing by topic – e.g. Business, Technology, auto accidents, etc.).

We are now seeing some of the results of this deal. On most articles on the DesMoinesRegister.com (owned by Gannett, one of the 3 newspaper chains which purchased Topix.net), there is a box called “Related news from the Web”. This box pulls headlines from Topix.net based on keywords found in the article. It’s far from perfect, while you can usually get other related Des Moines or Iowa stories, you sometimes get stuff like this:

screenshot of topix.net links

The reason you’re offered news stories from Bucklin, MO is because of this quote in the story (my emphasis added):

Bob Bucklin, who works nearby, said the naked lady is great art. “It does show the female form, but so what?” he said. “It’s not as offensive as a Calvin Klein ad or a Victoria’s Secret catalog.”

This is really an attempt by newspaper companies to divert ad revenue from Google AdWords to something they control – Topix.net. Newspapers are scared of both Google and eBay as these companies eat retail advertising revenues and classified advertising revenues.

And just like the above example, the technology is not perfect. In the January 2004 State of the Union, President Bush mentioned eliminating steroid use from baseball. As I was reading the transcript of the address on washingtonpost.com, I noticed the 3 Google AdWord ads at the bottom of the page; they were all for places to buy anabolic steroids on the net.

Nonetheless, I think you should give Topix.net a try. Their coverage of Des Moines is fun cause you get to see stuff about Des Moines that shows up at other news sources. But note, they direct all their links through Topix.net and spoof the displayed URL so they can track where you are going from Topix.net.