Every computer science or engineering graduate has learned the basics of relational database. This will certainly include some “advanced” skills like JOIN, or COUNT with GROUP BY, or sub-queries and probably certainly TRANSACTIONS. Students believe that they need to use these features in their real work.
How wrong!
Amazon’s e-commerce site uses key-value, consistent hashing and “always-write-succeed”, instead of transactions, as described in this ground-breaking paper. Reddit, one of the world’s most popular site uses only two tables:
they keep a Thing Table and a Data Table. Everything in Reddit is a Thing: users, links, comments, subreddits, awards, etc. Things keep common attribute like up/down votes, a type, and creation date. The Data table has three columns: thing id, key, value. There’s a row for every attribute. There’s a row for title, url, author, spam votes, etc. When they add new features they didn’t have to worry about the database anymore. They didn’t have to add new tables for new things or worry about upgrades. Easier for development, deployment, maintenance.
The price is you can’t use cool relational features. There are no joins in the database and you must manually enforce consistency. No joins means it’s really easy to distribute data to different machines. You don’t have to worry about foreign keys are doing joins or how to split the data up. Worked out really well. Worries of using a relational database are a thing of the past.
In the age of big data, it’s very rare that the database these students will work on in the future will stay on only one disk. As soon as the data expands physical machine boundary, all those beautiful normalization, join, sub-queries and transactions all become completely useless.
To prepare the students for future, I strongly recommend the universities also teaching the students these:
- How to denormalize databases. Teach them redis. Get them to think about how NOT to use lock. Teach them the master and slave concepts.
- Explain how MongoDB’s sharding works. Let students work out how to pick the right index and keys to shard on.
- Experiment with Google App Engine, particularly the DataStore so the students know the basic of BigTable and think more carefully on how to pick the best row id.
- Play around with Riak. Think about how vector clock works. What the “eventual consistency” really means.
(Again, if you are a student and you have read *any* of the above paper. Drop me a message. I’ve got a job for you. )
Completely agree with you there Alex. I’m an IT student and when I heard Reddit uses a two table database I started to question what I was being taught. Otago Polytech very briefly covers NoSQL using MongoDB in third year, which is a step in the right direction but more could be done.
Hi Alex, for the article of eventually consistency, this one (http://www.allthingsdistributed.com/2008/12/eventually_consistent.html ) might be a better choice.
I still think that the SQL model is the right model for data and Google F1 looks good on this topic.
Andy, Werner Vogels’s post *is* way better. Just replaced it.
I think you’re actually raising a different question. Universities are tending towards vocationalising their degrees which means people are being taught practical, industry applicable skills, not root theory. If it’s an industry focussed degree, I’ll put money on SQL being used by 90% of the likely employers. Sadly in Enterprise, they’ll rarely come across these technologies – it will be MS SQL Server or Oracle in 99% of the cases.
So while I agree with you they could be taught differently, I put for consideration that if the degree is vocational in nature, learning SQL IS the right answer if the institutions goal (and the students?) is instantly employable skills (which is very different from a well rounded education).
Totally agree Tim. Tons of the knowledge taught in traditional database like index and data structure design in db storage engines are still valid too.
The point I was making above is universities shouldn’t only teach JOIN and subqueries. They should make it clear that there is another world out there for the students to explore. Right now, that part is kind of missing.
I agree with you Tim, which is also probably why Otago University CS department insists on using C or Java in most papers, the 2 languages at the top of Tiobe’s index (they also use Oracle in their Database paper !). It’s bit sad that universities are so obsessed with the job market… where else are you going to learn the theory ?
I realise my percentages leave a bit to be desired there… still – point remains the same!