Shortcovers adds 1.8 million titles from Internet Archives

by Michael Tamblyn on December 14, 2009

If you’ve seen the release, you’ll know we are adding 1.8 million titles to the Shortcovers catalog through a partnership with the Internet Archives. Through their own scanning efforts and through relationships with other public domain repositories, they’ve amassed an incredible collection of free public domain works. We want to see those works in as many places, on as many devices as possible, so we fired up the development team and got to work. For those of you really interested in the details, here are some extra tidbits below:

What Books, From Where?
* These titles have been scanned from 120 libraries in 5 countries.
* 180 languages
* Roughly 400 million pages
* Adding about 1,000 new titles every day

What Formats?
This week (tomorrow, if all goes well) we’ll have all of the PDF and downloadable ePUB titles available, so that you can download them for desktop or eInk device reading. Mobile access through the Shortcovers app is going to take a little longer — Jan/Feb is the plan right now — as we make some tweaks to our library-in-the-cloud to support downloading/archiving from Internet Archive.

The PDFs are especially cool because they’re the page scans for the books. Some of them are beautiful, some are very old, very weird, or both. You can see marginalia and notes from library patrons long gone, as if you’d just pulled the title off the shelf. (On the other hand, they are big files – 30-40Mb!)

The downloadable ePUB files are OCR’d versions of the original scans. Like all OCR endeavors, much depends on the quality of the source material. Plenty of them are great. Some of them are a little odd. Pushing a 17th century wood-cut type book through an OCR program, no matter how good, is going to result in a bit of weirdness. If you download an ePUB that is difficult to read, check out the PDF — you may find there is a good reason for it.

What Can You See Through Shortcovers?
Tomorrow, we should have everything in English loaded up. Over the next few weeks, we’ll be adding the other languages as well. Our search engine needs a little tweaking to properly index non-English titles and authors.

DRM?
Nope. The PDFs are straight PDFs — read ‘em wherever you like. Same for the ePUB files.

Browsability
We’re working on it. For the English titles, we need to construct a usable Library of Congress-to-BISAC subject code mapping*. Not an easy thing, but definitely do-able. The non-English is going to be a bit trickier, but one way or another, we’ll find a way to make the majority of them perusable. We’ll keep you posted.

Why?
Because, at the end of the day, we’re sold on this idea that you should be able to find and read every book, no matter where you are. Some of them we hope to sell you. Others are free and you should be access to them quickly and easily. We’d like to be the place you come for both.

Technical Backstory
We started working with Internet Archive after I presented at their conference “Making Books Apparent” in October, when Peter Brantley and Brewster Kahle unveiled the BookServer project . We’re fans of OPDS and the work that BookServer is doing in terms of enhancing discoverability for books, so when Peter said that they had the whole collection in ePUB and ready to go, it seemed like a match made in heaven.

We are ingesting an OPDS feed that provides us with the catalog and updates. Internet Archive holds the files and serves them up on an as-needed basis. We index their catalog and merge it in with ours. As implementations go, it was the easiest 1.8M titles we’ve ever picked up. (Lawyers and agreements probably took longer than development ;-) ) The IA-to-mobile side is a bit trickier, but we’re on the case and wanted to make sure that people could start getting their hands on the books in the meantime.

We’re excited about this because we get to meet a number of objectives at the same time. Most importantly, Shortcovers readers get another 1.8 million titles to read. But we also reinforce the importance of open standards in ebooks, support Internet Archive in their goal of preserving digital resources, and showcase OPDS as a means of making ebooks discoverable. Smiles all around.

Enjoy!
We’re constantly working to bring you more great books. If you have any thoughts, feedback, or concerns, don’t hesitate to drop me a note here in the forums or directly at “mt at shortcovers dot com” or @mtamblyn via Twitter.


* This is the point at which the librarians in the house fall off their chairs laughing.

{ 3 trackbacks }

uberVU - social comments
12.14.09 at 9:34 am
TwittLink - Your headlines on Twitter
12.14.09 at 7:01 pm
Over-texts.com - Texts about everything» Archiwum blogu » Shortcovers changes name to Kobo, updates iPhone app.
12.15.09 at 4:59 pm

{ 4 comments }

1 Devini 12.14.09 at 10:08 am

We are waiting for your announce coming tomorrow.
Kobo. eReading device. Canadians are “chomping” at the bit.
BTW, you definitely need a librarian with cataloging experience
to “optimize” your search engine. Seriously, it’s not easy to
find what you’ve actually got.

2 Mark 12.14.09 at 10:42 am

Nice job Michael. Are you the first of the major sellers to play nice with the IA ? Thought so. Says a lot and means more.

3 Heather 12.14.09 at 3:16 pm

I find it difficult to browse your site for titles already. They are not divided by topics such as mystery, romance, non-fiction etc. How will I be able to browse the internet archive titles? I can already go to their site and download books to read. What is the advantage Shortcovers is offering? Not clear to me.

4 Michael Tamblyn 12.14.09 at 4:57 pm

Devini & Heather — search and browse are both about to get much, much better. Stay tuned!

Re: Browse and IA — like I said above, it’s going to take a bit of work, but we’re on the case. Many IA titles have Library of Congress subject coding, but turning those into an ecommerce-friendly browse is tricky. We’re working on it, but it’ll take a little time.

Comments on this entry are closed.