Stephen Hart's Blog

February 28, 2015

In my (probably) final post about making historical sources accessible, I would like to talk about metadata.

There has been a lot about metadata in the news recently with regard to new data retention rules for ISPs (Disclaimer: I work for one). Since even our Attorney-General Mr Brandis seems to have no real idea what metadata is I thought a recap might be in order.

Basically, metadata is additional information associated with a piece of data. As a practical example:

If you want to send an email to a friend saying "Gosh, Stephen Hart is a brilliant writer - you must read Cant, A Gentleman's Guide" (feel free to do this) there is extra information associated with it. Things like:

- Who the message is to
- Who the message is from
- What time it was sent
- Nerdy stuff like mail servers and routing information which you have absolutely no need to know about*

[* or, for pedants, about which you have absolutely no need to know]

None of this information has anything to do with your effort to convey my brilliance as a writer. It is 'meta' information or 'meta' data.

Having got that out of the way, let's move on to metadata for historical sources.

I touched on this back in Part IV on databases where you could filter results based on date. The date of publication is book metadata.

This sort of information is easy but you can do much more by adding categorization metadata. You do this currently with your own blog/tweets by adding subject tags, hashtags or whatever.

This is probably the single most useful thing you can do with your historical data. As an example, I went through every Cant term and put it into a category and subcategory - e.g. animals, occupations, sex (a popular one), transport and so forth.

Technically I haven't finished the job yet but, even on what I have done, a lot of people have written to say how useful they have found it. (Pats self on back, wrenching shoulder in the process)

This whole post fundamentally comes down to 'add as much extra information as you can particularly categories' and you most likely already knew that but it is useful to remind ourselves of the need and to make things easier for fellow researchers.

Use the term metadata liberally. Add it to the already long list of things you know about that our politicians don't. (I exempt Scott Ludlam, the Greens' Senator for Communications who actually knows quite a lot)

Thank you for your patience.
 •  0 comments  •  flag
Twitter icon
Published on February 28, 2015 16:22 • 292 views

January 24, 2015

Chatting about the weather is always acceptable in British Society and in a lot of other areas as well. (Although possibly not in Antarctica: "How's the weather today?" "Cold").

But if you are in the Georgian period, it can be tricky. If you were at the end of January in 1809 it would be a solecism not to mention the fact that floods on the Thames destroyed bridges at Eton and Windsor among others. It would be a hot topic of conversation.

The elderly, of course, would be unimpressed and would tell you at great length about the terrible flood of 1768 where the London-Exeter coach was washed away and all the passengers drowned.

Fear not, help is at hand for conversational gambits. A long list of weather events from 1700 to 1850 is now available at my site

The information has been taken (with permission) from Martin Rowley's weather site at - if you are interested in weather events from 4000BC the site is well worth a visit. All I have really done is make it easier to read.

A little study and you can remark that the terrible Scottish storms of October and November of 1829 caused the loss of many ships, which was tragic for those who had sunk money into coal transport (and, of course, for the sailors involved).

Have fun.
2 likes ·   •  0 comments  •  flag
Twitter icon
Published on January 24, 2015 17:06 • 169 views • Tags: weather-history-georgian

November 13, 2014

Harris's List of Covent-Garden Ladies was published annually between the years 1757 and 1795 as a directory for gentlemen seeking the services of ladies of negotiable virtue in and around the Covent Garden area of London.

As part of my ongoing 'Making Historical Sources Accessible' project, I have added the 1788 version to my website

By turns funny, tragic and downright bitchy, it provides some fascinating insights into the fair ladies practising the oldest profession in Georgian London.

Taking my own advice (see earlier posts), I have divided it into separate web pages, one for each of the ladies.

If you want some real insights into how gentleman rakes viewed women, or if you simply want to have some fun, just dive in and enjoy.
2 likes ·   •  0 comments  •  flag
Twitter icon
Published on November 13, 2014 15:34 • 197 views

November 1, 2014

I've recently discovered how to apply a discount to print copies of my books on Amazon.

Cant - A Gentleman's Guide, my book on the language of rogues in Georgian London, normally sells for US$8.99.

However, if you go to and enter the following code at the checkout, you can get it for US$6.99

Discount code: 6BXGQ9SW

Amazon has assured me this will work but please let me know if it doesn't.
 •  0 comments  •  flag
Twitter icon
Published on November 01, 2014 23:20 • 108 views

October 3, 2014

In this section of 'Making Historical Sources Accessible' I want to talk about databases. They can be a little scary if you have never used one before but the concepts are not too difficult. You may need to find a tame techno-nerd to set it up for you.

The simplest form of database is known as a 'key-value pair' type. This is a direct association between one thing and another. A dictionary is a classic example of this - the key is the word, the value is the definition. It is referred to as a one-to-one relationship.

There is an example on my site at for dictionaries of Georgian Thieves' Cant (the language of rogues - a sort of slang). There are text boxes for 'Cant' and for 'Meaning' and you can search on either of these. (Ignore the other stuff - I'll come to it later).

I use a very simple search mechanism to check for the exact sequence of letters you have entered, regardless of where they appear. In technical terminology this is called a substring search. 'String' is a fancy term for text. 'Substring' is a sequence of letters within that text. To make the search even easier, I ignore capitalisation.

It has its disadvantages - try entering 'cat' in the meaning box and you will get, in addition to cats, catholics, cattle, intoxication and so forth. It is, however, very flexible as you have no need to know the exact word.

The other main type of database is called 'Relational' which means, as you might expect, ways of 'relating' different pieces of information. It is much more powerful than the key-value pair as it allows you relate many items to a single key (known as a one-to-many relationship) or even many items to many other items (a many-to-many relationship).

As a simple example, let's return to the Thieves' Cant database. Below the text boxes you can see four radio buttons, each with an associated date. I used three different sources for this data. Choosing 'Any' - the default - and it will return results from any dictionary. However, if you are looking for material in the early period, the later sources will only get in your way.

When you select the date we now have a second variable on which the result is filtered. We couldn't have done this with just a key-value pair. You can make this sort of search as complex as you like.

I realise this post is a little nerdish and possibly more information than most people want but it helps to be aware of what can be done. If you want to set up a database of your own material and are having difficulty with it, just contact me and I will be happy to help you out.

In my next post, I will be discussing 'metadata' - information which is not part of the original document but which can greatly aid in finding what you want.
 •  0 comments  •  flag
Twitter icon
Published on October 03, 2014 18:07 • 158 views

July 25, 2014

In the first two parts of this series I discussed how we can rescue historical data simply by dividing existing texts into sections. In this post I want to look at how we display this information.

The way in which you display the text can make a significant difference to useability. As an example, I will make use of data from Survey of the Cities of London and Westminster, Borough of Southwark by one W. Stowe published in 1722. Stowe provides, among other things, details of coaching inns and when and where carriages depart and to which location. The entries are in a list, of which the following is the first line:
Abingdon, Co. Sarazen's Head in Breadstreet, th. s. Car. ditto, th.

By consulting the key we find that this means that if you want to go to Abingdon, you can get a coach from the Sarazen's Head in Breadstreet on Thursday or Saturday or you can go with a carrier on Thursday's only. It is not that hard to work out but by the time you get to Wooten-Basset and York it has been a lot of effort. Some of the entries are quite complex as well, for example:
Canterbury, Co. Spread Eagle, Gracechurch Street, t. th. s in Summer, m. th. Winter; Coach and Horses, Charing Cross, the same Days; Bell in Bellsavage Yard, Ludgate Hill, m. th. Car. Checquer, Charing Cross, t. th. s. Star by the Monument, the same Days.

Your options for going to Canterbury are therefore, by coach from either the Spread Eagle in Gracechurch Street or the Coach and Horses in Charing Cross on Tuesdays, Thursdays and Saturdays in Summer but only Mondays and Thursdays in Winter; or from the Bell in Bellsavage Yard, Ludgate Hill on Mondays and Thursdays or, if you can't afford a coach, you can go by Carrier from either the Checquer in Charing Cross or the Star by the Monument on Mondays and Thursdays.

It rapidly becomes all too hard for our little brains to handle. But brains can be used to help us if we let them. Brains like things to be organised.

The salient bits of information that Stowe provides are: Destination, Type of Vehicle, Departure Point, Day(s) of the Week, Special Cases (e.g. Summer or Winter). We can create a table to display each one separating out the types of vehicle and having one column for each day, plus one for the special cases.

To use our Abingdon example, this becomes:

To | Vehicle | From | S | M | T | W | T | F | S | Special
Abingdon | Carrier | Sarazen | . | . | . | . | T | . | . |
Abingdon | Coach | Sarazen | . | . | . | . | T | . | S |

Suddenly, everything becomes clear. Have a look at my website for the complete version.

For extra points, you can re-order the table to sort by departure point instead of destination (see the site again) which is very useful but does require some work. Spreadsheets can help here.

I did a similar thing with the list of imports and exports provided by Don Manoel (see previous post). You can find it at and what was a dense block of text now becomes an easy-to-read table.

In my next post we will take a look at databases. They can be a bit scary at first and take some getting used to but they are an extremely powerful information tool.
 •  1 comment  •  flag
Twitter icon
Published on July 25, 2014 23:39 • 207 views

July 16, 2014

In part I of this series, I discussed how the simple division of a book into separate webpages can help immensely in making the data in it more accessible. The example I used was John Timbs' Club Life of London, which was fairly easy to divide because each Club, Pub or Society formed a separate webpage. Not all books divide up so easily. In this post I would like to discuss producing a 'Good Bits' version and will briefly mention using the Wikipedia as a further tool.

I will use as my example a book published in 1745 entitled London in 1731, written by 'Don Manoel Gonzales' who may or may not have been Daniel Defoe. It contains a lot of information about London at this time, somewhat like a tourist guide, going through the city ward by ward but digressing periodically into topics of interest. The full title of the book is London in 1731. Containing a Description of the City of London; both in regard to its Extent, Buildings, Government, Trade, Etc. and it delivers.

However, it is a dense block of text and, like Timbs' work, is not something you will normally want to read from cover to cover. What we will try to do is make the most interesting parts more available.

The first thing, obviously, is to decide just what these parts are. I decided, somewhat arbitrarily, that I would focus on a) information on significant public places for which there is plenty of modern information but for which contemporary accounts are likely to be significant to the historian or historical novelist, and b) information which is otherwise quite difficult to find, particularly for this period.

You can find all the categories over at my website if you want more detail.

Let's look at a 'Type A' example. One interesting snippet is Don Manoel's Account of Christ's Hospital School - a charity school established in the 17th century. It contains information about the rules under which a child could be admitted to the school, how the governing committee worked, what would happen to the children when they finished their education and even what they had for each meal. We extract all this information into a single webpage ( and go across to our old friend Google to see how it works out.

Unlike the Salutation and Cat in the previous post, Christ's Hospital School produces a lot of results. The school has its own extensive website and there are references all over the place but nevertheless our webpage makes it in at number 26. Moreover, it contains information that is not on the School's own website.

Also, on the first page of the search we see that there is a Wikipedia entry for the school. If we go to the Wiki page we will find a section at the bottom of the page entitled External Links. Anyone can edit a Wikipedia page so we click on [edit] and add our link. Over time this will probably move us up the search ranking because our page is now linked to from an important site.

For a 'Type B' example Don Manoel has a lot of information about British Trade. For example, we learn that Britain exported large quantities of bullion, lead and English cloth to India and China and received tea, china ware cabinets, silks, coffee and muslins among others. This is information that is quite hard to track down, even if you knew it existed in the first place. So we extract it into a single page ( as usual.

This sort of information is quite tricky to search for but if we enter "imports and exports georgian period" we find our page at number 1 on the list. Other combinations may not be so successful - it is easy for the search engine to return values for Georgia the country - but at least we have given our researcher a much better chance of finding the information.

What we have done with London in 1731 is not so different from what we did with Club Life in London but it has required a bit more work. Not too much more - the main difference has been deciding which bits are important enough to put in their own page. Once again, we have rescued some historical titbits from obscurity.

In my next post, I intend to discuss the power of formatting. Simply presenting information in different formats can enhance its useability immensely. To be continued.
 •  2 comments  •  flag
Twitter icon
Published on July 16, 2014 19:37 • 228 views

July 11, 2014

When you are researching matters historical, you often come across fascinating primary or early secondary sources. You extract the snippets that you need and move on. You probably make a note somewhere about the type of material it contains in case you need to go back to it. You might mention it to like-minded friends and colleagues but it then disappears under the ever-accumulating pile of data.

Now that we are living in the century of the fruitbat, it seems we should do better and, indeed, it is happening. It is something of special interest of mine, combining interests in history and computing.

One of the prime examples of this is the Old Bailey Online project - - containing the Proceedings of the Old Bailey from 1674 to 1913, including 197,745 criminal trials. It is fully searchable, you can cross-reference the crimes to maps of London, read secondary accounts and anything else that might be around. It is an absolutely awesome resource.

Of course, it is supported by several Universities, has dedicated technical and project staff and, no doubt, hordes of students all keen to do their bit. Most of us do not have these resources but there is still a lot we can do. And it can be fun.

In this post, I would like to touch on how we can use the power of Google - the largest database search tool on the planet. (There are other search engines and they do a similar job. What works for Google works for them all). I am going to use my own website - - as an example, partly because I am familiar with it and partly because I want you all to know about it.

The site is dedicated to the 18th century and the Georgian period in general. I have done various things with databases, which I propose to discuss in a subsequent post, but even without databases you can do a lot.

In 1866, a man called John Timbs published a two-volume work entitled Club Life of London containing numerous anecdotes about London clubs and pubs during the 17th, 18th and 19th centuries. Some of the anecdotes are, frankly, tedious but others contain many gems of historical information. The text is available from Project Gutenberg so all you have to do is download it and read it through.

No, I didn't think so. Neither would I.

So, what I have done on my website is simply to divide the book into separate webpages, one for each location. For each webpage, you give it a title and description relating to the
tavern name. (Click here if you want to have a look at it.) Now when Google searches it gives each page a reasonably high rating because it knows the information is specific to the topic.

Let us suppose you are researching Charles Lamb and want to know a bit more about the Salutation and Cat where he would hang out with Coleridge. Enter "Salutation and Cat" into Google and there is our page, third from the top.

Google has undoubtedly indexed Timbs' book but it contains so much information about so many different topics that the search engine has no way of knowing that the information needs to be high on the list for that particular topic. Because it is in our title and description, it does.

Thus, we have rescued a small snippet of information from obscurity. Not only the Salutation and Cat but also the Turk's Head Coffee House, the Tzar of Muscovy's Head, the Essex Head Club and sundry Whitebait taverns and many others.

Whether Timbs' anecdotes deserved to be rescued is another question but it serves as an excellent illustration of what we can do. Because the wonderful people at Project Gutenberg have already provided the text for us it has not been too big a job. In a minor way, we have made things much easier and more interesting for our fellow travellers in History.

In my next post, I will be discussing the power of a "good bits" version of a text. Watch this space.
1 like ·   •  1 comment  •  flag
Twitter icon
Published on July 11, 2014 19:10 • 265 views • Tags: primary-sources

July 9, 2014

I mentioned in a previous post the travels of a young Prussian cleric, Pastor Moritz, who visited England in 1782.

It is described in letters and is a fascinating account but perhaps a largish chunk to take in all at once.

To this end, I have done a 'good bits' version whereby I have extracted the more interesting sections into their own pages on my website.

For example a visit to Vauxhall, a night at the theatre, and so forth.

You can find all the pages at

If you have any interest in Georgian England, the good pastor is worth a visit.
 •  0 comments  •  flag
Twitter icon
Published on July 09, 2014 19:03 • 166 views • Tags: georgian-england

July 3, 2014

This is going to sound like a paid announcement but it isn't. I'd just like to acknowledge some fabulous customer service.

The people over at offer book templates for Microsoft Word, all beautifully laid out and very reasonably priced. I purchased one of them when I was producing Cant - A Gentleman's Guide.

However, I am now working on the print version of Jonathan Wild and the template is too 'open', that is, too few words per page. It would bring the book in at over five hundred pages.

There was no denser version on the site so I wrote to them for advice. My expectation was that they would do it for a fee and I hoped the fee wouldn't be too big.

The next day I get a reply saying sure, they offer a high density version of their templates as a service. Here it is attached.

Consider me gobsmacked. It is a pleasant change to find a business model that involves helping the customer rather than charging whatever the market will accept.

And you know, I suspect it works. I'll certainly be going back. And here I find myself telling my friends how great they are.

What better publicity could you get than that?
1 like ·   •  0 comments  •  flag
Twitter icon
Published on July 03, 2014 19:29 • 220 views