Jump to ratings and reviews
Rate this book

Apache Solr 4 Cookbook

Rate this book
Apache Soir 4 can transform the effectiveness of your search engines and this book will show you how. Jump straight into the hands-on recipes and get a fast understanding of the latest and greatest in open source search. Overview In Detail Apache Solr is a blazing fast, scalable, open source Enterprise search server built upon Apache Lucene. Solr is wildly popular because it supports complex search criteria, faceting, result highlighting, query-completion, query spell-checking, and relevancy tuning, amongst other numerous features. "Apache Solr 4 Cookbook" will show you how to get the most out of your search engine. Full of practical recipes and examples, this book will show you how to set up Apache Solr, tune and benchmark performance as well as index and analyze your data to provide better, more precise, and useful search data. "Apache Solr 4 Cookbook" will make your search better, more accurate and faster with practical recipes on essential topics such as SolrCloud, querying data, search faceting, text and data analysis, and cache configuration. With numerous practical chapters centered on important Solr techniques and methods, Apache Solr 4 Cookbook is an essential resource for developers who wish to take their knowledge and skills further. Thoroughly updated and improved, this Cookbook also covers the changes in Apache Solr 4 including the awesome capabilities of SolrCloud. What you will learn from this book Approach "Apache Solr 4 Cookbook" is written in a helpful, practical style with numerous hands-on recipes to help you master Apache Solr to get more precise search results and analysis, higher performance, and reliability. Who this book is written for This book is for developers who wish to learn how to master Apache Solr 4. This book will specifically appeal to developers who wish to quickly get to grips with the changes and new features of Apache Solr 4. This book is also handy as a practical guide to solving common problems and issues when using Apache Solr.

328 pages, Paperback

First published January 25, 2013

6 people are currently reading
23 people want to read

About the author

Rafał Kuć

6 books4 followers

Ratings & Reviews

What do you think?
Rate this book

Friends & Following

Create a free account to discover what your friends think of this book!

Community Reviews

5 stars
8 (29%)
4 stars
10 (37%)
3 stars
8 (29%)
2 stars
1 (3%)
1 star
0 (0%)
Displaying 1 - 8 of 8 reviews
Profile Image for Arthur.
97 reviews6 followers
January 30, 2014
Before reading this book I have never even suspected so many kung-fu things can be done with Apache Solr (Lucene)!

To name a few: using a spellchecker, indexing binary formats (as PDF), breaking words apart, removing markup, handling plural and singular forms of words, incorporating geolocation, integration with a RDBMS.

This book definitely serves as a blueprint to doing development with Solr – it has all the necessary coverage from setting properly your environment up, incorporating other technologies as Nutch to going live and fine tune your system including an effective guide on how to deal with potential problems. As a bonus, there is a “Real-life” section that provides some help to tackling additional scenarios that the author faced in his practice, which did a lot of sense to me to be included into a Cookbook style of a publication.

The book is thoughtfully structured starting with a short description of a task at hand with how it was tackled to explaining in the details all the intricacies as what configuration needs to be affected.

The book is also very well structured paragraph-wise, it starts with the basics as installations and setting a few simple things up to dealing with more complex tasks as the book progresses toward the end. What I mean, Rafal has made this book very useful in an electronic form, for example a developer or (administrator) can quickly find in the book the needed information based on a desired accomplishment and even can copy and paste the sample configuration for further editing. To say more, the book has a good reference to several relevant supporting/complementing tools. And it would be needless to say that at the same time the author delves deep enough into how-tos of using the Solr GUI itself.

While I am still expecting my older laptop rebuilt I am really willing to get my hands on Apache Solr, this is how whetted my appetite is to explore it more after I am nearly done reading the book.

Verdict: I am sure having this book beside you will make you feel more confident in delivering your project on time with the best possible results.

It is a full 5 out 5 star rating, a big thank you Rafal and Packt Publishing!

PS: Blogged the review at http://wblo.gs/eOv
Profile Image for Bill Jones.
72 reviews2 followers
January 26, 2014
I would disagree about googling for the recipes in this cookbook, it's something anyone could say about any technical resource. I found this book to be extremely well written and organized. The recipe on parsing data for near real time searches was well worth the cost of the book. If you aren't comfortable with apache or linux you wouldn't be looking into this book, but don't let that stop you from getting to know Solr because the Author walks you through each project.

The real life scenarios are also priceless and all very well written, I am not an extensive user of Solr but these recipes are most common and now I have a goto source. The cloud setup was great and with more and more going into cloud computing it was great to see this made it into the book. Great book at a good price worthy of a cookbook title!
Http://goo.gl/8nSr9V
1 review
April 24, 2013
I recently read this book and i am really impressed! This book provides good understanding of Apache Solr for both developers as well as consultants.
The book starts off well with an introduction to Apache Solr, the web / app servers required, the role of Zookeeper, why clustering your data is vital ?, the various directory implementations, performance-oriented caching mechanisms, a sample crawler module which coupled with Solr gives a complete end-to-end solution, the role of Apache Tika as a extracting toolkit and the ease of customizing Solr. From then on, the book dwells into the details.
The first step is indexing. It plays a vital part in the entire search solution. Data can be in the form on .txt, .pdf or any other format. It is imperative that all such formats are easily indexable. One of the widely used tools for extracting metadata and language detection is Apache Tika. Data can also be present in a database, for which the Data Import Handler is handy. It comes in two variants – full and delta. Every detail is nicely explained with examples which can make the development time faster. DIH also helps us to modify the data while importing which I felt is a pretty neat feature! One of the nicest features included in Solr 4 is the ability to update single field in a document. I am not sure why this was included in the earlier versions but it’s a classic case of better late than never.
The next step in the pipeline is the data analysis which is achieved through the use of analyzers and tokenizers. Various use cases include elimination HTML and XML tags, copying the contents of one field to another and stemming words amongst others. The detailing that has gone into explaining every concept, the examples and the associated step-by-step explanation is really helpful.
Now that the data is indexed and the data preparation is completed, it’s time to query Apache Solr! Searches can be performed on individual words or on a phrase. You can boost or elevate certain documents over others based on your requirements. Simple concepts such as sorting and faceting of results to complex ones such as ignoring typos using n-grams and detecting duplicates are very simple to understand and perform. Faceting, in particular, is gaining momentum as it helps in implementing the auto-suggest feature and narrowing down the search criteria. A newly introduced feature called the pivot faceting was a much needed one and it vastly simplifies certain use cases related to faceting. Solr provides immense capabilities when it comes to querying and this book explains each of them in great detail taking real-world examples.
We indexed and queried the data. But as our application scales, we have to get our hands dirty and start fine-tuning the performance metrics in order to give a good user experience to our customers. This is where caches and its various flavors and granularities starts to make sense. Cache always plays a major role in any deployment and it is necessary to monitor Solr at all times to gauge its performance. This book can done a great job in clearly explaining the various types of caches, the commit operation and its impact on searchers and how to overcome these. This topic is really important for any Solr real-world deployment and this book has not let me down!
Apache Solr 4.0 introduced the most-awaited SolrCloud feature that allows us to use distributed indexing and searching. Setting up of SolrCloud cluster along with a Zookeeper ensemble to enable replication, fault-tolerance and high availability along with disaster recovery is a piece-of-a-cake now. I really appreciate the time and effort spent on documenting and explaining how to set up two collections inside a single cluster. It was a nightmare to find information on this particular topic when we implementing SolrCloud for one of our customers. But I am rest assured that others referring this book will save precious time of theirs. Adding / deleting nodes from a cluster is no longer a tedious task as the entire process is automated through the presence of Zookeeper nodes. The in-depth knowledge of the author in these topics is clearly visible and is of great help to all the readers. A touch on Zookeeper Rolling Restart, though off-topic, might enable readers to get a complete birds-eye view of the entire cluster. Certain features such as soft commit and NRT search have been explained in detail afterwards (under Real-life Situations) but I felt that at least a mention earlier on would have provided a much needed continuity in that section. For the geeky readers like me, a detailed description about load balancing across shards and replicas and their customizations, if any, would have added an extra amount of spice to this well-cooked food!
As with any other tool, Solr deployment too will run into some kind of a problem. This section details the common problems that are encountered and effective ways to overcome these. Shrinking the size of the index and allocating enough memory in advance amongst others are some of the solutions explained in detail and is clearly documented in this book.
Lastly, as every developer would have wanted it, the real-world scenarios are described and the various Solr concepts that were explained in earlier sections are put together as part of a complete end-to-end solution.
Any one trying Solr 4.0 must read this book in its entirety before recommending a Solr production architecture. As mentioned above, there are a few suggestions which if incorporated in this book would benefit readers. All in all, this book will be really helpful for the developers and consultants alike!
3 reviews
April 10, 2013
I got myself a copy of Rafal Kuc's cookbook for Apache Solr 4 and found out that every recipe is really a must for fine-tunning your Solr implementation.

Within a few chapters you'll notice that Rafal likes to keep things simple, no confusing explanations or examples. He does this by using the following setup for each recipe:

Every recipe starts with a introduction, explaining a real life situation where this recipe could be used. This introduction consists of a few lines, not more then 10, probably less. Sometimes there is a section called “getting ready” where you will find all resources needed for this recipe.

Next is the “how to do it” section, where the recipe describes a step by step solution / implementation to the reader with code snippets, path names and everything you need. Not too detailed though, because otherwise there would be nothing left to write in the “how it works” section!

The “how it works section” explains the “how to do it” section, but in a more detailed way. It also explains why certain configurations or decisions have been made.

Thene, “there's more”! Most of the recipes explain alternative solutions or implementations. E.g: instead of using Jetty as your JVM, you could also use Tomcat. Rafal mentions this in the recipe for configuring Jetty, but also provides detailed instructions on how to use Tomcat instead.

Fun fact, most of the “there's more” sections actually starts with the sentence “there is one more thing”, which sort of reminds me of Steve Jobs!

You will find that most of the questions you get, while reading these recipes, are automatically answered. You'll notice that Rafal's recipes do not come from textbook experience only, but are mainly from real life situations, as described in the books introduction.
One of the things I also really like about this book is that Rafal also describes alternative solutions in some recipes. It also helps a lot that the book comes with support files containing code snippets / examples!

If I would be asked to recommend any book on Apache Solr, then I would certainly recommend Rafal's Cookbook!
3 reviews
April 10, 2013
I got myself a copy of Rafal Kuc's cookbook for Apache Solr 4 and found out that every recipe is really a must for fine-tunning your Solr implementation.

Within a few chapters you'll notice that Rafal likes to keep things simple, no confusing explanations or examples. He does this by using the following setup for each recipe:

Every recipe starts with a introduction, explaining a real life situation where this recipe could be used. This introduction consists of a few lines, not more then 10, probably less. Sometimes there is a section called “getting ready” where you will find all resources needed for this recipe.

Next is the “how to do it” section, where the recipe describes a step by step solution / implementation to the reader with code snippets, path names and everything you need. Not too detailed though, because otherwise there would be nothing left to write in the “how it works” section!

The “how it works section” explains the “how to do it” section, but in a more detailed way. It also explains why certain configurations or decisions have been made.

Thene, “there's more”! Most of the recipes explain alternative solutions or implementations. E.g: instead of using Jetty as your JVM, you could also use Tomcat. Rafal mentions this in the recipe for configuring Jetty, but also provides detailed instructions on how to use Tomcat instead.

Fun fact, most of the “there's more” sections actually starts with the sentence “there is one more thing”, which sort of reminds me of Steve Jobs!

You will find that most of the questions you get, while reading these recipes, are automatically answered. You'll notice that Rafal's recipes do not come from textbook experience only, but are mainly from real life situations, as described in the books introduction.
One of the things I also really like about this book is that Rafal also describes alternative solutions in some recipes. It also helps a lot that the book comes with support files containing code snippets / examples!

If I would be asked to recommend any book on Apache Solr, then I would certainly recommend Rafal's Cookbook!
Profile Image for Rubén Teijeiro suárez.
2 reviews3 followers
March 26, 2013
A few days ago I started to read the Apache Solr 4 Cookbook trying to learn more about Solr and apply this knowledge in my Drupal projects development.

The book consist in a bunch of more than 100 receipts improve Apache Solr performance and make it more reliable and also obtain better results in your queries.

At first the book explain how to install Apache Solr with Jetty and Apache Tomcat, nothing new. But it also explains in a chapter how to configure a distributed Solr cluster installation with SolrCloud and ZooKeeper, something really important if you want a highly available and best performance Solr environment.

Other interesting common topic of some receipts is the data indexing. You can learn how to index your website pages as a web crawler using custom fields, also nothing new. But what about indexing binary files like your music or video files to create a music and video store? Also, probably you need to index your customer's invoice and bill files that are in PDF, ODT or DOC format. Well, just read a few receipts of the book and you will learn how to configure your Solr to do this tasks.

But my interest is focused in the chapter related to improving the Solr performance. In it is explained how to configure the document, query result and filter caches. Also it takes a view of how to test Solr perfomance with Ganglia and Scalable Performance Monitoring.

I recommend you to read this book if you are looking for a fast and easy answer for your Solr problem ;)

You can read a complete review of the book at http://drewpull.drupalgardens.com/blo...
Profile Image for Jeevanandam M..
2 reviews
May 16, 2013
I got an opportunity from Packt Publishing to review and publish my review comments with an unbiased opinion about Apache Solr 4 Cookbook. I have done my due diligence and published detailed pointers of book.

Summary:
- What we are getting from cookbook?
- What could have been covered in the cookbook?

Detailed Review: Apache Solr 4 Cookbook Review

Profile Image for Ramzi Alqrainy.
4 reviews3 followers
November 17, 2013
The recipes sometimes are not so different from what you see in the official docs. I'm a big fan of Rafal's work but his other books are more deep in the subject. I know this is only a cookbook but for me it didn't work like his other books that are great
Displaying 1 - 8 of 8 reviews

Can't find what you're looking for?

Get help and learn more about the design.