• Publishers
  • Learn
  • Blog
  • About
  • Contact
Menu

Kadaxis

139 Fulton Street, #703
New York, New York
Tools for Book Discovery and Marketing

​You're Custom Text Here

Kadaxis

  • Publishers
  • Learn
  • Blog
  • About
  • Contact

Why a 500 keyword character limit is costing you book sales.

August 16, 2018 Chris Sim

I bet that if you’re reading this, you’ve always accepted 500 characters as the gold standard for maximizing book visibility through search - work your way up to 500 characters and you’ve joined the metadata elite. You've surpassed the majority of publishers who add only two or three hundred keywords, or gasp, don’t add keywords at all. Well, dear reader, if your ONIX makes its way to Amazon, and Amazon is an important retailer for your sales, I have some news for you:

If you only send 500 keyword characters to Amazon, you’re probably missing out on sales.

Contrary to what you may have been told, Amazon accepts and uses far in excess of 500 characters per book. In this post we’ll uncover the origin of the 500 character limit, and provide actual hard data that shows Amazon does indeed accept keywords far beyond the 500 character limit. (In case you’re wondering why more keywords are better, the short answer: more keywords = more chances for a book to be found in search = more sales opportunities. A longer explanation is available here).

The "Best Practices For Keywords" Recommendation

The BISG’s metadata keyword best practice standard states a recommended limit of 500 characters. The guide is a fine document, and I highly recommend you give it a read in case you haven’t come across it before. I was a member of the committee who authored the document and contributed to it's content and discussions. If you’re tasked with “creating keywords for online retailers”, you should absolutely use this resource as a guide. When it comes to keywords destined to Amazon, it’s worth noting that: Amazon wasn't involved in authoring the standard, and the “best practices” are a general recommendation to the industry as a whole. They’re not prescriptive, and retailers can implement anything they wish to. In fact the committee, correctly, ensured they weren’t too Amazon-centric in their recommendations. Long story short - even though 500 characters is the “recommended” limit, it doesn’t require Amazon or any other retailer, to strictly abide by the recommendation.

Where Did The 500 Character Limit Come From?

I asked this question on the committee and the best answer I received was: “it’s always been in the ONIX standard”. As someone who has lead teams to build book search engines, I wanted to understand what the technical rationale might be - was there some data science based research that underpinned this widely accepted “best practice”? My search led me to the good folks over at EDItEUR who oversee the ONIX standard globally. From my exchanges with these publishing metadata veterans, I learned that:
- The keyword character limit as it’s defined in ONIX, has always been a “suggested limit”.
- The ONIX standard places no limit on how many keyword characters an ONIX sender can transmit.
- Publishers should rely on receivers (such as retailers) accepting at least 500 characters.
- The limit used to be 100 characters and was then raised to 250 before the current suggestion of 500.
- The 500 limit considers old library systems that may not have the technological resources of retailers, and therefore are limited in what they can receive.
Again, a sensible limit to accommodate myriad publishing systems of various sophistication levels and vintage. But also, still nothing prescriptive about what a receiver (retailer) should do with a keyword field, other than accept at least 500 characters.

It's Easy To Test The Limit, But You Have To Do It Properly

Search technology is complex and doesn’t operate in the same manner as say, a bank transaction system. If you login to your online banking website and send money to someone, they better receive it. Every cent needs to be managed, accounted for and then auditable by all parties involved.  Search engines are somewhat different. When a search engine indexes data, while it may read everything available to it, the use and visibility of the source data comes down to a whole host of factors which is far beyond the scope of this post. Suffice to say that if you supply Amazon with say 100 keywords, it will read all of them, but won’t necessarily use all of them for your book (understand more about this here). If your character length test involves typing in each of your keywords into Amazon search, paging through countless results to check for your book, then using the absence of your book in results as evidence that the keyword wasn’t indexed - you’re doing it wrong. (We haven’t even touched on partial term matches). Each keyword is an opportunity, not a guarantee.

Cold, Hard, Data: How Many Keyword Characters Should I Provide?

Kdxs-Keyword-Character-Count-Infographic.jpeg

Running a test to correctly identify whether a keyword improves search rank, involves analyzing hundreds of books, removing words found in the title, author names and category (BISAC/browse node) names from test queries and also searching for combinations of individual keyword terms (as a metadata expert, you already know that exact keyword matches are the tip of the search iceberg). We did all of this and here’s what we found:

Amazon indexes at least 1500 keyword characters from the keyword field in an ONIX file (or uploaded directly using Amazon's internal tools).
Kadaxis clients receive up to 1500 keyword characters, so this is the limit we tested. (On average, our clients receive an average of 1000 keyword characters per book). We found books matching search queries all the way up to the high 1400s character count after stripping away other metadata data we know is indexed. This approach gives us confidence that the book’s presence in search is attributed to the keyword and not other sources (like BISAC names).

30-50% of searches matched keywords were found in the 500-1500 character range
Said another way: if you’re only adding 500 keyword characters, your book is missing out on matching to 30-50% of search queries, than if you’d used 1000+ characters.
 
Books with 1000-1500 keyword characters match 67% more search queries, compared to books with 500 characters or less.
One of our tests compared 100 books from two trade publishers - one with Kadaxis keywords, and one who used an alternative service that maxes out at 500 characters. Both sets included a mix of good selling fiction and non-fiction titles and were run through identical measurement systems. The Kadaxis publisher had an average of 1098 keyword characters (max 1500) which matched to 67% more search queries than the publisher who added an average of 446 keyword characters (max 500).

Let's look at a couple of examples to illustrate further:

Title: Medical-Surgical Nursing Made Incredibly Easy (Incredibly Easy! Series)

This medical text matched numerous keywords in search, but one example of note is the keyword phrase "advanced pathophysiology". Neither of these terms are found anywhere in the book's metadata (as an aside, while this keyword is also not in the description text, know that the description isn't indexed for Amazon search, but that's a post for another day).

Medical-nursing-product.png

The keyword itself is present in the keyword field at character position 964 (out of 1079 total keyword characters). We can find the book in the Books search engine on Amazon for the search query "advanced pathophysiology, wedged between two other pathophysiology books. Note, the term is relevant for people interested in the topic, as evidenced by review mentions.

Medical-nursing-search.png

Let's take a look at one more example:

TITLE: The Greatest Story Ever Told--So Far: Why Are We Here?

Again, this title matches numerous keyword derived search queries, but we'll focus on one keyword: "heisenberg uncertainty", which refers to Werner Heisenberg's Uncertainty Principle. This keyword isn't mentioned anywhere in the book's metadata, but is mentioned several times by readers, including examples where "The Greatest Story Every Told" helped readers to better understand Heisenberg's Uncertainty Principle.

Greatest-story-product.png

 

The keyword "heisenberg uncertainty" is present in the book's keyword field at character position 1101, and the book is found in the search results among related titles.

Greatest-story-search.png

A few other examples, from hundreds in our set:

  • Blockchain Revolution: How the Technology Behind Bitcoin Is Changing Money, Business, and the World. Keyword with search match: "smart contracts", character position 1142.
  • The Perfect You: A Blueprint for Identity. Keyword with search match: "neuroscience", character position 1267 (actually matches 5 search queries about neuroscience).
  • Darkfever (Fever Series, Book 1). Keyword with search match: "male characters", character position 1095.

What does this all mean? More keywords equals better search visibility, which equals more chances to sell (after all, most books are sold through Amazon search).

Keyword ROI: It’s Worth It

1000 characters is a lot of work, 1500 characters even more so, but compared to the rest of the effort and expense that goes into making a book a success, it’s a comparably small investment for the potential upside - especially when you consider the compounding value better search visibility earns a book. If you have questions about how we conducted our tests, how you can replicate our results or to learn more about our keyword services, please get in touch via the Contact form.

Thank you to the folks at Firebrand (Catherine Toolan, Steve Rutberg and Joshua Tallent) for their help with this article.

Tags amazon search, ONIX, keywords, metadata

BISG 2017 Annual Meeting - Rights, Metadata and Marketing Panel

September 29, 2017 Chris Sim
bisgpanel.JPG

I participated in a panel on "Rights, Metadata and Marketing" at the recent BISG 2017 Annual Meeting, held at the Harvard Club in New York City. Here are my responses to the questions I was asked:

What is currently working for Kadaxis?

Our approach combines machine learning techniques with a deep knowledge of Amazon search. We take a digital marketing approach to keyword creation by understanding how readers search for books. As a result, our publisher clients are experiencing success by using our keywords, and have seen how the right keywords can directly lead to an increase in search visibility.

Creating keywords by hand isn't hard, creating keywords using algorithms also isn't particularly difficult, but creating keywords with an understanding of how a specific search engine (Amazon in our case) uses them can be challenging. This difference in platform optimization is key in creating keywords that have an impact.

What trends do you see in rights, metadata, and marketing?

A strong trend in moving away from the traditional gut instinct approach to around metadata content curation and marketing, to decisions backed by hard data. The most effective data engineering we've seen by publishers, are from those who iterate over metadata changes quickly, then use tools to measure the impact. As a result, their internal expertise and specialisation increases, leading to significant improvements to online visibility and sales.

What's not working as well, or where would you like help? What are the persistent problems you think the industry needs to solve?

Keywords have seen a significant increase in visibility for publishers this year, which has meant a significant increase in queries and interest in our service. But as with any new solution or technique, it's human nature to look for a silver bullet to solve a problem (in this case to boost search visibility and sales). Part of our engagement process for new clients is to set expectations that creating impactful keywords requires time and focus. While we can scale retail SEO expertise to work with tens of thousands of books at once, not every book will be boosted equally. Metadata optimization works best when using tools such as audience driven keyword analysis, but it can take iterating over the process to find what works best. A single metadata change is almost never the single piece of the puzzle for a meaningful increase in sales.

Platform optimization is key here also - many companies in the space have come and gone, making the same mistakes of extracting keywords from the content of the book, ignoring the audience and not optimizing for a platform - for us we focus on Amazon search. Creating keywords from a body of text isn't technically challenging, but creating keywords that have a high probability of working on a specific platform is our goal, which can be at odds with a gut instinct approach to keyword optimization.

We've also worked hard on helping to educate publishers on the importance of measuring keyword success - many publishers take the "set, forget and hope" approach to metadata optimization. Without a methodology in place to measure and understand how changes impact a book's performance, it's impossible to know if the changes made a difference or not, and how the process can be improve upon.

How can BISG help in these areas? What should we be thinking about for 2018 and beyond?

Publishers strengths have always been in identifying the content and creating a product that resonates with an audience. This is the traditional art, the skill that hasn't changed and I don't think it will or needs to. What has changed is how people find books - search and recommendation engines, social media, deals, and so forth. Audiences are reached using these newer systems through data and are the perfect target to apply data analysis techniques.

The better data a publisher has about an audience - the better it can target them.

Think about how other types of data might be shared and accessed and eventually monetized - beyond book metadata. The more you understand an audience, the more likely you are to reach them. Consider all the rich data about reader interests and behaviour that exists online - think book bloggers, email list owners, retailers, and so forth. If this data was captured and available to publishers in a standard format - data owners could monetize their data while publishers and other service providers could access powerful insight into audiences in a standard (potentially real-time) method at a low cost.

Creating a standard around how this data might be shared would be powerful and create significant value for audience data owners and consumers.
Ideally if such data were decentralized and made available on a blockchain, it could facilitate the next generation of market intelligence and discovery services in publishing.

If you could ask the companies represented by the people assembled here today for help in one area, what might that be?

Share with us how you're experimenting with data to understand who a readership is so we can understand and learn with you. Publishers often aren't given enough credit for being innovative. During my time in the industry, I've learnt an incredible amount from many smart publishers and always welcome the opportunity to understand how publishers are redefining marketing.
 

Tags bisg, metadata, marketing, blockchain

Machine Learning and Bestseller Prediction: More Than Words Can Say

September 7, 2017 Chris Sim
riccardo-annandale-140624-unsplash.jpg

There’s been much recent conjecture on whether book sales can be predicted by text analysis alone. My company, Kadaxis, has dedicated the past few years to machine learning research and product development for the publishing industry. In our early days, we set out to build an algorithm to predict bestsellers, and tested it in the wild. In this post, I’ll share my perspectives on why the text alone isn’t enough.

If You Publish It, Will They Come?

To predict book sales, you need to account for the factors that influence book sales. The text of a book is core to the product, but many other factors, such as sales and marketing, influence whether a customer will discover and buy it. An algorithm predicting book sales using only the text as input will only work in a book market meritocracy, where the best-written books always sell the most copies.

Author platform (brand awareness) is one such non-text factor that influences sales, as in the following examples:

– The Cuckoo’s Calling hits the top of Amazon’s bestseller list only after Robert Galbraith is revealed to be J.K. Rowling.
– Amy Schumer’s memoir, The Girl with the Lower Back Tattoo, hits the New York Times bestseller list in its first week—an improbable feat without her strong personal brand.
– Dave Eggers publishes multiple books and receives several award nominations prior to releasing bestseller The Circle, and does so after having appeared on television numerous times.

Even amongst well-discovered books, the relationship between reader satisfaction and sales volume can be tenuous. Consider Harry Potter and the Cursed Child and Go Set a Watchman, books that have sold millions of copies each, but have achieved star ratings of 3.6 and 3.4 on Amazon, respectively—scores below the average indicator of satisfied readers.

Many other factors might also influence book sales, such as the editorial process, cover design, marketing budget, seasonal trends and book metadata. A machine, just like a human, needs to consider which of these factors will make a book sell more, to make an accurate prediction.

Machine Reading

Assume for a moment a linear relationship exists between reader satisfaction, discoverability and sales (i.e. the best written books are found the most often and sell the most copies). In this author’s utopia, we can reliably predict sales volume directly from a book’s text, as long as we can measure what’s important to readers. As products go, books are nuanced and complex, and the reasons why they resonate with us are also complex (compared to, say, a toothbrush). How do we uniformly distill the unique traits of a book into data?

This is, of course, where machine learning helps us. One approach, which is also the method used by the authors of the much-talked-about The Bestseller Code, is topic analysis (or latent Dirichlet allocation). This technique allows us to define a book in terms of how much of a topic it contains, such as “Homicide – 8.7 percent.”

If you’d like to see the data a topic model creates, you can view an example from our systems here (or upload your own book for analysis at authorcheckpoint.com). Topic modeling gives us a good snapshot of the content of a book, and allows us to make apples-to-apples comparisons between them. It is also useful data to use as input to training a predictive algorithm.

The Curse of Dimensionality

Our machine reader might define thousands of topics for each book we’re analyzing. While more data points might seem like a good thing, the more we add, the more books we need to read in order to make reliable predictions. If, for example, we had 2,500 different data points about a book, we’d likely need several tens of thousands of books to be confident our algorithm is accurate. Even 20,000 books (the data set used in The Bestseller Code) is likely far too few books, and puts us at risk of the curse of dimensionality.

(A quick tech side-bar: even with cross-validation we’re still likely overfitting our data, and hold-out is no guarantee against this, especially when using heavily unbalanced classes such as “bestsellers” for classification.)

Too many data points, and not enough books, means our algorithm will probably find patterns to say whatever we want them to say. The patterns exist in the data, but they aren’t representative of the real cause of what we’re trying to predict. In the world of black-box trading systems, this phenomenon is well-known.

So is there value in analyzing the intrinsic qualities of books such an algorithm might identify as selling well? It might be an interesting exercise, and the similarities the algorithm finds might make sense to a human observer. But you couldn’t reliably conclude that those similarities were the reason the books both sold well. In a contrived example, we might conclude that books with a red cover, 250+ pages in length and featuring a dog instead of a cat, will sell more copies than those without.

There is, of course, a simple way to prove the efficacy of any predictive model, and that is to apply it to new, unseen books before publication.

Predicting What’s Important

Even with access to enough books in our author’s utopia, we of course need a reliable metric to measure. Bestseller lists are a weak proxy for actual sales volume for many reasons, not least for the fact that they reflect “fast sellers,” meaning a book on a list may sell less overall copies, over time, than a book that isn’t.

But rather than searching for a magic formula to help move more copies of a book, a more valuable and attainable goal is to solve for reader satisfaction. By tying together data about the content of a book, with data capturing a reader’s reaction to it (beyond tracking where they stopped reading), we can begin to understand the true impact a book has on a particular audience and why. Armed with this insight, we can better match books to readers (recommendation systems) and books to markets.

This article originally appeared on the DBW blog September 28, 2016

 

Tags machine learning, publishers, publishing, metadata, bestsellers

How Do Keywords Impact Sales?

May 22, 2017 Chris Sim
roman-kraft-137437-unsplash.jpg

The question I receive most often from publishers is: “How do keywords impact sales?” While adding keywords to book metadata is considered best-practice, publishing businesses are naturally more interested in whether the practice will increase revenue. Keywords in this context are ‘off-page’ keywords, which are sent to retailers in an ONIX feed or added to a book via KDP, Amazon’s dashboard for Kindle books. Keywords aren’t visible to customers, but are indexed directly by retailer search engines (such as on Amazon), and allow publishers and authors to influence how readers find their books online.

At Kadaxis, we’ve added keywords to thousands of books, on behalf of a wide variety of publishers, and while some titles have seen significant short-term sales improvement, in most cases, publishers observe an average overall increase across a portfolio of titles over time. In this post we’ll cover the relationship between search traffic and sales, and outline how the title selection component of a keyword strategy can have an impact.

Keywords Direct Online Shoppers To Books

When purchasing a book online, a customer, can take many paths in a session of book browsing. We’ve isolated one path for discussion. A typical path a customer might follow involves:

  • Typing a search query
  • Viewing a list of search results
  • Clicking on a book
  • Viewing the book’s product information
  • Making a purchase

Keywords can assist at the start of this flow, by helping books to appear in search results more frequently. But getting customers from search result to purchase is dependent on their previous exposure to the book, product information and other factors. Readers need to discover a book three times before they’re ready to buy, says Peter Hildick-Smith of the Codex Group, and ranking in search presents them with that option.

But the final responsibility to sell the book sits with the book’s product page. The stronger this page is the higher the likelihood of converting search traffic to sales. Some factors include and appealing title and cover, well-written descriptions, and positive customer reviews. A book with a bland, wordy description and a low count of negative reviews is unlikely to yield much return from adding keywords.

Sales Leads to Discoverability Leads to Sales

From a publisher’s perspective, a keyword’s core utility is to direct search traffic to books in the hope of selling more copies. If excellent, reader-focused keywords are assigned to a book, these keywords will only serve their function if the book appears in the search results of customers searching for books by those keywords. If the book doesn’t rank for those keywords, they are of no value.

So how do you determine whether a book will rank for its assigned keywords? The best predictor is sales. We consistently see a correlation between sales and the number of keywords a book ranks for: higher selling books also rank higher in search results. Generally, the more a book sells, and the more recently those sales occurred, the more discoverable it will be.

It can be insightful to examine the intent of different search providers when understanding how search works. Ecommerce retailers, such as Amazon, use search to sell products, whereas search companies, like Google, use search to help people find content. The focus on selling in retailer search can strongly influence how discoverable books become. (See also: How do Amazon and Google use my book metadata in search?).

For many reasons, products that have sold well in the past have a high chance of selling well in the future. Amazon exploits this phenomenon in search (and across their site), by boosting the visibility of higher selling books in an attempt to maximize sales. They understand that the odds of a sale are higher if a customer is presented with a popular item, so search results are reordered based on sales data (and other signals, such as page views and conversion rate). This means even the most well reasoned keywords might not have any impact for some books, but for others, they’re afforded the opportunity to rank for disproportionately more search queries.

Maximizing Return Through Title Selection

The myriad factors influencing search visibility, conversion and buyer sentiment, make it challenging to determine which books will benefit most from keywords. But since the endeavor is relatively low cost compared to rewriting jacket copy or updating a cover, and the possible return is high, the most prudent strategy to maximize ROI is to add keywords to a number of high potential titles.

Tying the concepts above together, this means selecting titles with:

  • A high chance of converting: books with good publisher-provided metadata (to assist customers in their buying decision) and customer-created reviews and ratings (social proof).
  • A high chance of ranking in search: typically books with a solid sales history, ideally performing above the competition, with recent sales valued more highly (or pre-promotion).

Recurring ROI

Titles that respond positively from keywords will experience increased sales over time, while maintaining search visibility and accumulating social proof, criteria which positively reinforce each other. But this can take time to build, and the rate of improvement varies for different genres, audiences, titles, and is heavily influenced by the prevailing zeitgeist of the moment. It’s not uncommon for titles to “tip” after several months of gradual improvement, which is why it’s best to adopt a medium to long-term outlook for any keyword strategy. But once the right keywords take effect, the return can persist long after the keywords were put in place.

As with most sound marketing strategies, keywords aren’t a silver bullet to an overnight improvement in sales. But when applied strategically across a quality catalog, they can significantly impact discoverability, leading to an ongoing recurring increase in sales over time.

This article originally appeared on the DBW blog May 22, 2017

Tags keywords, amazon keywords, amazon search, metadata, online sales, off-page keywords

Who Uses the Keywords in Metadata?

March 4, 2017 Chris Sim
seth-weisfeld-510837-unsplash.jpg

We often hear that keywords are important to help readers find and discover books. But what does that mean, and do keywords actually make a difference? In this post, we look at how keywords are used to search book websites (in particular, online booksellers), and their adoption by publishers. For this investigation, I had help from Pat Payton (Bowker) and Catherine Toolan (Firebrand). We set out to answer the following questions:

• Are publishers adding keywords to book metadata?
• Are they providing quality keywords?
• Do online booksellers use keywords in their search engines?

In this post, “keywords” refer to consumer-oriented terms to describe a book that are added to an ONIX feed and sent to third parties. These terms aren’t seen by the public and are primarily used for search indexing. Conversely, web search engines (such as Google) don’t make use of ONIX keywords, but analyze the text of public webpages to create search indexes. As book content isn’t public, search providers rely on metadata to help consumers locate books.

Keywords Help Consumers Find Books

Most retailers solve the simple use cases of finding a book by title, author or category. Many searches, however, are comprised of natural language queries that describe different elements of a book, such as its setting, characters, theme or an emotional response to its content. Keywords were designed to fill this gap, by allowing people knowledgeable of the book to specify additional terms by which to find it.

Books are multi-dimensional, complex products that are typically highly nuanced and represent multiple buy trigger points for different types of consumers. Books have much more depth than, say, a kettle or a toothbrush, and determining the best keywords is therefore proportionally complex.

Note that extracting keywords from the book’s text is a naïve approach to solving this problem. The most effective keywords relate to a reader’s experience with a book, and the language she uses to describe it.

Are Publishers Adding Keywords to Their Books?

Bowker analyzed the keywords added to ONIX files from roughly 150,000 publishers, which included reprint and self-publishing service providers to university presses, trade, school and audio publishers. Of these publishers, about 23,000 (15.3 percent) had added keywords to at least one book. And of these, smaller publishers (less than 100 titles) typically had a higher percentage of keyword coverage than did larger publishers.

Over the past 10 years, though, publishers have increased the number of titles with keywords from approximately 25,000 to approximately 114,000, in 2015. But this number is still a very small proportion of all books available.

How Sophisticated Are Publishers’ Efforts to Choose and Maintain Keywords?

While keywords have been part of the ONIX standard for many years, they definitely rose in importance around 2013. As publishers had whole backlists without keywords, obtaining coverage was (and still is) a resource-intensive task. In order to achieve high coverage of keywords across a catalog, many publishers undertook a stopgap approach, adding other metadata to keywords (from title/subtitle, subject codes, contributors, product format, and audience), which are already available to search providers, and therefore are unlikely to help with search visibility. To improve keyword quality and to recommend against practices such as keyword stuffing, the Book Industry Standard Group (BISG) published the “Best Practices for Keywords in Metadata,” in 2014, to guide publishers on choosing effective keywords.

Keyword quality is still low today, though. One example from Bowker shows the use of the keyword “audiobook” (relating to form, not content) in just about 12,000 of the approximately 114,000 titles sampled from 2015.

Do Online Retailers Use Keywords?

Every book search implementation is proprietary, so the exact use of keywords is generally not public knowledge. It is possible, however, to determine whether keywords, when used as search queries, return the books they’re associated with in ONIX.

Kadaxis tested 13 websites that consume ONIX and provide book search, and found that only Amazon showed books returned in search results for keywords attributed to the book in ONIX.

Keywords are central to Amazon’s search capability across all its product lines. The site receives keywords of wildly varying quality from a huge number of product suppliers (from individuals to large companies), which means its capability for filtering, cleaning and incorporating keywords into a search index and mapping these to consumer search queries is sophisticated.

As the quality of keywords provided by publishers is generally low, it is a challenging endeavor for other websites, without this history and experience, to use the data as extensively.

Are Keywords Worth the Investment?

From the research above, Amazon is the only online bookseller making use of keywords today. If increasing sales of books on Amazon is important, then investing in keywords may be worthwhile. As most publishers aren’t adding keywords to their titles (and of those that are, the quality is typically low), there also appears to be a window of opportunity in which publishers can gain a ranking advantage in Amazon’s search by adding keywords to titles.

Conclusion

While some publishers (see here and here) are quietly providing effective, consumer-oriented keywords, most aren’t investing significant resources. But doing so might represent a low cost, low risk investment for a potentially strong, recurring return. At least until a better solution is created, that takes the onus of keyword curation away from publishers and authors.

Additional thanks to Chris Saynor from OnixSuite.

This article originally appeared on the DBW blog March 4, 2016

Tags keywords, publishing, publishers, metadata

What are off-page keywords?

September 30, 2015 Chris Sim
john-schnobrich-510629-unsplash.jpg

In the world of publishing metadata, when we talk about keywords, we’re talking about structured off-page keywords, often sent in an ONIX file, from a publisher to a retailer like Amazon. The retailer indexes the keywords and matches them against customer search queries, in order to display relevant books to them. Keywords are made up of phrases used to describe a book and their purpose is to give a search engine clues about how to show a book to consumers. We call them "off-page", because the retailer uses them directly, and doesn't show them to customers, like they do with other book metadata such as the title or description.

Web search engines, such as Google, determine what content such as a web page is about, and also how people might search for the content. Off-page keywords put this burden on publishers or authors, who have the complex task of trying to understand how readers might search, then how a search engine will use the provided keywords.

A typical book search engine, that reads ONIX, will index various metadata fields, like the title, author, categories and so forth, data who’s primary purpose is to inform consumers about the book - it’s public data. It needs to be appealing and be constructed in a way that is optimized for a search engine to work with.

Conversely, the primary purpose of off-page keywords is to directly inform a search engine how to match a book against search queries. The intended audience is a machine, and the data is hidden from consumers - it is "off" the product "page". This private nature gives publishers a lot of freedom to test and experiment.

Here's an official, dry, textbook definition of keywords in publishing:
“Keywords are words or phrases to describe the theme or content of a book. They are assigned by the metadata creator to supplement title, author, description or other consumer facing data.”

While accurate, it leaves out the motivation behind why we use keywords at all.

On the surface, keywords are just a metadata element. But used properly, they can be a powerful discovery mechanism to capture a reader’s experience with a book, in a way that facilitates sharing that experience with others.

Creating effective keywords is an exercise in studying reader psychology and linguistics, requiring empathy and insight into how people communicate about books with each other. If you’re able to think and talk like your audience, you’re more likely to reach them.

Keywords are used to sell all kinds of products online, but creating them is probably toughest for publishers, as books are far more complex and subjectively experienced than other products, like toothbrushes or hair dryers. So figuring out which elements to express can be challenging.

How do search engines use keywords?

Search engines are just computer programs written to find information for us. We type a query, and the engine thumbs through large swathes metadata to decide what books to display.  The richer the metadata, the more search queries the book might match to.

A book with only basic metadata (title and author and so forth) will show up in fewer search results than the same book with 100 or even 50 good keywords. Every keyword you add is an opportunity to widen the search funnel, letting you suggest to the search engine another way consumers can find your book.

Most books are sold online, and most people find books through search (per Amazon). If you can improve a book’s visibility in search, you improve it’s likelihood of selling more copies. A recent study by Recode, a tech news website, found that more shoppers begin their product search on Amazon (55%) than Google (28%).

Tags off-page keywords, metadata, amazon search, books, publishing

A Publisher's Advantage Over Indie Authors

May 23, 2015 Chris Sim

View image | gettyimages.com

When it comes to book discovery and retail search, traditional publishers have two advantages over indie publishers.

More Keywords

The first is the ability to add more keywords to a book. Most independent authors will be able to add 5-7 keywords to their book's metadata. Each keyword (or phrase) provides an opportunity for the book to be matched with more search queries. The more search queries a book matches, the more times it will show up in search results, which of course means an increase in the potential customers who will see the book.

How is this possible?

Each online retailer (such as Amazon), accepts book metadata through different channels, and processes it for the search engine to use. Most independent authors add their data through a website (such as http://kdp.amazon.com). These websites are coded with specific rules about what data can be added and restrict, for example, how many keywords can be entered. (This is necessary to ensure a minimum level of quality in the data, which can impact search results).

Enter ONIX

Publishers, on the other hand, typically send book metadata to retailers in bulk, using an industry file format called ONIX. Under the ONIX standard, the keyword field has no restriction on the number of keywords that can be added to a book.

View image | gettyimages.com

Practically speaking, adding a very large number of keywords will lead to limited discovery benefit, as each online retailer will parse and process a maximum number of keywords.

But retailers almost certainly accept more than 7 keywords. The ONIX standard recommends filling the keyword field with 250 characters. The BISG working group, dedicated to book keyword best practices, however, recommends using 500 characters. This working group comprises members from Amazon and Barnes & Noble, the consumers of this book metadata, who use it in their search engines (we covered this group in our last post). Given this recommendation, and the contributors behind it, it's unlikely the book retailers would restrict keywords to less than 500 characters.

So how does this compare to the 5-7 keywords self-published authors are allowed? 500 characters equates to approximately 80 words, which is at least 27 keywords (phrases of 3 words in length - or more if keywords of 2 words are included). This is almost 4 times as many keywords.

Isn't having too many keywords bad for SEO?

Retail search engines process keywords from metadata in a structured manner (as opposed to web search engines that extract keywords content like web pages), and are unlikely to be subject to keyword dilution, which is the idea that using more keywords reduces an individual keyword's value. Keyword dilution can be a problem for a web page, as the breadth of topics a web page covers is likely to be less than an entire book.

It makes sense to use many keywords to describe the many topics in a book.


Does all this equate to a discovery advantage?

It is highly likely. If you use one keyword, your book will be matched to related search queries about that one topic. If you use 20 or more keywords, your book has 20 opportunities to be matched against different types of search queries, therefore significantly increasing the number of customers likely to see the associated book.

Better Search Data

The second advantage for publishers, who publish on Amazon, is data to help with Amazon SEO. Publishers with a high enough sales volume, will be invited to apply for access to ARA (Amazon Retail Analytics) which provides insight into search queries used by customers on Amazon. Core to any effective SEO strategy is the ability to evaluate and assess different keywords (search terms) for search volume (this is what Google Analytics provides free). ARA provides this data, albeit in a slightly obfuscated value called 'Search Frequency', along with data about conversions. Access to this data improves the efficiency and accuracy of keyword selection, as it allows publishers to determine whether to apply a long or short tail keyword to a book (depending on it's sales rank), and also to assess the type of books that convert the best for each search term.

View image | gettyimages.com

We can speculate why these differences exist, which are likely rooted in the history of traditional publishing and of self-publishing. In the early days of self-publishing, independent authors were much less sophisticated than they are today, as were the service providers to help them publish. It's likely disparities such as these, that provide considerable advantage to traditional publishers, will become less pronounced as the self-publishing industry matures.

Tags publishing, metadata, keywords, amazon, barnes and noble, onix, seo

Why Keywords Are So Important

May 21, 2015 Chris Sim

View image | gettyimages.com

Crafting effective keywords to add to a book's metadata, could be one of the highest return marketing activities to increase online sales potential. This post examines why keywords are so important, and how they affect discovery on Amazon.

Let's break the logic down:
• Amazon is the biggest bookseller in the world.
• Around two thirds of online book sales are made through Amazon.
• Search is how most customers find products on Amazon.
• Keywords directly influence a book's visibility in Amazon's product search.

In Amazon's own words (link requires a seller central login):

Search is the primary way that customers use to locate products on Amazon. Customers search by entering keywords, which are matched against the search terms you enter for a product. Well-chosen search terms increase a product's visibility and sales. The number of views for a product detail page can increase significantly by adding just one additional search term—if it's a relevant and compelling term.

We differentiate between keywords derived from web page text (Google, Bing, etc.) and keywords added to a book's metadata for consumption by a book retailer (Amazon, Barnes & Noble). Web search engines crawl web pages to derive keywords and concepts, to help users find information. Book product search engines consume book metadata (which includes keywords), provided by the publisher or author, and help customers find books to purchase.

As Google's executive chairman, Eric Schmidt lamented:

People don’t think of Amazon as search, but if you are looking for something to buy, you are more often than not looking for it on Amazon.

Why can't the machines just figure it all out?

So why the difference between a web and a book (product) search engine? Why can't Amazon read a book's text to figure out what to index, just like Google crawls a web page? There are two core reasons for this:

1. Human classification beats machine classification, when done properly. People are better at describing books, in terms other people relate to, than machines. The technology exists to understand the topical content of a book (we know, we've built it), but for a product search engine, it's more effective for Amazon to put the burden of describing a book in keywords, onto the author or publisher. The author/publisher, in turn, has a strong incentive to increase their book's discoverability in search.

2. It's easier for Amazon to do. Pretend you're a technical superstar tasked with building a search engine for millions of books. What solution do you think would be easier to build? One where you had to index 5-20 human curated keywords that describe each book, or one where you had to index tens (or hundreds) of thousands of words per book to find out what it's about? Leveraging an incentivised crowd to manually add descriptive terms in a structured format, is a much smarter and technically simpler solution.

Isn't it a search engine, not a discovery engine?

View image | gettyimages.com

It has been said that search is not discovery, but this perspective doesn't consider the complex task search engine's undertake to discern user intent (we've talked about the different user intents when searching before). Let's look at the distinction between book discovery and book search (within the context of a search engine), and how different elements of metadata support different user intents:

Book Search


Searching for a specific book or title supports a customer who has 'discovered' a book through another channel, and is simply visiting a book retailer to purchase the book. In this case, the user intent is obvious, and the implementation is a basic, nuts and bolts 'search' engine. As a publisher or author, you really don't have much to do to optimize for this use case. Your book title and author (contributor) name is specified in the metadata. The engine performs a simple match for these fields to a customer's search query. This is why there is no need to include book title and author name in your keywords.

Book Discovery


Book discovery, in the context of a 'search' engine supports many cases of different user intent, where a customer isn't searching for a specific book. The engine helps the customer discover books that satisfy their query. For example, customers might use a book search engine to discover:
• a new book to read in their favorite genre ('contemporary romance new releases')
• a book to learn about a trending topic ('books about the islamic state')
• a book to solve a problem ('back pain')

The metadata that directly influences book discovery on Amazon search are keywords.

Cases exist where subtitles and category names impact discovery, but keywords are designed for, and have a direct relation to book discovery. Other discovery mechanisms also exist, of course, such as bestseller lists and item-to-item similarity recommendations, but these are often outside of the control of an author/publisher.

Codifying how customers think about books

View image | gettyimages.com

Amazon categories are influenced by the way customers naturally group books together, and how they express these categorizations when searching for books. Book categories are continually refined to adapt to shifts in customers' tastes and collective interests. Books are categorized by manually curated metadata (BISAC or Browse Node - Amazon's equivalent of a category), as well as by analyzing a book's keywords. Many categories need a book to be associated with certain keywords, in order for it to qualify for the category. Analyzing the Science, Fiction and Fantasy category requirements we'll see keywords such as: angels, demons, dragons, vampire, aliens, horror and magic. These are all broad, book discovery terms that are designed to satisfy users looking to find books by search terms other than title and author.

There is a clear link between how customers mentally label and group books, and how they express their intent when trying to find books. Amazon attempts to replicate this organization via it's search engine and associated categorical data. By using the language and terms customers actively use to search for books, it can more accurately answer book queries at scale.

The bulk of the complexity of a successful book search engine, lies not in basic title/author matching, but in deciphering a user's intent when broad terms are used for discovery. Helping a customer find and purchase a book when they're unsure of exactly what they're looking for, is big business.

After all, the 'search experience team' believe it's about "finding, not searching".

Do readers even discover new books through search?

Unless you have access to internal search query and purchase data from a major online retailer, it's not possible to make an absolute assertion one way or the other. So let's consider some visible signals:

The industry believes so

The BISG has created a working group dedicated purely to defining best practices for keywords in book metadata. These keywords (in almost all cases) are curated by a person, to be stored with the rest of the book's metadata, and used by retailers (such as Amazon and Barnes & Noble) to help consumers find books. These are not the keywords that web search engines, such as Google, extract from the content of book descriptions on product pages.

This working group has published a guide for publishers to use when defining keywords, which is available for download (via free registration). A summary, that doesn't require registration, is also available.

The group comprises members from all the publishing service provider heavyweights (Ingram, Bowker, etc.), all big five publishers (plus many others), Library of Congress, Barnes & Noble and also Amazon.

Most large publishers have also allocated in-house resources (of varying expertise) specifically to curating keywords for their books.

Amazon has invested heavily in Search and Sales Business Intelligence

Access to this data is only available to a small number of organizations that sell a lot online, through a product called Amazon Retail Analytics (ARA). It's goal is to help vendors optimize their product listings to sell more, largely through data optimization for search. Here's a screenshot.

ARA provides publishers with data on how often keywords are searched for (volume), click through rates and conversion rates. It has it's limits, but is far more information than most smaller publishers and independent authors have access to.

When considering the investment and focus the publishing industry has dedicated to keywords, which are created for the sole purpose of helping consumers find books - it's challenging to dismiss the vital role they perform in selling books online.

A sales panacea?

Will the perfect keywords alone magically whisk a book to the bestsellers list? No. The fundamentals need to be executed well, which results in a quality, professional product with market demand. Quality can't be faked over the long term, and short term hacks won't lead to sustainable ongoing sales.

Effective keywords increase a book's chance of being located by the right customer, and help augment success achieved through other marketing channels. While keywords can increase a book's exposure, whether a customer discovers a quality book or not, will ultimately be represented by unit sales and reviews.

Conclusion

We've analyzed how keywords work and why they're important - which is to help sell more books in the marketplace where most books are sold. The industry acknowledges the importance of this correlation, as evidenced by its focus and investment in keyword standardization and dedication of resources (at publisher and retailer level). Yet most authors and publishers don't create effective keywords for their books or update them very often. Compared to the effort and resources involved in publishing a title, a well-implemented keyword strategy can be one of the highest ROI marketing activities for a book. In many cases, this represents a strong, currently missed, opportunity for increased book discovery.

Sign-up for Author Checkpoint and find keywords for any book.

Tags amazon, keywords, book discovery, metadata

blog

Mostly AI and Books

Blog
The Emotional Frontier: How AI is Revolutionizing the Way We Express and Understand Sentiment Through the Written Word
about a year ago

Does your book have the right keywords? Buy here