March 25, 2023

How Airbnb leverages ML to derive visitor curiosity from unstructured textual content information and supply customized suggestions to Hosts

By: Joy Jing and Jing Xia

At Airbnb, we endeavor to construct a world the place anybody can belong anyplace. We attempt to know what our company care about and match them with Hosts who can present what they’re on the lookout for. What higher supply for visitor preferences than the company themselves?

We constructed a system referred to as the Attribute Prioritization System (APS) to hearken to our company’ wants in a house: What are they requesting in messages to Hosts? What are they commenting on in critiques? What are widespread requests when calling buyer assist? And the way does it differ by the house’s location, property sort, worth, in addition to company’ journey wants?

With this customized understanding of what house facilities, services, and site options (i.e. “house attributes”) matter most to our company, we advise Hosts on which house attributes to amass, merchandize, and confirm. We are able to additionally show to company the house attributes which can be most related to their vacation spot and desires.

We do that via a scalable, platformized, and data-driven engineering system. This weblog put up describes the science and engineering behind the system.

What do company care about?

First, to find out what issues most to our company in a house, we take a look at what company request, touch upon, and make contact with buyer assist about probably the most. Are they asking a Host whether or not they have wifi, free parking, a non-public scorching tub, or entry to the seashore?

To parse this unstructured information at scale, Airbnb constructed LATEX (Listing ATtribute EXtraction), a machine studying system that may extract house attributes from unstructured textual content information like visitor messages and critiques, buyer assist tickets, and itemizing descriptions. LATEX accomplishes this in two steps:

The named entity recognition (NER) module makes use of textCNN (convolutional neural network for text) and is educated and tremendous tuned on human labeled textual content information from varied information sources inside Airbnb. Within the coaching dataset, we label every phrase that falls into the next 5 classes: Amenity, Exercise, Occasion, Particular POI (i.e. “Lake Tahoe”), or generic POI (i.e. “put up workplace”).

The entity mapping module makes use of an unsupervised studying method to map these phrases to house attributes. To attain this, we compute the cosine distance between the candidate phrase and the attribute label within the fine-tuned phrase embedding area. We take into account the closest mapping to be the referenced attribute, and may calculate a confidence rating for the mapping.

We then calculate how continuously an entity is referenced in every textual content supply (i.e. messages, critiques, customer support tickets), and combination the normalized frequency throughout textual content sources. House attributes with many mentions are thought of extra necessary.

With this technique, we’re in a position to acquire perception into what company are fascinated about, even highlighting new entities that we might not but assist. The scalable engineering system additionally permits us to enhance the mannequin by onboarding extra information sources and languages.

An example of a listing’s description with keywords highlighted and labeled by the Latex NER model.
An instance of a list’s description with key phrases highlighted and labeled by the Latex NER mannequin.

What do company care about for several types of properties?

What company search for in a mountain cabin is totally different from an city condo. Gaining a extra full understanding of company’ wants in an Airbnb house allows us to supply extra customized steering to Hosts.

To attain this, we calculate a singular rating of attributes for every house. Based mostly on the traits of a house–location, property sort, capability, luxurious stage, and so on–we predict how continuously every attribute will probably be talked about in messages, critiques, and customer support tickets. We then use these predicted frequencies to calculate a personalized significance rating that’s used to rank all potential attributes of a house.

For instance, allow us to take into account a mountain cabin that may host six individuals with a median each day worth of $50. In figuring out what’s most necessary for potential company, we study from what’s most talked about for different properties that share these identical traits. The end result: scorching tub, hearth pit, lake view, mountain view, grill, and kayak. In distinction, what’s necessary for an city condo are: parking, eating places, grocery shops, and subway stations.

Picture: An instance picture of a mountain cabin house
An example of home attributes ranked for a mountain cabin vs an urban apartment.
An instance of house attributes ranked for a mountain cabin vs an city condo.
Picture: An instance of an city condo house

We might immediately combination the frequency of key phrase utilization amongst related properties. However this method would run into points at scale; the cardinality of our house segments might develop exponentially massive, with sparse information in very distinctive segments. As an alternative, we constructed an inference mannequin that makes use of the uncooked key phrase frequency information to deduce the anticipated frequency for a phase. This inference method is scalable as we use finer and extra dimensions to characterize our properties. This enables us to assist our Hosts to greatest spotlight their distinctive and numerous assortment of properties.

How can company’ preferences assist Hosts enhance?

Now that we’ve got a granular understanding of what company need, we will help Hosts showcase what company are on the lookout for by:

However to make these suggestions related, it’s not sufficient to know what company need. We additionally must be certain about what’s already within the house. This seems to be trickier than asking the Host as a result of 800+ house attributes we accumulate. Most Hosts aren’t in a position to instantly and precisely add the entire attributes their house has, particularly since facilities like a crib imply various things to totally different individuals. To fill in a number of the gaps, we leverage company suggestions for facilities and services they’ve seen or used. As well as, some house attributes can be found from reliable third events, comparable to actual property or geolocation databases that may present sq. footage, bed room depend, or if the house is overlooking a lake or seashore. We’re in a position to construct a really full image of a house by leveraging information from our Hosts, company, and reliable third events.

We make the most of a number of totally different fashions, together with a Bayesian inference mannequin that will increase in confidence as extra company affirm that the house has an attribute. We additionally leverage a supervised neural community WiDeText machine studying mannequin that makes use of options in regards to the house to foretell the chance that the subsequent visitor will affirm the attribute’s existence.

Along with our estimate of how necessary sure house attributes are for a house, and the chance that the house attribute already exists or wants clarification, we’re in a position to give customized and related suggestions to Hosts on what to amass, merchandize, and make clear when selling their house on Airbnb.

Cards shown to Hosts to better promote their listings.
Playing cards proven to Hosts to higher promote their listings.

What’s subsequent?

That is the primary time we’ve recognized what attributes our company need all the way down to the house stage. What’s necessary varies tremendously primarily based on house location and journey sort.

This full-stack prioritization system has allowed us to offer extra related and customized recommendation to Hosts, to merchandize what company are on the lookout for, and to precisely signify fashionable and contentious attributes. When Hosts precisely describe their properties and spotlight what company care about, company can discover their good trip house extra simply.

We’re presently experimenting with highlighting facilities which can be most necessary for every sort of house (i.e. kayak for mountain cabin, parking for city condo) on the house’s product description web page. We consider we will leverage the data gained to enhance search and to find out which house attributes are most necessary for various classes of properties.

On the Host facet, we’re increasing this prioritization methodology to embody extra suggestions and insights into how Hosts could make their listings much more fascinating. This consists of actions like releasing up fashionable nights, providing reductions, and adjusting settings. By leveraging unstructured textual content information to assist company join with their good Host and residential, we hope to foster a world the place anybody can belong anyplace.

If this sort of work pursuits you, try a few of our associated positions at Careers at Airbnb!

It takes a village to construct such a strong full-stack platform. Particular because of (alphabetical by final title) Usman Abbasi, Dean Chen, Guillaume Guy, Noah Hendrix, Hongwei Li, Xiao Li, Sara Liu, Qianru Ma, Dan Nguyen, Martin Nguyen, Brennan Polley, Federico Ponte, Jose Rodriguez, Peng Wang, Rongru Yan, Meng Yu, Lu Zhang for his or her contributions, dedication, experience, and thoughtfulness!