US9582766B2 - Clustering query refinements by inferred user intent - Google Patents
Clustering query refinements by inferred user intent Download PDFInfo
- Publication number
- US9582766B2 US9582766B2 US15/075,957 US201615075957A US9582766B2 US 9582766 B2 US9582766 B2 US 9582766B2 US 201615075957 A US201615075957 A US 201615075957A US 9582766 B2 US9582766 B2 US 9582766B2
- Authority
- US
- United States
- Prior art keywords
- query
- refinement
- node
- search
- refinements
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 claims abstract description 53
- 238000000638 solvent extraction Methods 0.000 claims abstract description 9
- 230000007704 transition Effects 0.000 claims description 56
- 239000013598 vector Substances 0.000 claims description 28
- 239000011159 matrix material Substances 0.000 claims description 18
- 230000004044 response Effects 0.000 claims description 15
- 238000012545 processing Methods 0.000 claims description 11
- 238000004590 computer program Methods 0.000 abstract description 15
- 238000004422 calculation algorithm Methods 0.000 description 35
- 238000009826 distribution Methods 0.000 description 22
- 238000005295 random walk Methods 0.000 description 21
- 230000009471 action Effects 0.000 description 12
- 230000006399 behavior Effects 0.000 description 11
- 238000013459 approach Methods 0.000 description 8
- 230000008569 process Effects 0.000 description 8
- YMHOBZXQZVXHBM-UHFFFAOYSA-N 2,5-dimethoxy-4-bromophenethylamine Chemical compound COC1=CC(CCN)=C(OC)C=C1Br YMHOBZXQZVXHBM-UHFFFAOYSA-N 0.000 description 7
- 241000282372 Panthera onca Species 0.000 description 7
- 241001323321 Pluto Species 0.000 description 7
- 241000545067 Venus Species 0.000 description 7
- 230000008901 benefit Effects 0.000 description 6
- 238000009472 formulation Methods 0.000 description 6
- 239000000203 mixture Substances 0.000 description 6
- 238000004891 communication Methods 0.000 description 5
- 230000001143 conditioned effect Effects 0.000 description 5
- 235000009508 confectionery Nutrition 0.000 description 5
- 238000010276 construction Methods 0.000 description 4
- 230000001419 dependent effect Effects 0.000 description 4
- 230000006870 function Effects 0.000 description 4
- 229940050561 matrix product Drugs 0.000 description 4
- 230000001052 transient effect Effects 0.000 description 4
- 238000010521 absorption reaction Methods 0.000 description 3
- 235000019219 chocolate Nutrition 0.000 description 3
- 230000003993 interaction Effects 0.000 description 3
- 241001061257 Emmelichthyidae Species 0.000 description 2
- 241001465754 Metazoa Species 0.000 description 2
- 206010033307 Overweight Diseases 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 238000002474 experimental method Methods 0.000 description 2
- QSHDDOUJBYECFT-UHFFFAOYSA-N mercury Chemical compound [Hg] QSHDDOUJBYECFT-UHFFFAOYSA-N 0.000 description 2
- 229910052753 mercury Inorganic materials 0.000 description 2
- 238000005065 mining Methods 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 238000005192 partition Methods 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- 238000013515 script Methods 0.000 description 2
- 238000000926 separation method Methods 0.000 description 2
- XLYOFNOQVPJJNP-UHFFFAOYSA-N water Substances O XLYOFNOQVPJJNP-UHFFFAOYSA-N 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 230000015572 biosynthetic process Effects 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 229910052729 chemical element Inorganic materials 0.000 description 1
- 238000007621 cluster analysis Methods 0.000 description 1
- 230000001427 coherent effect Effects 0.000 description 1
- 230000000295 complement effect Effects 0.000 description 1
- 238000007796 conventional method Methods 0.000 description 1
- 230000009193 crawling Effects 0.000 description 1
- 238000007418 data mining Methods 0.000 description 1
- 238000001914 filtration Methods 0.000 description 1
- 230000002452 interceptive effect Effects 0.000 description 1
- 239000004973 liquid crystal related substance Substances 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 230000000644 propagated effect Effects 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
- 238000009877 rendering Methods 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 230000001953 sensory effect Effects 0.000 description 1
- 238000011524 similarity measure Methods 0.000 description 1
- 239000000758 substrate Substances 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2453—Query optimisation
- G06F16/24534—Query rewriting; Transformation
- G06F16/24542—Plan optimisation
-
- G06N7/005—
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N7/00—Computing arrangements based on specific mathematical models
- G06N7/01—Probabilistic graphical models, e.g. probabilistic networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/242—Query formulation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/28—Databases characterised by their database models, e.g. relational or object models
- G06F16/284—Relational databases
- G06F16/285—Clustering or classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/901—Indexing; Data structures therefor; Storage structures
- G06F16/9024—Graphs; Linked lists
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/903—Querying
- G06F16/90335—Query processing
-
- G06F17/30389—
-
- G06F17/30463—
-
- G06F17/30598—
-
- G06F17/30958—
-
- G06F17/30979—
Definitions
- the subject matter of this specification relates generally to search systems.
- Web search engines today often complement the search results with a list of related search queries. For example, given the query “mars,” a search engine can return the related queries “mars god of war,” “mars planet,” “venus,” “jupiter,” etc. These related search queries help users to find and explore information related to the original query. Furthermore, because users often provide short queries with little or no context, related queries allow users to further specify their information needs. For example, by clicking on “mars god of war,” a user signals interest in the Roman god as opposed to the planet Mars.
- Related queries are typically mined from the query logs by finding other queries that co-occur in sessions with the original query. Specifically, query refinements, a particular kind of related queries, are obtained by finding queries that are most likely to follow the original query is a user session. For many popular queries, there may be hundreds of related queries mined from the logs using this method. However, given the limited available space on a search results page, search engines typically only choose to display a few of the related queries.
- This specification describes technologies relating to clustering of query refinements of a user search query.
- One of the goals of the technologies described in this specification is to group refinements of a search query into clusters that are likely to represent distinct information needs.
- the clusters computed by the presently proposed algorithm in this specification can be used to improve the selection and placement of the query suggestions presently proposed by a search engine, and can also serve to summarize the different aspects of information relevant to the original user query.
- the problem of clustering query refinements is defined as a graph clustering problem.
- the graph captures the users' behavior with transitions between pairs of queries and queries to documents.
- the graph effectively incorporates both content based similarity as well as session co-occurrence similarity mined from the query logs.
- the graph has a natural probabilistic interpretation as a Markov model.
- the presently proposed algorithm clusters refinements based on their likely underlying user intents by combining document click and session co-occurrence information. Principally, the presently proposed algorithm operates by performing multiple random walks on a Markov graph that approximates user search behavior. The random walks lead to a set of most likely visited documents for each query and the algorithm then clusters queries based on this information.
- the presently proposed algorithm relies on complete-link clustering, but the presently proposed model and algorithm are generally flexible to utilize many different types of clustering techniques.
- a computer-implemented method comprising: identifying a plurality of refinements of a first search query, each refinement being a search query that follows the first search query in at least one session of queries submitted to a search system; identifying a document set of each of the refinements, the document set of a refinement being documents that are search results presented in response to the refinement by the search system and that have received user selections while being presented as the search results; building a representation of a graph for the first search query, wherein the graph has a node for the first search query, a node for each of the refinements, and a node for each document in the document sets of the refinements, and wherein the graph has edges from the first search query node to each of the refinement nodes, edges from the first search query node to each document in the respective document set of the first search query, edges from each refinement to each document in the respective document set of the refinement, and
- the methods further include the actions of: receiving the first search query as a query in a search session; and providing, in a response to the first search query, each of one or more of the refinement clusters as a search suggestion.
- each search suggestion is provided as a selectable user interface element on a graphic user interface.
- each search suggestion is provided as a selectable hyperlink having anchor text matching one of the refinements in the refinement cluster of the search suggestion.
- identifying a plurality of refinements R(q) of a first search query q each refinement r ⁇ R(q) being a search query that follows the first search query q in at least one session of queries submitted to a search system
- identifying a document set D(r) of each of the refinements r the document set of a refinement being the documents d that are search results presented in response to the refinement by the search system and that have received user selection while being presented as the search results
- building a representation of a graph G for the first search query q wherein the graph G has a node for the first search query q, a node for each of the refinements r, and a node for each document d in the document sets of the refinements, and wherein the graph G has edges from the first search query node q to each of the refinement nodes r, edges from the first search query q to each document in
- the methods further include the action of building a transition probability matrix P for the graph G that includes the following elements:
- n s (r i , r j ) is the number of sessions in which r i and r j co-occur
- the methods further include the action of: calculating a visit probability vector for each refinement in the plurality of refinements R(q) from the transition probability matrix P, where each vector has elements representing a probability for each document in the document set D(q) and the document sets of the refinements R(q); and clustering the refinements into refinement clusters by partitioning the visit probability vectors into proper sub sets.
- identifying a plurality of refinements R(q) of a first search query q each refinement r ⁇ R(q) being a search query that follows the first query q in at least one session of queries submitted to a search system; identifying a document set D(r) of each of the refinements r, the document set of a refinement being the documents d that have been presented as search results in response to the refinement by the search system and that have received user selections while being presented as the search results; building a representation of a graph G for the first search query q, wherein the graph G has a node for the first search query q, a node for each of the refinements r, a node for each document d in the document sets of the refinements, and an off-topic node for an off-topic state f, and wherein the graph G has edges from the first search query node q to each of the refinement no
- the methods further include the action of building a transition probability matrix P for the graph G that includes the following elements:
- the methods further include the actions of: calculating a visit probability vector for each refinement in the plurality of refinements R(q) from the transition probability matrix P, where each vector has elements representing a probability for each document in the document set D(q) and the document sets of the refinements R(q); and clustering the refinements into refinement clusters by partitioning the visit probability vectors into proper sub sets.
- a more diverse set of query refinements can be selected using the techniques presently proposed in this specification as compared to the query refinements selected using the conventional techniques referred to in this specification.
- Conventional solutions to selecting related queries rely more on frequency than on diversity.
- the presently proposed clustering method as described in this specification can produce results indicating that for “mars,” while the most popular cluster pertains to names of planets, the second most popular cluster pertains to the Mars chocolate bar, followed by clusters that pertain to facts about the planet Mars, then the Mars rovers, and so on.
- a related query from each cluster can be selected and presented in a search result page.
- a search engine that relies on query frequency in selecting query refinements would not likely present any related queries in the second cluster (e.g., “mars candy” or “mars chocolate bar”) in the top refinements for the query due to the queries' relatively low query frequency as compared to the related queries in the first cluster.
- clustering can be used to improve the placement of related queries on the search results page.
- Related queries are often placed in rows or columns.
- users can better understand the significance and meaning of the query refinements that are presented.
- Such cluster-aware layouts of query refinements have the potential for increasing the number of related queries that can be displayed as the layouts will appear less cluttered and less likely to pose an information overload for users.
- related-query suggestions or query refinements can be improved across user sessions. For example, if a user poses the query “Pluto” after “mars,” it is more likely that the user is interested in the solar system rather than the Disney character Pluto the Dog. Hence, it makes more sense to propose related searches for “pluto” that pertain to planets or facts about planets, rather than Disney characters.
- the clusters provide a summary of the possible diverse interests and information needs that people may have about a given query (as expressed by the queries they pose). For example, our cluster analysis indicates that for “mars,” there are large distinct clusters of user queries about planets in the Solar System and Mars the Roman god, and in addition, there are clusters of refinements about Mars candy bar and a Japanese comic strip. About the planet itself, there are distinct non-trivial clusters about facts about the planet, about the rovers sent by NASA, and about the speculation about life and water on the planet. Such summaries of the information relevant to a query can form the basis for other search-result interfaces, such as mashups that provide topics summaries.
- the presently proposed algorithms can be effectively used to improve query suggestions over conventional approaches that are based on only session or only document click information.
- FIG. 1A illustrates an example graph that models user search behavior.
- FIG. 1B illustrates an example partitioning of the graph illustrated in FIG. 1A .
- FIG. 2 illustrates an example of normalizing transition probabilities.
- FIG. 3 illustrates example random walks through a graph.
- FIG. 4 illustrates adding a transition to an absorbing off-topic state in the graph illustrated in FIG. 1A .
- FIG. 5 illustrates an example search system.
- FIG. 6 illustrates an example method for presenting search suggestions in response to a query.
- FIG. 7 illustrates an example graphical user interface that presents search results and search suggestions in response to a query.
- the two queries “venus” and “jupiter” are unlikely to retrieve any search results in common with each other or with the query “mars.” Because the queries “venus,” “jupiter,” and “mars” have few search results in common, the common search results that have also received user selections (e.g., user clicks) are even rarer. Consequently, these three queries are not likely to be clustered together based on the above basic query clustering techniques, even though they do reflect the same user intent of researching planets.
- a second approach to query clustering would be to use query session logs to cluster query refinements. Specifically, two refinements could be clustered together if they co-occur in sessions that have similar queries.
- This approach is pursued by B. M. Fonseca, P. Golgher, B. Pôssas, B. Ribeiro-Neto, and N. Ziviani, Concept - based interactive query expansion ( in CIKM ' 05 : Proceedings of the 14 th ACM international conference on information and knowledge management , pages 696-703, New York, N.Y., 2005), where given an initial query q, related queries q′ co-occurring in sessions are grouped into clusters. Users are then asked to specify the cluster that best describes their information need.
- This approach is effective for clustering queries that are unrelated content-wise, such as “venus” and “jupiter.”
- session logs can be sparse (in comparison to document clicks), especially for infrequent queries, making it hard to infer statistically significant query relationships.
- the graph includes nodes for related queries and nodes for documents that users navigate to from these queries.
- the graph captures the users' behavior as a Markov model over possible transitions between pairs of queries and between queries and documents.
- clusters in the graph can capture the different users' intent that is relevant to a particular query.
- the graph captures the intuition of content-based similarity with edges from queries to documents, and also captures similarity based on session co-occurrence with edges connecting pairs of queries.
- click-through information e.g., recorded user selections of search results
- session information are combined in a coherent fashion in the model used for query clustering.
- Query clustering is pursued in the context of a single original user query rather than in the global context of many queries or a query collection.
- a Markov model is employed to capture the users' behavior in manners that are different from previous random walk models for query log mining. For example, (1) it combines both the click-through and session co-occurrence information; and (2) it is an absorbing Markov model which makes limiting distributions of a random walk be dependent on the start node (which is a feature exploited for the current problem definition).
- any clustering algorithm can be applied to the graph to find the user intents.
- search session data e.g., search queries and user selections of search results
- search query logs are completely anonymized so that the data cannot be associated with the users.
- each query can be associated with a unique 128-bit number that is not associated with any user.
- Queries in the logs are divided into sessions. In general, a session is a period during which a user submits queries.
- a session can be measured in a number of ways including, for example, by a specified period of time (for example, thirty minutes), by a specified number of queries (for example, fifteen queries), until a specified period of inactivity (for example, ten minutes without submitting a query), while a user is logged-in to a system, or while a user submits queries that relate to similar topics.
- the goal is to cluster query refinements.
- a query r is said to be a refinement of a query q, if r follows q in a session.
- R(q) denotes all the refinements of a query q.
- q does not need to be the first query in a session in order for r to be a refinement of q.
- the model described herein is agnostic to how query refinements are collected, but Definition 1 is provided because it is the one of the most common methods for determining refinements. In some implementations, other methods of collecting query refinements R(q) of a query q are possible.
- Definition 2 The set of co-occurring queries of q i , denoted by Q(q i ), is the set of queries q i , such that q i and q j occur in the same session.
- query log specifies which documents were selected (e.g., clicked) as a search result presented in response to a query:
- Definition 3 The document set of a query q i , denoted D(q i ), is the set of documents that users click on while the documents are presented as search results responsive to q i (e.g., search results responsive to q i ).
- D(q i ) is collected from all sessions that include q i .
- User behavior is modeled as a graph. Given the query logs and a query q, the goal is to cluster the query refinements of q into a set of different information needs.
- the intuition underlying the presently proposed clustering algorithm is that two refinements are similar if they lead to the same content within a typical search session.
- the presently proposed graph model is based on the following typical user behavior.
- V the query q
- R refinements
- R the query refinements
- E the set of edges
- edges (q, r) i.e., the edges from the query q to each of its refinements
- edges (r, d) i.e., the edges from r to each of its clicked result documents
- edges (r, r i ) i.e., the edges connecting co-occurring queries).
- a respective node e.g., nodes r 1 , . . . , r 7
- a respective node for each document e.g., documents d 1 , . . . , d 8
- an edge from the node representing the query q to the node representing each of q's query refinements e.g., the edges (q, r 1 ), . . .
- an edge from the node representing each query to the node representing each of the query's clicked documents e.g., the edges (r 1 , d 1 ), (r 1 , d 2 ), (r 2 , d 1 ), (r 2 , d 3 ), (r 3 , d 2 ), (r 3 , d 3 ), (r 3 , d 4 ), (r 4 , d 4 ), (r 4 , d 4 ), (r 4 , d 5 ), (r 5 , d 5 ), (r 5 , d 6 ), (r 6 , d 7 ), (r 6 , d 8 ), (r 7 , d 7 ), and (r 7 , d 8 )).
- edges (r 1 , r 2 ), (r 1 , r 3 ), (r 4 , r 5 ), (r 2 , r 6 ), and (r 6 , r 7 ) there is an edge between each pair of queries that are co-occurring queries.
- weights are assigned to edges in G based on information available in the query logs.
- edges of the form (r, d) the edges are assigned weights proportional to the probability of user clicking on d as a search result of r.
- edges of the form (r, r i ) where r is either the query q or one of its refinements, the edges are assigned weights proportional to the probability of r and r i co-occurring in a search session.
- the probabilities are not conditioned on sessions starting with the query q. This is permissible because it is unlikely that such conditioned probabilities can be obtained reliably, given the sparsity of such information in the query logs. However, if such conditional probabilities can be obtained, they could be used, in addition or alternatively.
- edges from documents to queries there are no edges from documents to queries (e.g., as shown in FIG. 1A ). This is not a limitation, because if a user were to proceed as follows: q ⁇ r 1 ⁇ d ⁇ r 2 . . . , then r 2 is also, by definition, a refinement of q; and all subsequent user interactions after r 2 will also be accounted for in G.
- the above intuition is captured by the hypothesis that the set of documents reachable from a query refinement r i , after starting from the query q, are representative of the user's underlying intent in selecting r i after q.
- the query clustering problem can hence be formulated as clustering by intent. Specifically, if there is an edge (r i , r j ) in the graph G, or if both r i and r j have edges to the same documents, then users in s typical session will reach the same documents as users in r j 's typical session.
- FIG. 1B illustrates a partition of the query nodes (e.g., the nodes r 1 , . . . , r 7 ) into 3 clusters (e.g., the clusters 202 a , 202 b , and 202 c ).
- the (q, r i ) edges are excluded since the goal here is to cluster the refinements.
- clustering objective can be defined as follows:
- the goal here is to cluster only the nodes corresponding to query refinements, whereas the document nodes are not included in the clusters. This would result in a more efficient algorithm than if all nodes are included in the clusters.
- the graph formulation of the problem presented herein can also be mapped to the correlation clustering described in N. Bansal, A. Blum, and S. Chawla, Correlation clustering, Machine Learning, 56(1 3):89 113, 2004.
- the mapping is very similar to the reduction to the min k-cut.
- Correlation clustering motivated by document clustering, gives an advantage over the classical min k-cut in that it does not require a pre-specified k number of target clusters.
- the solution we propose herein works for the correlation clustering formulation as well as it does for the min k-cut formulation.
- a key insight underlying the presently proposed clustering technique is that the graph of transitions, G, has a very natural interpretation as a Markov model, describing transition probabilities between states. Furthermore, the absorbing states of this Markov model are the nodes in G corresponding to clicked documents. As a result, each of the query-refinement nodes in G can be characterized by the vector of probabilities of reaching each of the absorbing states. Clustering the query refinement nodes based on these vectors is consistent with the intuition of clustering by user intent as described above. Furthermore, as an added benefit, the computational complexity of clustering is significantly lower than that of min k-cut.
- each node can be viewed as a Markov state with the edge weights being the transition probabilities between the corresponding states.
- the outgoing edge weights from each node can be normalized to sum up to 1. This can be done by defining a parameter ⁇ , the document escape probability, which represents the probability that there will be a transition from a query-refinement node to a document node. Consequently, with probability 1 ⁇ , there will be transition from a refinement node to another refinement node (as illustrated in FIG. 2 ). Although the value of 8 does not significantly affect the results, the usefulness of ⁇ in our model will be discussed in more details later in this specification.
- n s (r i , r j ) does not need to be restricted to only the sessions where r i follows r j , and instead, all sessions in which they co-occur can be considered.
- the transitions between refinement nodes can be restricted to be only those in the context of the original query q, i.e., its set of refinements R. Transitions to nodes that are not in R will be discussed below.
- each refinement node is a transient state (since the transitions between refinements are bi-directional), while each document node is an absorbing state (only self-transitions). Moreover, since at least one absorbing state is accessible from each of the transient states, the Markov chain is said to be absorbing. In other words, if one were to perform an infinite-step random walk on this Markov chain (starting at any state), one will always escape the refinement states and be absorbed by one of the document states.
- the probability of absorption in a given absorbing state depends on the initial state. As shown in FIG. 3 , if a random walker starts at r 3 , she is likely to satisfy her search intent at the documents close to r 3 , i.e., d 4 , d 2 , d 1 , and d 3 (e.g., as shown in the upper graph in FIG. 3 ). On the other hand, if she starts at r 7 , she is likely to satisfy her search intent at the documents close to r 7 , i.e., d 8 , d 7 , d 3 , and d 1 (e.g., as shown in the lower graph in FIG. 3 ). Indeed, the limiting distribution is highly dependent on the starting node. In the description below, the term “limiting distribution” and “absorption distribution” will be used interchangeably. Because the Markov chain is absorbing, the two distributions are equivalent.
- the fact that the limiting distribution is conditioned on the start node can be used to determine which documents are most descriptive of a query. Specifically, a random walk can be performed starting from each of the refinements r i of the query q, to obtain the specific limiting distribution vector ⁇ right arrow over (l) ⁇ i for the refinement r i . Each entry in ⁇ right arrow over (l) ⁇ i will correspond to a document node and equal the probability of reaching that document at the end of an infinite random walk starting from r i . Then to measure the similarity between two refinements ri and rj, their corresponding limiting distribution vectors ⁇ right arrow over (l) ⁇ i and ⁇ right arrow over (l) ⁇ j can be compared.
- the above process allows refinements to be clustered as points in some n-dimensional space and can be used to determine which refinements are likely to represent the same user intent.
- Algorithm 1 An example pseudo-code of the algorithm is shown in Algorithm 1 below.
- the inputs are the graph G(q) constructed as described earlier and the number of desired clusters k.
- the transition matrix is initialized as already described. Each of the other steps of the algorithm is described below.
- Algorithm 1 clusterRefinementsByIntent(G(q), k, ⁇ , n)
- this random walk clustering formulation and the min k-cut formulation are explained as follows. Recall the goal of minimizing the edge weights between the clusters in the random walk clustering formation. By performing a random walk from each of the refinements, nodes that are in the vicinity of the refinements can be discovered and the probabilities that these nodes are likely to be visited can be determined. However, it is not the topology of the graph which influences the clustering decisions most, but the transition probabilities. These probabilities effectively enforce the objectives of minimizing the weight between the clusters. To see why this is the case, consider the graph in FIG. 3 again.
- ⁇ r 1 ,r 2 ,r 3 ⁇ may either get merged with ⁇ r 6 , r 7 ⁇ or stay separate.
- the random walk by pushing the probability mass down the edges with more weight, forces refinements connected by high-weight edges to be clustered together. This way, edges with high weight end up inside of the clusters and edges with low weight end up between the clusters, thus minimizing the cost function in Problem Statement 1 (Formula 1).
- the matrix product P ⁇ P is such that its [i,j] entry will be the 2-step transition probability from state i to state j.
- P n [i, j] has the n-step transition probability from i to j.
- P n approaches the limiting distribution.
- the lim n ⁇ P n [i, j] entry is equal to the limiting fraction of time the random walker spends in state j when starting from state i.
- the visit probability i.e., the probability of visiting an absorbing state d within an n-step random walk starting at some state r i , is equal to the probability of transitioning from state r i to state din n steps.
- Proposition 1 Given the transition matrix P, computed for the presently proposed Markov model, the row of lim n ⁇ P n corresponding to a node v is the visit probability distribution vector of the random walk started at v.
- the visit probability distribution for a refinement r i captures the probability that a user will eventually reach different documents during a search session that starts with q and includes r i . It is thus representative of the hypothesized user intent underlying r i in the context of q.
- a random walk calculation can be performed via a simple matrix product shown in Algorithm 2 below.
- Algorithm 2 the limiting distribution is approximated by P n for a suitably chosen n (described below).
- Algorithm 2 calculateLimitingDistribution(P, n)
- Algorithm 3 extractLimitDistributions(R, P n ) for r i ⁇ R do ⁇ right arrow over (l) ⁇ i ⁇ vector of size
- the parameter ⁇ in our Markov model controls how likely a user is to click on a document from any refinement node. In a sense, this parameter controls how “exploratory” the user is believed to be, and in practice it can be used to control the convergence rate of the above proposed algorithms.
- a high number of iterations (parameter n of the algorithm), just like low value of ⁇ , allows the discovery of many remote refinements, thus suggesting an exploratory nature of user behavior.
- a low number of iterations and respectively high value of ⁇ makes random walks shorter, indicating more focused user intents.
- lower numbers of iterations e.g., 3-5
- higher values of e.g., 0.5-0.7
- a new state f can be added to the proposed Markov model described above to signify the user's transition off topic (e.g., see the added node f in the graph 402 in FIG. 4 ). Then, from each of the query refinements r i ⁇ R(q), an off-topic transition (r i , j) can be added. Once the user switches off-topic, it is assumed that she does not come back to the original topic, thus f is modeled as an absorbing state with no outgoing edges.
- the transition probability is determined for (r i , f).
- the original model described above without the off-topic drift is essentially equivalent to that probability being zero.
- One option would be to set the probability to some constant, and then respectively normalize other transition probabilities.
- using a constant can be inaccurate.
- the varying off-topic drift plays an interesting role in the clustering.
- the entire transition probability mass of r can be pushed by the random walk to other nodes that do not quite merit it.
- r i has a high off-topic probability
- r j only one on-topic transition to a different refinement r j .
- Ignoring the off-topic transitions can imply that all the probability mass is transferred from r i to r j .
- This effect can be exacerbated by the probabilities being pushed transitively along the Markov chain to other refinement nodes, thereby rendering the clustering less effective.
- One of the advantages of the presently proposed models is that they can employ any algorithm for clustering Euclidean vectors (e.g., hierarchical, density based, partitional, or graph based).
- Algorithm 4 shows an example clustering algorithm.
- sim(r i , r j ) is the cosine similarity between the limiting distributions of refinements r i and r j .
- the above algorithm works by picking the pair of current clusters that have the highest value for completelink similarity and merging them. This proceeds until only the required number of clusters remain.
- Cosine similarity is effective for comparing two discrete probability distributions. Meanwhile, complete-link clustering avoids chaining and provides guarantees on similarity within each cluster.
- Algorithm 4 clusterVectors(G, L, k) R ⁇ 0 for r i ⁇ R do R ⁇ R ⁇ ⁇ r i ⁇ end for while
- the size of the vectors can be limited as follows. First, a limited number of query refinements are considered (e.g., up to 80 query refinements can be considered). Second, the number of document states off of each refinement can be limited (e.g., to 15 documents per refinement). Together, these limiting conditions can lead to an upper bound of 1200 on the dimensionality of limiting distribution vectors. These limitations are justified in practice because refinements beyond the top 80 usually have probability mass of less than 0.002, and document clicks beyond the top 15 are rare and most of the time statistically insignificant. Hence, such filtering not only simplifies the clustering, but also eliminates potential “noise.”
- Some of the user queries have ambiguous related queries that may bias clustering as well as the limiting distributions themselves.
- the query “kobe bryant.” one of its related queries is “kobe.” Because “kobe” is synonymous to “kobe bryant”, the query “kobe” may co-occur with many related queries R of “kobe bryant”.
- every related query of “kobe bryant” is transitively related to every query in R.
- a random walk is performed from any related query in R, all documents of all R queries can be reached within two steps.
- different queries content-wise now may have similar limiting distributions which will bias the clustering.
- jaguar This problem is not specific to synonyms.
- Some refinement-like related queries may also be ambiguous. For example, consider the query “jaguar,” which has become a canonical example for disambiguation. This query may refer to an animal, a car brand, or an Apple operating system. Now suppose “jaguar” has a related query “jaguar facts.” Although “jaguar facts” is unlikely to refer to the operating system, it is still ambiguous. For example, it could refer to the car brand as well as to the animal. Accordingly, “jaguar facts” may co-occur with almost every other related query of “jaguar” biasing the random walk and, effectively, biasing the clustering.
- a string edit distance e.g., Levenshtein distance
- Levenshtein distance a string edit distance heuristic
- similarity thresholds can be used to control the clustering.
- a custom similarity threshold as a function of the query parameters can be used (indeed, “jaguar” may have more user intents than “kobe bryant stats”).
- the clusters can be used, for example, by a search engine, to provide better query refinement suggestions for users.
- FIG. 5 illustrates an example search system 514 for providing search results relevant to submitted queries as can be implemented in an Internet, an intranet, or other client and server environment.
- the search system 514 is an example information retrieval system.
- a user 502 interacts with the search system 514 through a client device 504 .
- the client device 504 can be or include a computer (e.g., a personal computer, a mobile phone, etc.) coupled to the search system 514 through a wired or wireless local area network (LAN) or wide area network (WAN), e.g., the Internet.
- LAN local area network
- WAN wide area network
- the search system 514 and the client device 504 are both implemented in the same machine.
- a user can install a desktop search application on the client device 504 .
- the client device 504 will generally include a random access memory (RAM) 506 and a processor 508 .
- RAM random access memory
- a user 502 submits a query 510 to a search engine 530 within a search system 514 .
- the query 510 is transmitted through a network to the search system 514 .
- the search system 514 can be implemented as, for example, computer programs running on one or more computers in one or more locations that are coupled to each other through a network.
- the search system 514 includes an index database 522 and a search engine 530 .
- the search system 514 responds to the query 510 by generating search results 528 , which are transmitted through the network to the client device 504 in a form that can be presented to the user 502 (e.g., in a search results web page to be displayed in a web browser running on the client device 204 ).
- the search results web page can also include one or more search suggestions for the query 510 .
- the search engine 530 identifies documents that match the query 510 .
- the search engine 530 will generally include an indexing engine 520 that indexes documents (e.g., web pages, images, multimedia content, or news articles on the Internet) found by the search system 514 , for example, documents found while crawling the Internet, an index database 522 that stores the index information, and a ranking engine 552 (or other software) to rank the documents that match the query 510 .
- the search engine 530 transmits the search results 528 through the network to the client device 504 for presentation to the user 502 .
- the search system can further include a search suggestion engine 560 that identifies and presents search suggestions to the user.
- the search suggestions can be derived from the clustered query refinements for the query, as will be described below with reference to FIG. 6 .
- FIG. 6 illustrates an example method 600 for presenting search suggestions in response to a user-submitted search query.
- the method 600 will be described with reference to a system that performs the method.
- the system can be, for example, the search system 514 described above with reference to FIG. 5 .
- the system receives a query ( 602 ), for example, as described above with reference to FIG. 5 .
- the system obtains clustered query refinements for the query ( 604 ).
- the clustered query refinements are generated on the fly, in response to receiving a query, using the algorithms described above.
- the clustered query refinements are generated in advance using the algorithms described above, and then stored, for example, in a database.
- obtaining the clustered query refinements includes obtaining the refinements from the database.
- the system then presents one or more of the clusters of query refinements as search suggestions ( 606 ).
- the search engine presents each of the clusters as a search suggestion by presenting one or more of the query refinements from one or more of the clusters as search suggestions. An example presentation is described below, with reference to FIG. 7 .
- the one or more clusters include all of the clusters. In other implementations, the one or more clusters are less than all of the clusters. In these implementations, the one or more clusters can be chosen, for example, by selecting the clusters having query refinements with the highest aggregate transition probabilities for the query refinements in the cluster. The transition probabilities can be determined, for example, as described above. Other techniques for selecting the one or more clusters can also be used. For example, the popularity for each query refinement in a cluster can be combined (e.g., summed or averaged), and the clusters having the highest overall probability can be selected. The probability of a query refinement is the number of times users submit queries for the query refinement.
- the one or more query refinements from each cluster can be all query refinements in the cluster, or can be a selected number of query refinements.
- the one or more query refinements can be a top number of query refinements from each cluster, where the top number is all query refinements whose transition probabilities satisfy a threshold, or a predetermined number of query refinements with the highest transition probabilities in each cluster.
- the transition probabilities can be determined, for example, as described above.
- Each search suggestion can be presented as a selectable element in a graphical user interface.
- the search suggestions can be hyperlinks in a search results page.
- the anchor text of the hyperlink can correspond to the text of the search suggestion.
- the search engine presents search results responsive to a query for the search suggestion.
- the user can be presented with additional search suggestions for query refinements from the same cluster.
- the search suggestions from different clusters are visibly separated in the user interface.
- the search suggestions from one or more of the clusters can be presented in a separate column.
- each column includes a selectable user interface element representative of the cluster. When a user selects the user interface element, the user can be provided with additional search suggestions for the cluster.
- FIG. 7 illustrates an example graphical user interface 700 that presents search results 704 in response to the query “mars” 702 , and also presents search suggestions 706 for the query 702 .
- the search results 704 are identified by a search engine, for example, as described above with reference to FIG. 5 .
- a user will not always be satisfied with the search results 704 generated in response to a query.
- Users can be unsatisfied, for example, when the queries they submit are too broad. For example, when a user submits “mars” but is really looking for “the planet Mars,” the search engine may identify search results that are relevant to other uses of the word “mars” but are not relevant to the planet. Users can also be unsatisfied, for example, when the queries they submit use non-standard or incorrect terminology. Other reasons for user dissatisfaction are also possible.
- the user interface 700 includes a group of search suggestions 706 , e.g., related queries that a user may find have responsive search results that are more relevant to the user's interests.
- the search suggestions 706 are divided into columns according to their corresponding clusters. For example, column 708 corresponds to the cluster for Mars the Roman god, column 710 corresponds to the cluster for Mars the planet, and column 712 corresponds to the cluster for Mars the candy bar.
- the search engine When a user selects one of the search suggestions 706 , the search engine presents a new set of search results responsive to the search suggestion in the user interface 700 and may optionally present a new group of search suggestions for the selected search suggestion.
- Embodiments of the subject matter and the functional operations described in this specification can be implemented in digital electronic circuitry, or in computer software, firmware, or hardware, including the structures disclosed in this specification and their structural equivalents, or in combinations of one or more of them.
- Embodiments of the subject matter described in this specification can be implemented as one or more computer programs, i.e., one or more modules of computer program instructions encoded on a computer storage medium for execution by, or to control the operation of, data processing apparatus.
- the program instructions can be encoded on a propagated signal that is an artificially generated signal, e.g., a machine-generated electrical, optical, or electromagnetic signal, that is generated to encode information for transmission to suitable receiver apparatus for execution by a data processing apparatus.
- the computer storage medium can be a machine-readable storage device, a machine-readable storage substrate, a random or serial access memory device, or a combination of one or more of them.
- data processing apparatus encompasses all kinds of apparatus, devices, and machines for processing data, including by way of example a programmable processor, a computer, or multiple processors or computers.
- the apparatus can include special purpose logic circuitry, e.g., an FPGA (field programmable gate array) or an ASIC (application-specific integrated circuit).
- the apparatus can also include, in addition to hardware, code that creates an execution environment for the computer program in question, e.g., code that constitutes processor firmware, a protocol stack, a database management system, an operating system, or a combination of one or more of them.
- a computer program (also known as a program, software, software application, script, or code) can be written in any form of programming language, including compiled or interpreted languages, or declarative or procedural languages, and it can be deployed in any form, including as a stand-alone program or as a module, component, subroutine, or other unit suitable for use in a computing environment.
- a computer program may, but need not, correspond to a file in a file system.
- a program can be stored in a portion of a file that holds other programs or data (e.g., one or more scripts stored in a markup language document), in a single file dedicated to the program in question, or in multiple coordinated files (e.g., files that store one or more modules, sub-programs, or portions of code).
- a computer program can be deployed to be executed on one computer or on multiple computers that are located at one site or distributed across multiple sites and interconnected by a communication network.
- the processes and logic flows described in this specification can be performed by one or more programmable processors executing one or more computer programs to perform functions by operating on input data and generating output.
- the processes and logic flows can also be performed by, and apparatus can also be implemented as, special purpose logic circuitry, e.g., an FPGA (field programmable gate array) or an ASIC (application-specific integrated circuit).
- processors suitable for the execution of a computer program include, by way of example, both general and special purpose microprocessors, and any one or more processors of any kind of digital computer.
- a processor will receive instructions and data from a read-only memory or a random access memory or both.
- the essential elements of a computer are a processor for performing or executing instructions and one or more memory devices for storing instructions and data.
- a computer will also include, or be operatively coupled to receive data from or transfer data to, or both, one or more mass storage devices for storing data, e.g., magnetic, magneto-optical disks, or optical disks.
- mass storage devices for storing data, e.g., magnetic, magneto-optical disks, or optical disks.
- a computer need not have such devices.
- a computer can be embedded in another device, e.g., a mobile telephone, a personal digital assistant (PDA), a mobile audio or video player, a game console, a Global Positioning System (GPS) receiver, or a portable storage device (e.g., a universal serial bus (USB) flash drive), to name just a few.
- PDA personal digital assistant
- GPS Global Positioning System
- USB universal serial bus
- Computer-readable media suitable for storing computer program instructions and data include all forms of non-volatile memory, media and memory devices, including by way of example semiconductor memory devices, e.g., EPROM, EEPROM, and flash memory devices; magnetic disks, e.g., internal hard disks or removable disks; magneto-optical disks; and CD-ROM and DVD-ROM disks.
- semiconductor memory devices e.g., EPROM, EEPROM, and flash memory devices
- magnetic disks e.g., internal hard disks or removable disks
- magneto-optical disks e.g., CD-ROM and DVD-ROM disks.
- the processor and the memory can be supplemented by, or incorporated in, special purpose logic circuitry.
- a computer having a display device, e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor, for displaying information to the user and a keyboard and a pointing device, e.g., a mouse or a trackball, by which the user can provide input to the computer.
- a display device e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor
- keyboard and a pointing device e.g., a mouse or a trackball
- Other kinds of devices can be used to provide for interaction with a user as well; for example, feedback provided to the user can be any form of sensory feedback, e.g., visual feedback, auditory feedback, or tactile feedback; and input from the user can be received in any form, including acoustic, speech, or tactile input.
- a computer can interact with a user by sending documents to and receiving documents from a device that is used by the user; for example, by sending web pages to a
- Embodiments of the subject matter described in this specification can be implemented in a computing system that includes a back-end component, e.g., as a data server, or that includes a middleware component, e.g., an application server, or that includes a front-end component, e.g., a client computer having a graphical user interface or a Web browser through which a user can interact with an implementation of the subject matter described in this specification, or any combination of one or more such back-end, middleware, or front-end components.
- the components of the system can be interconnected by any form or medium of digital data communication, e.g., a communication network. Examples of communication networks include a local area network (“LAN”) and a wide area network (“WAN”), e.g., the Internet.
- LAN local area network
- WAN wide area network
- the computing system can include clients and servers.
- a client and server are generally remote from each other and typically interact through a communication network.
- the relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Computational Linguistics (AREA)
- Software Systems (AREA)
- Mathematical Physics (AREA)
- Operations Research (AREA)
- Computational Mathematics (AREA)
- Computing Systems (AREA)
- Pure & Applied Mathematics (AREA)
- Mathematical Optimization (AREA)
- Mathematical Analysis (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Algebra (AREA)
- Probability & Statistics with Applications (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
Description
P[d,d]=1,
where ε is a numerical parameter between 0 and 1.
P[d,d]=1,
where ε is a numerical parameter between 0 and 1.
where 1{c} is an indicator variable equal to 1 if condition c is true and 0 otherwise; and where w(ri, rj) is the weight for the edge between ri and rj, and w(ri, d) is the weight for the edge between ri and d.
-
- for each (ri, d), where dεD(ri), and nd(d|ri) is the number of times a user clicks on the document d, a result of the query ri, then:
-
- for each (ri, rj), where rj is a refinement of both ri and q (i.e., riεR(q)∩R(ri)), and ns(ri, rj) is the number of sessions in which ri and rj co-occur, then:
-
- for each document d (all of which are terminal in G), self-transitions are added:
P[d,d]=1 (Formula 4)
- for each document d (all of which are terminal in G), self-transitions are added:
-
- P←initializeTransitionMatrix(G(q), ε)
- P′←calculateLimitingDistributions(P, n)
- L←extractAbsorptionDistributions(G(q), P′)
- R←clusterVectors(L, k)
- return R
-
- P′←Pn
- return P′
Algorithm 3: extractLimitDistributions(R, Pn) |
for ri ∈ R do | |||
{right arrow over (l)}i ← vector of size|∪ri ∈R(q) D(ri)| | |||
for d ∈ ∪r |
|||
{right arrow over (l)}i[d] = Pn[ri, d] | |||
end for | |||
end for | |||
L = {{right arrow over (l)}l,...{right arrow over (l)}r}, where r =|R(q)| | |||
return L | |||
completelink(R l ,R m)=min sim(r i ,r j) (Formula 7)
where the minimum is taken over all ri in Rl and over all rj in Rm.
Algorithm 4: clusterVectors(G, L, k) |
R ← 0 | |||
for ri ∈ R do | |||
R ← R ∪ {ri} | |||
end for | |||
while |R| > k do | |||
Rl,Rm ← arg max Ri ≠ Rm ∈ R completelink(Rl, Rm) | |||
Rl ← Rl ∪ Rm | |||
R ← R − Rm | |||
end while | |||
return R | |||
-
- Before Initialize TransitionMatrix, select a set of ambiguous related queries A⊂R using a heuristic.
- After Initialize TransitionMatrix, for any related query riεR and an ambiguous query aεA, set w(ri, f)←w(ri, f)+w(ri, a) and w(ri, a)=0 (i.e., remove transitions to ambiguous queries).
- Invoke Cluster Vectors only on R−A.
- Invoke Cluster Vectors again on R∪A.
Claims (11)
P[d,d]=1
P[d,d]=1
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US15/075,957 US9582766B2 (en) | 2009-11-02 | 2016-03-21 | Clustering query refinements by inferred user intent |
Applications Claiming Priority (4)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US25743509P | 2009-11-02 | 2009-11-02 | |
US12/938,205 US8423538B1 (en) | 2009-11-02 | 2010-11-02 | Clustering query refinements by inferred user intent |
US13/854,275 US9323806B2 (en) | 2009-11-02 | 2013-04-01 | Clustering query refinements by inferred user intent |
US15/075,957 US9582766B2 (en) | 2009-11-02 | 2016-03-21 | Clustering query refinements by inferred user intent |
Related Parent Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US13/854,275 Division US9323806B2 (en) | 2009-11-02 | 2013-04-01 | Clustering query refinements by inferred user intent |
Publications (2)
Publication Number | Publication Date |
---|---|
US20160203411A1 US20160203411A1 (en) | 2016-07-14 |
US9582766B2 true US9582766B2 (en) | 2017-02-28 |
Family
ID=48049263
Family Applications (3)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US12/938,205 Expired - Fee Related US8423538B1 (en) | 2009-11-02 | 2010-11-02 | Clustering query refinements by inferred user intent |
US13/854,275 Active US9323806B2 (en) | 2009-11-02 | 2013-04-01 | Clustering query refinements by inferred user intent |
US15/075,957 Active US9582766B2 (en) | 2009-11-02 | 2016-03-21 | Clustering query refinements by inferred user intent |
Family Applications Before (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US12/938,205 Expired - Fee Related US8423538B1 (en) | 2009-11-02 | 2010-11-02 | Clustering query refinements by inferred user intent |
US13/854,275 Active US9323806B2 (en) | 2009-11-02 | 2013-04-01 | Clustering query refinements by inferred user intent |
Country Status (1)
Country | Link |
---|---|
US (3) | US8423538B1 (en) |
Cited By (21)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
TWI687820B (en) * | 2017-10-10 | 2020-03-11 | 香港商阿里巴巴集團服務有限公司 | Random walk, cluster-based random walk method, device and equipment |
US10901971B2 (en) | 2017-10-10 | 2021-01-26 | Advanced New Technologies Co., Ltd. | Random walking and cluster-based random walking method, apparatus and device |
US11256759B1 (en) | 2019-12-23 | 2022-02-22 | Lacework Inc. | Hierarchical graph analysis |
US11637849B1 (en) | 2017-11-27 | 2023-04-25 | Lacework Inc. | Graph-based query composition |
US11770464B1 (en) | 2019-12-23 | 2023-09-26 | Lacework Inc. | Monitoring communications in a containerized environment |
US11792284B1 (en) | 2017-11-27 | 2023-10-17 | Lacework, Inc. | Using data transformations for monitoring a cloud compute environment |
US11831668B1 (en) | 2019-12-23 | 2023-11-28 | Lacework Inc. | Using a logical graph to model activity in a network environment |
US11909752B1 (en) | 2017-11-27 | 2024-02-20 | Lacework, Inc. | Detecting deviations from typical user behavior |
US11954130B1 (en) | 2019-12-23 | 2024-04-09 | Lacework Inc. | Alerting based on pod communication-based logical graph |
US11979422B1 (en) | 2017-11-27 | 2024-05-07 | Lacework, Inc. | Elastic privileges in a secure access service edge |
US12021888B1 (en) | 2017-11-27 | 2024-06-25 | Lacework, Inc. | Cloud infrastructure entitlement management by a data platform |
US12034754B2 (en) | 2017-11-27 | 2024-07-09 | Lacework, Inc. | Using static analysis for vulnerability detection |
US12058160B1 (en) | 2017-11-22 | 2024-08-06 | Lacework, Inc. | Generating computer code for remediating detected events |
US12095879B1 (en) | 2017-11-27 | 2024-09-17 | Lacework, Inc. | Identifying encountered and unencountered conditions in software applications |
US12095794B1 (en) | 2017-11-27 | 2024-09-17 | Lacework, Inc. | Universal cloud data ingestion for stream processing |
US12095796B1 (en) | 2017-11-27 | 2024-09-17 | Lacework, Inc. | Instruction-level threat assessment |
US12126643B1 (en) | 2017-11-27 | 2024-10-22 | Fortinet, Inc. | Leveraging generative artificial intelligence (‘AI’) for securing a monitored deployment |
US12126695B1 (en) | 2017-11-27 | 2024-10-22 | Fortinet, Inc. | Enhancing security of a cloud deployment based on learnings from other cloud deployments |
US12130878B1 (en) | 2017-11-27 | 2024-10-29 | Fortinet, Inc. | Deduplication of monitored communications data in a cloud environment |
US12267345B1 (en) | 2017-11-27 | 2025-04-01 | Fortinet, Inc. | Using user feedback for attack path analysis in an anomaly detection framework |
US12309185B1 (en) | 2024-01-18 | 2025-05-20 | Fortinet, Inc. | Architecture for a generative artificial intelligence (AI)-enabled assistant |
Families Citing this family (43)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8515975B1 (en) * | 2009-12-07 | 2013-08-20 | Google Inc. | Search entity transition matrix and applications of the transition matrix |
US10204163B2 (en) * | 2010-04-19 | 2019-02-12 | Microsoft Technology Licensing, Llc | Active prediction of diverse search intent based upon user browsing behavior |
US8868591B1 (en) * | 2011-06-22 | 2014-10-21 | Google Inc. | Modifying a user query to improve the results |
EP2754073A1 (en) * | 2011-09-08 | 2014-07-16 | Google, Inc. | Exploring information by topic |
US9405856B2 (en) * | 2011-12-30 | 2016-08-02 | Microsoft Technology Licensing, Llc | Task-oriented query-completion suggestions with shortcuts |
US20140047089A1 (en) * | 2012-08-10 | 2014-02-13 | International Business Machines Corporation | System and method for supervised network clustering |
US9602623B2 (en) * | 2012-09-11 | 2017-03-21 | Nokia Corporation | Method and apparatus for caching local mashup service parameters |
US9098487B2 (en) * | 2012-11-29 | 2015-08-04 | Hewlett-Packard Development Company, L.P. | Categorization based on word distance |
US9652797B2 (en) * | 2013-01-18 | 2017-05-16 | 24/7 Customer, Inc. | Intent prediction based recommendation system using data combined from multiple channels |
US9355140B1 (en) | 2013-03-13 | 2016-05-31 | Google Inc. | Associating an entity with a search query |
WO2014153776A1 (en) * | 2013-03-29 | 2014-10-02 | Hewlett-Packard Development Company, L.P. | Query features and questions |
US9336277B2 (en) * | 2013-05-31 | 2016-05-10 | Google Inc. | Query suggestions based on search data |
US9116952B1 (en) | 2013-05-31 | 2015-08-25 | Google Inc. | Query refinements using search data |
US9348815B1 (en) * | 2013-06-28 | 2016-05-24 | Digital Reasoning Systems, Inc. | Systems and methods for construction, maintenance, and improvement of knowledge representations |
EP2908255B1 (en) * | 2014-02-13 | 2018-07-25 | Amadeus S.A.S. | Increasing search result validity |
US20160092506A1 (en) * | 2014-09-29 | 2016-03-31 | Linkedin Corporation | Generating suggested structured queries |
US10255358B2 (en) * | 2014-12-30 | 2019-04-09 | Facebook, Inc. | Systems and methods for clustering items associated with interactions |
US10489470B2 (en) * | 2015-03-03 | 2019-11-26 | Samsung Electronics Co., Ltd. | Method and system for filtering content in an electronic device |
US9892167B2 (en) * | 2015-03-31 | 2018-02-13 | Rovi Guides, Inc. | Methods and systems for generating cluster-based search results |
US10831771B2 (en) * | 2015-07-06 | 2020-11-10 | Sap Se | Interactive exploration of large graphs |
US10769182B2 (en) * | 2016-06-10 | 2020-09-08 | Apple Inc. | System and method of highlighting terms |
US10831763B2 (en) | 2016-06-10 | 2020-11-10 | Apple Inc. | System and method of generating a key list from multiple search domains |
EP3488367B1 (en) | 2016-07-21 | 2022-03-30 | Koninklijke Philips N.V. | Annotating medical images |
US10467229B2 (en) | 2016-09-30 | 2019-11-05 | Microsoft Technology Licensing, Llc. | Query-time analytics on graph queries spanning subgraphs |
US10545945B2 (en) | 2016-10-28 | 2020-01-28 | Microsoft Technology Licensing, Llc | Change monitoring spanning graph queries |
US10402403B2 (en) * | 2016-12-15 | 2019-09-03 | Microsoft Technology Licensing, Llc | Utilization of probabilistic characteristics for reduction of graph database traversals |
US10445361B2 (en) | 2016-12-15 | 2019-10-15 | Microsoft Technology Licensing, Llc | Caching of subgraphs and integration of cached subgraphs into graph query results |
US10242223B2 (en) | 2017-02-27 | 2019-03-26 | Microsoft Technology Licensing, Llc | Access controlled graph query spanning |
CN107402954B (en) * | 2017-05-26 | 2020-07-10 | 百度在线网络技术(北京)有限公司 | Method for establishing sequencing model, application method and device based on sequencing model |
CN107480162B (en) * | 2017-06-15 | 2021-09-21 | 北京百度网讯科技有限公司 | Search method, device and equipment based on artificial intelligence and computer readable storage medium |
US10831809B2 (en) * | 2017-08-31 | 2020-11-10 | Ca Technologies, Inc. | Page journey determination from web event journals |
CN109697282B (en) | 2017-10-20 | 2023-06-06 | 阿里巴巴集团控股有限公司 | Sentence user intention recognition method and device |
US11586939B2 (en) * | 2019-02-28 | 2023-02-21 | Entigenlogic Llc | Generating comparison information |
CN111752433B (en) * | 2020-06-16 | 2021-09-07 | 深圳梅沙科技有限公司 | Information display method and device, electronic equipment and readable storage medium |
CN112000788B (en) * | 2020-08-19 | 2024-02-09 | 腾讯云计算(长沙)有限责任公司 | Data processing method, device and computer readable storage medium |
US20220138407A1 (en) * | 2020-10-29 | 2022-05-05 | Giving Tech Labs, LLC | Document Writing Assistant with Contextual Search Using Knowledge Graphs |
CN112650907B (en) * | 2020-12-25 | 2023-07-14 | 百度在线网络技术(北京)有限公司 | Search word recommendation method, target model training method, device and equipment |
CN112800048B (en) * | 2021-03-17 | 2021-08-06 | 电子科技大学 | A Completion Method for Communication Records of Communication Network Users Based on Graph Representation Learning |
US12050612B2 (en) * | 2021-08-27 | 2024-07-30 | Graphite Growth, Inc. | Generation and use of topic graph for content authoring |
US11934476B2 (en) * | 2021-10-28 | 2024-03-19 | Toyota Research Institute, Inc. | System and method for contextualizing and improving understanding of web search results |
CN114385816A (en) * | 2022-01-12 | 2022-04-22 | 阿里巴巴(中国)有限公司 | Conversation flow mining method and device, electronic equipment and computer storage medium |
US12147420B2 (en) | 2022-01-31 | 2024-11-19 | Walmart Apollo, Llc | Systems and methods for detecting and resolving ambiguous search queries |
US20240256532A1 (en) * | 2023-01-31 | 2024-08-01 | Walmart Apollo, Llc | System and method for processing cross-lingual search queries |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20070067279A1 (en) | 2004-07-06 | 2007-03-22 | Icosystem Corporation | Methods and Apparatus for Interactive Searching Techniques |
US7565627B2 (en) | 2004-09-30 | 2009-07-21 | Microsoft Corporation | Query graphs indicating related queries |
US8504437B1 (en) | 2009-11-04 | 2013-08-06 | Google Inc. | Dynamically selecting and presenting content relevant to user input |
US8631004B2 (en) | 2009-12-28 | 2014-01-14 | Yahoo! Inc. | Search suggestion clustering and presentation |
-
2010
- 2010-11-02 US US12/938,205 patent/US8423538B1/en not_active Expired - Fee Related
-
2013
- 2013-04-01 US US13/854,275 patent/US9323806B2/en active Active
-
2016
- 2016-03-21 US US15/075,957 patent/US9582766B2/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20070067279A1 (en) | 2004-07-06 | 2007-03-22 | Icosystem Corporation | Methods and Apparatus for Interactive Searching Techniques |
US7565627B2 (en) | 2004-09-30 | 2009-07-21 | Microsoft Corporation | Query graphs indicating related queries |
US8504437B1 (en) | 2009-11-04 | 2013-08-06 | Google Inc. | Dynamically selecting and presenting content relevant to user input |
US8631004B2 (en) | 2009-12-28 | 2014-01-14 | Yahoo! Inc. | Search suggestion clustering and presentation |
Non-Patent Citations (24)
Title |
---|
A. K. Jain et al., "Data Clustering: A Review," Sep. 1999, ACM Computing Surveys, vol. 31, No. 3, pp. 264-323. |
Bodo Billerbeck et al., "Query Expansion using Associated Queries," 2003, in CIKM '03: Proceedings of the Twelfth International Conference on Information and Knowledge Management, pp. 2-9. |
Bruno M. Fonseca et al., "Concept-Based Interactive Query Expansion," 2005, in CIKM '05: Proceedings of the 14th ACM International Conference on Information and Knowledge Management, 8 pages. |
Chien-Kang Huang et al., "Relevant Term Suggestion in Interactive Web Search Based on Contextual Information in Query Session Logs," May 2003, Journal of the American Society for Information Science and Technology, 54(7), pp. 638-649. |
Christopher D. Manning et al., "Introduction to Information Retrieval," 2008, Cambridge University Press, pp. 346-356. |
Doug Beeferman et al., "Agglomerative clustering of a search engine query log," 2000, in KDD '00: Proceedings of the sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 407-416. |
Hang Cui et al., "Probabilistic Query Expansion Using Query Logs," 2002, in WWW '02: Proceedings of the 11th International Conference on World Wide Web, 8 pages. |
Howard M. Taylor et al., "An Introduction to Stochastic Modeling," Revised Edition, 1994, Academic Press, Inc., pp. 89-240. |
Huanhuan Cao et al., "Context-Aware Query Suggestion by Mining Click-Through and Session Data," 2008, in KDD '08: Proceeding of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 9 pages. |
Huzur Saran et al., "Finding k-cuts within Twice the Optimal," 1991, in SFCS '91: Proceedings of the 32nd Annual Symposium on Foundations of Computer Science, pp. 743-751. |
Ji-Rong Wen et al., "Clustering User Queries of a Search Engine," 2001, in WWW '01: Proceedings of the 10th International Conference on World Wide Web, pp. 162-168. |
Nick Craswell et al., "Random Walks on the Click Graph," 2007, in SIGIR '07: Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 8 pages. |
Nikhil Bansal et al., "Correlation Clustering," 2004, Machine Learning, 56, pp. 89-113. |
Nir Ailon et al., "Aggregating Inconsistent Information: Ranking and Clustering," Oct. 2008, Journal of the ACM, vol. 55, No. 5, Article 23, 27 pages. |
Olivier Goldschmidt et al., "Polynomial Algorithm for the k-CUT Problem," 1988, in SFCS '88: Proceedings of the 29th Annual Symposium on Foundations of Computer Science, pp. 444-451. |
Paolo Boldi et al., "The Query-flow Graph: Model and Applications," 2008, in CIKM '08: Proceeding of the 17th ACM conference on Information and Knowledge Management, 9 pages. |
Peter Anick, "Using Terminological Feedback for Web Search Refinement-A Log-based Study," 2003, in SIGIR '03: Proceedings of the 26th annual international ACM SIGIR conference on Research and development in information retrieval, pp. 88-95. |
Qi He et al., "Web Query Recommendation via Sequential Query Prediction," 2009, in ICDE '09: Proceedings of the 2009 IEEE International Conference on Data Engineering, pp. 1443-1454. |
Qiaozhu Mei et al., "Query Suggestion Using Hitting Time," 2008, in CIKM '08: Proceeding of the 17th ACM Conference on Information and Knowledge Management, pp. 469-477. |
Ricardo Baeza-Yates et al., "Query Recommendation Using Query Logs in Search Engines," Nov. 2004, vol. 3268/2004 of Lecture Notes in Computer Science, pp. 588-596. |
Silviu Cucerzan et al., "Extracting Semantically Related Queries by Exploiting User Session Information," 2005, Technical Report, Microsoft Research, 10 pages. |
Steve Chien et al., "Semantic Similarity Between Search Engine Queries Using Temporal Correlation," 2005, in WWW '05: Proceedings of the 14th International Conference on World Wide Web, pp. 2-11. |
Xuanhui Wang et al., "Mining Broad Latent Query Aspects from Search Sessions," 2009, in KDD '09: Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 867-875. |
Zhiyong Zhang et al., "Mining Search Engine Query Logs for Query Recommendation," May 2006, in WWW '06: Proceedings of the 15th International Conference on World Wide Web, pp. 1039-1040. |
Cited By (31)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10776334B2 (en) | 2017-10-10 | 2020-09-15 | Alibaba Group Holding Limited | Random walking and cluster-based random walking method, apparatus and device |
US10901971B2 (en) | 2017-10-10 | 2021-01-26 | Advanced New Technologies Co., Ltd. | Random walking and cluster-based random walking method, apparatus and device |
TWI687820B (en) * | 2017-10-10 | 2020-03-11 | 香港商阿里巴巴集團服務有限公司 | Random walk, cluster-based random walk method, device and equipment |
US12058160B1 (en) | 2017-11-22 | 2024-08-06 | Lacework, Inc. | Generating computer code for remediating detected events |
US12095794B1 (en) | 2017-11-27 | 2024-09-17 | Lacework, Inc. | Universal cloud data ingestion for stream processing |
US12120140B2 (en) | 2017-11-27 | 2024-10-15 | Fortinet, Inc. | Detecting threats against computing resources based on user behavior changes |
US11689553B1 (en) | 2017-11-27 | 2023-06-27 | Lacework Inc. | User session-based generation of logical graphs and detection of anomalies |
US12267345B1 (en) | 2017-11-27 | 2025-04-01 | Fortinet, Inc. | Using user feedback for attack path analysis in an anomaly detection framework |
US11792284B1 (en) | 2017-11-27 | 2023-10-17 | Lacework, Inc. | Using data transformations for monitoring a cloud compute environment |
US12244621B1 (en) | 2017-11-27 | 2025-03-04 | Fortinet, Inc. | Using activity monitored by multiple data sources to identify shadow systems |
US11882141B1 (en) | 2017-11-27 | 2024-01-23 | Lacework Inc. | Graph-based query composition for monitoring an environment |
US11909752B1 (en) | 2017-11-27 | 2024-02-20 | Lacework, Inc. | Detecting deviations from typical user behavior |
US12206696B1 (en) | 2017-11-27 | 2025-01-21 | Fortinet, Inc. | Detecting anomalies in a network environment |
US11979422B1 (en) | 2017-11-27 | 2024-05-07 | Lacework, Inc. | Elastic privileges in a secure access service edge |
US11991198B1 (en) | 2017-11-27 | 2024-05-21 | Lacework, Inc. | User-specific data-driven network security |
US12021888B1 (en) | 2017-11-27 | 2024-06-25 | Lacework, Inc. | Cloud infrastructure entitlement management by a data platform |
US12034754B2 (en) | 2017-11-27 | 2024-07-09 | Lacework, Inc. | Using static analysis for vulnerability detection |
US12130878B1 (en) | 2017-11-27 | 2024-10-29 | Fortinet, Inc. | Deduplication of monitored communications data in a cloud environment |
US12034750B1 (en) | 2017-11-27 | 2024-07-09 | Lacework Inc. | Tracking of user login sessions |
US11637849B1 (en) | 2017-11-27 | 2023-04-25 | Lacework Inc. | Graph-based query composition |
US12095879B1 (en) | 2017-11-27 | 2024-09-17 | Lacework, Inc. | Identifying encountered and unencountered conditions in software applications |
US12126695B1 (en) | 2017-11-27 | 2024-10-22 | Fortinet, Inc. | Enhancing security of a cloud deployment based on learnings from other cloud deployments |
US12095796B1 (en) | 2017-11-27 | 2024-09-17 | Lacework, Inc. | Instruction-level threat assessment |
US11677772B1 (en) | 2017-11-27 | 2023-06-13 | Lacework Inc. | Using graph-based models to identify anomalies in a network environment |
US12126643B1 (en) | 2017-11-27 | 2024-10-22 | Fortinet, Inc. | Leveraging generative artificial intelligence (‘AI’) for securing a monitored deployment |
US11256759B1 (en) | 2019-12-23 | 2022-02-22 | Lacework Inc. | Hierarchical graph analysis |
US12032634B1 (en) | 2019-12-23 | 2024-07-09 | Lacework Inc. | Graph reclustering based on different clustering criteria |
US11954130B1 (en) | 2019-12-23 | 2024-04-09 | Lacework Inc. | Alerting based on pod communication-based logical graph |
US11831668B1 (en) | 2019-12-23 | 2023-11-28 | Lacework Inc. | Using a logical graph to model activity in a network environment |
US11770464B1 (en) | 2019-12-23 | 2023-09-26 | Lacework Inc. | Monitoring communications in a containerized environment |
US12309185B1 (en) | 2024-01-18 | 2025-05-20 | Fortinet, Inc. | Architecture for a generative artificial intelligence (AI)-enabled assistant |
Also Published As
Publication number | Publication date |
---|---|
US20150161201A1 (en) | 2015-06-11 |
US8423538B1 (en) | 2013-04-16 |
US20160203411A1 (en) | 2016-07-14 |
US9323806B2 (en) | 2016-04-26 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US9582766B2 (en) | Clustering query refinements by inferred user intent | |
US8972394B1 (en) | Generating a related set of documents for an initial set of documents | |
US8099417B2 (en) | Semi-supervised part-of-speech tagging | |
US8631004B2 (en) | Search suggestion clustering and presentation | |
US8346754B2 (en) | Generating succinct titles for web URLs | |
US8244737B2 (en) | Ranking documents based on a series of document graphs | |
US7818315B2 (en) | Re-ranking search results based on query log | |
US9378247B1 (en) | Generating query refinements from user preference data | |
US20120317088A1 (en) | Associating Search Queries and Entities | |
US8650172B2 (en) | Searchable web site discovery and recommendation | |
US20100318531A1 (en) | Smoothing clickthrough data for web search ranking | |
CN107066589B (en) | Entity semantics and word frequency ordering method and device based on comprehensive knowledge | |
US8484193B2 (en) | Look-ahead document ranking system | |
CN103049528A (en) | Personalized web page searching and sorting method on basis of interest vectors of user | |
Li et al. | A feature-free search query classification approach using semantic distance | |
Yan et al. | Designing focused crawler based on improved genetic algorithm | |
Pavani et al. | A novel web crawling method for vertical search engines | |
Zhu et al. | Clickrank: learning session-context models to enrich web search ranking | |
Wang et al. | QIRM: A quantum interactive retrieval model for session search | |
Xu | Web mining techniques for recommendation and personalization | |
Thakur et al. | Performance based novel techniques for semantic web mining | |
RU2839310C1 (en) | System and method for ranking search results in search engine | |
Jay et al. | An approach to identify user interest by reranking personalize web | |
Geetharani et al. | Location-based Ranking Method (LBRM) for ranking search results in search engines | |
Kim et al. | Developing a Meta-Suggestion Engine for Search Queries |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AS | Assignment |
Owner name: GOOGLE INC., CALIFORNIA Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:SADIKOV, ELDAR;MADHAVAN, JAYANT;HALEVY, ALON;REEL/FRAME:038058/0974 Effective date: 20101112 |
|
STCF | Information on status: patent grant |
Free format text: PATENTED CASE |
|
AS | Assignment |
Owner name: GOOGLE LLC, CALIFORNIA Free format text: CHANGE OF NAME;ASSIGNOR:GOOGLE INC.;REEL/FRAME:044097/0658 Effective date: 20170929 |
|
MAFP | Maintenance fee payment |
Free format text: PAYMENT OF MAINTENANCE FEE, 4TH YEAR, LARGE ENTITY (ORIGINAL EVENT CODE: M1551); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY Year of fee payment: 4 |
|
MAFP | Maintenance fee payment |
Free format text: PAYMENT OF MAINTENANCE FEE, 8TH YEAR, LARGE ENTITY (ORIGINAL EVENT CODE: M1552); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY Year of fee payment: 8 |