CN104778202B - The analysis method and system of event evolutionary process based on keyword - Google Patents
The analysis method and system of event evolutionary process based on keyword Download PDFInfo
- Publication number
- CN104778202B CN104778202B CN201510062379.3A CN201510062379A CN104778202B CN 104778202 B CN104778202 B CN 104778202B CN 201510062379 A CN201510062379 A CN 201510062379A CN 104778202 B CN104778202 B CN 104778202B
- Authority
- CN
- China
- Prior art keywords
- peak
- time period
- search results
- window
- sequence
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000004458 analytical method Methods 0.000 title claims abstract description 39
- 230000010429 evolutionary process Effects 0.000 title 1
- 238000000034 method Methods 0.000 claims abstract description 70
- 230000008569 process Effects 0.000 claims abstract description 38
- 238000001514 detection method Methods 0.000 claims abstract description 29
- 230000011218 segmentation Effects 0.000 claims description 20
- 230000008859 change Effects 0.000 claims description 4
- 238000010586 diagram Methods 0.000 description 10
- 238000004364 calculation method Methods 0.000 description 3
- 230000008878 coupling Effects 0.000 description 3
- 238000010168 coupling process Methods 0.000 description 3
- 238000005859 coupling reaction Methods 0.000 description 3
- 230000006870 function Effects 0.000 description 3
- 238000004891 communication Methods 0.000 description 2
- 238000011161 development Methods 0.000 description 2
- 230000018109 developmental process Effects 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 238000012545 processing Methods 0.000 description 2
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000011895 specific detection Methods 0.000 description 1
Landscapes
- User Interface Of Digital Computer (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
本发明实施例提供一种基于关键词的事件演化过程的分析方法及系统。该方法包括:统计各时间段内的搜索结果数量,获得搜索结果数量随时间段变化的序列;对所述序列进行尖峰检测获得至少一个尖峰窗口;对尖峰窗口对应的搜索结果进行文本分析,获得尖峰描述;显示搜索结果数量随时间段变化的序列,并在尖峰的位置显示尖峰描述。本发明实施例通过统计搜索结果数量随时间段变化的序列,显示搜索结果数量随时间段变化的序列以及尖峰描述,展现出微博信息数量随时间变化的趋势,微博信息数量随时间变化的趋势能够展现出与关键词相关的事件的演化过程,依据尖峰描述用户不必详细阅读微博内容便可获知微博信息的主要内容,提高了搜索结果显示的信息量。
Embodiments of the present invention provide a method and system for analyzing event evolution process based on keywords. The method includes: counting the number of search results in each time period, and obtaining a sequence in which the number of search results changes with the time period; performing peak detection on the sequence to obtain at least one peak window; performing text analysis on the search results corresponding to the peak window to obtain Spike descriptions; show the sequence of the number of search results over time and show the spike descriptions at the locations of the spikes. In the embodiment of the present invention, by counting the sequence of the number of search results changing with the time period, displaying the sequence of the number of search results changing with the time period and the peak description, showing the trend of the number of microblog information changing with time, and the number of microblog information changing with time Trends can show the evolution process of events related to keywords. According to the peak description, users can know the main content of Weibo information without reading Weibo content in detail, which improves the amount of information displayed in search results.
Description
技术领域technical field
本发明实施例涉及计算机技术领域,尤其涉及一种基于关键词的事件演化过程的分析方法及系统。The embodiments of the present invention relate to the field of computer technology, in particular to a method and system for analyzing event evolution process based on keywords.
背景技术Background technique
随着计算机技术的发展,微博软件作为信息获取的一个重要来源已经成为广泛应用的社交类软件。With the development of computer technology, microblogging software, as an important source of information acquisition, has become a widely used social software.
现有技术中,微博软件通过用户输入的关键字进行匹配搜索,并将搜索出的与该关键字密切相关的微博信息显示在用户终端。In the prior art, the microblog software performs a matching search through the keyword input by the user, and displays the searched microblog information closely related to the keyword on the user terminal.
由于微博信息具有时间属性,但现有的微博信息显示结果无法展现微博信息随时间变化的趋势,导致搜索结果显示的信息量较低,并且无法展现与关键词相关的事件的演化过程。Due to the time attribute of microblog information, the existing microblog information display results cannot show the trend of microblog information changing over time, resulting in a low amount of information displayed in the search results, and the evolution process of events related to keywords cannot be displayed .
发明内容Contents of the invention
本发明实施例提供一种基于关键词的事件演化过程的分析方法及系统,以提高搜索结果显示的信息量,并展现与关键词相关的事件的演化过程。Embodiments of the present invention provide a method and system for analyzing event evolution process based on keywords, so as to increase the amount of information displayed in search results and display the evolution process of events related to keywords.
本发明实施例的一个方面是提供一种基于关键词的事件演化过程的分析方法,包括:An aspect of the embodiments of the present invention is to provide a method for analyzing event evolution process based on keywords, including:
统计各时间段内的搜索结果数量,获得所述搜索结果数量随所述时间段变化的序列;Counting the number of search results in each time period, and obtaining the sequence of the number of search results changing with the time period;
对所述序列进行尖峰检测获得至少一个尖峰窗口,每个所述尖峰窗口包括一个子序列,所述子序列包括一个尖峰,所述尖峰是所述搜索结果数量在所述序列中的局部最大值;Performing peak detection on the sequence to obtain at least one peak window, each of the peak windows includes a subsequence, the subsequence includes a spike, and the spike is a local maximum of the number of search results in the sequence ;
对所述尖峰窗口对应的搜索结果进行文本分析,获得尖峰描述;Perform text analysis on the search results corresponding to the peak window to obtain a peak description;
显示所述搜索结果数量随所述时间段变化的序列,并在所述尖峰的位置显示所述尖峰描述。A sequence of changes in the number of search results with the time period is displayed, and a description of the peak is displayed at the position of the peak.
本发明实施例的另一个方面是提供一种基于关键词的事件演化过程的分析系统,包括:Another aspect of the embodiments of the present invention is to provide a keyword-based analysis system for event evolution process, including:
统计模块,用于统计各时间段内的搜索结果数量,获得所述搜索结果数量随所述时间段变化的序列;A statistics module, configured to count the number of search results in each time period, and obtain a sequence in which the number of search results changes with the time period;
检测模块,用于对所述序列进行尖峰检测获得至少一个尖峰窗口,每个所述尖峰窗口包括一个子序列,所述子序列包括一个尖峰,所述尖峰是所述搜索结果数量在所述序列中的局部最大值;A detection module, configured to perform peak detection on the sequence to obtain at least one peak window, each of the peak windows includes a subsequence, the subsequence includes a peak, and the peak is the number of search results in the sequence Local maximum in ;
文本分析模块,用于对所述尖峰窗口对应的搜索结果进行文本分析,获得尖峰描述;A text analysis module, configured to perform text analysis on the search results corresponding to the peak window to obtain a peak description;
显示模块,用于显示所述搜索结果数量随所述时间段变化的序列,并在所述尖峰的位置显示所述尖峰描述。A display module, configured to display the sequence of the number of search results changing with the time period, and display the description of the peak at the position of the peak.
本发明实施例提供的基于关键词的事件演化过程的分析方法及系统,通过统计搜索结果数量随时间段变化的序列,依据尖峰检测获取该序列的局部最大值,分析该局部最大值出现的原因获得尖峰描述,显示搜索结果数量随时间段变化的序列,并在尖峰位置显示尖峰描述,实现了微博信息显示结果能够展现微博信息数量随时间变化的趋势,微博信息数量随时间变化的趋势能够展现出与关键词相关的事件的演化过程,依据尖峰描述用户不必详细阅读微博内容便可获知微博信息的主要内容,提高了搜索结果显示的信息量。The method and system for analyzing the event evolution process based on keywords provided by the embodiments of the present invention obtains the local maximum value of the sequence by counting the number of search results that change with the time period according to the peak detection, and analyzes the cause of the local maximum value Obtain the peak description, display the sequence of the number of search results changing over time, and display the peak description at the peak position, and realize the microblog information display results can show the trend of the number of microblog information changing over time, and the number of microblog information changing over time Trends can show the evolution process of events related to keywords. According to the peak description, users can know the main content of Weibo information without reading Weibo content in detail, which improves the amount of information displayed in search results.
附图说明Description of drawings
图1为本发明实施例提供的基于关键词的事件演化过程的分析方法流程图;Fig. 1 is the flow chart of the analysis method of the event evolution process based on keywords provided by the embodiment of the present invention;
图2为本发明另一实施例提供的尖峰窗口的示意图;Fig. 2 is a schematic diagram of a peak window provided by another embodiment of the present invention;
图3为本发明另一实施例提供的尖峰窗口的示意图;Fig. 3 is a schematic diagram of a peak window provided by another embodiment of the present invention;
图4为本发明另一实施例提供的显示搜索结果的示意图;Fig. 4 is a schematic diagram of displaying search results provided by another embodiment of the present invention;
图5为本发明实施例提供的基于关键词的事件演化过程的分析系统的结构图;5 is a structural diagram of an analysis system based on a keyword-based event evolution process provided by an embodiment of the present invention;
图6为本发明另一实施例提供的基于关键词的事件演化过程的分析系统的结构图。FIG. 6 is a structural diagram of an analysis system for a keyword-based event evolution process provided by another embodiment of the present invention.
具体实施方式Detailed ways
图1为本发明实施例提供的基于关键词的事件演化过程的分析方法流程图。本发明实施例针对利用关键词搜索到的微博信息,统计微博数量,并提供了基于关键词的事件演化过程的分析方法,该方法具体步骤如下:FIG. 1 is a flowchart of an analysis method for a keyword-based event evolution process provided by an embodiment of the present invention. According to the microblog information searched by keywords, the embodiment of the present invention counts the number of microblogs, and provides an analysis method for event evolution process based on keywords. The specific steps of the method are as follows:
步骤S101、统计各时间段内的搜索结果数量,获得所述搜索结果数量随所述时间段变化的序列;Step S101, counting the number of search results in each time period, and obtaining a sequence in which the number of search results changes with the time period;
所述统计各时间段内的搜索结果数量之前还包括:Before the statistics of the number of search results in each time period, it also includes:
依据关键词搜索并获得与所述关键词相关的搜索结果,所述搜索结果包括时间信息;Searching according to keywords and obtaining search results related to the keywords, the search results including time information;
所述统计各时间段内的搜索结果数量包括:The statistics of the number of search results in each time period include:
依据所述时间信息分别统计所述各时间段内的搜索结果数量。Count the number of search results in each time period according to the time information.
通过用户输入的关键词,搜索获得与该关键词相关的微博信息,具体为包含该关键词的微博信息,微博信息具体包括微博内容和发布时间,以预定的时间段统计发布时间在该时间段内的微博信息的数量,以时间段是1天为例,统计某一天内发布的、包含该关键词的微博信息的数量,分别统计多个时间段中每个时间段内的微博信息的数量,且时间段与微博信息的数量一一对应,多个时间段与多个微博信息的数量构成微博信息的数量随时间段变化的序列。多个时间段之间互不重叠。Through the keyword entered by the user, search and obtain the microblog information related to the keyword, specifically the microblog information containing the keyword, the microblog information specifically includes the microblog content and release time, and the release time is counted in a predetermined time period The number of microblog information in this time period, taking the time period as 1 day as an example, count the number of microblog information published in a certain day and contain this keyword, and count each time period in multiple time periods The number of microblog information in the time period and the number of microblog information correspond one-to-one, and the number of time periods and the number of microblog information constitutes a sequence in which the number of microblog information changes with time periods. Multiple time periods do not overlap each other.
步骤S102、对所述序列进行尖峰检测获得至少一个尖峰窗口,每个所述尖峰窗口包括一个子序列,所述子序列包括一个尖峰,所述尖峰是所述搜索结果数量在所述序列中的局部最大值;Step S102, performing peak detection on the sequence to obtain at least one peak window, each of the peak windows includes a subsequence, the subsequence includes a spike, and the spike is the number of search results in the sequence local maximum;
对于上述步骤获得的多个时间段与多个微博信息的数量构成微博信息的数量随时间段变化的序列进行尖峰检测,检测微博信息的数量在该序列中的局部最大值,该局部最大值代表尖峰,该序列的局部最大值至少包括一个,即一个序列至少包括一个尖峰。通过尖峰检测获取该序列的所有尖峰,尖峰的获取具体通过尖峰窗口获得,尖峰窗口是包括尖峰在内的一个子序列。The number of microblog information obtained in the above steps constitutes a sequence of changes in the number of microblog information with time periods for peak detection, and detects the local maximum value of the number of microblog information in the sequence. The maximum represents a spike, and the local maximum of the sequence includes at least one, that is, a sequence includes at least one spike. All the peaks of the sequence are acquired through peak detection, and the acquisition of the peaks is specifically obtained through the peak window, and the peak window is a subsequence including the peak.
步骤S103、对所述尖峰窗口对应的搜索结果进行文本分析,获得尖峰描述;Step S103, performing text analysis on the search results corresponding to the peak window to obtain a peak description;
对尖峰窗口内的所有微博信息进行文本分析,获取文本中的分词,将文本中出现频率较高的分词作为尖峰描述,解释尖峰出现的原因。Perform text analysis on all microblog information in the peak window, obtain the word segmentation in the text, use the word segmentation with high frequency in the text as the peak description, and explain the reason for the spike.
步骤S104、显示所述搜索结果数量随所述时间段变化的序列,并在所述尖峰的位置显示所述尖峰描述。Step S104, displaying the sequence of the number of search results changing with the time period, and displaying the description of the peak at the position of the peak.
将微博信息的数量随时间段变化的序列显示出来,并在尖峰的位置显示尖峰描述,使用户一目了然与输入的关键词相关的微博信息的数量变化,与关键词相关的事态的发展趋势,以及公众对该事态的关注度。Display the sequence of changes in the number of microblog information over time, and display the peak description at the peak position, so that users can see at a glance the change in the number of microblog information related to the input keyword and the development trend of the situation related to the keyword , and public attention to the situation.
本发明实施例通过统计搜索结果数量随时间段变化的序列,依据尖峰检测获取该序列的局部最大值,分析该局部最大值出现的原因获得尖峰描述,显示搜索结果数量随时间段变化的序列,并在尖峰位置显示尖峰描述,实现了微博信息显示结果能够展现微博信息数量随时间变化的趋势,微博信息数量随时间变化的趋势能够展现出与关键词相关的事件的演化过程,依据尖峰描述用户不必详细阅读微博内容便可获知微博信息的主要内容,提高了搜索结果显示的信息量。The embodiment of the present invention obtains the local maximum of the sequence by counting the number of search results changing with the time period according to the peak detection, analyzes the cause of the occurrence of the local maximum to obtain the peak description, and displays the sequence of the number of search results changing with the time period, And the peak description is displayed at the peak position, and the microblog information display results can show the trend of the amount of microblog information changing over time, and the trend of the amount of microblog information changing over time can show the evolution process of events related to keywords. Peak description users can know the main content of the microblog information without reading the microblog content in detail, which increases the amount of information displayed in the search results.
图2为本发明另一实施例提供的尖峰窗口的示意图;图3为本发明另一实施例提供的尖峰窗口的示意图。在上述实施例的基础上,所述尖峰窗口包括窗口起始的时间段标识、尖峰的时间段标识和窗口结束的时间段标识;FIG. 2 is a schematic diagram of a peak window provided by another embodiment of the present invention; FIG. 3 is a schematic diagram of a peak window provided by another embodiment of the present invention. On the basis of the above embodiments, the spike window includes a time period identifier for the start of the window, a time period identifier for the peak and a time period identifier for the end of the window;
所述依据所述各时间段以及所述各时间段内的搜索结果数量进行尖峰检测获得尖峰窗口包括:The performing peak detection according to the time periods and the number of search results in each time period to obtain the peak window includes:
初始化参数mean=C1,C1表示第1个时间段内的搜索结果数量;Initialization parameter mean=C 1 , C 1 represents the number of search results in the first time period;
若则更新所述初始化参数其中,Ci表示第i个时间段内的搜索结果数量,n表示所述时间段的总个数;like Then update the initialization parameter Wherein, Ci represents the number of search results in the i-th time period, and n represents the total number of the time period;
若则确定所述窗口起始的时间段标识为i,若则j=j+1,继续判断是否成立,直至时确定所述窗口结束的时间段标识为j,计算Ck,i≤k≤j,使得均成立,则k表示所述尖峰的时间段标识;like It is then determined that the time period at which the window starts is identified as i, if Then j=j+1, continue to judge is established until When it is determined that the time period at which the window ends is identified as j, calculate C k , i≤k≤j, so that are all established, then k represents the time period identification of the peak;
所述尖峰窗口为window(i,k,j);The peak window is window(i,k,j);
更新所述初始化参数依据获取window(i,k,j)的方法继续获取所述序列的所述尖峰窗口。Update the initialization parameters Continue to acquire the peak window of the sequence according to the method of acquiring window(i,k,j).
如图2所示,若C4明显较大时,则初始化参数对应增大,导致该序列后续的尖峰窗口不易被检出。为了解决该问题,本发明实施例还包括:将所述搜索结果数量随所述时间段变化的序列逆序排列获得逆序序列;As shown in Figure 2, if C 4 is significantly larger, the initialization parameter The corresponding increase makes the subsequent peak window of the sequence difficult to be detected. In order to solve this problem, the embodiment of the present invention further includes: arranging in reverse order the sequence in which the number of search results changes with the time period to obtain a reverse sequence;
依据获取window(i,k,j)的方法获取所述逆序序列的所述尖峰窗口;Acquiring the spike window of the reverse sequence according to the method of acquiring window(i,k,j);
将所述序列的所述尖峰窗口和所述逆序序列的所述尖峰窗口合并为尖峰窗口集合,重复的所述尖峰窗口记录一次。The spike windows of the sequence and the spike windows of the reverse sequence are combined into a spike window set, and the repeated spike windows are recorded once.
对图2中的序列逆序排列获得逆序序列,依据获取window(3,4,6)的方法获取逆序序列的尖峰窗口,具体可以沿着X轴的逆向检测尖峰窗口,具体的检测方法与上述步骤相同,检测出的第一个尖峰窗口为window(8,10,12),由于window(8,10,12)中没有明显较大的值,则初始化参数不会明显增大,还可检测出第二个检测出window(3,4,6)。Arrange the sequence in Figure 2 in reverse order to obtain the reverse sequence, and obtain the peak window of the reverse sequence according to the method of obtaining window(3,4,6). Specifically, the peak window can be detected along the reverse direction of the X axis. The specific detection method is the same as the above steps Similarly, the first detected peak window is window(8,10,12), since there is no significantly larger value in window(8,10,12), the initialization parameters It will not increase significantly, and the second detection window (3,4,6) can also be detected.
对于图2中的序列沿着X轴正向检测获得window(3,4,6),沿着X轴逆向检测获得window(8,10,12)和window(3,4,6),将两个方向检测出的尖峰窗口合并为尖峰窗口集合{window(3,4,6),window(8,10,12)},其中,重复的尖峰窗口window(3,4,6)记录一次,如图3所示,该尖峰窗口集合作为序列最终的尖峰窗口。For the sequence in Figure 2, window (3,4,6) is obtained by forward detection along the X axis, and window (8,10,12) and window (3,4,6) are obtained by reverse detection along the X axis. Spike windows detected in two directions are merged into a set of spike windows {window(3,4,6), window(8,10,12)}, where the repeated spike window window(3,4,6) is recorded once, such as As shown in FIG. 3 , the set of spike windows is used as the final spike window of the sequence.
本发明实施例提供了获取尖峰窗口的具体方法,并通过正向和逆向检测序列的尖峰窗口,大大提高了序列中尖峰被检测出来的概率,提高了计算精度。The embodiment of the present invention provides a specific method for obtaining the peak window, and through forward and reverse detection of the peak window of the sequence, the probability of the peak being detected in the sequence is greatly improved, and the calculation accuracy is improved.
图4为本发明另一实施例提供的显示搜索结果的示意图。在上述实施例的基础上,所述对所述尖峰窗口对应的搜索结果进行文本分析,获得尖峰描述包括:Fig. 4 is a schematic diagram of displaying search results provided by another embodiment of the present invention. On the basis of the above embodiments, performing text analysis on the search results corresponding to the spike window, and obtaining the spike description includes:
获取所述尖峰窗口对应的搜索结果,利用分词工具获得所述搜索结果对应的分词;计算各分词的词频逆向文件频率(Term Frequency Inverse Document Frequency,简称TFIDF)值;若所述TFIDF值大于阈值,将所述TFIDF值对应的分词作为所述尖峰描述。Obtain the search result corresponding to the peak window, use the word segmentation tool to obtain the corresponding word segmentation of the search result; calculate the word frequency reverse document frequency (Term Frequency Inverse Document Frequency, referred to as TFIDF) value of each word segmentation; if the TFIDF value is greater than the threshold value, The word segmentation corresponding to the TFIDF value is used as the peak description.
对尖峰窗口内的所有微博信息进行文本分析,利用分词工具获得文本中的分词,并计算各分词的TFIDF值,第h个分词的TFIDF值的定义如下公式(1)(2)(3):Perform text analysis on all microblog information in the spike window, use the word segmentation tool to obtain the word segmentation in the text, and calculate the TFIDF value of each word segmentation, the definition of the TFIDF value of the hth word segmentation is as follows formula (1)(2)(3) :
TFIDFh=TFh*IDFh (1)TFIDF h =TF h *IDF h (1)
TFh=Nh,t/ΣmNm,t (2)TF h = N h,t /Σ m N m,t (2)
IDFh=1/DFh (3)IDF h = 1/DF h (3)
其中,Nh,t表示第h个分词在第t个尖峰窗口中出现的次数,ΣmNm,t表示第t个尖峰窗口对应的微博信息中出现的分词总数,DFh表示包含第h个分词的微博信息的条数。Among them, N h,t represents the number of times that the h-th word segment appears in the t-th peak window, Σ m N m,t represents the total number of word segments that appear in the microblog information corresponding to the t-th peak window, and DF h represents the number of words that appear in the t-th peak window The number of microblog information with h participle.
将大于阈值的TFIDF值对应的分词作为尖峰描述,具体将TFIDF值按从大到小的顺序排序,排序前5个TFIDF值分别对应的分词作为尖峰描述,即尖峰描述包括5个分词,该5个分词是微博信息中出现频率最高的5个分词,并能够概括事件的主要内容。The word segmentation corresponding to the TFIDF value greater than the threshold is used as the peak description. Specifically, the TFIDF values are sorted in descending order, and the word segmentation corresponding to the first 5 TFIDF values is used as the peak description. Participles are the five most frequently occurring participle in Weibo information, and can summarize the main content of the event.
所述显示所述搜索结果数量随所述时间段变化的序列包括:将所述搜索结果数量随所述时间段变化的序列连接成曲线,显示所述曲线。The displaying the sequence of the number of search results varying with the time period includes: connecting the sequence of the number of search results varying with the time period into a curve, and displaying the curve.
以马航事故为例,如图4所示,横轴代表时间,纵轴代表原创微博数量,原创微博中包括很多条对马航事故的报道,以1天为统计原创微博数量的时间段,具体从2014年3月1日起到2014年7月31日每隔一天或两天统计一次当天(全天24小时)关于马航事故的原创微博数量,并获得原创微博数量随时间段变化的序列,并将该序列连接成曲线,同时依据上述实施例的方法获得该序列的所有尖峰窗口,以及每个尖峰窗口对应的尖峰描述,并将尖峰描述显示在尖峰位置处,如图4所示,该序列包括两个尖峰,第一个尖峰描述为马航、MH370、失联、飞机、祈福,第二个尖峰描述为马航、MH17、乌克兰、击落、坠毁。Taking the Malaysia Airlines accident as an example, as shown in Figure 4, the horizontal axis represents time, and the vertical axis represents the number of original Weibo. Original Weibo includes many reports on the Malaysia Airlines accident. One day is used as the time period for counting the number of original Weibo. , specifically from March 1, 2014 to July 31, 2014, counting the number of original microblogs about the Malaysia Airlines accident every other day or two days (24 hours a day), and obtaining the number of original microblogs over time Change the sequence, and connect the sequence into a curve, and obtain all the peak windows of the sequence according to the method of the above-mentioned embodiment, and the peak description corresponding to each peak window, and display the peak description at the peak position, as shown in Figure 4 As shown, the sequence includes two spikes. The first spike is described as Malaysia Airlines, MH370, lost contact, plane, blessing, and the second spike is described as Malaysia Airlines, MH17, Ukraine, shot down, and crashed.
本发明实施例通过输入关键词,获得包含该关键词的原创微博数量随时间变化的连续曲线图,连续曲线图能够展现出与关键词相关的事件的演化过程,并在连续曲线的尖峰处显示能够概括微博内容的尖峰描述,提高了搜索结果显示的信息量。In the embodiment of the present invention, by inputting a keyword, a continuous graph of the number of original microblogs containing the keyword changing over time is obtained. The continuous graph can show the evolution process of events related to the keyword, and at the peak of the continuous curve Display peak descriptions that can summarize Weibo content, increasing the amount of information displayed in search results.
图5为本发明实施例提供的基于关键词的事件演化过程的分析系统的结构图。本发明实施例提供的基于关键词的事件演化过程的分析系统可以执行基于关键词的事件演化过程的分析方法实施例提供的处理流程,如图5所示,基于关键词的事件演化过程的分析系统50包括统计模块51、检测模块52、文本分析模块53和显示模块54,其中,统计模块51用于统计各时间段内的搜索结果数量,获得所述搜索结果数量随所述时间段变化的序列;检测模块52用于对所述序列进行尖峰检测获得至少一个尖峰窗口,每个所述尖峰窗口包括一个子序列,所述子序列包括一个尖峰,所述尖峰是所述搜索结果数量在所述序列中的局部最大值;文本分析模块53用于对所述尖峰窗口对应的搜索结果进行文本分析,获得尖峰描述;显示模块54用于显示所述搜索结果数量随所述时间段变化的序列,并在所述尖峰的位置显示所述尖峰描述。FIG. 5 is a structural diagram of an analysis system for a keyword-based event evolution process provided by an embodiment of the present invention. The analysis system based on the keyword-based event evolution process provided by the embodiment of the present invention can execute the processing flow provided by the embodiment of the keyword-based event evolution process analysis method embodiment, as shown in FIG. 5 , the analysis of the keyword-based event evolution process The system 50 includes a statistical module 51, a detection module 52, a text analysis module 53 and a display module 54, wherein the statistical module 51 is used to count the number of search results in each time period, and obtain the number of search results that changes with the time period. sequence; the detection module 52 is used to perform peak detection on the sequence to obtain at least one peak window, and each of the peak windows includes a subsequence, and the subsequence includes a peak, and the peak is that the number of search results is within the specified peak. The local maximum in the sequence; the text analysis module 53 is used to perform text analysis on the search results corresponding to the peak window to obtain a peak description; the display module 54 is used to display the sequence that the number of search results changes with the time period , and display the spike description at the spike's location.
本发明实施例通过统计搜索结果数量随时间段变化的序列,依据尖峰检测获取该序列的局部最大值,分析该局部最大值出现的原因获得尖峰描述,显示搜索结果数量随时间段变化的序列,并在尖峰位置显示尖峰描述,实现了微博信息显示结果能够展现微博信息数量随时间变化的趋势,微博信息数量随时间变化的趋势能够展现出与关键词相关的事件的演化过程,依据尖峰描述用户不必详细阅读微博内容便可获知微博信息的主要内容,提高了搜索结果显示的信息量。The embodiment of the present invention obtains the local maximum of the sequence by counting the number of search results changing with the time period according to the peak detection, analyzes the cause of the occurrence of the local maximum to obtain the peak description, and displays the sequence of the number of search results changing with the time period, And the peak description is displayed at the peak position, and the microblog information display results can show the trend of the amount of microblog information changing over time, and the trend of the amount of microblog information changing over time can show the evolution process of events related to keywords. Peak description users can know the main content of the microblog information without reading the microblog content in detail, which increases the amount of information displayed in the search results.
图6为本发明另一实施例提供的基于关键词的事件演化过程的分析系统的结构图。在图5的基础上,所述尖峰窗口包括窗口起始的时间段标识、尖峰的时间段标识和窗口结束的时间段标识;检测模块52具体用于初始化参数mean=C1,C1表示第1个时间段内的搜索结果数量;若则更新所述初始化参数其中,Ci表示第i个时间段内的搜索结果数量,n表示所述时间段的总个数;若则确定所述窗口起始的时间段标识为i,若则j=j+1,继续判断是否成立,直至时确定所述窗口结束的时间段标识为j,计算Ck,i≤k≤j,使得均成立,则k表示所述尖峰的时间段标识;所述尖峰窗口为window(i,k,j);更新所述初始化参数i=j+1,依据获取window(i,k,j)的方法继续获取所述序列的所述尖峰窗口。FIG. 6 is a structural diagram of an analysis system for a keyword-based event evolution process provided by another embodiment of the present invention. On the basis of Fig. 5, the peak window includes the time period identification of the window start, the time period identification of the peak and the time period identification of the end of the window; the detection module 52 is specifically used to initialize the parameter mean=C 1 , C 1 represents the first The number of search results in a time period; if Then update the initialization parameter Wherein, Ci represents the number of search results in the i-th time period, and n represents the total number of the time period; if It is then determined that the time period at which the window starts is identified as i, if Then j=j+1, continue to judge is established until When it is determined that the time period at which the window ends is identified as j, calculate C k , i≤k≤j, so that All established, then k represents the time period identification of the peak; the peak window is window (i, k, j); update the initialization parameters i=j+1, continue to acquire the peak window of the sequence according to the method of acquiring window(i,k,j).
基于关键词的事件演化过程的分析系统50还包括逆序排列模块55,逆序排列模块55用于将所述搜索结果数量随所述时间段变化的序列逆序排列获得逆序序列;检测模块52还用于依据获取window(i,k,j)的方法获取所述逆序序列的所述尖峰窗口;将所述序列的所述尖峰窗口和所述逆序序列的所述尖峰窗口合并为尖峰窗口集合,重复的所述尖峰窗口记录一次。The analysis system 50 of the event evolution process based on keywords also includes a reverse sequence module 55, and the reverse sequence module 55 is used to reversely sequence the sequence of the number of search results changing with the time period to obtain a reverse sequence; the detection module 52 is also used to Acquiring the spike window of the reverse sequence according to the method of obtaining window(i, k, j); merging the spike window of the sequence and the spike window of the reverse sequence into a set of spike windows, repeated The spike window is recorded once.
基于关键词的事件演化过程的分析系统50还包括搜索模块49,搜索模块49用于依据关键词搜索并获得与所述关键词相关的搜索结果,所述搜索结果包括时间信息;统计模块51具体用于依据所述时间信息分别统计所述各时间段内的搜索结果数量。The analysis system 50 of the event evolution process based on keywords also includes a search module 49, and the search module 49 is used to search according to keywords and obtain search results relevant to the keywords, and the search results include time information; the statistics module 51 specifically It is used to separately count the number of search results in each time period according to the time information.
文本分析模块53具体用于获取所述尖峰窗口对应的搜索结果,利用分词工具获得所述搜索结果对应的分词;计算各分词的词频逆向文件频率TFIDF值;若所述TFIDF值大于阈值,将所述TFIDF值对应的分词作为所述尖峰描述。The text analysis module 53 is specifically used to obtain the search result corresponding to the peak window, and utilizes the word segmentation tool to obtain the word segmentation corresponding to the search result; calculate the word frequency reverse file frequency TFIDF value of each word segmentation; if the TFIDF value is greater than the threshold value, the The word segmentation corresponding to the TFIDF value is used as the peak description.
显示模块54具体用于将所述搜索结果数量随所述时间段变化的序列连接成曲线,显示所述曲线。The display module 54 is specifically configured to connect the series of changes in the number of search results with the time period into a curve, and display the curve.
本发明实施例提供的基于关键词的事件演化过程的分析系统可以具体用于执行上述图1所提供的方法实施例,具体功能此处不再赘述。The keyword-based event evolution process analysis system provided by the embodiment of the present invention can be specifically used to execute the method embodiment provided in FIG. 1 above, and the specific functions will not be repeated here.
本发明实施例提供了获取尖峰窗口的具体方法,并通过正向和逆向检测序列的尖峰窗口,大大提高了序列中尖峰被检测出来的概率,提高了计算精度;通过输入关键词,获得包含该关键词的原创微博数量随时间变化的连续曲线图,连续曲线图能够展现出与关键词相关的事件的演化过程,并在连续曲线的尖峰处显示能够概括微博内容的尖峰描述,提高了搜索结果显示的信息量。The embodiment of the present invention provides a specific method for obtaining the peak window, and through the forward and reverse detection of the peak window of the sequence, the probability of the peak being detected in the sequence is greatly improved, and the calculation accuracy is improved; The continuous curve graph of the number of original microblogs of keywords changing over time, the continuous curve graph can show the evolution process of events related to keywords, and the peak description of the microblog content can be displayed at the peak of the continuous curve, which improves the The amount of information displayed in the search results.
综上所述,本发明实施例通过统计搜索结果数量随时间段变化的序列,依据尖峰检测获取该序列的局部最大值,分析该局部最大值出现的原因获得尖峰描述,显示搜索结果数量随时间段变化的序列,并在尖峰位置显示尖峰描述,实现了微博信息显示结果能够展现微博信息数量随时间变化的趋势,微博信息数量随时间变化的趋势能够展现出与关键词相关的事件的演化过程,依据尖峰描述用户不必详细阅读微博内容便可获知微博信息的主要内容,提高了搜索结果显示的信息量;提供了获取尖峰窗口的具体方法,并通过正向和逆向检测序列的尖峰窗口,大大提高了序列中尖峰被检测出来的概率,提高了计算精度;通过输入关键词,获得包含该关键词的原创微博数量随时间变化的连续曲线图,连续曲线图能够展现出与关键词相关的事件的演化过程,并在连续曲线的尖峰处显示能够概括微博内容的尖峰描述,提高了搜索结果显示的信息量。To sum up, the embodiment of the present invention obtains the local maximum value of the sequence by counting the sequence in which the number of search results changes over time according to peak detection, analyzes the cause of the local maximum value to obtain a peak description, and shows that the number of search results changes with time. The sequence of segment changes, and display the peak description at the peak position, realize the microblog information display results can show the trend of the number of microblog information over time, and the trend of the number of microblog information over time can show events related to keywords According to the evolution process of the peak description, users can know the main content of the microblog information without reading the microblog content in detail, which improves the amount of information displayed in the search results; provides a specific method to obtain the peak window, and through the forward and reverse detection sequence The peak window in the sequence greatly increases the probability of the peak being detected in the sequence and improves the calculation accuracy; by inputting a keyword, a continuous curve graph of the number of original microblogs containing the keyword changing over time is obtained, and the continuous curve graph can show The evolution process of events related to keywords, and the peak description that can summarize the content of Weibo is displayed at the peak of the continuous curve, which improves the amount of information displayed in the search results.
在本发明所提供的几个实施例中,应该理解到,所揭露的装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。In the several embodiments provided by the present invention, it should be understood that the disclosed devices and methods can be implemented in other ways. For example, the device embodiments described above are only illustrative. For example, the division of the units is only a logical function division. In actual implementation, there may be other division methods. For example, multiple units or components can be combined or May be integrated into another system, or some features may be ignored, or not implemented. In another point, the mutual coupling or direct coupling or communication connection shown or discussed may be through some interfaces, and the indirect coupling or communication connection of devices or units may be in electrical, mechanical or other forms.
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。The units described as separate components may or may not be physically separated, and the components shown as units may or may not be physical units, that is, they may be located in one place, or may be distributed to multiple network units. Part or all of the units can be selected according to actual needs to achieve the purpose of the solution of this embodiment.
另外,在本发明各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用硬件加软件功能单元的形式实现。In addition, each functional unit in each embodiment of the present invention may be integrated into one processing unit, each unit may exist separately physically, or two or more units may be integrated into one unit. The above-mentioned integrated units can be implemented in the form of hardware, or in the form of hardware plus software functional units.
上述以软件功能单元的形式实现的集成的单元,可以存储在一个计算机可读取存储介质中。上述软件功能单元存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)或处理器(processor)执行本发明各个实施例所述方法的部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(Read-Only Memory,ROM)、随机存取存储器(Random Access Memory,RAM)、磁碟或者光盘等各种可以存储程序代码的介质。The above-mentioned integrated units implemented in the form of software functional units may be stored in a computer-readable storage medium. The above-mentioned software functional units are stored in a storage medium, and include several instructions to make a computer device (which may be a personal computer, server, or network device, etc.) or a processor (processor) execute the methods described in various embodiments of the present invention. partial steps. The aforementioned storage medium includes: U disk, mobile hard disk, read-only memory (Read-Only Memory, ROM), random access memory (Random Access Memory, RAM), magnetic disk or optical disk and other various media that can store program codes. .
本领域技术人员可以清楚地了解到,为描述的方便和简洁,仅以上述各功能模块的划分进行举例说明,实际应用中,可以根据需要而将上述功能分配由不同的功能模块完成,即将装置的内部结构划分成不同的功能模块,以完成以上描述的全部或者部分功能。上述描述的装置的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。Those skilled in the art can clearly understand that for the convenience and brevity of description, only the division of the above-mentioned functional modules is used as an example for illustration. The internal structure of the system is divided into different functional modules to complete all or part of the functions described above. For the specific working process of the device described above, reference may be made to the corresponding process in the foregoing method embodiments, and details are not repeated here.
最后应说明的是:以上各实施例仅用以说明本发明的技术方案,而非对其限制;尽管参照前述各实施例对本发明进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分或者全部技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本发明各实施例技术方案的范围。Finally, it should be noted that: the above embodiments are only used to illustrate the technical solutions of the present invention, rather than limiting them; although the present invention has been described in detail with reference to the foregoing embodiments, those of ordinary skill in the art should understand that: It is still possible to modify the technical solutions described in the foregoing embodiments, or perform equivalent replacements for some or all of the technical features; and these modifications or replacements do not make the essence of the corresponding technical solutions deviate from the technical solutions of the various embodiments of the present invention. scope.
Claims (12)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510062379.3A CN104778202B (en) | 2015-02-05 | 2015-02-05 | The analysis method and system of event evolutionary process based on keyword |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510062379.3A CN104778202B (en) | 2015-02-05 | 2015-02-05 | The analysis method and system of event evolutionary process based on keyword |
Publications (2)
Publication Number | Publication Date |
---|---|
CN104778202A CN104778202A (en) | 2015-07-15 |
CN104778202B true CN104778202B (en) | 2018-08-14 |
Family
ID=53619666
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201510062379.3A Active CN104778202B (en) | 2015-02-05 | 2015-02-05 | The analysis method and system of event evolutionary process based on keyword |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN104778202B (en) |
Families Citing this family (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108255860B (en) * | 2016-12-29 | 2020-07-31 | 北京国双科技有限公司 | Keyword analysis processing method and device |
CN111708938B (en) * | 2020-05-27 | 2023-04-07 | 北京百度网讯科技有限公司 | Method, apparatus, electronic device, and storage medium for information processing |
CN113553407B (en) * | 2021-06-18 | 2022-09-27 | 北京百度网讯科技有限公司 | Event tracing method and device, electronic equipment and storage medium |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1879107A (en) * | 2003-09-30 | 2006-12-13 | Google公司 | Information retrieval based on historical data |
CN101364426A (en) * | 2007-08-08 | 2009-02-11 | 联发科技股份有限公司 | Memory control circuit and method thereof |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9424511B2 (en) * | 2011-11-09 | 2016-08-23 | Qualcomm Incorporated | Methods and apparatus for unsupervised neural component replay by referencing a pattern in neuron outputs |
-
2015
- 2015-02-05 CN CN201510062379.3A patent/CN104778202B/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1879107A (en) * | 2003-09-30 | 2006-12-13 | Google公司 | Information retrieval based on historical data |
CN101364426A (en) * | 2007-08-08 | 2009-02-11 | 联发科技股份有限公司 | Memory control circuit and method thereof |
Non-Patent Citations (2)
Title |
---|
"基于时间序列相似性的股价趋势预测研究";孙建乐;《万方数据 企业知识服务平台》;20140925;全文 * |
"基于语义统计分析的网络舆情挖掘技术研究";万源;《中国博士学位论文全文数据库 信息科技辑》;20121115(第11期);全文 * |
Also Published As
Publication number | Publication date |
---|---|
CN104778202A (en) | 2015-07-15 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
AU2017408801B2 (en) | User keyword extraction device and method, and computer-readable storage medium | |
US12189702B2 (en) | Expert detection in social networks | |
CN113590645B (en) | Searching method, searching device, electronic equipment and storage medium | |
US20140258283A1 (en) | Computing device and file searching method using the computing device | |
CN106886567B (en) | Microblog emergency detection method and device based on semantic extension | |
CN112559747B (en) | Event classification processing method, device, electronic equipment and storage medium | |
CN108804642A (en) | Search method, device, computer equipment and storage medium | |
CN107408115B (en) | Web site filter, method and medium for controlling access to content | |
CN105930527B (en) | Searching method and device | |
CN109299235B (en) | Knowledge base searching method, device and computer readable storage medium | |
CN112364625A (en) | Text screening method, device, equipment and storage medium | |
CN104778202B (en) | The analysis method and system of event evolutionary process based on keyword | |
CN106933878B (en) | Information processing method and device | |
US20160248724A1 (en) | Social Message Monitoring Method and Apparatus | |
CN108875050B (en) | Text-oriented digital evidence-obtaining analysis method and device and computer readable medium | |
CN110019763B (en) | Text filtering method, system, equipment and computer readable storage medium | |
CN103955526B (en) | Data storage method and device | |
CN103092838B (en) | A kind of method and device for obtaining English words | |
Zhang et al. | Effective and Fast Near Duplicate Detection via Signature‐Based Compression Metrics | |
CN107169065B (en) | Method and device for removing specific content | |
CN116561402A (en) | Method, device and server for acquiring target content information in webpage | |
CN111966948B (en) | Information delivery method, device, equipment and storage medium | |
CN114997136A (en) | Text matching method, knowledge base construction method and device | |
CN113360696A (en) | Image pairing method, device, equipment and storage medium | |
CN103793448A (en) | Article information providing method and system |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
EXSB | Decision made by sipo to initiate substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |