1 string.getWordFrequency
<string.getWordFrequency
minwordlength='minwordlength'
maxwordlength='maxwordlength'
ignore-xml-tags='ignore-xml-tags'
>
<text /> !
</string.getWordFrequency>
Attributes | |||||
---|---|---|---|---|---|
Name | Type | Required | Default | Description | |
Aminwordlength | string | Minimum number of characters of the words to count. | |||
Amaxwordlength | string | Maximum number of characters of the words to count. | |||
Aignore-xml-tags | boolean | Indicates if it has to ignore the xml tags, so that they are not processed. |
Arguments | |||||
---|---|---|---|---|---|
Name | Type | Required | Unique | Nullable | Description |
Etext | string |
Returns | |
---|---|
Type | Description |
HashMap | Returns the map with the pairs {word, frequency}. The attributes minwordlength and maxwordlength condition the return, because any word with length less than minwordlength or greater than maxwordlength is evaluated. |
Exceptions
requires 1 arguments, received: ...
The entry parameter has not been specified.
Show the times which the words of a text are repeated.
<xsql-script name='string._getWordFrequency_sample1'> <body> <set name='text'> test1 asdf asdf test2 sd test1 asdf asdf test2 sd </set> <println> <string.getWordFrequency minwordlength='3' maxwordlength ='15'> <text/> </string.getWordFrequency> </println> </body> </xsql-script>
Returns the map with the pairs word, frequency. The attributes minwordlength and maxwordlength condition the return, because in this case, any word less than three characters or more than fifteen is evaluated.
{test1=2, test2=2, asdf=4}
Show the times which the words in a string with the attribute ignore-xml-tags='false' are repeated.
<xsql-script name='string._getWordFrequency_sample2'> <body> <string.getWordFrequency minwordlength='3' ignore-xml-tags='false' > <string>A linux command to see the content of the directory <code>ls</code>. It serves to show the content.</string> </string.getWordFrequency> </body> </xsql-script>
The output with ignore-xml-tags='false' will be:
command=1|linux=1|for=2|content=2|directory=1|code=2|Serves=1|show=1|
Show the times which the words in a string with the attribute ignore-xml-tags='true' are repeated.
<xsql-script name='string._getWordFrequency_sample3'> <body> <string.getWordFrequency minwordlength='3' ignore-xml-tags='true' > <string>A linux command to see the content of the directory <code>ls</code>. It serves to show the content.</string> </string.getWordFrequency> </body> </xsql-script>
The output with ignore-xml-tags='true' will be:
command=1|linux=1|for=2|content=2|directory=1|Serves=1|show=1|
In this case, you can see that it has not processed the XML tags that in this case were <code> and </code>
In this example, it is obtained the frequency appearance of the words of at least 3 characters and the results are used to index the pages of the documentation.
<xsql-script name='string._getWordFrequency_sample4'> <body> <set name='m_map'> <string.getWordFrequency minwordlength='3'> <string><p_some_text /> </string> </string.getWordFrequency> </set> <!-- Security protection if the getWordFrequency returns null. --> <if> <expr><isnull><m_map /></isnull></expr> <then> <return /> </then> </if> <iterator name='m_i' type='key'> <in> <m_map/> </in> <do> <set name='m_key'> <m_i/> </set> <set name='m_frq'><map.get name='m_map'><m_key /></map.get></set> <!-- Each one of the tokens with the frequency are inserted --> <insert table='wic_word_tokens'> <column name='obj_id'><p_obj_id /></column> <column name='doc_id'><p_doc_id /></column> <column name='doc_word'><m_key /></column> <column name='doc_freq'><m_frq /></column> </insert> </do> </iterator> </body> </xsql-script>
Returns the map with the pairs word, frequency. The attributes minwordlength and maxwordlength condition the return, because in this case, any word with less than characters or more than fifteen is evaluated.