Converts a text into a words and frequency map. Returns the map with the pairs word, frequency.

1 string.getWordFrequency

<string.getWordFrequency
    minwordlength='minwordlength'
    maxwordlength='maxwordlength'
    ignore-xml-tags='ignore-xml-tags'
>
    <text /> !
</string.getWordFrequency>

Exceptions

requires 1 arguments, received: ...

The entry parameter has not been specified.

Example

Show the times which the words of a text are repeated.

Copy
<xsql-script name='string._getWordFrequency_sample1'>
    <body>
        <set name='text'>
            test1 asdf asdf test2 sd test1 asdf asdf test2 sd
        </set>
        <println>
            <string.getWordFrequency minwordlength='3' maxwordlength ='15'>
                <text/>
            </string.getWordFrequency>
        </println>
    </body>
</xsql-script>

Returns the map with the pairs word, frequency. The attributes minwordlength and maxwordlength condition the return, because in this case, any word less than three characters or more than fifteen is evaluated.

Copy
{test1=2, test2=2, asdf=4}
Example

Show the times which the words in a string with the attribute ignore-xml-tags='false' are repeated.

Copy
<xsql-script name='string._getWordFrequency_sample2'>
    <body>
        <string.getWordFrequency
            minwordlength='3'
            ignore-xml-tags='false'
        >
            <string>A linux command to see the content of the directory <code>ls</code>. It serves to show the content.</string>
        </string.getWordFrequency>
    </body>
</xsql-script>

The output with ignore-xml-tags='false' will be:

Copy
command=1|linux=1|for=2|content=2|directory=1|code=2|Serves=1|show=1|
Example

Show the times which the words in a string with the attribute ignore-xml-tags='true' are repeated.

Copy
<xsql-script name='string._getWordFrequency_sample3'>
    <body>
        <string.getWordFrequency
            minwordlength='3'
            ignore-xml-tags='true'
        >
            <string>A linux command to see the content of the directory <code>ls</code>. It serves to show the content.</string>
        </string.getWordFrequency>
    </body>
</xsql-script>

The output with ignore-xml-tags='true' will be:

Copy
command=1|linux=1|for=2|content=2|directory=1|Serves=1|show=1|

In this case, you can see that it has not processed the XML tags that in this case were <code> and </code>

Example

In this example, it is obtained the frequency appearance of the words of at least 3 characters and the results are used to index the pages of the documentation.

Copy
<xsql-script name='string._getWordFrequency_sample4'>
    <body>
    <set name='m_map'>
        <string.getWordFrequency minwordlength='3'>
            <string><p_some_text /> </string>
        </string.getWordFrequency>
    </set>
    <!-- Security protection if the getWordFrequency returns null. -->
    <if>
        <expr><isnull><m_map /></isnull></expr>
        <then>
            <return />
        </then>
    </if>
        <iterator name='m_i' type='key'>
            <in>
                <m_map/>
            </in>
            <do>
                <set name='m_key'> <m_i/> </set>
                <set name='m_frq'><map.get name='m_map'><m_key /></map.get></set>
                <!-- Each one of the tokens with the frequency are inserted -->
                <insert table='wic_word_tokens'>
                    <column name='obj_id'><p_obj_id /></column>
                    <column name='doc_id'><p_doc_id /></column>
                    <column name='doc_word'><m_key /></column>
                    <column name='doc_freq'><m_frq /></column>
                </insert>
            </do>
        </iterator>
    </body>
</xsql-script>

Returns the map with the pairs word, frequency. The attributes minwordlength and maxwordlength condition the return, because in this case, any word with less than characters or more than fifteen is evaluated.