Content Classification Engine is available from 19.2 and later
The Content Classification Engine is only available for enterprise licenses, or standard licenses with CCE+PATTERNSEARCH components. For information on the different license types, read about the Key Features on the Pricing page.
The Content Classification Engine (CCE) is a rule-driven content classification system that enables the generic labeling of files with metadata. This labeling enables key operations within FileCloud such as contextual file search and Data Leak Prevention.
CCE automates, streamlines, and strengthens the overall level of data leak prevention for an organization. Administrators and users can upload files and folders with the knowledge that they can be automatically classified according to their content, which helps ensure that sensitive data is immediately covered by the criteria outlined in the DLP plan. CCE rules are also applied retroactively to data that was uploaded before the rules were created, helping organizations protect legacy data.
Smart Classification is only available for files that are 1MB or larger.
Read more about managing metadata.
Smart DLP. Read more about
Before You Start
CCE will only function properly if Solr has been configured and your storage has been indexed. Additionally, administrators must have created at least one set of metadata in order for the classification process to operate.
- Since rules that apply to the same metadata attribute often result in unexpected classification, each rule should have a unique metadata attribute.
- To prevent overwriting metadata intentionally added by users, as of FileCloud Version 184.108.40.20609, CCE does not overwrite metadata it didn’t add itself. Users must remove manually added metadata set values to allow CCE to add its own metadata.
- As of FileCloud Version 220.127.116.1109, CCE uses Perl Compatible Regular Expressions (PCRE), which enables it to support a richer set of regular expressions. For example, the character class \d which represents a single number, is now usable.
If you upgrade from a previous version of FileCloud, review your CCE rules and existing patterns to confirm that they still classify as expected.
- As of FileCloud Version 18.104.22.16809, CCE updates classification if a file no longer meets the condition of a rule after it is updated and re-uploaded. For example, if a file with a credit card number that is classified as PII is re-uploaded without the credit card number, the PII classification is removed.
- Empty files cannot be indexed and classified.
- As of FileCloud Version 22.214.171.12409, the default maximum size for indexed files is 10MB; therefore, by default, files larger than 10MB are not classifiable by CCE and are not available for content search.
- As of FileCloud Version 20.3, if you have OCR enabled, CCE scans image and PDF files for matching patterns. To enable OCR, see Enabling Solr OCR.
Get Started with CCE
The CCE Crawler is an automated tool that classifies files and folders after a rule has been enabled. This helps to ensure that all content is classified according to the defined and enabled rules regardless of when the upload occurred or will occur.
Automating the Crawler
To control the automation of the classification process, as well as choosing when the crawler runs, administrators can use Cron Jobs.
The CCE crawler will not run unless manually enabled or executed by a Cron job.
Manually Run the Crawler
To manually run the crawler, click the blue button on the row of the rule you would like the crawler to use for classification. The amount of time needed for the completion of the crawl will depend on the number and size of files being classified, as well as the complexity of enabled CCE rules.
You may manually execute a rule that is not enabled (Auto-classification Enabled is FALSE). After you click the arrow, your screen displays the message This rule is disabled but it can classify files when manually executed. Proceed to execute the rule? Click OK to execute the rule.
If you edit a currently executing rule and click Save, rule execution is aborted and Status is set to Unexecuted.