One of the ways in which information is made findable is by tagging it with keywords. The issue with having people tag their work with keywords, is the keywords are often incomplete, missing relevant nuance, or viewed from a very narrow lens. The Termscape tool allows unstructured text to be mined to find a conceptual landscape in which hidden connections can be found.
In order to use the Termscape tool, you must have Elements that contain a "large" amount of information in a long-text property. "Description", "Abstract" or "Summary" fields are ideal to collect this kind of information. By a large amount of information, we suggest at a minimum 3-4 sentences from at least 5-6 Elements.
Creating a New Termscape
The tool is located behind the Admin button in the left-hand panel as "Analyst - Text Analysis".
Clicking this link will take you to the tool where you can start your Analysis. Select "Create New".
Step 1: Title and set the parameters of your Termscape
Give your Termscape a Title, a subtitle and description are not required, but may become useful as you build out several versions of a Termscape. There is a default list of Stop Words that words that will be excluded from the analysis due to the frequency of the words in any documentation. You can create a Termscape using a controlled Vocabulary (i.e. only using Data Science Methodology terms) that will search specifically map the landscape of your unstructured text using only those specified words. Note: A controlled vocabulary list will over-ride the stop words.
Items First - when this box is checked, the termscape will cluster the Documents, Summaries, or Descriptions from which the information is being pulled together. If this box is unchecked, the generated terms will be clustered together.
Set Min-Max Clusters - this range will set the minimum and maximum number of clusters of your Termscape. Once you have created a termscape, adjusting this setting may provide a more relevant set of conceptual clusters.
Set Number of SVD Dimensions - Keep this value at the default of 10 unless you have a small number of Elements that you will be pulling information from. This is your first troubleshooting step if/when the Termscape algorithm fails.
Set the Max number of Terms - This will be determined by the complexity of your final result.
Step 2: Select the properties containing your data
Clicking the "Document Patterns" button allows you to select the Element Types and Properties on those Element Types that contain the text you want to analyze.
Click "Add Pattern", and select the Element Type (Choose Data Source) in a similar way you would select a query in the Query Editor in the Exploratory Viewer. You can select more than one Element Type.
Once you have selected the Element Type(s) that contain the text properties you want to analyze, click the dropdown next to "Features:" to select the properties that contain the text. You can select more than one property.
Clicking the "Test Pattern" button will let you know how many results your query returns.
If your Element Type includes a property that distinguishes the Elements from each other, you can distinguish what that property is using the "Group By" button. This interface will pull the Element Types you have selected in your Document Pattern and suggest properties on that Element Type by which to alternatively color the clusters.
Once you have followed each of these steps, click "Create & Run"
Evaluating your Termscape
Once the algorithm has run, you should see a page that looks like the image below:
Click on the link labelled "Run Embedded, Show All". An example Termscape is shown in the image below. You can also select Termscapes that show just the terms (Run Embedded, Hide Docs) or that show just the Documents (Run Embedded, Hide Terms).
Search - You can search for a term in the text box, and can then see and explore documents that contain that term.
Group - If you selected a property(ies) using "Group By", they will be illustrated here.
Legend - Clusters (or other groups) will be shown in the bottom left. The terms represented in each cluster indicate words that occur in most frequently together within that context. Clicking on the terms will show both the related terms as well as the documents that the terms are derived from.
Term - in the termscape, each term is represented by a circle, clicking on that node will show you relationships throughout the graph.
Document - Each rectangle represents a document. Clicking on that node, will allow you to read the text from which the terms were generated, as well as see the relationships between documents based on their conceptual similarities.