You can also see Java, Python, Cython, Swift, C, C++, or C# repository.
To check if you have a compatible version of Node.js installed, use the following command:
node -v
You can find the latest version of Node.js here.
Install the latest version of Git.
npm install nlptoolkit-corpus
In order to work on code, create a fork from GitHub page. Use Git for cloning the code to your local or below line for Ubuntu:
git clone <your-fork-git-link>
A directory called util will be created. Or you can use below link for exploring the code:
git clone https://github.com/starlangsoftware/corpus-js.git
Steps for opening the cloned project:
- Start IDE
- Select File | Open from main menu
- Choose
Corpus-Jsfile - Select open as project option
- Couple of seconds, dependencies will be downloaded.
To store a corpus in memory
a = Corpus("derlem.txt");
If this corpus is split with dots but not in sentences
constructor(fileName: string = undefined, splitterOrChecker: SentenceSplitter = undefined)
To eliminate the non-Turkish sentences from the corpus
constructor(fileName: string = undefined, splitterOrChecker: LanguageChecker = undefined)
The number of sentences in the corpus
sentenceCount(): number
To get ith sentence in the corpus
getSentence(index: number): Sentence
TurkishSplitter class is used to split the text into sentences in accordance with the . rules of Turkish.
split(line: string): Array<Sentence>
- main and types are important when this package will be imported.
"main": "dist/index.js",
"types": "dist/index.d.ts",
- Dependencies should be maximum (not only direct but also indirect references should also be given), everything directly in the code should be given here.
"dependencies": {
"nlptoolkit-corpus": "^1.0.12",
"nlptoolkit-dictionary": "^1.0.14",
"nlptoolkit-morphologicalanalysis": "^1.0.19",
"nlptoolkit-xmlparser": "^1.0.7"
}
- Compiler flags currently includes nodeNext for importing.
"compilerOptions": {
"outDir": "dist",
"module": "nodeNext",
"sourceMap": true,
"noImplicitAny": true,
"removeComments": false,
"declaration": true,
},
- tests, node_modules and dist should be excluded.
"exclude": [
"tests",
"node_modules",
"dist"
]
- Should include all ts classes.
export * from "./CategoryType"
export * from "./InterlingualDependencyType"
export * from "./InterlingualRelation"
export * from "./Literal"
- Add data files to the project folder. Subprojects should include all data files of the parent projects.
- Classes should be defined as exported.
export class JCN extends ICSimilarity{
- Do not forget to comment each function.
/**
* Computes JCN wordnet similarity metric between two synsets.
* @param synSet1 First synset
* @param synSet2 Second synset
* @return JCN wordnet similarity metric between two synsets
*/
computeSimilarity(synSet1: SynSet, synSet2: SynSet): number {
- Function names should follow caml case.
setSynSetId(synSetId: string){
- Write getter and setter methods.
getRelation(index: number): Relation{
setName(name: string){
- Use standard javascript test style.
describe('SimilarityPathTest', function() {
describe('SimilarityPathTest', function() {
it('testComputeSimilarity', function() {
let turkish = new WordNet();
let similarityPath = new SimilarityPath(turkish);
assert.strictEqual(32.0, similarityPath.computeSimilarity(turkish.getSynSetWithId("TUR10-0656390"), turkish.getSynSetWithId("TUR10-0600460")));
assert.strictEqual(13.0, similarityPath.computeSimilarity(turkish.getSynSetWithId("TUR10-0412120"), turkish.getSynSetWithId("TUR10-0755370")));
assert.strictEqual(13.0, similarityPath.computeSimilarity(turkish.getSynSetWithId("TUR10-0195110"), turkish.getSynSetWithId("TUR10-0822980")));
});
});
});
- Enumerated types should be declared with enum.
export enum CategoryType {
MATHEMATICS, SPORT, MUSIC, SLANG, BOTANIC,
PLURAL, MARINE, HISTORY, THEOLOGY, ZOOLOGY,
METAPHOR, PSYCHOLOGY, ASTRONOMY, GEOGRAPHY, GRAMMAR,
MILITARY, PHYSICS, PHILOSOPHY, MEDICAL, THEATER,
ECONOMY, LAW, ANATOMY, GEOMETRY, BUSINESS,
PEDAGOGY, TECHNOLOGY, LOGIC, LITERATURE, CINEMA,
TELEVISION, ARCHITECTURE, TECHNICAL, SOCIOLOGY, BIOLOGY,
CHEMISTRY, GEOLOGY, INFORMATICS, PHYSIOLOGY, METEOROLOGY,
MINERALOGY
}
- If there are multiple constructors for a class, define them as constructor1, constructor2, ..., then from the original constructor call these methods.
constructor1(symbol: any){
constructor2(symbol: any, multipleFile: MultipleFile) {
constructor(symbol: any, multipleFile: MultipleFile = undefined) {
if (multipleFile == undefined){
this.constructor1(symbol);
} else {
this.constructor2(symbol, multipleFile);
}
}
- Importing should be done via import method with referencing the node-modules.
import {Corpus} from "nlptoolkit-corpus/dist/Corpus";
import {Sentence} from "nlptoolkit-corpus/dist/Sentence";
- Use xmlparser package for parsing xml files.
var doc = new XmlDocument("test.xml")
doc.parse()
let root = doc.getFirstChild()
let firstChild = root.getFirstChild()
