Turkish FrameNet

FrameNet

Introduced in 1997, FrameNet (Lowe, 1997; Baker et al., 1998; Fillmore and Atkins, 1998; Johnson et al., 2001) has been developed by the International Computer Science Institute in Berkeley, California. It is a growing computational lexicography project that offers in-depth semantic information on English words and predicates. Based on the theory of Frame Semantics by Fillmore (Fillmore and others, 1976; Fillmore, 2006), FrameNet offers semantic information on predicate-argument structure in a way that is loosely similar to wordnet (Kilgarriff and Fellbaum, 2000).

In FrameNet, predicates and related lemmas are categorized under frames. The notion of frame here is thoroughly described in Frame Semantics as a schematic representation of an event, state or relationship. These semantic information packets called frames are constituted of individual lemmas (also known as Lexical Units) and frame elements (such as the agent, theme, instrument, duration, manner, direction etc.). Frame elements can be described as semantic roles that are related to the frame. Lexical Units, or lemmas, are linked to a frame through a single sense. For instance, the lemma ”roast” can mean to criticise harshly or to cook by exposing to dry heat. With its latter meaning, ”roast” belongs to the Apply Heat frame.

Turkish FrameNet

In this version of Turkish FrameNet, we aimed to release a version of Turkish FrameNet that captures at least a considerable majority of the most frequent predicates, thus offering a valuable and practical resource from day one. Because Turkish is a low-resource language, it was important to ensure that FrameNet had enough coverage that it could be incorporated into NLP solutions as soon as it is released to the public.

We took a closer look at Turkish WordNet and designated 8 domains that would possibly contain the most frequent predicates in Turkish: Activity, Cause, Change, Motion, Cognition, Perception, Judgement and Commerce. For the first phase, the focus was on the thorough annotation of these domains. Frames from English FrameNet were adopted when possible and new frames were created when needed. In the next phase, team of annotators will attack the Turkish predicate compilation offered by TRopBank and KeNet for a lemma-by-lemma annotation process. This way, both penetration and coverage of the Turkish FrameNet will be increased.

Video Lectures

For Developers

You can also see Java, Python, Cython, Swift, C, Js, Php, or C# repository.

Requirements

CPP

To check if you have compatible C++ Compiler installed,

Open CLion IDE
Preferences >Build,Execution,Deployment > Toolchain

Git

Install the latest version of Git.

Download Code

In order to work on code, create a fork from GitHub page. Use Git for cloning the code to your local or below line for Ubuntu:

git clone <your-fork-git-link>

A directory called DataStructure will be created. Or you can use below link for exploring the code:

git clone https://github.com/starlangsoftware/TurkishFrameNet-CPP.git

Open project with CLion IDE

To import projects from Git with version control:

Open CLion IDE , select Get From Version Control.
In the Import window, click URL tab and paste github URL.
Click open as Project.

Result: The imported project is listed in the Project Explorer view and files are loaded.

Compile

From IDE

After being done with the downloading and opening project, select Build Project option from Build menu. After compilation process, user can run TurkishFrameNet-CPP.

Detailed Description

FrameNet

FrameNet'i okumak ve tüm Frameleri hafızada tutmak için

a = new FrameNet();

Frameleri tek tek gezmek için

for (int i = 0; i < a.size(); i++){
	Frame frame = a.getFrame(i);
}

Bir fiile ait olan Frameleri bulmak için

frames = a.getFrames("TUR10-1234560")

Frame

Bir framein lexical unitlerini getirmek için

string getLexicalUnit(int index)

Bir framein frame elementlerini getirmek için

string getFrameElement(int index)

Cite

@inproceedings{marsan20,
title = {{B}uilding the {T}urkish {F}rame{N}et},
year = {2021},
author = {B. Marsan and N. Kara and M. Ozcelik and B. N. Arican and N. Cesur and A. Kuzgun and E. Saniyar and O. Kuyrukcu and O. T. Y{\i}ld{\i}z},
booktitle = {Proceedings of GWC 2021}
}

For Contibutors

Conan Setup

First install conan.

pip install conan

Instructions are given in the following page:

https://docs.conan.io/2/installation.html

Add conan remote 'ozyegin' with IP: 104.247.163.162 with the following command:

conan remote add ozyegin http://104.247.163.162:8081/artifactory/api/conan/conan-local --insert

Use the comman conan list to check for installed packages. Probably there are no installed packages.

conan list

conanfile.py file

Put the correct dependencies in the requires part

    requires = ["math/1.0.0", "classification/1.0.0"]

Default settings are:

    settings = "os", "compiler", "build_type", "arch"
    options = {"shared": [True, False], "fPIC": [True, False]}
    default_options = {"shared": True, "fPIC": True}
    exports_sources = "src/*", "Test/*"

    def layout(self):
        cmake_layout(self, src_folder="src")

    def generate(self):
        tc = CMakeToolchain(self)
        tc.generate()
        deps = CMakeDeps(self)
        deps.generate()

    def build(self):
        cmake = CMake(self)
        cmake.configure()
        cmake.build()

    def package(self):
        copy(conanfile=self, keep_path=False, src=join(self.source_folder), dst=join(self.package_folder, "include"), pattern="*.h")
        copy(conanfile=self, keep_path=False, src=self.build_folder, dst=join(self.package_folder, "lib"), pattern="*.a")
        copy(conanfile=self, keep_path=False, src=self.build_folder, dst=join(self.package_folder, "lib"), pattern="*.so")
        copy(conanfile=self, keep_path=False, src=self.build_folder, dst=join(self.package_folder, "lib"), pattern="*.dylib")
        copy(conanfile=self, keep_path=False, src=self.build_folder, dst=join(self.package_folder, "bin"), pattern="*.dll")

    def package_info(self):
        self.cpp_info.libs = ["ComputationalGraph"]

CMakeLists.txt file

Set the C++ standard with compiler flags.

	set(CMAKE_CXX_STANDARD 20)
	set(CMAKE_CXX_FLAGS "-O3")

Dependent packages should be given with find_package.

	find_package(util_c REQUIRED)
	find_package(data_structure_c REQUIRED)

For library part, use add_library and target_link_libraries commands. Use m library for math linker in Linux.

	add_library(Math src/Distribution.cpp src/Distribution.h src/DiscreteDistribution.cpp src/DiscreteDistribution.h src/Vector.cpp src/Vector.h src/Eigenvector.cpp src/Eigenvector.h src/Matrix.cpp src/Matrix.h src/Tensor.cpp src/Tensor.h)
	target_link_libraries(Math util_c::util_c data_structure_c::data_structure_c m)

For executable tests, use add_executable and target_link_libraries commands. Use m library for math linker in Linux.

	add_executable(DiscreteDistributionTest src/Distribution.cpp src/Distribution.h src/DiscreteDistribution.cpp src/DiscreteDistribution.h src/Vector.cpp src/Vector.h src/Eigenvector.cpp src/Eigenvector.h src/Matrix.cpp src/Matrix.h src/Tensor.cpp src/Tensor.h Test/DiscreteDistributionTest.cpp)
	target_link_libraries(DiscreteDistributionTest util_c::util_c data_structure_c::data_structure_c m)

Data files

Add data files to the cmake-build-debug folder.

C++ files

If needed, comparator operators == and < should be implemented for map and set data structures.

    bool operator==(const Word &anotherWord) const{
        return (name == anotherWord.name);
    }
    bool operator<(const Word &anotherWord) const{
        return (name < anotherWord.name);
    }

Do not forget to comment each function.

	/**
 	* A constructor of Word class which gets a String name as an input and assigns to the name variable.
	*
	* @param _name String input.
 	*/
	Word::Word(const string &_name) {

Function names should follow caml case.

	int Word::charCount() const

Write getter and setter methods.

	string Word::getName() const
	void Word::setName(const string &_name)

Use catch.hpp for testing purposes. Add

#define CATCH_CONFIG_MAIN  // This tells Catch to provide a main() - only do this in one cpp file

line in only one of the test files. Add

#include "catch.hpp"

line in all test files. Example test file is given below:

TEST_CASE("DictionaryTest") {
    TxtDictionary lowerCaseDictionary = TxtDictionary("lowercase.txt", "turkish_misspellings.txt");
    TxtDictionary mixedCaseDictionary = TxtDictionary("mixedcase.txt", "turkish_misspellings.txt");
    TxtDictionary dictionary = TxtDictionary();
    SECTION("testSize"){
        REQUIRE(29 == lowerCaseDictionary.size());
        REQUIRE(58 == mixedCaseDictionary.size());
        REQUIRE(62113 == dictionary.size());
    }
    SECTION("testGetWord"){
        for (int i = 0; i < dictionary.size(); i++){
            REQUIRE_FALSE(nullptr == dictionary.getWord(i));
        }
    }
    SECTION("testLongestWordSize"){
        REQUIRE(1 == lowerCaseDictionary.longestWordSize());
        REQUIRE(1 == mixedCaseDictionary.longestWordSize());
        REQUIRE(21 == dictionary.longestWordSize());
    }

Enumerated types should be declared with enum class.

	enum class Pos {
		ADJECTIVE,
		NOUN,
		VERB,
		ADVERB,

Every header file should start with

	#ifndef MATH_DISTRIBUTION_H
	#define MATH_DISTRIBUTION_H

and end with

	#endif //MATH_DISTRIBUTION_H

Do not forget to use const expression for parameters, if they will not be changed in the function.

	void Word::setName(const string &_name);

Do not forget to use const expression for methods, which do not modify any class attribute. Also use [[dodiscard]]

	[[nodiscard]] bool isPunctuation() const;

Use xmlparser package for parsing xml files.

    auto* doc = new XmlDocument("test.xml");
    doc->parse();
    XmlElement* root = doc->getFirstChild();
    XmlElement* firstChild = root->getFirstChild();

Data structures: Use map for hash map, unordered_map for linked hash map, vector for array list, unordered_set for hash set

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Turkish FrameNet

FrameNet