Introduced in 1997, FrameNet (Lowe, 1997; Baker et al., 1998; Fillmore and Atkins, 1998; Johnson et al., 2001) has been developed by the International Computer Science Institute in Berkeley, California. It is a growing computational lexicography project that offers in-depth semantic information on English words and predicates. Based on the theory of Frame Semantics by Fillmore (Fillmore and others, 1976; Fillmore, 2006), FrameNet offers semantic information on predicate-argument structure in a way that is loosely similar to wordnet (Kilgarriff and Fellbaum, 2000).
In FrameNet, predicates and related lemmas are categorized under frames. The notion of frame here is thoroughly described in Frame Semantics as a schematic representation of an event, state or relationship. These semantic information packets called frames are constituted of individual lemmas (also known as Lexical Units) and frame elements (such as the agent, theme, instrument, duration, manner, direction etc.). Frame elements can be described as semantic roles that are related to the frame. Lexical Units, or lemmas, are linked to a frame through a single sense. For instance, the lemma ”roast” can mean to criticise harshly or to cook by exposing to dry heat. With its latter meaning, ”roast” belongs to the Apply Heat frame.
In this version of Turkish FrameNet, we aimed to release a version of Turkish FrameNet that captures at least a considerable majority of the most frequent predicates, thus offering a valuable and practical resource from day one. Because Turkish is a low-resource language, it was important to ensure that FrameNet had enough coverage that it could be incorporated into NLP solutions as soon as it is released to the public.
We took a closer look at Turkish WordNet and designated 8 domains that would possibly contain the most frequent predicates in Turkish: Activity, Cause, Change, Motion, Cognition, Perception, Judgement and Commerce. For the first phase, the focus was on the thorough annotation of these domains. Frames from English FrameNet were adopted when possible and new frames were created when needed. In the next phase, team of annotators will attack the Turkish predicate compilation offered by TRopBank and KeNet for a lemma-by-lemma annotation process. This way, both penetration and coverage of the Turkish FrameNet will be increased.
You can also see Java, Python, Cython, Swift, C, Js, Php, or C# repository.
To check if you have compatible C++ Compiler installed,
- Open CLion IDE
- Preferences >Build,Execution,Deployment > Toolchain
Install the latest version of Git.
In order to work on code, create a fork from GitHub page. Use Git for cloning the code to your local or below line for Ubuntu:
git clone <your-fork-git-link>
A directory called DataStructure will be created. Or you can use below link for exploring the code:
git clone https://github.com/starlangsoftware/TurkishFrameNet-CPP.git
To import projects from Git with version control:
-
Open CLion IDE , select Get From Version Control.
-
In the Import window, click URL tab and paste github URL.
-
Click open as Project.
Result: The imported project is listed in the Project Explorer view and files are loaded.
From IDE
After being done with the downloading and opening project, select Build Project option from Build menu. After compilation process, user can run TurkishFrameNet-CPP.
FrameNet'i okumak ve tüm Frameleri hafızada tutmak için
a = new FrameNet();
Frameleri tek tek gezmek için
for (int i = 0; i < a.size(); i++){
Frame frame = a.getFrame(i);
}
Bir fiile ait olan Frameleri bulmak için
frames = a.getFrames("TUR10-1234560")
Bir framein lexical unitlerini getirmek için
string getLexicalUnit(int index)
Bir framein frame elementlerini getirmek için
string getFrameElement(int index)
@inproceedings{marsan20,
title = {{B}uilding the {T}urkish {F}rame{N}et},
year = {2021},
author = {B. Marsan and N. Kara and M. Ozcelik and B. N. Arican and N. Cesur and A. Kuzgun and E. Saniyar and O. Kuyrukcu and O. T. Y{\i}ld{\i}z},
booktitle = {Proceedings of GWC 2021}
}
- First install conan.
pip install conan
Instructions are given in the following page:
https://docs.conan.io/2/installation.html
- Add conan remote 'ozyegin' with IP: 104.247.163.162 with the following command:
conan remote add ozyegin http://104.247.163.162:8081/artifactory/api/conan/conan-local --insert
- Use the comman conan list to check for installed packages. Probably there are no installed packages.
conan list
- Put the correct dependencies in the requires part
requires = ["math/1.0.0", "classification/1.0.0"]
- Default settings are:
settings = "os", "compiler", "build_type", "arch"
options = {"shared": [True, False], "fPIC": [True, False]}
default_options = {"shared": True, "fPIC": True}
exports_sources = "src/*", "Test/*"
def layout(self):
cmake_layout(self, src_folder="src")
def generate(self):
tc = CMakeToolchain(self)
tc.generate()
deps = CMakeDeps(self)
deps.generate()
def build(self):
cmake = CMake(self)
cmake.configure()
cmake.build()
def package(self):
copy(conanfile=self, keep_path=False, src=join(self.source_folder), dst=join(self.package_folder, "include"), pattern="*.h")
copy(conanfile=self, keep_path=False, src=self.build_folder, dst=join(self.package_folder, "lib"), pattern="*.a")
copy(conanfile=self, keep_path=False, src=self.build_folder, dst=join(self.package_folder, "lib"), pattern="*.so")
copy(conanfile=self, keep_path=False, src=self.build_folder, dst=join(self.package_folder, "lib"), pattern="*.dylib")
copy(conanfile=self, keep_path=False, src=self.build_folder, dst=join(self.package_folder, "bin"), pattern="*.dll")
def package_info(self):
self.cpp_info.libs = ["ComputationalGraph"]
- Set the C++ standard with compiler flags.
set(CMAKE_CXX_STANDARD 20)
set(CMAKE_CXX_FLAGS "-O3")
- Dependent packages should be given with find_package.
find_package(util_c REQUIRED)
find_package(data_structure_c REQUIRED)
- For library part, use add_library and target_link_libraries commands. Use m library for math linker in Linux.
add_library(Math src/Distribution.cpp src/Distribution.h src/DiscreteDistribution.cpp src/DiscreteDistribution.h src/Vector.cpp src/Vector.h src/Eigenvector.cpp src/Eigenvector.h src/Matrix.cpp src/Matrix.h src/Tensor.cpp src/Tensor.h)
target_link_libraries(Math util_c::util_c data_structure_c::data_structure_c m)
- For executable tests, use add_executable and target_link_libraries commands. Use m library for math linker in Linux.
add_executable(DiscreteDistributionTest src/Distribution.cpp src/Distribution.h src/DiscreteDistribution.cpp src/DiscreteDistribution.h src/Vector.cpp src/Vector.h src/Eigenvector.cpp src/Eigenvector.h src/Matrix.cpp src/Matrix.h src/Tensor.cpp src/Tensor.h Test/DiscreteDistributionTest.cpp)
target_link_libraries(DiscreteDistributionTest util_c::util_c data_structure_c::data_structure_c m)
- Add data files to the cmake-build-debug folder.
- If needed, comparator operators == and < should be implemented for map and set data structures.
bool operator==(const Word &anotherWord) const{
return (name == anotherWord.name);
}
bool operator<(const Word &anotherWord) const{
return (name < anotherWord.name);
}
- Do not forget to comment each function.
/**
* A constructor of Word class which gets a String name as an input and assigns to the name variable.
*
* @param _name String input.
*/
Word::Word(const string &_name) {
- Function names should follow caml case.
int Word::charCount() const
- Write getter and setter methods.
string Word::getName() const
void Word::setName(const string &_name)
- Use catch.hpp for testing purposes. Add
#define CATCH_CONFIG_MAIN // This tells Catch to provide a main() - only do this in one cpp file
line in only one of the test files. Add
#include "catch.hpp"
line in all test files. Example test file is given below:
TEST_CASE("DictionaryTest") {
TxtDictionary lowerCaseDictionary = TxtDictionary("lowercase.txt", "turkish_misspellings.txt");
TxtDictionary mixedCaseDictionary = TxtDictionary("mixedcase.txt", "turkish_misspellings.txt");
TxtDictionary dictionary = TxtDictionary();
SECTION("testSize"){
REQUIRE(29 == lowerCaseDictionary.size());
REQUIRE(58 == mixedCaseDictionary.size());
REQUIRE(62113 == dictionary.size());
}
SECTION("testGetWord"){
for (int i = 0; i < dictionary.size(); i++){
REQUIRE_FALSE(nullptr == dictionary.getWord(i));
}
}
SECTION("testLongestWordSize"){
REQUIRE(1 == lowerCaseDictionary.longestWordSize());
REQUIRE(1 == mixedCaseDictionary.longestWordSize());
REQUIRE(21 == dictionary.longestWordSize());
}
- Enumerated types should be declared with enum class.
enum class Pos {
ADJECTIVE,
NOUN,
VERB,
ADVERB,
- Every header file should start with
#ifndef MATH_DISTRIBUTION_H
#define MATH_DISTRIBUTION_H
and end with
#endif //MATH_DISTRIBUTION_H
- Do not forget to use const expression for parameters, if they will not be changed in the function.
void Word::setName(const string &_name);
- Do not forget to use const expression for methods, which do not modify any class attribute. Also use [[dodiscard]]
[[nodiscard]] bool isPunctuation() const;
- Use xmlparser package for parsing xml files.
auto* doc = new XmlDocument("test.xml");
doc->parse();
XmlElement* root = doc->getFirstChild();
XmlElement* firstChild = root->getFirstChild();
- Data structures: Use map for hash map, unordered_map for linked hash map, vector for array list, unordered_set for hash set
