Intern, Linguistic Data Engineering Team - Summer 2018
Sorry, this job was removed at 11:56 a.m. (EST) on Saturday, May 12, 2018
By clicking Apply Now you agree to share your profile information with the hiring company.
About the Job
Basis Technology is seeking a Linguistic Data Engineering Intern to be a part of a growing data team in support of several text analytics products. This person will work with multiple discrete engineering teams providing quality data to evaluate and further the development of natural language processing tools as well as consult on the language specific aspects of multilingual text.
Responsibilities:
- Assist with managing large scale text mining, data acquisition and annotation projects
- Derive meaningful metrics from data annotation tasks
- Describe and demonstrate linguistic phenomena on a variety of languages
- Survey and Catalogue new data releases and best practices in data maintenance, conversion and analytics
Qualifications:
- strong scripting abilities, especially python
- Ability to write and revise annotation guidelines
- Knowledge of Linguistics including
- tokenization
- part of speech
- morphology
- grammar structures
- Familiarity with linguistic community resources
- especially treebanks but also
- CredBank, ClueWeb, CommonCrawl and other AWS hosted sets
- Experience working with and modifying annotation tools such as WebAnno, brat, GATE
- Nice to have:
- Experience working with crowdsourcing platforms, Mechanical Turk, Crowdflower
- Experience with finite state automata, especially Xerox FST
- Proficiency in at least one language in addition to English
- Experience with conversion, storage, version control and maintenance tasks for large multilingual text collections
Read Full Job Description