This description is supported by a more extensive requirements document.
Overview
We are looking to establish a text classification model that will create a mapping between a dictionary (JSON) of trip types with locales or destinations which are individually described in "articles" in Wiki Voyage. There are three specific outcomes that can be achieved progressively across a few sprints:
Extract and capture keywords from each WV article representing a locale
Classify each locale as "a good place to visit" based on a separate set of keywords or dictionary that will provided
Classify each locale as an appropriate place to experience each trip type
A schema for the output to JSON documents has been defined and we will confirm with the ML engineer.
Content Sources
Locale content is approximately 30,000 Wiki Voyage articles will have already been "cleaned" by the time we start to minimize pre-processing. This will be confirmed, but the total corpus should not exceed 1 Gb
Trip Type dictionary is a JSON document with a consistent structure to facilitate extraction of keywords and phrases
"Good Place to Visit" keyword list
Implementation
We anticipate organizing the work along a few short sprints that align the outputs and a logical NLP/text classification process, but will perform final planning to confirm this with the ML engineer. The model will be implemented in AWS and we would like it to be automatically retrained as new locale content is collected and as the trip type dictionary expands or is update. We will explain our architecture in order to enable this.
The developer can develop in our AWS environment or using their own tools or sandbox, but the model must run on AWS and be managed in the future based on our DevOps standards.
We will provide the content structure of the WV articles other data in S3 as well as work with the ML engineer to confirm the implementation and configuration within our AWS environment.
Skills and Collaboration
Knowledge and experience performing text classification modeling and putting into production (working an architect and cloud engineer)
Python to perform data source preprocessing
Proactive communication and affinity for collaboration
Secondary Objectives
Although not part of the ML engineer\'s scope, we will be happy to have insight that will help us with our secondary objectives, including:
Data Set Adjustments Insight
Initial validation of the key word definition to determine approaches to enrich the Trip Type dictionary for improved mapping in future iterations including understanding the strengths and weaknesses of the existing Wiki Voyage data and article quality and variability key word format or structure key words existing in nested Trip Types
Gain perspective on Locale hierarchy levels at which to pursue mapping (region, country, subdivision, city, district)
Consider implications for \xe2\x80\x9crelated\xe2\x80\x9d trip types
Near Term Model Tuning Priorities and Opportunities
Gain perspective on how Locales can be prioritized for a Trip Type
Evaluate \xe2\x80\x9cgood place to visit\xe2\x80\x9d key words to determine if categories should be established
Understand how to process \xe2\x80\x9cinconclusive\xe2\x80\x9d scores, for example\xe2\x80\xa6
Advanced Model Expansion
Evaluate each Locale based on a combination of the two classifications
Understand the relative differences, advantages, and future opportunities between Bag of Words and Word Embeddings
Clarity as to the nuances of mapping to improve the model and next steps, including applying supervised learning
Opportunity and considerations for adopting neural networks as a more advanced approach
Architecture Standards
Confirm data structure and \xe2\x80\x9cend points\xe2\x80\x9d to support Traveler interaction (associated with item above regarding form of output
Begin to formalize tools and architecture that fit into the the broader platform architecture
Beware of fraud agents! do not pay money to get a job
MNCJobz.com will not be responsible for any payment made to a third-party. All Terms of Use are applicable.