Python Data Engineer

Job Category: Engineering
Job Type: Full Time
Job Location: Princeton


At Semandex, we are dedicated to creating innovative products and solutions to help our customers derive actionable information from documents, images, video, and sensor observations faster than ever before.

We are looking for a problem solver who is eager to learn while working with a passionate team of like-minded professionals. We are proud to provide an open collaborative environment that encourages professional growth and task leadership.

The ideal candidate should have extensive and hands-on experience designing, implementing, testing and scaling Python software, as well as a broad knowledge of AI/ML algorithms and applications.

Responsibilities and Duties

  • Understanding the functional requirements of the software and exploring available options to enable the functionalities
  • Presenting and discussing proposed approaches with the team and arriving at the best viable option
  • Designing and implementing the designed Python microservices
  • Testing and deploying the services as part of the overall software
  • Documenting the research and relevant data
  • Collaborating with the team and contributing to a learning environment


Must Have

  • 2-5 years of experience with Python software development
  • Bachelor’s in Computer Science, Data Science or related fields
  • Strong problem-solving skills with an emphasis on product development
  • Strong experience in processing data and drawing insights from large data sets
  • Good familiarity with one or more libraries: pandas, NumPy, SciPy etc.
  • In-depth knowledge of spaCy and similar NLP libraries like NLTK, textacy etc.
  • Experience with Python development environments, including, but not limited to Jupyter, Google Colab notebooks, Matplotlib, Plotly, and geoplotlib.
  • Knowledge of advanced statistical techniques and concepts (regression, properties of distributions, statistical tests and proper usage, etc.) and experience with applications of statistical methods.
  • Experience creating and using advanced machine learning algorithms and statistics: regression, simulation, scenario analysis, modeling, clustering, decision trees, neural networks, etc.
  • Strong experience in a variety of machine learning techniques (clustering, decision tree learning, neural networks, deep learning etc.) and their advantages/drawbacks
  • Excellent written and verbal communication skills for documenting, reporting and presenting to customers
  • A drive to learn and master new technologies and techniques
  • US Citizenship

Nice if you have

  • Familiarity with one or more of TensorFlow, PyTorch, scikit-learn, Keras, Gensim and rest of Python’s AI ecosystem
  • Good knowledge of the NLP concepts TF-IDF, Bag of Words (BOW), word vectors, named entity recognition, part-of-speech tagging
  • Some familiarity with latest research in embedding models, transfer learning, supervised and unsupervised
  • Experience working with and creating data architectures
  • Experience with distributed data/computing tools: Map/Reduce, Hadoop, Hive, Spark, MongoDB.
  • Familiarity with modern microservices architectures and cloud environments e.g. Docker, AWS, GCP, Jenkins, Teamcity
  • Master’s in Computer Science, Data Science or related fields
  • High degree of analytical and problem-solving skills


  • Possibility of remote work
  • Medical, dental and vision coverage, FSA
  • Company matched 401K
  • Gym membership program
  • Tuition assistance

Semandex is an equal opportunity employer.

