All jobs

[VCK] Senior Data Engineer (AI Ingestion Platform)

100% Remote Full-time Open now

Company Description

We are Software Mind, an awesome team of engineers who are ready to ramp up any top-notch company’s projects! Our aim? To always be one step ahead. Become part of a multicultural company in constant growth with an excellent work environment certified by Great Place To Work!

Job Description

About the Project Software Mind is building a private, tenant-isolated AI assistant for the real estate title and settlement industry. The platform is a retrieval-first (RAG) system that ingests historical email, documents, and structured metadata into a per-tenant vector index, and serves grounded, cited, expert-weighted answers through a chat-style Q&A interface with single sign-on and full audit logging. The platform is AWS-native with a Python/FastAPI backend, Vue.js frontend, OpenSearch/Pinecone vector store, and OpenAI/Anthropic/Bedrock as LLM provider. You will join a senior, cross-functional LATAM-based team where hands-on AI delivery experience not just familiarity is the baseline expectation. You own the ingestion and processing backbone of the platform the pipelines that transform raw email and document corpora into clean, PII-minimised, chunked, and indexed data in the per-tenant vector store. This is the foundational layer the AI extraction gateway depends on; quality here directly determines system accuracy. Your Responsibilities Build and own the historical email ingestion pipeline via Microsoft Graph API Implement SharePoint / OneDrive document ingestion pipeline with scoped folder access Design and implement the PII minimisation pre-processing layer Build the vector store indexing workflow (OpenSearch/Pinecone) with per-tenant data isolation Define and implement the data processing schema; produce and maintain schema documentation Build the OCR routing orchestrator and integrate OCR service for scanned documents Implement the raw text / content extraction layer for all supported document types Define and prototype push vs. pull ingestion strategy, from one-time PoC through to incremental nightly pipeline Ensure data lineage and audit traceability are built into pipeline outputs from the outset Tech Stack: Python, Microsoft Graph API, AWS (S3, DynamoDB, Lambda), OpenSearch, Pinecone, OCR Tooling, PII Libraries, NER Libraries, Docker, Jira, Confluence

Qualifications

Must-Have Skills & Experience +90% English written and oral (at least B2 level) with excellent communication skills 6+ years in data engineering; strong pipeline and ETL/ELT experience required Proficiency in Python for data pipeline development Experience with Microsoft Graph API or similar enterprise email/document APIs (M365, Exchange Online) AWS data services: S3, DynamoDB, Glue, and/or Lambda-based event-driven processing Familiarity with PII detection and data minimisation techniques (regex-based, NER-based, or purpose-built libraries) Experience with vector store indexing or semantic search pipeline construction Additional Information

Nice-to-Have

Prior experience building ingestion pipelines specifically for AI/ML, NLP, or LLM-based platforms OCR tooling experience: AWS Textract, Tesseract, or commercial OCR services Understanding of per-tenant data isolation patterns, tenant-scoped encryption, and row-level security Familiarity with LangChain document loaders, embedding pipelines, or vector index management We are accepting applications from LATAM countries Apply To This Job

You might also like

Technical Account Manager, German speaking

100% Remote Full-time

Site Reliability Engineer

100% Remote Full-time

[VCK] Senior Development Lead (AI +RAG Platform )

100% Remote Full-time

EPC Project Manager - Remote

100% Remote Full-time

Senior Data Scientist

100% Remote Full-time

Senior Machine Learning Engineer, Ads Foundational Representations

100% Remote Full-time

Customer Account Manager (FluentStream)

100% Remote Full-time

Treasury Risk Manager

100% Remote Full-time

EPC Project Manager - Remote

100% Remote Full-time

Quotations Specialist - Remote

100% Remote Full-time

Remote Pharmacy Technician (3:30PM - 12:00AM ET Mon-Wed; 10:00AM - 6:30PM ET Sat-Sun)

100% Remote Full-time

Experienced Customer Support Specialist – Apple Products and Services (Work From Home) at arenaflex

100% Remote Full-time

District Support Pharmacist - Full Time

100% Remote Full-time

Marketing Operations Associate - NA HOKA

100% Remote Full-time

Licensed Practical Nurse (CA license required) - Hiring in Wyoming

100% Remote Full-time

Experienced Customer Service Representative – Live Chat Support Specialist (FULLY REMOTE)

100% Remote Full-time

Experienced Customer Support Specialist – Virtual Chat Moderator for arenaflex

100% Remote Full-time

Grants & Development Specialist

100% Remote Full-time

Remote Live Chat Data Entry Specialist – $40/hr – Summer 2024 Internship with arenaflex in Bhutan

100% Remote Full-time

Experienced Full Stack Caregiver Leads – Compassionate Home Care Services for Seniors in High Point, NC at arenaflex

100% Remote Full-time