[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-senior-gpu-systems-ai-infrastructure-engineer-nyc":3,"similar-senior-gpu-systems-ai-infrastructure-engineer-nyc":40},{"id":4,"slug":5,"title":6,"skills":7,"budget":24,"duration":25,"location":26,"onsitePercent":27,"contractType":28,"foundAt":29,"category":30,"description":34,"rawText":35,"webTitle":36,"webText":37,"language":38,"projectId":25,"sourceUrl":39},7744,"senior-gpu-systems-ai-infrastructure-engineer-nyc","Senior GPU Systems \u002F AI Infrastructure Engineer (NYC)",[8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23],"CUDA programming","GPU kernel optimization","parallel computing","distributed systems","C++","Rust","Python","PyTorch","JAX","NCCL","MPI","Ray","performance profiling","Nsight","Triton","HIP","Competitive + equity",null,"New York City",75,"permanent","2026-06-03T04:48:20+00:00",{"id":31,"slug":32,"label":33},3,"ai_ml","AI & Machine Learning","Senior-level engineer role to build and optimize next-generation AI infrastructure for large-scale model training and inference. Focus on GPU systems, kernel optimization, distributed compute, and high-performance AI workloads. Work directly on the performance layer of modern AI stacks where milliseconds matter.","\u003Ch2>Senior GPU Systems \u002F AI Infrastructure Engineer (NYC)\u003C\u002Fh2>\n\u003Cp>\u003Cstrong>Location:\u003C\u002Fstrong> New York City (Hybrid \u002F On-site preferred)\u003C\u002Fp>\n\u003Cp>\u003Cstrong>Comp:\u003C\u002Fstrong> Competitive + equity (Series A-C \u002F high-growth AI infra)\u003C\u002Fp>\n\u003Ch3>\u003Cstrong>About the Role\u003C\u002Fstrong>\u003C\u002Fh3>\n\u003Cp>We&#8217;re hiring a senior-level engineer to build and optimise next-generation AI infrastructure powering large-scale model training and inference. This role sits at the intersection of \u003Cstrong>GPU systems, kernel optimisation, distributed compute, and high-performance AI workloads\u003C\u002Fstrong>.\u003C\u002Fp>\n\u003Cp>You&#8217;ll work directly on the performance layer of modern AI stacks-where milliseconds matter, GPUs are saturated, and inefficiencies translate directly into cost and latency at scale.\u003C\u002Fp>\n\u003Cp>This is a deeply technical role for engineers who are comfortable working close to the metal and care about squeezing every ounce of performance out of modern accelerators (NVIDIA, AMD, and emerging architectures).\u003C\u002Fp>\n\u003Chr \u002F>\n\u003Ch3>\u003Cstrong>What You&#8217;ll Work On\u003C\u002Fstrong>\u003C\u002Fh3>\n\u003Cul>\n\u003Cli>Design and optimise \u003Cstrong>GPU kernels (CUDA \u002F Triton \u002F HIP)\u003C\u002Fstrong> for large-scale AI workloads\u003C\u002Fli>\n\u003Cli>Build and tune \u003Cstrong>high-performance inference and training pipelines\u003C\u002Fstrong> for LLMs and multimodal models\u003C\u002Fli>\n\u003Cli>Work on \u003Cstrong>distributed systems for AI training (multi-node, multi-GPU clusters)\u003C\u002Fstrong>\u003C\u002Fli>\n\u003Cli>Improve \u003Cstrong>memory bandwidth utilisation, kernel fusion, and compute efficiency\u003C\u002Fstrong>\u003C\u002Fli>\n\u003Cli>Contribute to or extend frameworks like \u003Cstrong>PyTorch, JAX, or custom runtimes\u003C\u002Fstrong>\u003C\u002Fli>\n\u003Cli>Build tooling for \u003Cstrong>profiling, benchmarking, and performance regression detection\u003C\u002Fstrong>\u003C\u002Fli>\n\u003Cli>Collaborate closely with ML researchers and infra engineers to remove system bottlenecks\u003C\u002Fli>\n\u003C\u002Ful>\n\u003Chr \u002F>\n\u003Ch3>\u003Cstrong>What We&#8217;re Looking For (Core Profile \u002F MPC Fit)\u003C\u002Fstrong>\u003C\u002Fh3>\n\u003Cp>You&#8217;re likely a strong match if you have:\u003C\u002Fp>\n\u003Cul>\n\u003Cli>5-10+ years in \u003Cstrong>systems engineering, HPC, GPU computing, or AI infrastructure\u003C\u002Fstrong>\u003C\u002Fli>\n\u003Cli>Deep experience with \u003Cstrong>CUDA programming and GPU kernel optimisation\u003C\u002Fstrong>\u003C\u002Fli>\n\u003Cli>Strong understanding of \u003Cstrong>parallel computing, memory hierarchies, and compute bottlenecks\u003C\u002Fstrong>\u003C\u002Fli>\n\u003Cli>Experience with \u003Cstrong>distributed systems (Ray, MPI, NCCL, custom cluster orchestration, etc.)\u003C\u002Fstrong>\u003C\u002Fli>\n\u003Cli>Background in \u003Cstrong>high-performance C++ \u002F Rust \u002F Python systems\u003C\u002Fstrong>\u003C\u002Fli>\n\u003Cli>Experience working on \u003Cstrong>training or inference stacks for large-scale ML models\u003C\u002Fstrong>\u003C\u002Fli>\n\u003Cli>Strong intuition for \u003Cstrong>performance profiling (Nsight, perf, flamegraphs, etc.)\u003C\u002Fstrong>\u003C\u002Fli>\n\u003C\u002Ful>\n\u003Chr \u002F>\n\u003Ch3>\u003Cstrong>Nice to Have\u003C\u002Fstrong>\u003C\u002Fh3>\n\u003Cul>\n\u003Cli>Experience with \u003Cstrong>Triton, TVM, or MLIR-based compiler stacks\u003C\u002Fstrong>\u003C\u002Fli>\n\u003Cli>Exposure to \u003Cstrong>kernel fusion, graph compilation, or runtime optimisation\u003C\u002Fstrong>\u003C\u002Fli>\n\u003Cli>Experience at \u003Cstrong>AI infra startups, hyperscalers, or HPC environments\u003C\u002Fstrong>\u003C\u002Fli>\n\u003Cli>Familiarity with \u003Cstrong>quantisation, KV caching, or inference acceleration techniques\u003C\u002Fstrong>\u003C\u002Fli>\n\u003Cli>Contributions to \u003Cstrong>open-source ML systems or GPU libraries\u003C\u002Fstrong>\u003C\u002Fli>\n\u003Cli>Background in \u003Cstrong>CUDA graph execution, stream scheduling, or warp-level optimisation\u003C\u002Fstrong>\u003C\u002Fli>\n\u003C\u002Ful>\n\u003Chr \u002F>\n\u003Ch3>\u003Cstrong>Why This Role\u003C\u002Fstrong>\u003C\u002Fh3>\n\u003Cul>\n\u003Cli>Work on the \u003Cstrong>critical performance layer of AI systems (not application-level ML)\u003C\u002Fstrong>\u003C\u002Fli>\n\u003Cli>Direct impact on \u003Cstrong>cost, latency, and scalability of frontier AI models\u003C\u002Fstrong>\u003C\u002Fli>\n\u003Cli>High autonomy-own entire subsystems (kernel → runtime → distributed execution)\u003C\u002Fli>\n\u003Cli>NYC-based team building at the forefront of \u003Cstrong>AI infrastructure and compute optimisation\u003C\u002Fstrong>\u003C\u002Fli>\n\u003Cli>Opportunity to shape systems used at \u003Cstrong>massive scale in production ML workloads\u003C\u002Fstrong>\u003C\u002Fli>\n\u003C\u002Ful>\n\u003Cp>Darwin Recruitment is acting as an Employment Agency in relation to this vacancy.\u003C\u002Fp>\n\u003Cp>\u003Cimg src=\"https:\u002F\u002Fcounter.adcourier.com\u002FUmVlY2UuV2FsZG9uLjg3MTc1LjEyNzg0QGRhcndpbi5hcGxpdHJhay5jb20.gif\">\u003C\u002Fp>\n\u003Cp>\u003Cspan style=\"color: #ffffff\">Reece Waldon\u003C\u002Fspan>\u003C\u002Fp>\nAnsprechpartner: Reece Waldon\nE-Mail: Reece.Waldon@darwinrecruitment.com\nTelefon: +44 1277 287285","Senior GPU Systems \u002F AI Infrastructure Engineer","We are seeking a senior-level engineer to build and optimize next-generation AI infrastructure powering large-scale model training and inference. This role sits at the intersection of GPU systems, kernel optimization, distributed compute, and high-performance AI workloads. You will work directly on the performance layer of modern AI stacks where milliseconds matter, GPUs are saturated, and inefficiencies translate directly into cost and latency at scale. This is a deeply technical role for engineers who are comfortable working close to the metal and care about squeezing every ounce of performance out of modern accelerators including NVIDIA, AMD, and emerging architectures. Key responsibilities include designing and optimizing GPU kernels using CUDA, Triton, and HIP for large-scale AI workloads, building and tuning high-performance inference and training pipelines for LLMs and multimodal models, and working on distributed systems for AI training across multi-node, multi-GPU clusters. You will improve memory bandwidth utilization, kernel fusion, and compute efficiency while contributing to or extending frameworks like PyTorch, JAX, or custom runtimes. The role involves building tooling for profiling, benchmarking, and performance regression detection, and collaborating closely with ML researchers and infrastructure engineers to remove system bottlenecks. We are looking for candidates with 5-10+ years in systems engineering, HPC, GPU computing, or AI infrastructure, deep experience with CUDA programming and GPU kernel optimization, and strong understanding of parallel computing, memory hierarchies, and compute bottlenecks. Experience with distributed systems, high-performance programming languages, and working on training or inference stacks for large-scale ML models is essential. This position offers the opportunity to work on critical performance layers of AI systems with direct impact on cost, latency, and scalability of frontier AI models.","en","https:\u002F\u002Fwww.darwinrecruitment.com\u002Fjob\u002F3871756305391-gpu-systems-ai-infra-engineer-new-york-new-york\u002F",{"items":41},[42,58,72,83,96,111,128,146,169,180,190,208,226,244,265],{"id":43,"slug":44,"title":45,"skills":46,"budget":25,"duration":25,"location":25,"onsitePercent":25,"contractType":55,"foundAt":56,"category":57},10464,"computer-vision-engineer-for-robotics-perception-stack","Computer Vision Engineer for Robotics Perception Stack",[47,48,49,50,15,51,52,53,54],"Computer vision","Sensor fusion","LiDAR","Cameras","TensorFlow","Object detection","Tracking","Scene understanding","contracting","2026-06-03T06:06:17+00:00",{"id":31,"slug":32,"label":33},{"id":59,"slug":60,"title":61,"skills":62,"budget":25,"duration":25,"location":25,"onsitePercent":25,"contractType":55,"foundAt":70,"category":71},10449,"infrastructure-engineer-for-distributed-model-training","Infrastructure Engineer for Distributed Model Training",[63,19,64,65,66,67,68,69],"PyTorch Distributed","CUDA","HPC networking","InfiniBand","RDMA","GPU computing","LLM training pipelines","2026-06-03T06:06:04+00:00",{"id":31,"slug":32,"label":33},{"id":73,"slug":74,"title":75,"skills":76,"budget":25,"duration":25,"location":25,"onsitePercent":25,"contractType":55,"foundAt":81,"category":82},10417,"ai-hardware-security-engineer-2","AI Hardware Security Engineer",[77,78,79,80],"Secure firmware","Hardware root of trust","Trusted execution environments","Low-level systems programming","2026-06-03T06:05:34+00:00",{"id":31,"slug":32,"label":33},{"id":84,"slug":85,"title":86,"skills":87,"budget":25,"duration":25,"location":25,"onsitePercent":25,"contractType":55,"foundAt":94,"category":95},10401,"ai-inference-platform-engineer-confidential-computing","AI Inference Platform Engineer - Confidential Computing",[88,89,90,13,91,12,92,93],"Kubernetes","GPU clusters","Confidential computing","Go","AI inference","ML infrastructure","2026-06-03T06:05:20+00:00",{"id":31,"slug":32,"label":33},{"id":97,"slug":98,"title":99,"skills":100,"budget":25,"duration":25,"location":25,"onsitePercent":25,"contractType":55,"foundAt":109,"category":110},10385,"confidential-ai-systems-engineer-with-tee-expertise","Confidential AI Systems Engineer with TEE expertise",[101,102,103,104,105,106,107,15,64,108],"TEEs","SGX","SEV","TrustZone","Secure boot","Hardware attestation","Confidential containers","AI workloads","2026-06-03T06:05:04+00:00",{"id":31,"slug":32,"label":33},{"id":112,"slug":113,"title":114,"skills":115,"budget":25,"duration":123,"location":124,"onsitePercent":125,"contractType":55,"foundAt":126,"category":127},10341,"ai-engineer-llm-and-rag-systems","AI Engineer - LLM and RAG Systems",[14,116,117,118,119,120,121,122],"LLMs","RAG","embeddings","prompt engineering","AWS","vector databases","microservices","3 Monate (Verlängerung erwartet, ~1 Jahr Gesamtlaufzeit)","Utrecht",50,"2026-06-03T06:04:26+00:00",{"id":31,"slug":32,"label":33},{"id":129,"slug":130,"title":131,"skills":132,"budget":25,"duration":25,"location":25,"onsitePercent":25,"contractType":28,"foundAt":144,"category":145},9009,"senior-npu-kernel-operator-engineer","Senior NPU Kernel \u002F Operator Engineer",[133,14,134,135,136,137,138,139,140,141,142,143],"C\u002FC++","Tensor computation","Neural network operators","Memory hierarchy","Bandwidth and latency analysis","Cache\u002FSRAM behaviour","Parallelism and synchronization","Data locality and vectorization","Performance optimization","Accelerator programming","GPU\u002FNPU development","2026-06-03T05:31:14+00:00",{"id":31,"slug":32,"label":33},{"id":147,"slug":148,"title":149,"skills":150,"budget":25,"duration":25,"location":166,"onsitePercent":125,"contractType":55,"foundAt":167,"category":168},8140,"ai-and-telco-architect","AI and Telco Architect",[151,152,153,154,155,156,157,158,159,160,161,162,163,164,165],"OSS","Assurance","Fulfillment","Inventory","Fault management","Capacity planning","AI\u002FML technologies","Real-time telemetry","Streaming technologies","Kafka","gNMI","OpenTelemetry","Enterprise architecture","Integration","Stakeholder communication","Netherlands","2026-06-03T05:07:08+00:00",{"id":31,"slug":32,"label":33},{"id":170,"slug":171,"title":172,"skills":173,"budget":25,"duration":25,"location":25,"onsitePercent":25,"contractType":55,"foundAt":178,"category":179},7629,"ai-compute-cluster-engineer","AI Compute Cluster Engineer",[65,88,174,175,176,177],"GPU scheduling","AI compute clusters","networking optimization","storage optimization","2026-06-03T04:37:11+00:00",{"id":31,"slug":32,"label":33},{"id":181,"slug":182,"title":183,"skills":184,"budget":186,"duration":25,"location":187,"onsitePercent":125,"contractType":28,"foundAt":188,"category":189},7608,"ai-telco-architect","AI Telco Architect",[151,152,153,154,155,156,157,158,159,160,161,162,163,185],"Integration experience","up to 90,000 EUR\u002Fyear","Amsterdam","2026-06-03T03:53:26+00:00",{"id":31,"slug":32,"label":33},{"id":191,"slug":192,"title":193,"skills":194,"budget":25,"duration":203,"location":204,"onsitePercent":205,"contractType":55,"foundAt":206,"category":207},7605,"ai-fullstack-engineer","AI Fullstack Engineer",[195,196,197,14,198,116,199,200,201,202],"React","TypeScript","Java","AI\u002FML","AI agents","LangChain","Vector Databases","Fullstack development","Initial 3 Months","Berlin",0,"2026-06-03T03:52:36+00:00",{"id":31,"slug":32,"label":33},{"id":209,"slug":210,"title":211,"skills":212,"budget":222,"duration":25,"location":223,"onsitePercent":25,"contractType":28,"foundAt":224,"category":225},7562,"ai-spezialist-mwd-ai-specialist","AI Spezialist (m\u002Fw\u002Fd) – AI Specialist",[14,213,214,215,216,217,120,218,219,220,221],"R","KI-Tools","Machine Learning","Datenverarbeitung","Cloud-Technologien","Azure","Google Cloud","Datenschutz","Compliance","mindestens 75.000 EUR\u002FJahr","Wien","2026-06-03T00:01:40+00:00",{"id":31,"slug":32,"label":33},{"id":227,"slug":228,"title":229,"skills":230,"budget":25,"duration":25,"location":240,"onsitePercent":241,"contractType":28,"foundAt":242,"category":243},7518,"manager-ki-und-prozessautomatisierung-mwd","Manager KI und Prozessautomatisierung (m\u002Fw\u002Fd)",[231,232,233,234,218,235,236,237,238,239],"KI","Prozessautomatisierung","Microsoft Copilot","Power Automate","ERP-Integration","SAP","Change Management","Digitalisierung","Large Language Models","Stephanskirchen",100,"2026-06-02T14:26:02+00:00",{"id":31,"slug":32,"label":33},{"id":245,"slug":246,"title":247,"skills":248,"budget":25,"duration":261,"location":25,"onsitePercent":262,"contractType":55,"foundAt":263,"category":264},7433,"ai-data-engineer-im-bereich-wissensmanagement-bots","AI Data Engineer im Bereich Wissensmanagement Bots",[249,14,250,251,252,253,254,160,255,256,257,258,259,260,217],"PostgreSQL","ETL\u002FELT-Pipelines","Big Data","SQL","Airflow","dbt","Spark","Data Engineering","Pandas","PySpark","Data Quality","Observability","6M+",20,"2026-06-02T09:30:40+00:00",{"id":31,"slug":32,"label":33},{"id":266,"slug":267,"title":268,"skills":269,"budget":25,"duration":25,"location":223,"onsitePercent":125,"contractType":55,"foundAt":274,"category":275},7414,"machine-learning-engineer-mwd","Machine Learning Engineer (m\u002Fw\u002Fd)",[215,51,15,14,213,270,120,219,271,272,273],"Apache Airflow","Datenmanagement","NLP","Computer Vision","2026-06-02T08:26:02+00:00",{"id":31,"slug":32,"label":33}]