I know some people are having success with ollama on their own hardware. Anyone doing this for a code assistant that they're actually using to help with real work? What kind of minimum hardware is necessary for a decently (not necessarily top tier) performing system?
I'm using helix ( https://helix-editor.com/ ) which currently can only be extended via LSP servers, so I've added lsp-ai ( https://github.com/SilasMarvin/lsp-ai/ ) to connect it to ollama ( https://ollama.com/ ), to interact with Google's gemma2 ( https://ai.google.dev/gemma/prohibited_use_policy ) which has better licence terms for me compared to Meta's llama3.1, etc
Note that this UX is not as complete as GitHub Copilot in Visual Studio Code, but I do get LLM-generated completions, and they are useful suggestions some of the time (feels like 10-20% of the time, usually I'll go with a suggestion from the language-specific LSP server instead)
These completions are very fast, almost instant, on my M3 MacBook Pro (2023) and also even on a Dell Precision 5550 (2020), but take a second or so on my desktop PC which has AMD GPUs (2022)
I think you mostly need to make sure your model can fit into GPU RAM (I _think_, I could be wrong about this), so GPU RAM is more important than GPU clock speed
Last updated: Jan 13 2025 at 01:14 UTC