Building a 4-Card V100 Server for Under RMB10K: Complete Tutorial for Running 70B Large Models with 64GB VRAM

Building a 4-Card V100 Server for Under RMB10K: Complete Tutorial for Running 70B Large Models with 64GB VRAM

Under RMB10K! 4 GPUs! 64GB VRAM! Hand-Built Water-Cooled 4-Card V100 Server Tutorial

Author: Second-Rate Programmer
Published: June 14, 2025, 16:05
Location: Beijing

Preface

I recently hand-built a server with 4 GPUs, 64GB VRAM, capable of running 70B DeepSeek models with output performance reaching 25 tokens/s - incredibly smooth. It's truly hand-built - I scraped my fingernails and separated nail from flesh, and it still hurts several days later! So I can't let this effort go to waste - I'm sharing this tutorial with everyone.

整机完成图 Completed 4-card V100 server assembly

Hardware Configuration List

Component Model/Specification Description
Motherboard 7048GR (Supermicro) Taiwanese manufacturer Supermicro motherboard, also used by high-end cards like H100, H200
Graphics Cards Tesla V100 × 4 16GB HBM2 VRAM each, totaling 64GB VRAM
Power Supply 2000W Multi-GPU machines require high-wattage PSU
Cooling 480 radiator water cooling system External mounting
Adapter Cards SXM2 to PCIe × 4 V100-specific adapter cards

Note: The motherboard is 7048GR, from Taiwanese manufacturer Supermicro. High-end graphics cards like H100 and H200 also use motherboards from this company. Second-hand servers bought online might be branded as Inspur or Sugon, but the motherboards are all the same, and the cases are similar too.

Why Choose V100?

Performance Characteristics

The V100 was once the computational king of data centers. IBM used these GPUs to build the Summit supercomputer, which was once the world's #1 supercomputer. Now retired from data centers and available to consumers, but performance remains strong:

  • Computing Power: Equivalent to RTX 2080Ti
  • VRAM Capacity: 16GB HBM2 high-bandwidth memory per card
  • Overall Performance: 4 V100s have computing power equivalent to RTX 4090, but far exceed 4090 in VRAM

Performance Comparison

GeekBench OpenCL对比图 GeekBench OpenCL test: One 4090 roughly equals two V100s

Performance comparison between V100 and 4090 in different application scenarios:

计算化学应用对比 In computational chemistry applications, 4090 performs worse than V100 in some computing packages

Cost-Performance Advantage

The V100 is absolutely the cost-performance king in the current GPU market. However, using this GPU well is not easy because:

  1. Interface Issues: The GPU was designed for data centers with SXM2 interface, requiring adapter cards
  2. Cooling Requirements: High power consumption (max 300W per card), demanding excellent cooling
  3. Installation Complexity: Requires manual modification and precise installation

Cooling Solution Selection

Power Consumption Calculation

  • Maximum power per V100 GPU: 300W
  • Total power consumption for 4 cards: 1200W
  • Cooling requirements: Estimated 240 radiator can handle two cards, 4 cards need 480 radiator

Actual Installation

After arrival, found the 480 radiator was indeed too large to fit inside the server, had to install externally.

冷排安装示意图 External 480 radiator cooling system

Detailed Installation Process

Adapter Card Installation

原始转接卡 SXM2 to PCIe adapter card in original state

Each V100 GPU requires installation of: - SXM2 to PCIe adapter card - Custom water cooling block

装好转接卡和水冷头的V100 V100 with adapter card and water cooling block installed

Case Installation

四张卡在机箱内的安装 Final installation of four V100s inside the case

The four cards installed side by side still have some clearance, and water cooling takes up relatively little space. However, when installing the water tubes, it really requires manual precision work, making the installation process quite difficult.

Temperature Testing

nvidia-smi输出 Temperature monitoring when water cooling not fully installed, around 50+ degrees

Temperature performance after installation completion: - Idle State: Close to room temperature - High Load Operation: Under 60 degrees at 200+ watts per card

Performance Testing

DeepSeek 32b Model Testing

Successfully running 32b DeepSeek model with excellent fluidity in actual operation:

4张V100运行DeepSeek演示视频 Click to watch: 4-card V100 running DeepSeek demonstration video

Performance Metrics: - Model: DeepSeek 70b - Output Speed: 25 tokens/s - Running State: Very smooth

Summary

This hand-built 4-card V100 server has the following advantages:

Hardware Advantages

  • High Cost-Performance: Under $10K cost
  • Large VRAM: 64GB HBM2 VRAM, suitable for large model inference
  • Stable Performance: Data center-grade hardware reliability

Applicable Scenarios

  • Individual AI developers
  • Small and medium enterprise AI applications
  • Research institutes and project teams
  • Local deployment of large models

Technical Features

  • Supports 70B parameter large models
  • Inference speed reaches 25 tokens/s
  • Excellent temperature control, stable long-term operation

Future Plans

If everyone needs more detailed installation tutorials, I can introduce them module by module: - Hardware selection and procurement guide - Detailed installation steps - System configuration and software environment setup - Performance tuning and troubleshooting


If you also want such a high cost-performance AI server, welcome to contact the author for more technical support and custom services!

This tutorial demonstrates how to build a professional-grade AI computing server with extremely high cost-performance. Although the installation process requires certain hands-on ability, the final performance and cost advantages are worthwhile. For AI application scenarios requiring large VRAM and high computing power, V100 remains an excellent choice.

Contact Author

Scan to add the wechat of the author