Jakho

Jakho

Full Stack developer.
x

The bumpy journey of using the Tesla V100 graphics card on an old motherboard

Background#

Due to the relatively low performance of computers used in general development scenarios for front-end development, I previously spent less than 1k on Xianyu to pick up a set of I5-6600 + Mini ITX motherboard in a small case, which has been my main development machine. After upgrading to 32G of memory, even the integrated graphics driver is more than sufficient for daily development tasks. Additionally, I have a MacBook available for iOS development tasks. Recently, AI has been extremely popular, and coincidentally, some of the company's business involves using Stable Diffusion for small painting applications. After trying it out, I found it quite interesting. Since the company's development environment has shared computing power, I thought about whether I could upgrade the graphics card using old hardware to reach the threshold for playing with AI.

Component Selection#

AI painting requires a large amount of video memory to store models and training data. Therefore, the larger the video memory capacity, the better the AI painting effect. Thus, video memory capacity is a very important factor in my selection. After looking at many posts online, it seems that at least 8GB of video memory is needed to handle AI painting tasks, but to draw higher resolution images without running out of memory, it is recommended to choose 16GB or higher video memory capacity.

When I searched on Xianyu, I found many old computing cards such as P102, P104, P106, P40, M40, and P100 that have a very high cost-performance ratio, with 8G or even 16G video memory available for less than a thousand or even two to three hundred yuan. I originally planned to choose the P100 as the graphics card for this upgrade. However, considering that the Pascal architecture is indeed a bit old, and seeing that the V100 chip is the next generation of the P100 with stronger performance and not too high a price difference, I decided to go for the V100 SXM2 16GB server version computing card (the reason for choosing SXM2 will be mentioned later).

Build Configuration#

Here is the configuration for this build:

Motherboard: Gigabyte GA-B250M-DS3H
CPU: Intel Core I5-6600
Graphics Card: NVIDIA V100 SXM2 16G
Memory: Guangwei Hanjiang 16G 2666MHz x2
Hard Drive: Seagate Cool Play 520 1TB SSD

The motherboard was purchased to accommodate the old 6th generation I5, and since a large graphics card is needed, I chose an M-ATX motherboard considering the size. It was available for just over a hundred. The old machine was compact and used DC power, so it wasn't worth much when sold. I found a G4400T low-power chip on Xianyu for over 20 yuan to install Debian as a NAS device. I chose the cheaper memory since the old motherboard wouldn't support frequencies over 2400. The hard drive was taken from the old machine, and I bought a 1T one from Dongzi International for 299 before the prices skyrocketed. Looking at the prices now, I regret not buying a couple more.

The graphics card is the main highlight, the Tesla V100, specifically the SXM2 version, which requires a PCIE adapter card to be used on a home PC motherboard. The price of the adapter card is roughly the same as the graphics card, around 1300+, and I didn't buy the PCIE version directly because it was similarly priced but a bit more expensive. However, if the graphics card becomes outdated in the future, I can still use the adapter card and replace it with other SXM2 specification graphics cards, which are generally cheaper than the PCIE versions.

Why Not Choose Modified Graphics Cards#

As we all know, the modified version of the 2080TI 22G is more cost-effective for rendering and drawing, and the Turing architecture is more advanced. However, considering that the memory chips have been replaced, stability has received many negative reviews online. After comprehensive consideration, I chose the server-specific computing card.

Modifying BIOS to Support Above 4G#

After assembling the components mentioned above, I powered on the machine, but to my surprise, it wouldn't boot and went directly to the BIOS interface, displaying a long string of English messages indicating that there were insufficient PCI resources to drive the PCI devices.

image

At this point, I hurriedly searched online for related cases to see if there were any solutions. I found one: in the motherboard BIOS, there is a switch called Above 4G, and setting it to enabled would solve the issue. If your motherboard BIOS can enable Resizable BAR, it’s even better to enable that as well, as it will enhance performance.

Additionally, it is important to note that enabling Above 4G means that the system boot mode must be changed to UEFI, and the CSM option in the motherboard BIOS needs to be set to disabled. You can find many tutorials online on how to reinstall the system via UEFI, which are quite simple, so I won't elaborate further.

In summary, three options need to be set.

  1. Enable Above 4G
  2. Enable Resizable BAR (if available)
  3. Disable CSM compatibility for system boot

The key point is that I was surprised to find that my Gigabyte B250M motherboard did not have this option. After looking at many similar motherboard cases online, there was no solution, so I had to modify the BIOS to reveal hidden options.

Using AMIBCP to Enable Hidden Options on the Motherboard#

First, remove the graphics card from the motherboard, as you cannot enter the system otherwise.

After successfully booting into the system, download the latest BIOS file for the motherboard from the official website as a base for modification.

Download the AMIBCP software, open the original BIOS file you just downloaded, and note that the software defaults to limiting the format; you need to select the option for all file types to find it.

image

image

Then, as shown in the images, find the Above 4G option and change Access/Use to User, and set the last two options to Enabled by default.

image

image

After making the changes, save or save as a new BIOS file, and remember to distinguish it from the original BIOS to avoid confusion later when trying to revert to the original BIOS.

Using AFUWINGUI to Flash the BIOS#

Taking the Gigabyte motherboard as an example, if the modified BIOS file cannot be updated using the official method, it must be flashed using a third-party method. There are many ways to do this, such as using a programmer, but I chose the most convenient method, which is to flash it directly in Windows.

First, download and open AFUWINGUI.

image

Click the start button, select the modified BIOS file from above,

Then, click the refresh button on the right, wait for the refresh status to complete, and when it shows Done, the modified BIOS has been successfully flashed.

You can use the GPU-Z software to check the options as shown below, and you will see that Above 4G has been enabled.

image

At this point, you can power down, install the graphics card, and restart.

Postscript#

As the title suggests, the journey was indeed not that simple. After installing the graphics card, it was recognized normally, and after installing the NVIDIA driver, the video memory frequency and everything else were also normal.

What puzzled me was that after rebooting, executing nvidia-smi in the command line surprisingly indicated no devices. When I opened the Device Manager, I saw that the graphics card had a yellow triangle warning. I rebooted several times, but it was still ineffective, and it seemed like a failure. I then uninstalled the driver, rebooted, and reinstalled it, and to my surprise, it was successfully recognized again. However, after another reboot, the yellow triangle warning reappeared.

Could it be that I missed some settings? I was completely baffled and spent several days trying to find the cause, but there were no related cases online.

Until one day, I looked back at the photos I took of the graphics card and discovered that a capacitor had fallen off the upper right corner of the graphics card. I immediately contacted the seller for a replacement, suspecting that this might be the cause.

image


After a long wait for the replacement, I was surprised to find that the new graphics card had the same issue. After rebooting WIN10, I needed to uninstall the driver and reinstall it for the graphics card to work properly. I suspect it might be due to the motherboard being too old or some issue with the system drivers. However, since I rarely shut down, I just made do with it. I plan to try installing WIN11 or find another way to research it when I have time.


Later, I discovered that I only needed to disable ECC detection in the NVIDIA driver management after successfully installing the driver once. I tested it, and after rebooting, the driver would no longer drop.

Finally, I can happily render now. The journey has indeed been arduous, taking up two weeks of my after-work time, and I even bought an extra motherboard for testing. However, this has been a profound experience, and I've learned a lot about building PCs.

Loading...
Ownership of this post data is guaranteed by blockchain and smart contracts to the creator alone.