Jakho

Jakho

Full Stack developer.
x

The Bumpy Journey of Using Tesla V100 Graphics Card on an Old Motherboard

Background#

Because the performance of computers used in general development scenarios for front-end development is not high, I spent less than 1k on Xianyu to buy a set of I5-6600 + Mini ITX motherboard small chassis, which has been used as my main development machine. Even after upgrading to 32GB of memory, the integrated graphics driver is more than enough for daily development tasks, and I also have a MacBook for iOS development tasks. Recently, AI on the rise is very popular, and coincidentally, our company uses Stable Diffusion for some drawing applications. After trying it out, I found it interesting. Since the company's development environment has shared computing power, I thought about whether I could upgrade the graphics card of the old hardware to meet the threshold for playing with AI.

Component Selection#

AI drawing requires a large amount of graphics memory to store models and training data. Therefore, the larger the graphics memory capacity, the better the effect of AI drawing. Therefore, graphics memory capacity is a very important factor to consider when selecting components. After reading many posts online, it is generally recommended to have at least 8GB of graphics memory to handle AI drawing tasks. However, in order to draw higher resolution images without exploding the pixel density, it is recommended to choose a graphics memory capacity of 16GB or higher.

When I searched on Xianyu, I found that many old compute cards such as P102, P104, P106, P40, M40, and P100 have very high cost performance. They can be obtained for less than a thousand yuan or even two to three hundred yuan, and they have a graphics memory threshold of 8G or even 16G. I originally planned to choose the P100 as the graphics card for this upgrade. However, considering that the Pascal architecture is indeed a bit old, and seeing that the V100 chip is the next generation of the P100 with stronger performance and a not too high price difference, I gritted my teeth and chose the V100 SXM2 16GB server version compute card (the reason for choosing SXM2 will be mentioned later).

Machine Configuration#

Here is the configuration of this build:

Motherboard: Gigabyte GA-B250M-DS3H
CPU: Intel Core i5-6600
Graphics Card: NVIDIA V100 SXM2 16G
Memory: G.Skill Ripjaws 16G 2666MHz (two sticks)
Hard Drive: Seagate FireCuda 520 1TB SSD

For the motherboard, I bought it to accommodate the 6th generation i5 in the old host, and I needed a large graphics card. Taking into account the size, I chose an M-ATX motherboard, which only costs a little over a hundred more. The old host is not worth much because it is small and powered by DC. So I bought a G4400T low-power chip for about 20 yuan on Xianyu to install Debian as a NAS device. I bought cheap memory because the old motherboard doesn't support frequencies higher than 2400. The hard drive was taken from the old machine. I bought a 1TB Seagate FireCuda 520 for 299 yuan from Dongzi's international store before the price went up. Seeing the price skyrocket, I now regret not buying a couple more sticks.

As for the graphics card, the highlight is the Tesla V100, and it is the SXM2 version. It needs to be used with a PCIE adapter card to work on a consumer PC motherboard. The price of the adapter card is basically the same as the graphics card, both around 1300+. The reason for not directly buying the PCIE version is that the price is similar but more expensive. However, if the graphics card becomes outdated in the future, I can still continue to use the adapter card, and I can also replace it with other SXM2 specification graphics cards. This type of graphics card is generally cheaper than the PCIE version.

Why Not Choose Modified Graphics Cards#

As we all know, the modified version of the 2080TI 22G is more cost-effective for alchemy and drawing, and the Turing architecture is more advanced. However, considering that the modified version replaces the memory particles, the stability is not very good according to many online reviews. Taking all factors into consideration, I chose a compute card specifically designed for servers.

Modifying BIOS to Enable Above 4G Support#

After assembling the components mentioned above, I turned on the computer, but to my surprise, it couldn't boot and went straight to the BIOS interface, displaying a long string of English text, indicating that your PCI resources are insufficient and cannot drive your PCI devices.

image

At this point, I immediately went online to search for relevant cases and solutions. I found one. It is to find a switch called "Above 4G" in the motherboard BIOS and set it to "Enabled" to solve the problem. If your motherboard BIOS can enable Resizable BAR, it is even better, and it will improve performance.

In addition, it is important to note that enabling Above 4G means that the system boot mode needs to be changed to UEFI, and the CSM option in the motherboard BIOS needs to be set to "Disabled". You can search online for tutorials on how to reinstall the system using UEFI, there are many, and they are quite simple, so I won't go into detail here.

In summary, three options need to be set:

  1. Enable Above 4G
  2. Enable Resizable BAR (if available)
  3. Disable CSM compatibility for system boot

The key is that my Gigabyte B250M motherboard surprisingly does not have this option. I searched for many cases of the same motherboard online, but there was no solution. So I had to modify the hidden options in the BIOS.

Enabling Hidden Options in the BIOS Using AMIBCP#

First, remove the graphics card from the motherboard, otherwise you cannot enter the system.

After successfully booting into the system, download the latest BIOS file for the motherboard from the official website.

Download the AMIBCP software and open the original BIOS file that was downloaded earlier. Note that the software limits the format by default, so choose the option to display all file types to find it.

image

image

Then, as shown in the screenshots, find the Above 4G option and change the Access/Use to "User", and the last two options should be changed to "Enabled" by default.

image

image

After making the changes, save or save as a new BIOS file. Remember to distinguish it from the original BIOS file so that you can find or use the correct file when you want to flash back to the original BIOS in the future.

Flashing the BIOS Using AFUWINGUI#

Taking Gigabyte motherboard as an example, if the modified BIOS file cannot be updated using the official website's update method, it must be flashed using a third-party method. There are many methods, such as using a programmer, but I chose the most convenient method, which is to flash it directly in Windows.

First, download and open AFUWINGUI.

image

Click the "Open" button, select the modified BIOS file mentioned above,

Then, click the refresh button on the right side, wait for the refresh status to complete and display "Done", and the modified BIOS will be successfully flashed.

You can use the GPU-Z software and go to the options as shown in the following screenshot to see that Above 4G has been enabled.

image

At this point, you can shut down the computer, install the graphics card, and then restart.

Afterword#

As the title suggests, the process was not as simple as expected. After installing the graphics card, it could be recognized during boot, and after installing NVIDIA's driver, everything related to graphics memory frequency was normal.

What surprised me was that after restarting, when I executed nvidia-smi in the command line, it actually prompted that there was no device. I checked the device manager and found that the graphics card had a yellow triangle. I restarted several times, but it was of no use. It seemed like a failure. So I uninstalled the driver, restarted, and installed it again. Unexpectedly, after installing it, the graphics card was immediately recognized and driven. After another restart, the yellow triangle appeared again.

Could there be any missing settings? I was very puzzled and couldn't find the reason even after several days of searching online. There were no relevant cases online either.

Until one day, when I looked back at the photos I took of the graphics card, I found that a choke had fallen off from the upper right corner of the graphics card. I immediately contacted the seller for a replacement. I guess the problem might be here.

image


After a long wait for the replacement, the new graphics card had the same issue. After restarting Windows 10, I had to uninstall the driver and reinstall it to use the graphics card properly. It is probably because the motherboard is too old or there is a problem with the system driver. However, since I rarely shut down my computer, I just made do with it. I plan to try installing Windows 11 or find another solution when I have time.


Later, I found that after successfully installing the driver once, I could enter the NVIDIA driver management and disable ECC detection. Tested and confirmed that after restarting, the driver no longer disappeared.

Finally, I can happily do alchemy. It has been a difficult and arduous journey, taking up two weeks of my after-work time, and I even bought an extra motherboard for testing. However, it is also an experience where I have learned a lot about computer assembly.

Loading...
Ownership of this post data is guaranteed by blockchain and smart contracts to the creator alone.