Bullshit. It has to be way more than that.
As stated above, our study focuses on the inference (i.e. deployment) stage in the model life cycle,
And this is why.
The model cards for Stable Diffusion 1.5 and 2.1 estimate the CO2 emissions as 11.25 tons and 12 tons for training. XL lacks the info.
A transatlantic flight (round-trip) is about 1 ton per pax. So, while every little bit helps, ML is not where you can make the big gains in lowering emissions.
Wait so then does playing a game that maxes out my GPU for two hours use enough power to charge 1000 smartphones?
Because that’s a lot.
A high(er) end smartphone has a battery capacity of approx. 0.019kWh (5000mAh), a gtx3080 has a max power draw of 320W so running that (at max load) for two hours is 0.64kWh, which is equivalent to fully charging ~34 smartphones.
So the headline must be false, since you can generate a lot more than 34 generative AI images on a 3080 in 2 hours. That’s if you just include inference though.
I wonder if they are somehow trying to factor in the training costs.
This is outdated in a big way with stable diffusion turbo and the recent LCM models that can render images at 30fps on a 3090.
360w * 1s /60 seconds a minute / 60 minutes an hour = .1 wh/image
30 images a second? .033 wh
A phone battery is 3000 mah * 3.5volts = 10.5 wh
318 images per phone charge
My math is probably off, but you get the idea.
You’re off by 3 orders of magnitude.
30 * 0.1Wh = 3Wh
That’s ( fixed, messed up mah conversion) .1wh for a second of 3090 time/ 30 images a second.
If a 3090 drew 3 watt hours in 1/30th of a second it would melt.
Possibly off by one order of magnitude though… Editing post to see, and it looks like I was. 300 images per charge instead of 3000.
It’s probably a net savings over a digital artist creating images given the speed. Just powering your monitor for so much longer is going to take more power.
The referenced part of the paper, for those interested in the maths.
Text-based tasks are, all things considered, more energy-efficient than image-based tasks, with image classification requiring less energy (median of 0.0068 kWh for 1,000 inferences) than image generation (1.35 kWh) and, conversely, text generation (0.042 KwH) requiring more than text classification (0.0023 kWh). For comparison, charging the average smartphone requires 0.012 kWh of energy 4, which means that the most efficient text generation model uses as much energy as 16% of a full smartphone charge for 1,000 inferences, whereas the least efficient image generation model uses as much energy as 950 smartphone charges (11.49 kWh), or nearly 1 charge per image generation, although there is also a large variation between image generation models, depending on the size of image that they generate.