02:12 imirkin: pengindchaos: i don't have a clean answer. the point of modifiers is that they can be incrementally added on (e.g. neg neg neg neg x == x). you could model things as being able to access h1/h0 separately, in which case it'd make sense to have them as modifiers. otherwise it may make more sense to just do it as subops, like MADSP is done.
02:12 imirkin: i.e. if it's all just fixed sequences
08:16 stevecam: i was trying to work out why nouveau wasnt loading, turns out i had it blacklisted, problem solved :-)
08:27 mupuf: stevecam: crazy how systems are behaving how they should sometimes ;)
08:52 karolherbst: imirkin: the problem with that on xmad is, that we already have those tons of subops anyway, but maybe we could simply have subops be a bitfield and enable all the variants or something
08:54 karolherbst: azaki: decompressing your trace really kills low memory systems :D
08:54 karolherbst: on my 32GB machine I have 1.2GB swap used and it is still decompressing
08:55 karolherbst: I think it is more caused by having also file encrpytion
08:55 karolherbst: *encryption
08:56 karolherbst: or uhm, full disc encryption rather
09:01 stevecam: mupuf, ill say
09:53 karolherbst: seems to happen around 26GB or something
10:20 pendingchaos: imirkin: not sure what you mean by "access h1/h0 separately" and "fixed sequences"?
10:22 pendingchaos: I think I'm leaning towards putting all XMAD's flags in the subOp as a bitfield and adding some special handling in swapSources
10:26 karolherbst: pendingchaos: "fixed sequencs" meaning if the usage of the stuff is always the same
10:27 karolherbst: pendingchaos: like if you use hi/lo only for a certain optimization and nothing will ever touch it again
10:27 karolherbst: things are more complicated if you want passes to optimize operations with access to hi/lo values
10:27 karolherbst: but if it is always the same, then it doesn't really matter
10:28 karolherbst: in the end we could just add those optimizations for maxwell as being fixed replacements of the original ops (e.g. imul and imad)
10:29 tomtomgps_: Hi! I have a rom from a NVIDIA GPU and I'm trying to understand what It does. I've opened the rom in a hex editor, what architecture reads this code ?
10:30 karolherbst: tomtomgps_: your CPU
10:30 tomtomgps_: is the hex code x86 ?
10:30 karolherbst: you mean the normal video bios?
10:30 karolherbst: it's no microcode
10:30 karolherbst: it is just data
10:31 karolherbst: it contains some bios/uefi 16 bit blob though for video initialization or something though or something like that
10:31 karolherbst: tomtomgps_: we have a tool called nvbios which is able to read parts of the video bios
10:36 tomtomgps_: What is a rom ? does it contain the video bios ?
10:36 karolherbst: well, it depends on what you actually fetched
10:36 RSpliet: tomtomgps_: the majority of the VBIOS consists of data structures that describe board-specific parameters (display output ports, clocks and DRAM parameters, external devices connected...) and some initialisation routines. I believe there's a little bit of 16-bit Intel code in there enough to parse and execute some of the initialisation routines... which is why in the past you had to buy separate graphics cards for PowerPC Mac ;-)
10:40 tomtomgps_: RSpliet: The reason I'm asking is because intel Macs need NVIDIA card with moded roms to display at boot. A company sells modified cards and I'm curious to know how they can achieve this.
10:40 RSpliet: I'm not sure anyone here knows the ins and outs of that...
10:41 karolherbst: tomtomgps_: by doing dirty hacks
10:41 karolherbst: well
10:41 karolherbst: you can be a bit more clever and try to only replace certain things
10:42 karolherbst: but normally those GPUs are just flashed with moddified roms where the mac is happy with the result
10:42 tomtomgps_: Someone posted one of these modified roms and I'm trying to compare it with a normal rom and see where the changes have been made, and what they do
10:43 karolherbst: you won't get much by comparing those
10:43 RSpliet: karolherbst: I bet it's no dirtier than the hack of sticking x86 code into a graphics card VBIOS for the sake of initialisation :-P It's likely just small changes to meet different conventions
10:43 karolherbst: RSpliet: depends
10:44 karolherbst: you might just have some wierd rom files which work for certain gpus, but you never really checked
10:44 RSpliet: but karolherbst is right: there's too many other variables that change. Unless you have 100% guarantee that the two roms are used on the exact *same* graphics card (not just the chip, the whole *card*), you'll get too much differences to make sense of the results
10:45 karolherbst: anyway, I know that such cars usually work
10:45 karolherbst: but the lifetime of such are questionable
10:45 karolherbst: they are still totally overpriced afaik
10:46 karolherbst: tomtomgps_: on linux it only matters until the real GPU driver gets loaded, right?
10:50 tomtomgps_: karolherbst: not sure I can answer that question as I have little knowledge
10:51 karolherbst: well, the painful part here is that the x86 bios initialization code doesn't work on OpenROM, so you kind of need the "x86" GPU with the OpenROM stuff
10:51 karolherbst: it is kind of a painful process and requires deeper knowledge of the vbios structure
10:52 karolherbst: as you might need more space
10:52 karolherbst: or something
10:52 karolherbst: also, the PPC cards had bigger roms afaik
10:52 karolherbst: and you have to make it fit the x86 ones
10:53 tomtomgps_: from what I've been told it sometimes necessary to change the rom chips with bigger ones on modified Nvidia gpus for intel Macs
10:53 tomtomgps_: it is*
10:53 karolherbst: for intel macs?
10:53 karolherbst: why though?
10:54 karolherbst: on intel macs GPUs should just work
10:54 karolherbst: afaik
10:54 karolherbst: maybe requiring the nvidia driver package, but otherwise it should just work
10:58 tomtomgps_: The GPUs work in the sense that they provide 3D acceleration after installing the Nvidia web drivers, but when booting a Mac the boot screen will not appear on a regular PC Nvidia card. The boot screen provides functionality for changing boot options, reinstalling from the recovery partition....
10:59 karolherbst: mhh
10:59 karolherbst: werid
10:59 tomtomgps_: Also under windows without a modified rom the card will run at x8 speeds
10:59 karolherbst: I highly doubt that
11:00 karolherbst: I wouldn't trust anything in this area before trying it out myself
11:00 tomtomgps_: karolherbst: makes sense
11:00 karolherbst: the vbios isn't _that_ relevant
11:00 karolherbst: sure the driver parses that
11:01 karolherbst: and the vbios might state that x8 is the most
11:01 karolherbst: but then it should behave the same on either OS
11:01 karolherbst: otherwise the driver of one is buggy
11:01 karolherbst: and I doubt it is the windows one
11:02 karolherbst: I know that the vbios of GPUs from macs are a bit weird in some ways
11:02 karolherbst: and that things are optimized towards consuming less power
11:02 karolherbst: by trading of peak performance
11:03 karolherbst: maybe capping to x8 saves 0.2W or something with no perf gain in practise
11:03 karolherbst: who knows
11:06 tomtomgps_: karolherbst: no, from what I understand since this is occurs on a Mac Pro tower power consumption is not an issue. AMD PC cards also require a resistor mod to run at full speed under the Mac Pro under Mac OS. Nvidia on the other hand is able to run at x16 just by installing the drivers on Mac OS. Under windows though with modification both GPUs run at x8.
11:07 tomtomgps_: without modification*
11:07 karolherbst: "resistor mod" sounds like messing witht he power consumption/thermal meter
11:08 karolherbst: well if it runs at x16 under mac os, the driver is buggy there
11:08 karolherbst: or modded
11:08 karolherbst: I wouldn't be surprised if they add non conformant changes to the vbios
11:10 tomtomgps_: I'm trying to understand, when a PC boots the PC reads the VBIOS from the GPU ?
11:10 karolherbst: kind of
11:11 karolherbst: the vbios doesn't know how to initialize GPUs
11:11 karolherbst: because there are too many and nobody bothered to have sane interfaces
11:11 karolherbst: so it needs to read some of the vbios to properly initialize it
11:11 karolherbst: and usualy there is just 16 bit x86 code on it
11:11 karolherbst: and I think an emulator for uefi systems to execute that code
11:12 karolherbst: and that code might also read some parts of the vbios for knowing how to talk with certain displays or whatever
11:12 karolherbst: it is more or less complete afaik
11:18 tomtomgps_: I'm not sure how a driver can choose to use x8 or x16. Isn't x8 or x16 speed initialized during the boot sequence ?
11:20 karolherbst: tomtomgps_: it is just the pcie lanes
11:20 karolherbst: it practically doesn't matter anyway
11:20 karolherbst: for laptops it does a bit
11:21 karolherbst: and you can kind of switch the amount of lanes on the fly
11:21 karolherbst: on mac os x it seems to switch depending on the load
11:21 karolherbst: but that's simply because of the vbios
11:21 karolherbst: you can kind of state this per performance level
11:23 tomtomgps_: Yes under Mac OS it seems to change when the card is under load, that is for cards which have a Mac EFI, normal PC Cards running on a Mac will stick to x8.
15:58 imirkin: karolherbst: the vbios is 16-bit real mode x86 code.
16:22 nyef: ... So, if you had a 286 strapped to a PCI bus, could you drop an nvidia card in there and expect it to run? (-:
16:23 nyef: Well, at least CRAWL?
16:34 karolherbst: imirkin: okay sure, but I guess most of it is just data, no?
16:40 nyef: karolherbst: Might depend on how many "init scripts" there are. Are those "just" data, are they code, or does it depend on which way you're looking at them?
17:11 karolherbst: pendingchaos: HdkR will love your perf improvements :p
17:11 karolherbst: pendingchaos: are your tests with a reclocked GPU?
17:11 pendingchaos: no
17:11 pendingchaos: I don't think there is any with Pascal?
17:12 karolherbst: ahh, pascal
17:12 karolherbst: yes, there isn't
17:12 karolherbst: well, integer math isn't _that_ common
17:12 karolherbst: pendingchaos: I assume affected shaders are mainly dolphin and those feral ported games like tomb raider?
17:12 karolherbst: those do a bit of integer math inside compute shaders
17:12 pendingchaos: I don't have any feral ported games, so I haven't tested
17:13 karolherbst: right, but I meant you can kind of guess what applications are affected by looking at the shader-db output
17:13 pendingchaos: it seems to improve dolphin a good bit
17:13 karolherbst: uhm....
17:14 HdkR: I love everyone's perf improvements
17:14 karolherbst: pendingchaos: https://github.com/karolherbst/shader-db/commit/88aeaa2d6e7d3480c3b87055a487f2e7b8a91036
17:14 karolherbst: I guess I don't need that split-to-files.py thing anymore
17:15 karolherbst: pendingchaos: https://github.com/karolherbst/shader-db/commit/482d93424927e82e70be7ae9366e14ee9300f301
17:15 orbea: karolherbst: pendingchaos would these dolphin perf imrprovements work with kepler? Can I test? :P
17:15 karolherbst: orbea: no
17:15 karolherbst: maxwell/pascal only
17:15 orbea: oh, okay
17:16 karolherbst: they have a weirdo 16 but not really 16 bit IMAD instruction
17:16 karolherbst: which is faster then the full 32 bit imad
17:16 karolherbst: *than
17:16 pendingchaos: It does have some Fermi+ changes for constant folding
17:16 HdkR: One might say it is significantly faster
17:17 pendingchaos: but that's only for some multiplications by immediates
17:17 karolherbst: pendingchaos: sure, but that won't give a 50% speed boost :)
17:17 pendingchaos: probably not
17:17 orbea: i'd be happy with like 5% :P
17:17 pendingchaos: *probably
17:18 pendingchaos: I'm doing a shader-db run right now btw
17:18 orbea: dolphin-emu + reclocking is mostly full speed here, but not by much
17:19 pendingchaos: and I just realized that I should have updated the shader-db numbers on the patches
17:19 pendingchaos: it's not a huge deal though
17:20 karolherbst: pendingchaos: well, you can apply my patch and nv-report will print which shaders are affected
17:21 karolherbst: and with some cut/sort -u magic you can even do nice stats
17:21 karolherbst: we probably want to improve the script properly
17:24 pendingchaos: for >20 changes in instruction counts, I'm getting dolphin, alien isolation, dirt rally, everspace, f1 2015, hitman pro and tomb raider
17:25 Lyude: In terms of i2c stuff on nouvea, what does pad stand for?
17:26 karolherbst: pendingchaos: yeah, dolphin + feral ported games :D
17:26 karolherbst: everspace isn't though
17:26 karolherbst: alien isolation? I don't think so as well
17:29 Lyude: btw karolherbst: are some of the hwmon sensors on i2c?
17:29 karolherbst: Lyude: yes, the power ones
17:29 Lyude: alright
17:29 karolherbst: and uhm
17:29 karolherbst: we also have some GPUs with the volt pwm being on the i2c
17:30 karolherbst: but that shouldn't be of any concern for runpm
17:30 karolherbst: but uhm
17:30 karolherbst: or well
17:30 karolherbst: maybe we start reading the voltage out from there then?
17:30 pendingchaos: karolherbst: seems it is: https://www.feralinteractive.com/en/games/alienisolation/
17:30 karolherbst: yeah..
17:30 karolherbst: I am not quite sure these days
17:30 karolherbst: I know that everspace isn't
17:31 karolherbst: but those feral shaders are quite looking a like
17:31 karolherbst: like they use integer math a lot :)
17:31 HdkR: Sounds like you need more OpenCL kernels :D
18:23 karolherbst: ohhh, reuse can be used across movs
18:26 azaki: karolherbst: sorry about the file compression; my upload speed isn't too good, so yeah. i actually also have 32GB ram (ddr3), and it was using about 6 or so while it was compressing with --threads=8
18:26 karolherbst: ahh
18:26 karolherbst: yeah, seems like xz uses more ram when decompressing
18:27 azaki: at first i was trying to actually compress it even more, by using -e (--extreme) and -9, but that was taking way too long even with 8 threads, so i gave up on that.
18:27 azaki: and just used default compression settings.
18:27 azaki: which is -6 i think
19:51 nyef: Do we have an archive of tesla context-switch microprograms (and possibly initial context values) by chipset somewhere, or is that something that I'll need to build myself if I want such a thing?
20:27 Lyude: skeggsb_: if I made changes to nouveau that involved adding a new nvif callback, would i need to submit it to the out of tree kernel module's repo on github?
22:12 nyef: ... I'm seeing ctxprog register hits (in the kernel source, even) that don't match anything I see in rnndb?
22:13 nyef: Either I'm not looking in the right place, or rnndb is somehow "behind", even if it's "we don't know what this is, but it appears to be necessary on such-and-such chipsets".
22:17 azaki: karolherbst: by the way, i should mention, the apitrace was already like over 20GB or so before i even got in-game, since it took awhile to load. so the part where i'm actually in-game is probably towards the last few GB of the trace. i only stayed in-game for maybe 20 seconds or so.