00:35karolherbst: imirkin: quick review on the neg/abs -> add patches?
03:04imirkin: karolherbst: + return def(0).getFile() != FILE_PREDICATE && src(0).getFile() != FILE_PREDICATE;
03:04imirkin: i'd say == FILE_GPR for both
03:04imirkin: since there's also FILE_FLAGS
03:04imirkin: which you presumably also don't want to hit
03:05karolherbst: imirkin: I want to allow const mem though
03:05imirkin: hm. well, FILE_FLAGS you probably also want to skip.
03:06imirkin: so white-list FILE_GPR and FILE_CONSTMEM or whatever it's caleld
03:06karolherbst: yeah, probably the best idea
03:06imirkin: is there no ABS modifier with IADD?
03:07karolherbst: there is not
03:07karolherbst: think about it ;)
03:07karolherbst: it's super complicated to actually implement it for free
03:07karolherbst: if not at all impossible
03:07imirkin: not _that_ hard
03:07karolherbst: it is
03:07imirkin: given that neg is an option
03:08imirkin: oh, neg is also not an option?
03:08karolherbst: neg is, but neg is easy
03:08imirkin: how is neg harder than abs?
03:08karolherbst: I meant, neg is there
03:09karolherbst: but no abs
03:09karolherbst: abs is really annoying because it's not a singe simple operation
03:09imirkin: but as far as the hw is concerned, if you can do neg, it's not much harder to do abs
03:10imirkin: i get that they didn't implement it
03:10imirkin: but it's not a difficulty-of-hw thing
03:10karolherbst: well, modifiers have to be kind of for free
03:10karolherbst: otherwise they are a bit pointless
03:10imirkin: abs is no harder than neg
03:10karolherbst: it actually is for integers
03:11karolherbst: for neg you don't have to check the sign
03:11karolherbst: you just do the neg
03:11karolherbst: this you won't need for floats
03:11karolherbst: abs and neg are essentially simple bit operations on floats
03:12imirkin: so your point is that for float, abs is just & 0x7fffffff
03:12karolherbst: neg more or less for ints as well
03:12HdkR: I'd expect it to be a pipeline size optimization. takes more effort to invert a single bit versus doing an integer negate :P
03:12imirkin: while for int, it's conditional on the high bit? that doesn't seem too bad.
03:12karolherbst: imirkin: it's bad enough so that you can't implement it for free on a modifier ;)
03:12imirkin: or they chose not to do it based on other parameters
03:12karolherbst: might be
03:12karolherbst: but if you look at lowering code, the ints abs is the most complex one
03:12HdkR: Potentially, how often are you negating integers compared to floats anyway?
03:13imirkin: like "who the fuck takes abs of an integer"
03:13karolherbst: imirkin: well, you still have i2i.abs ;)
03:13imirkin: yeah, gotta throw it in _somewhere_
03:13imirkin: but doesn't mean you have to do it everywhere - still uses gates/etc
03:13karolherbst: well, you could force the compiler to lower it
03:14karolherbst: I guess it would be just be a big waste of hardware for questionable benefit
03:14karolherbst: maybe it's different with volta/turing now... I never checked
03:18karolherbst: HdkR: actually, do people complain a lot that with Turing older games (using float only shaders) are slower?
03:18HdkR: Why would they be slower?
03:18karolherbst: because they are
03:18HdkR: Are you talking about dual-issue being removed?
03:19karolherbst: that was the case for maxwell already
03:19karolherbst: I saw some benchmarks around showing that
03:20imirkin: HdkR: less fps. as opposed to moar fps.
03:20karolherbst: HdkR: or could be that it was perf/money
03:20karolherbst: something like that
03:20HdkR: perf/money has definitely gotten worse
03:20karolherbst: well, not for engines doing a lot of integer operations
03:20HdkR: As far as I'm aware is that instruction latency is only improved
03:21karolherbst: sure, because the ALUs are simplier
03:22HdkR: So. Instruction latency is improved, throughput is the same. You're saying the performance has decreased?
03:23karolherbst: never tested it myself
03:23karolherbst: I could though :/
03:24karolherbst: ohh actually, I can't
03:24karolherbst: I have no high perf pascal card
03:25karolherbst: imirkin: any other comments on the patches?
03:26HdkR: karolherbst: I think you're reading in to the internet too much. People are upset by the higher price which causes perf/$ to drop, and complaining that the RTCore and Tensor cores add nothing to compute so they should have just slapped more SMs and given people more raw compute
03:29imirkin: karolherbst: + mod = cvt->src(0).mod ? NV50_IR_MOD_NEG_ABS : NV50_IR_MOD_NEG;
03:29imirkin: that seems ... dangerous
03:29imirkin: why not just check if mod == ABS
03:29imirkin: and if not, bail
03:30imirkin: also does Modifier() have a default constructor? if not, you have to initialize mod.
03:30imirkin: counting cvt instructions seems weird. reasonable exercise here, but not something to generally keep track of...
03:31karolherbst: imirkin: what's wrong with neg abs?
03:32imirkin: it just seems dangerous to assume that the modifier is "abs"
03:32imirkin: you just test it for not being neg
03:32karolherbst: ohh, I see
03:32karolherbst: yeah, I can be more explicit here
03:32karolherbst: but it's not like anything else would be legal
03:32imirkin: yea i know
03:32imirkin: but ... being explicit makes the code safer and easier to read
03:33karolherbst: "cvt->src(0).mod.abs() ? " :)
03:33karolherbst: ohh mhh, abs() only checks if the bit is set :/
03:34karolherbst: mmhhh I really don't like the Modifier class, it has so many pitfalls
03:36karolherbst: imirkin: yeah... dunno about counting cvts, I basically wrote the patch because I actually wanted to check how many the patch eliminates
03:36karolherbst: so I did the work anyhow
03:36imirkin: but i think perhaps it's best to elave that out
03:36karolherbst: yeah maybe
03:38imirkin: it wasn't tons and tons of work, so hopefully you won't feel too bad about it :)
03:41karolherbst: not really
03:46karolherbst: imirkin: mhh, we could potentially allow this optimization for f64
03:47imirkin: why would that be disallowed?
03:47karolherbst: well you left the comment on the first version that I should disable it for non 32 bit operations ;)
03:48karolherbst: ohh, actually I checked == 8 before
03:59karolherbst: I kind of want to clean up the if case in case OP_NEG: a bit :/
03:59karolherbst: but I don't come up with a nice way
04:02karolherbst: if (!isFloatType(cvt->sType) && cvt->src(0).mod) return;
04:02karolherbst: if (isFloatType(cvt->sType) && (cvt->src(0).mod && cvt->src(0).mod != Modifier(NV50_IR_MOD_ABS))) return;
04:02imirkin: there's a .abs() no?
04:03karolherbst: yeah, but abs() returns true for neg_abs as well
04:03imirkin: right..... yeah
04:03imirkin: yeah, that seems fine
04:03imirkin: just add a little comment
04:03karolherbst: I think I will redesign the entire Modifier class to be more sane overall, and much easier to use :p
04:04karolherbst: that "ambiguous overload for 'operator!=" compiler error is annoying as well
04:04karolherbst: if you skip the Modifier() thing
04:06imirkin: yeah =/
04:07imirkin: ideally NV50_IR_MOD_ABS would somehow be of the right type
04:07imirkin: i don't have a great answer
04:07karolherbst: that getOp() thing is totally dangerous as well :(
04:10imirkin: just ... try not to break nv50 ;)
04:10karolherbst: I won't :p
04:11karolherbst: essentially, this class should have a method "applyToMOV()" where you can merge the modifier into the mov and everything gets done for you automatically
04:11karolherbst: oh well... will probably not have time for it this week, and by bed awaits
04:13imirkin: i pushed out the screen->text tracking logic changes btw
06:01television: [drm:drm_calc_timestamping_constants [drm]] *ERROR* crtc 41: Can't calculate constants, dotclock = 0!
06:01television: any idea why this occurs?
06:02television: it seems to happen at random, happened twice this month so far
06:02television: never happened at all before 2019
06:02imirkin: i have literally never seen that
06:03imirkin: not just myself, i've never seen anyone reporting that for any drivers
06:03imirkin: television: is it just an annoying error message, or is there any sort of additional failure?
06:04television: more than an annoying error message
06:04television: so it goes like this: boot up, laptop works fine for days worth of uptime, suddenly I get that dotclock = 0 thing
06:04television: then every time after that point
06:04television: when playing videos of any sort
06:04television: wether from mpv, VLC, youtube, anything
06:05television: even an advertisment
06:05television: the screen flashes to black
06:05television: then flashes back again
06:05imirkin: full-screen, or even windowed?
06:05television: even windowed.
06:05television: (i rarely fullscreen... if at all even)
06:06imirkin: can you pastebin your xorg log? that will tell me a lot about your system
06:06television: seeking through video causes it to flicker to black at the point where i click and when i un-click
06:07television: and i get dell wmi messages when the screen flashes... but that normally happens when changing video inputs and etc
06:07television: dell_wmi: Unknown key with type 0x0011 and code 0xffd1 pressed
06:07television: dell_wmi: Unknown key with type 0x0011 and code 0xffd0 pressed
06:07television: sure, one sec
06:08imirkin: well, changing video inputs would certainly cause issues :)
06:08imirkin: also, you say never before 2019... did you update your kernel?
06:08television: i update the kernel quite often
06:09imirkin: do you know the last version it worked ok
06:09television: also to be clear, I haven't changed video inputs
06:09imirkin: and the first version it didn't?
06:09imirkin: (yeah, i get that - i just mean if for some reason the system is changing video inputs behind your back due to some crazy reason ...)
06:09television: I'm just saying I get the same message when the screen flashes due to an input change
06:09television: i gotta check arch's history of kernel updates
06:10television: it won't be exact either, i could give it a random stab of when "im sure it worked then"
06:10imirkin: well, the next step after that would be
06:10imirkin: identify a working version, identify a broken version, do a kernel bisect
06:10imirkin: although, you can't really repro at will ... hm
06:11television: I'd guess sometime around 4.19.5 was working fine
06:11television: im on 4.20.0 now
06:11television: and oddly the Xorg log is last from Jan 8th?
06:12imirkin: it's hard to identify when an xorg log is from
06:12imirkin: there are lots of dates
06:12imirkin: and most of them aren't the ones you want
06:12television: im looking at the last modified time
06:12television: of the file
06:13imirkin: [ 7.602] (==) Log file: "/var/log/Xorg.0.log", Time: Tue Jan 8 01:32:44 2019
06:13imirkin: you did manage to pick out the right time, well done ;)
06:13television: yeah thats strange because... my uptime says "4 days"
06:13imirkin: that was a failed start though
06:13television: so... wtf?
06:13imirkin: check in ~/.local ?
06:13imirkin: if you're using systemd, it does crazy shit
06:13imirkin: i haven't quite mastered it
06:14imirkin: (lack of interest on my part)
06:14television: same tbh.
06:14imirkin: but i don't use it. do you? :)
06:14television: i do.
06:15television: ahhh thats more like it.
06:16television: also I *think* this bug may occur during sleep/wake
06:16imirkin: wow, interesting
06:16imirkin: GT216 + eDP
06:16television: haven't had it enough times to me for me to make a good correlation as to whats causing it
06:16imirkin: without actually looking at kernel code, it does sound like the edid has bad/missing data for some reason
06:17imirkin: with DP, the edid comes in via an auxiliary channel
06:17imirkin: but there's also weird provisions for having the EDID stored as part of the vbios
06:17imirkin: also you're using the modesetting ddx... i'd recommend xf86-video-nouveau
06:17television: with this laptop, i could upgrade to a S-IPS eDP 1920x1080 panel if i wanted to
06:17imirkin: not sure if that plays into it or not
06:17television: instead of this TN
06:18television: ok I'll try it
06:18television: I installed it, is there anything else I need to do besides reboot?
06:19imirkin: just restart X
06:19imirkin: no need to reboot
06:23television: bleh another annoyance: I press WinKey+L to lock the laptop, and the screen goes black as expected/normal
06:23television: when i move the cursor/tap the keyboard, the screen flashes
06:23television: 5 times in a row
06:23television: before finally calming dow
06:24television: seems to cause two more of those dotclock errors in dmesg one right after the other
06:24imirkin: i wonder if there's something loose in your keyboard
06:24imirkin: or something funky with the wmi driver
06:24imirkin: i hate to blame hardware on things like that ...
06:24imirkin: but in this case, it sounds like it might be related
06:25imirkin: check to see if an old kernel that you're pretty sure was fine still works ok
06:26television: wmi driver maybe- because dell pushed an update that f***ed up my keyboard backlight and I keep getting-
06:26television: woah hey i see more nouveau errors earlier in my dmesg
06:27television: this might be useful
06:27imirkin: can you try reverting that update?
06:28television: heh that was waaaaaaaay back in mid-2018 when that started
06:28imirkin: ah =/
06:28television: and i didn't get this flashy screen dotclock stuff back then
06:29imirkin: right, so probably not it.
06:31television: it actually goes on a lot longer than that what i pasted
06:31television: but i couldn't easially pastebin like 10 pages from journalctl exactly
06:32television: it seems really repetitive though
06:32television: it ends with this
06:33imirkin: that's surprising
06:40television: what does it mean?
06:40television: (noob here, sorry)
06:41imirkin: not sure what it means
06:42imirkin: it means there was a data desync somewhere, which actually happens on nv50
06:42imirkin: or ... hm. dunno.
06:42imirkin: LOCAL_LIMIT_WRITE trap means a shader did something silly
06:43imirkin: however the 1000f010 is actually most likely a data value to a query-related thing
06:43imirkin: which ... shouldn't mix
06:43imirkin: this happens on nv50 sometimes, i have some ideas as to why, but haven't gotten around to implementing anything useful to fix it
06:43imirkin: there's some belief that it's more likely to happen with xf86-video-nouveau due to how it implements vsync
06:44television: hm well i'm open to apply crazy patches to my laptop if you need a GT216 to test with
06:44imirkin: i have a G84 plugged in too
06:45television: im happy nouveau is improving... I had an awful bug where it'd kernel panic on a GT216 with doing literally nothing.
06:45imirkin: i just need ... time.
06:45television: without fail, every 15m to an hour, it would kpanic
06:45imirkin: and motivation. both of which are lacking.
06:45television: without touching the keyboard or doing anything at all
06:45television: i know that feel
06:46television: i booted gparted's livecd to repartition something on my laptop, unaware that it used nouveau, and it kernelpanic'd right in the middle of the operation
06:46imirkin: i'm glad things improved. panics (or lack thereof) are the work of skeggsb, glad things improved for you
06:46television: no idea what was causing that, but a few months back, the panics were gone
06:47television: also i got good at sensing when it was about to panic, the cursor would hiccough in the middle of moving, then I'd SysRq REISUB right away before it actually panic'd
06:48television: if i waited a few secs more it would panic
06:48television: was considering using firescope to debug it
06:48television: but lack energy to do so...
06:49imirkin: esp since it appears to be fixed now
06:49television: so i switched to propiatary, but then in April, some changes were made that made nvidia's stuff unusable... so i had to switch back to nouveau :D
06:49imirkin: i've had good success with netconsole for most things
06:50television: imirkin, skeggsb, thank you both very much
06:50television: without you all, i wouldn't be using my laptop in linux or i would have held my stuff back to some ancient version of xorg and nvidia-propiatary
06:51television: i appreciate this project a lot :D
06:51imirkin: cool :)
06:52television: idk what time it is by you but its 1:51am here, im gonna head to bed
06:54imirkin: we're in the same TZ
07:25maxthecat: Can anyone tell me where I can find the page table structure of NVF1 in source code?
07:28imirkin: have a look at https://github.com/skeggsb/nouveau/blob/master/drm/nouveau/nvkm/subdev/mmu/vmmgf100.c
07:29imirkin: and https://github.com/skeggsb/nouveau/blob/master/drm/nouveau/nvkm/subdev/mmu/vmmgk104.c
07:32maxthecat: Got it, thank you!
07:43imirkin: maxthecat: you may also find https://envytools.readthedocs.io/en/latest/hw/memory/g80-vm.html useful to read through
07:43imirkin: the gf100+ vm is slightly different, but still relatively similar.
08:09maxthecat: actually I have already read the envytools g80-vm, I just don't know how to walk the page table of NVF1.
08:11imirkin: right. so it's a little different -- there's a quicker way to switch between VM's
08:11imirkin: so you have to get the PDE entries from a slightly different place
08:11imirkin: but iirc the format is identical
08:12imirkin: skeggsb would know more off the top of his head though
08:13imirkin: also look in vmm.c in that same dir
08:20maxthecat: emmm, thank you. I'll dive into the implementation of that.
15:20karolherbst: imirkin: I will run your text patch through piglit today, just to make sure.
15:25pmoreau: karolherbst: Did you get a chance to run the clover series on Radeon?
15:25karolherbst: not yet
15:26pmoreau: Okay, I’ll give it another shot on mine, see if I can get something on the screen at least (or via SSH).
15:27karolherbst: yeah.. I am not feeling too well today, so I stayed at home
15:32RSpliet: Better be fit for FOSDEM you
15:38mupuf: karolherbst: get better soon!
15:53pmoreau: karolherbst: Arf, hope you’ll get better soon! :-/
15:53karolherbst: uff... piglit crashed my kernel
15:55karolherbst: skeggsb, imirkin: https://gist.github.com/karolherbst/27d37bf3e202e91cb80056d99dd6af16
16:20pmoreau: karolherbst: I am not going to get that far: `get_compiler_options()` is not set for r600 (nor for any driver). We need 1) some additional checks to avoid getting a NULL pointer dereference, 2) get at least one user for this series.
16:22karolherbst: pmoreau: get_compiler_options is only important if you support nir
16:22pmoreau: Wait, I’m on the wrong branch
16:23pmoreau: Nevertheless, it shouldn’t be crashing.
16:24karolherbst: it shouldn't
16:24karolherbst: where does it crash though?
16:25karolherbst: pmoreau: right, but wrong branch
16:25karolherbst: most patches aren't exactly well written there
16:25karolherbst: the clover ones
16:25karolherbst: get_compiler_options is only set if you actually support NIR
16:25pmoreau: Yes, that’s what I said earlier “17:22:24 pmoreau │ Wait, I’m on the wrong branch”
16:26karolherbst: so there is no point getting calling spirv_to_nir for r600 in the first place ;)
16:26karolherbst: *to call
16:26pmoreau: Right, but it is going to be called for every single device, regardless of what they support (with the current code).
16:27karolherbst: pmoreau: yeah.. but that's not part of your patches, right?
16:27pmoreau: No, that’s part of the second series.
16:27karolherbst: ohh, okay
16:27karolherbst: which patch added that?
16:28pmoreau: Hum, not part of my second series, but probably the one that you and Rob will be sending along to get clients. https://github.com/karolherbst/mesa/commit/647547a4d97200fedd16bd3f33464cbcd8e53732
16:29karolherbst: yeah, that's what I thought
16:29karolherbst: it even says "WIP" :p
16:30karolherbst: I wanted to take care of most of the clover patches after we've got the infra done in nir + your clover patches are merged
16:30pmoreau: Makes sense.
16:31pmoreau: Going to ping again on the list once I’m done testing the patch. But I doubt it will make it into 19.0.
16:51pmoreau: It “works”, i.e. an easy OpenCL tests does run properly, but I am getting some weird messages like “unsupported call to function _Z13get_global_idj”. I wonder if it’s something wrong 1) in the way I use SPIRV-LLVM-Translator, 2) in SPIRV-LLVM-Translator itself, or 3) in the LLVM backend of clover.
16:52karolherbst: pmoreau: libcl installed?
16:52karolherbst: but yeah...
16:52karolherbst: that could get weird
16:52pmoreau: I do have it.
16:52karolherbst: yeah.. dunno
16:52karolherbst: pmoreau: you might have to tell clover to link against libcl
16:52pmoreau: Does it need to match an LLVM version?
16:52karolherbst: could be?
16:53karolherbst: it does have *.bc files
17:15pmoreau: karolherbst: Comparing the LLVM IR I get between compiling from source and using SPIR-V -> LLVM IR, I get `%0 = tail call i32 @llvm.r600.read.tgid.x() #2` (from source) and `%3 = call spir_func i64 @_Z13get_global_idj(i32 0) #1` (from SPIR-V).
17:15karolherbst: yeah, looks like some linking issues
17:16pmoreau: Both are before linking
17:17karolherbst: pmoreau: check the llvm/invocation.cpp compile() function
17:17pmoreau: So I might need to generate directly for r600, or add an extra conversion layer from SPIR-V to R600.
17:17karolherbst: there it links against libcl
17:18karolherbst: and the other libcl bits
17:19pmoreau: Hum, yeah, that might help. The header bits I can just ignore though.
17:20karolherbst: I think the header bits are the more important parts
17:21pmoreau: It’s a C header, and the input is SPIR-V, how is it going to help? I don’t plan on doing SPIR-V -> C. :-D
17:22karolherbst: ohhhh, right
17:22karolherbst: that sounds like a big issue actually
17:22karolherbst: pmoreau: they could overload/hide the clang functions to resolve symbols differently in the headers :/
17:22karolherbst: no idea how that's done correctly
17:23pmoreau: Yeah, me neither :-/
17:23HdkR: pmoreau: Doesn't spirv-cross support cross-translation to functional C++?
17:24HdkR: `Convert SPIR-V to debuggable C++ [EXPERIMENTAL]`
17:24HdkR: spirv-cross is mental
17:24HdkR: I need to figure out how that works
17:25pmoreau: HdkR: So you are saying one should do OpenCL C -> SPIR-V -> C++ -> LLVM IR -> R600 (assuming that even works)?
17:25HdkR: If you could figure out how to stick a translation through nir in there somewhere it would be perfect
17:26karolherbst: we could replace C++ with nir
17:26karolherbst: and that's what we could do actually :p
17:34pmoreau: I think I’ll ask on the ML. There was some discussion some time ago about having the clc functionalities as SPIR-V binaries.
17:46karolherbst: pmoreau: but the discussion was still about speciliazed ones :/
17:46karolherbst: so one spv file for each supported chipset or something
17:46karolherbst: which I don't really get the point of
17:47pmoreau: Just had a look at the SPIRV-LLVM-Translator, and it seems to be hardcoded to generate spir(64)?-unknown-unknown target. So nothing I can do at that level.
17:48karolherbst: well, we can't change that
17:48karolherbst: we can change it for the CLC -> SPIR-V path though inside clover
17:51pmoreau: Sure, but I don’t care as much for that one, since 1) that’s in the next pull request, not this one, and 2) there will be Nouveau and? freedreno as consumers, so I don’t feel bad about not supporting Radeon or whoever else.
17:52HdkR: pmoreau: I presume it is hardcoded to that so when it goes through the clang path it does all the specialization work necessary
17:59karolherbst: imirkin: mhhh... I just ran piglit over my negabs patches and... there are fails
17:59karolherbst: "neg ftz f32 $r1 c0[0x0]" -> "add ftz f32 $r1 $r255 neg c0[0x0]" I don't see anything from with it
18:00imirkin_: seeeems reasonable
18:00imirkin_: unless we're emitting the neg modifier wrong
18:00imirkin_: but then we'd be in serious trouble
18:00HdkR: https://hastebin.com/edojacucaj.cpp spirv-cross generates some interesting C++ code
18:01karolherbst: imirkin_: FADD.FTZ R1, RZ, -c[0x0][0x0] ; in nvdisasm
18:01karolherbst: I already tried disable the sched stuff as well
18:01imirkin_: that's pretty convincing :)
18:02imirkin_: and nvdisasm on the native neg one?
18:02imirkin_: also, do you know which value it fails for? is it like a funny value, like NaN or infinity?
18:03karolherbst: F2F.FTZ.F32.F32 R1, -c[0x0][0x0] ;
18:03karolherbst: imirkin_: piglit/generated_tests/spec/arb_shader_bit_encoding/execution/built-in-functions/fs-floatBitsToInt-neg.shader_test
18:04imirkin_: karolherbst: which one fails?
18:04imirkin_: you have to bisect it
18:04imirkin_: comment half out
18:04karolherbst: ohh, this way you meant
18:04karolherbst: the first one
18:05imirkin_: makes sense.
18:05HdkR: It's probably my fault yea
18:05imirkin_: actually, hm
18:05imirkin_: karolherbst: you sure it's the first one?
18:05imirkin_: and not the third one?
18:06imirkin_: 0 - 0 -> 0. whereas it wants -0
18:06imirkin_: karolherbst: also, what's the fail color?
18:06imirkin_: oh wait, it always just sets red.
18:06karolherbst: this is... annoying
18:06imirkin_: try to figure out what's broken there. it's probably more subtle.
18:07imirkin_: anyways, generically, -x != 0-x, it seems :(
18:07imirkin_: at least, not bit-for-bit
18:07karolherbst: if I set expected to -0 it fails as well though
18:07imirkin_: ok, so you commented out all the tests except the first one?
18:07imirkin_: and then go into the shader and comment out all of the if's except the first one
18:08karolherbst: so we have a 0.0 - -0.0
18:08karolherbst: imirkin_: all checks fail
18:08imirkin_: pastebin the current state of your failing shader?
18:09imirkin_: (the shader_test file)
18:09karolherbst: do you mean the actual tests or the if clauses within the shader :/
18:10imirkin_: i mean the contents of the shader_test file you are presently testing with
18:11karolherbst: ohh well, I found my mistake. the -0.0 test actually passes
18:11imirkin_: but the 0.0 test fails?
18:11karolherbst: -1.0 fails
18:12imirkin_: pastebin the test :p
18:12karolherbst: heh, it sometimes passes
18:13imirkin_: are there shader faults?
18:13karolherbst: imirkin_: https://gist.github.com/karolherbst/d1fb9ebac03120e68b72a951d3380abc
18:13imirkin_: comment out the -0.0 test
18:13karolherbst: now it passes
18:13imirkin_: otherwise a no-op draw will result in a pass
18:13imirkin_: since it doesn't clear
18:14karolherbst: it's the -0.0 one randomly failing
18:14karolherbst: sometimes it passes with -1.0
18:15karolherbst: sometimes with 0.0
18:15karolherbst: as the expected result
18:15karolherbst: mixed sometimes passes as well of course
18:15karolherbst: wow, that's evil
18:16imirkin_: can you figure out what happens if you comment out some if's in the shader itself?
18:16imirkin_: e.g. only check the .x one?
18:16karolherbst: like I thought, mhh, let's be smart and move the source to the second slot, because you can support c with it :(
18:17karolherbst: imirkin_: the x and xy checks seem to return 0.0 in all cases so far, testing xyz it starts to sometimes fail
18:18imirkin_: what if you ONLY have the .xyz check?
18:20karolherbst: heh.. wait, let me start from a cleaned shader_test file for the sake of sanity
18:21karolherbst: I think that 0.0 in the expected uniform might confuse shader_runner as well
18:25karolherbst: imirkin_: okay... -0.0 and -1.0 are actually fine. that randomness was something else (I guess 0.0 for ivecs is undefined in piglit)
18:26karolherbst: 0.0 is the first test failing
18:26karolherbst: so 0.0 -0.0 is apperantly 0.0, not -0.0
18:30pmoreau: HdkR: Indeed, that’s some interesting C++
18:30HdkR: pmoreau: Just tested it on a fairly large local group size. It actually threads out really well
18:32HdkR: Completely saturates my 2990WX with a workgroup size set to 128 :D
18:33HdkR: Obviously that means I need to upgrade to Rome to really push it
18:47karolherbst: imirkin_: mhh, so I guess I can't do it for c then and have to leave the source where it is
18:47karolherbst: allthough, for ints that would be fine
18:48imirkin_: karolherbst: dunno
18:48imirkin_: i think it's an inexact opt
18:48karolherbst: for floats?
18:48karolherbst: I think (fneg a) -> (fadd neg a 0) should be fine
18:49imirkin_: i think that'd still end up as 0.0
18:49imirkin_: i.e. -0.0 + 0 == 0
18:49imirkin_: so neg(0) == 0
18:50karolherbst: imirkin_: nvidia is smart
18:50imirkin_: instead of -0
18:50karolherbst: "FADD R2, -RZ, -c[0x0][0x144] ;"
18:50karolherbst: how though?
18:50karolherbst: I thoguht two negs isn't legal?
18:50imirkin_: for IADD
18:53karolherbst: imirkin_: they always seem to use -RZ for neg
18:53karolherbst: "FADD R2, -R2, -RZ ;"
18:53imirkin_: presumably only in this circumstances
18:54karolherbst: in which other ones they wouldn't have to?
18:55imirkin_: e.g. if you wrote code like 0 - x
18:55karolherbst: then you end up with an (fadd 0 neg x) anyway
18:55karolherbst: because apperantly that's not a (fneg x)
18:55imirkin_: i know
18:55imirkin_: but in that case it'd be RZ
18:56karolherbst: but yeah, for a 0 - x they end up with "FADD R0, RZ, -c[0x0][0x144] ;"
18:56karolherbst: but so would we
18:56imirkin_: i think for the CVT -> ADD thing, you can always set it
18:57imirkin_: er, NEG -> ADD
18:57karolherbst: at least the test passes now
18:57karolherbst: fun issue
18:57imirkin_: that's a subtle failure.
18:57imirkin_: and a subtle fix!
19:00karolherbst: ohhh, for (iadd neg a neg b) I still have that -> (iadd3 neg a neg b 0) opt :)
19:01karolherbst: because it makes totally sense to support two negs with iadd3, but not with iadd :)
19:01karolherbst: or it was one at 1/2 and one at 3?
19:01karolherbst: so -> iadd(neg a 0 neg b)?
19:01karolherbst: something like that
19:03imirkin_: but you don't need that here
19:03imirkin_: since -0 does nothing for ints
19:04karolherbst: imirkin_: no, I meant for supporting iadd with two neg modifiers
19:04imirkin_: that's a separate thing
19:04karolherbst: I just remembered I had patches somewhere
19:04imirkin_: you can do it on maxwell+
19:04karolherbst: but yeah, that neg -> add mitigates the penalty for not having it already a bit :)
19:27karolherbst: much better, no regressions now :)
19:30karolherbst: imirkin_: sat -> fadd(0, neg a)
19:30karolherbst: uhm sat(neg a) ->
19:30imirkin_: presumably -0, neg a
19:30imirkin_: not that it super-matters in this case
19:30karolherbst: allthough I am not quite sure if we even support mods on a sat
19:31imirkin_: we don't
19:31imirkin_: or ... shouldn't
19:31karolherbst: yeah, we don't
19:31karolherbst: still wrote the code to support mods while doing the sat -> add conversions
19:32imirkin_: should probably enable that, dunno
19:32imirkin_: i'm sure it has like zero impact
19:38karolherbst: I think I found an nvidia compiler bug.. maybe
19:38karolherbst: fabs(0.0) returns -0.0
19:41karolherbst: "FADD R2, -RZ, |c[0x0][0x144]|" for fabs(a) doesn't sound right
19:42karolherbst: not that it matters much though
19:47karolherbst: heh "FADD.SAT R0, -RZ, -|c[0x0][0x144]|"
19:48HdkR: karolherbst: What glsl version?
19:48karolherbst: HdkR: CL 1.2
19:49karolherbst: it seems like the compiler always puts a -RZ for CVT -> FADD
19:49karolherbst: maybe it doesn't matter for abs?
19:49karolherbst: but returning -0.0 for abs is kind of weird
19:50HdkR: Little strange yea
19:51karolherbst: I didn't verify it at runtime
19:51karolherbst: the binary just looks like it would
19:52imirkin_: karolherbst: why not right?
19:52imirkin_: you mean the -RZ?
19:52imirkin_: probably doesn't end up playing..
19:53karolherbst: like what if you do fabs(0.0)/0.0
19:53imirkin_: i mean, -0 + 0 = +0
19:53imirkin_: so it doesn't matter
19:54karolherbst: -0.0 / 0.0 is what? was it +inf or -inf?
19:54imirkin_: -inf i think? or nan
19:54imirkin_: probably nan.
19:55karolherbst: yeah.. hopefully
20:48mslusarz: I wonder whether firmware signature verification is vulnerable to https://yifan.lu/images/2019/01/Injecting_Software_Vulnerabilities_with_Voltage_Glitching.pdf ;)
21:10karolherbst: mslusarz: it most likely is
21:10karolherbst: question is rather, are you able to adjust the voltage ;)
21:10karolherbst: on pascal afaik that's locked down
21:10karolherbst: but then we speak about falcon voltage here
22:03karolherbst: imirkin_: is there something special about redicates in nv50?
22:03karolherbst: on nvc0 we seem to use mov for reg <-> predicate stuff
22:04karolherbst: ohh wait, my patch doesn't apply to nv50 anyway
22:04karolherbst: but still
22:06HdkR: karolherbst: Surely six predicates is enough for anyone and you never need to spill ;)
22:08imirkin_: karolherbst: nv50 doesn't have predicates
22:08imirkin_: it has flags registers
22:08imirkin_: which are multi-bit regs
22:08imirkin_: which can be used to predicate things
22:08imirkin_: based on e.g. sign or equal or whatever flags
22:08karolherbst: ahhh, I see
22:08imirkin_: kinda like EFLAGS on x86
22:08imirkin_: but not as many bits :)
22:09imirkin_: HdkR: 7
22:09HdkR: Are you considering PT to be a true predicate?
22:10imirkin_: otherwise it'd be 8
22:10HdkR: Oh derp, I misremembered how bits work
22:10karolherbst: volta will be fun
22:10HdkR: I blame meetings
22:10imirkin_: whatever the problem, that's a good blame target
22:11karolherbst: volta will be tons of fun for codegen
22:11HdkR: imirkin_: Don't worry, on Turing you get DOUBLE the predicates
22:11HdkR: woop woop, party
22:11karolherbst: like instructions suddenly start to return two carry bits :)
22:11imirkin_: *thats* why its so much more expensive
22:11HdkR: Yea, it explains the price hike all by itself
22:12karolherbst: or turings uniform registers/predicates
22:12karolherbst: also fun
22:12HdkR: Yea, that's what I alluded to. I guess it isn't /technically/ double predicates when they are just UPs
22:12karolherbst: which essentially every op can read from
22:12karolherbst: HdkR: there are less UPs actually
22:13karolherbst: I think
22:13karolherbst: I am sure there are only 64 UGPRs
22:13HdkR: Love that phase when still trying to figure things out
22:13karolherbst: 63 if you consider the always 0 one
22:14karolherbst: also no builtin library possible afaik
22:14HdkR: builtin library?
22:14imirkin_: we have a library of builtins
22:14karolherbst: no function calls
22:14imirkin_: to do things like intdiv, etc
22:14karolherbst: sure, but volta has no call anymore
22:14imirkin_: karolherbst: btw, did we ever merge the fp64 library things?
22:14HdkR: Oh, that's what you're concerned about
22:15karolherbst: imirkin_: and for volta that's super useless anyway
22:15karolherbst: or we save the regs ourselfs
22:15karolherbst: anyway, fun
22:15karolherbst: uhm wait
22:15karolherbst: we have to save those anyway
22:15karolherbst: but we have no RET
22:15karolherbst: _that_s the problem
22:15karolherbst: so we have to save the return address
22:15imirkin_: can you load the IP?
22:16karolherbst: also they changed to thise other bfe/bfi format
22:16imirkin_: to the BFM thing?
22:16karolherbst: yeah, I think so, no idea how either one is called
22:16karolherbst: we have both in nir
22:17HdkR: Divergent data analysis is something I'm super curious about doing
22:17karolherbst: the one we use today, the other one used in volta+
22:17imirkin_: bfm = generate a mask
22:17imirkin_: and then the next instruction consumes the mask
22:17karolherbst: I think in the worst case one instruction can actually write to two predicates and read from three predicates at the same time :)
22:17karolherbst: or swaped
22:17karolherbst: something insane like that
22:19HdkR: That's a good one
22:19HdkR: I love an instruction that can touch five predicates
22:22imirkin_: like PSETP?
22:24HdkR: PSETP, PLOP3
22:24karolherbst: imirkin_: instructions just take two predicates for now reasons
22:24karolherbst: and generate two
22:24karolherbst: and then you have predicated execution
22:25karolherbst: they just take two
22:25karolherbst: don't ask why
22:26karolherbst: even a FSETP has 1 input, 1 predicate, 2 outputs
22:26karolherbst: mhh, actually that's the case for the maxwell ISA as well
22:27karolherbst: mhhh, I do missremember quite a bit already
22:28imirkin_: PLOP - that's a good opcode name
22:28HdkR: I like plop
22:29imirkin_: what does it do? checks if a logic op returns 0 or not?
22:29imirkin_: for like if (a & b || c & d || ... )
22:31karolherbst: P: predicate LO: logic operation P: predicatet, and maybe does return $p1 || $p2? .... let me check
22:31imirkin_: karolherbst: yeah, i'm sure it does
22:31imirkin_: or a configurable "addition" with another predicate
22:31imirkin_: yeah, i bet its pD = (pA op1 pB) op2 pC
22:32HdkR: What's that fifth predicate used for? :)
22:32imirkin_: that one's just kind extra :)
22:33HdkR: lol, encoded but dead
22:33karolherbst: two return ones
22:34karolherbst: where the second is !first
22:34karolherbst: and now we predicated that PLOP3 :)
22:34karolherbst: $p0 plop3.or.or $p1 $p2 ($p3 || $p4) || $p5 :)
22:35imirkin_: gonna run out of predicates pretty soon!
22:35HdkR: karolherbst: What is the definition of first in that statement? :P
22:35imirkin_: p1 + p2 are both dests
22:35imirkin_: p1 = the thing. p2 = !p1
22:36imirkin_: although that's the first time i've heard that.
22:36karolherbst: imirkin_: it's there since nvc0 :)
22:36imirkin_: i knew about its existence
22:36imirkin_: just never knew what the second pred dest was
22:36HdkR: I'll say that your definition is wrong then :D
22:36karolherbst: :p probably
22:37karolherbst: no idea where I've actually read that
22:37karolherbst: I am sure it's the right thing though
22:37HdkR: tsk tsk tsk, gotta do your homework before you can play with that second dest
22:38pmoreau: HdkR: Evil person! :-D
22:38imirkin_: homework sucks
22:38imirkin_: i didn't do it in school, i'm not going to do it now...
22:42HdkR: btw, anyone in here going to Fosdem?
22:48imirkin_: i think karol might be?
22:48karolherbst: HdkR: but you could actually point us to any officially available documentation and statements, right?
22:51HdkR: big oof
22:52HdkR: Cuda docs are rude enough to say it is available but that is it
22:53karolherbst: apperantly that some super magic never being used
22:53karolherbst: maybe it's just for fun there and has no apperant reason
22:54HdkR: It's almost like pseudocode for how instructions operate would be greatly appreciated
22:54karolherbst: sadly we only get that for ptx
22:55HdkR: You know, pass ptx in to cuda tools, disassemble it to get assembly, glare at it until it looks like it is fine
22:56HdkR: Although I really enjoy the CFG that the tools can output
22:57imirkin_: HdkR: we have that with hwtests :)
22:57imirkin_: only for nv50 though
22:57imirkin_: need to hook it up for nvc0+, which is more painful
22:57HdkR: reading branchy code without the CFG visible is hard stuff
22:57imirkin_: and apparently there's a limit to how much time mwk is willing to waste on this stuff
22:57karolherbst: I could probably just hijack a generated cuda thing and just check what it outputs
22:58karolherbst: should be done in 5 minutes, I am jsut too lazy
22:58imirkin_: HdkR: https://github.com/envytools/envytools/blob/master/nvhw/sfu_tab.c
23:00HdkR: Looks like a couple of tables
23:00imirkin_: it is.
23:00imirkin_: but determining those tables took a lot of effort, i think
23:00karolherbst: like the hardware is doing anything else :p
23:01mwk: imirkin_: these tables can be read straight from the SM register space, the algorithm was a tougher nut to crack :p
23:01imirkin_: mwk: really?
23:01imirkin_: i thought you RE'd the tables
23:01mwk: not these ones
23:01imirkin_: oh boo =/
23:02mwk: those, on the other hand...
23:02HdkR: Ah. Similar sort of thing had to be done to figure out IBM's Gekko float transcendental instructions
23:02pmoreau: HdkR: I’ll be going to FOSDEM. Are you attending? :-)
23:02HdkR: Haven't really had any plans to
23:02imirkin_: HdkR: the g80 fpu: https://github.com/envytools/envytools/blob/master/hwtest/g80_fp.cc
23:03HdkR: Trying to find a reason to spend the money on a plane ticket
23:03karolherbst: ohh, the second predicate is probably only used for nan and num and indicates whether just one of them is num/nan, but you don't know which one :p
23:03imirkin_: HdkR: support the economy!
23:03HdkR: Support the US airlines more like
23:03pmoreau: HdkR: Support the global warming while enjoying Belgian beers?
23:03imirkin_: HdkR: the more you buy, the more you save!
23:03pmoreau: Good one, Ilia! :-D
23:04HdkR: The plane tickets are so friggin expensive though
23:04karolherbst: HdkR: swim
23:05karolherbst: mhh, but that would probably take a little longer
23:05HdkR: Might have to start now
23:05pmoreau: Might had to start last month
23:06RSpliet: Swim? Is that still an open route yeah? Someone should really build a wall around the coast lines...
23:06karolherbst: I think even walking and swimming that little part between alaska/russia would be faster
23:07karolherbst: RSpliet: with the UK situation right now, I doubt you should joke about other countries misseries :p
23:07HdkR: Huh, $1,016 for a roundtrip flight. Quite a bit less than when I last checked
23:08HdkR: 27h return flight path. ooow
23:08RSpliet: karolherbst: not my country bruv
23:08karolherbst: might have to leave though if they just get out :p
23:08RSpliet: Yep, but I do have this "get out of jail free" card called a Dutch passport ;-) I'll be fine :-D
23:08karolherbst: (but I still hold on to my prediction, that this entire thing is a nightmare and people just wake up the day before)
23:09karolherbst: and it was all a big joke
23:09karolherbst: I even believe that hardcore british people have that kind of humour
23:10karolherbst: RSpliet: also, I know a few brits who've got the german citizienship now, so they are fine as well :p
23:11karolherbst: HdkR: anyhow, $1k is nothing :p
23:11karolherbst: just do it
23:12karolherbst: and you wouldn't have t pay for it anyway...
23:12HdkR: lol, you're funny
23:12karolherbst: just aks nicely and say it's like super important
23:13RSpliet: Booking three days in advance is rarely a good strategy for cheap tickets. Not to mention I don't think the US has an Easyjet equivalent... which airport you'd be flying from HdkR?
23:13HdkR: RSpliet: It would be SFO
23:15RSpliet: Ouch, so many stops...
23:15HdkR: yea, it's not great
23:15karolherbst: you have to be smart ;) here
23:16HdkR: karolherbst: It would be a personal trip for me anyway. I wouldn't be able to convince anyone at my job to pay for me going there :P
23:16karolherbst: then you have to spend more time in brussles, otherwise the flight is too expensive for just a few days :p
23:17HdkR: eh, vacation time is just a waste of money for me regardless
23:17karolherbst: :D those americans
23:17imirkin_: RSpliet: Spirit Airlines is the Easyjet of the USA
23:18HdkR: LAX->BRU has some better flight times but that would require figuring out a way to LAX
23:18imirkin_: HdkR: just take a cab :)
23:18imirkin_: only like a 6h drive?
23:18RSpliet: HdkR: eh, Air Lingus does a flight for $921. but that's a 30 hour flight each way.
23:18HdkR: yea, painful
23:20karolherbst: HdkR: ha, I have the best idea
23:20RSpliet: Actually, JustFly claims to have a phone-only fare for $645 that takes IDK how long with timezone differences
23:20karolherbst: mhh, but no better flights actually
23:21imirkin_: HdkR: fly norwegian to gatwick, and go from there...
23:21imirkin_: something like easyjet should be easy to attach onto that
23:22karolherbst: wow, that's quite cheap actually
23:22karolherbst: 850€ with reasonable flights
23:22karolherbst: but then london -> brussels can be annoying
23:23imirkin_: wow - Aer Lingus SFO -> LGW for $500. but i picked semi-random dates
23:24imirkin_: yeah ok. you're not going to do better than $1k.
23:24imirkin_: HdkR: https://www.google.com/flights#flt=SFO./m/02rnbv.2019-02-01*/m/02rnbv.SFO.2019-02-04;c:USD;e:1;sd:1;t:f
23:26RSpliet: If you fly on Thursday you scrape $65 off... and I think that'll imply landing on Friday in time for the Friday eve "Delirium café" meet
23:28karolherbst: yeah... being there on the 1st is a must