03:09 imirkin_: skeggsb: 2 people reporting refcount issues with the new vmm stuff
03:21 skeggsb: yeah, i'm aware
07:06 TheXzoron: I had a kernel panic after comming back to my computer trying to resume from xscreensaver
07:07 TheXzoron: I don't have anything in xorg.conf other than defining the nouveau driver
07:08 TheXzoron: could it be related to setting it to the lowest power state?
07:10 TheXzoron: because some of the hacks on xscreensaver were running really slowly
07:11 TheXzoron: but that doesn't seem like causality
07:18 TheXzoron: what should I do to log what would be relevant if it happens again?
08:10 oday: So, I extracted my vbios from system bios using mmtool
08:11 oday: It has the legacy rom header at the start, and no type 3/efi header
08:11 oday: But there are crypto sigs at the end, like you'd find in an efi rom
08:12 oday: How do I check if I have an efi rom with its type3 header truncated, or just a legacy bios rom?
08:17 oday: As I said earlier, the proprietary drivers look in the system bios for optimus machines' vbios, which obviously does not work inside a vm
08:18 oday: So I'm trying to patch the _ROM call in the ssdt to read from a buffer hardcoded inside the ssdt instead of where it's normally supposed to
08:20 oday: But then I have to pass a bunch of other tables, most of which don't correspond to any hardware on my vm
08:21 oday: So instead I'm thinking about reversing / patching the relevant part of the windows nvidia driver
08:21 oday: Which approach would you recommend?
10:22 karolherbst: pmoreau: for those OpIs... float operands, we have something like "x must be a scalar or vector of floating-point type." as the operand
10:23 karolherbst: did you had to parse something like that already?
10:26 Booti386: Hello
10:27 Booti386: I just got a freeze of the whole screen
10:27 Booti386: https://hastebin.com/ucurarupof.go
10:28 Booti386: But I don't think these logs will help much
10:31 karolherbst: not really
10:32 Booti386: And another, with linux 4.14 that happen too often and made me go back to 4.13: https://hastebin.com/rutamurofi.go
10:45 karolherbst: what software are you running when this happens?
10:50 Booti386: Sometimes Firefox, sometimes a text editor, sometimes only the desktop (cinnamon).
10:52 karolherbst: okay
10:55 gnarface: compositing is on by default in cinnamon, isn't it?
10:59 Booti386: Yes
11:20 pmoreau: karolherbst: Yes: all the arithmetic operations like OpIAdd work like that, OpAnd & co as well
11:40 karolherbst: pmoreau: ahh, thanks
11:40 karolherbst: pmoreau: so that SpirVValue type contains all the information about what it actually is, right?
11:42 pmoreau: Yes, https://github.com/karolherbst/mesa/blob/nouveau_spirv_support/src/gallium/drivers/nouveau/codegen/nv_ir_from_spirv.cpp#L194-L199
11:42 karolherbst: nice
11:43 pmoreau: There might be a difference between the type and what is actually stored when bitcasting is happening.
11:44 karolherbst: mhh, now I am wondering
11:44 karolherbst: how do I express IsFinite and IsNormal in nvir?
11:44 karolherbst: IsInf and IsNaN should be easy
11:44 karolherbst: but the other two?
11:45 pmoreau: In some of the memory test (I think the ones with structures) for example, the spv::Op::OpVariable pointing to a struct gets bitcasted to a pointer to chars, which are then copied back to global/shared memory.
11:45 karolherbst: ahh, there is testp in ptx
11:46 karolherbst: maybe I check what nvidia is doing here
11:46 pmoreau: Which was a pain to deal with when the pointer was to data in actual registers rather than to some region in memory.
11:46 karolherbst: before I try and guess
11:46 karolherbst: mhh
11:46 karolherbst: I see
11:46 pmoreau: Should be easy to check that the blob is doing.
11:49 pmoreau: You could always implement IsFinite as (isInf() || IsNan()) I guess
11:51 pmoreau: Ah yeah, testp has flags for all those functions, nice
11:55 karolherbst: mhh, how do I define a predicate in ptx...
11:55 karolherbst: ahh .pred
11:56 pmoreau: yup
11:56 karolherbst: mhh I can't store a pred...
11:56 karolherbst: LD
11:56 karolherbst: :D
11:57 pmoreau: What do you mean by “store”?
11:57 karolherbst: I guess I have to cvt
11:57 karolherbst: st.
11:57 pmoreau: Oh, yeah
11:57 karolherbst: st.global [dest], res;
11:57 pmoreau: I think there is another way than cvt
11:57 karolherbst: mhh
11:57 karolherbst: mov?
11:58 pmoreau: http://docs.nvidia.com/cuda/parallel-thread-execution/index.html#manipulating-predicates
11:58 karolherbst: I highly doubt that
11:58 pmoreau: selp
11:58 karolherbst: selp is select
11:58 karolherbst: based on predicate
11:58 karolherbst: well a cvt would do as well
11:58 pmoreau: “There is no direct conversion between predicates and integer values, and no direct way to load or store predicate register values.”
11:59 pmoreau: They recommend using selp
11:59 karolherbst: ahhh
11:59 karolherbst: cvta
11:59 karolherbst: ohh right
11:59 karolherbst: cvta can only do u32 and u64
11:59 karolherbst: mhh messy
11:59 karolherbst: and cvt can't do pred
11:59 karolherbst: ...
12:00 karolherbst: selp it is then
12:00 pmoreau: “The cvta instruction converts addresses between generic and const, global, local, or shared state spaces.” why would you want to use it?
12:01 karolherbst: ohh right, I thought it is a convert and store thing...
12:02 karolherbst: LOP32I.AND R0, c[0x0][0x148], 0x7fffffff; ISETP.LT.U32.AND P0, PT, R0, c[0x2][0x0], PT;
12:02 karolherbst: ...
12:03 karolherbst: for testp.finite
12:03 karolherbst: weird
12:04 karolherbst: c[0x0][0x148] is the float, but mhh
12:05 karolherbst: ohh wow
12:06 karolherbst: c[0x2][0x0] is 0x7f800000
12:06 karolherbst: they just "opted" it into c2 space
12:07 karolherbst: .nv.constant2.test: .byte 0x00, 0x00, 0x80, 0x7f
12:08 karolherbst: ahh PT is predicate true
12:08 pmoreau: Yes
12:09 karolherbst: so LOP32I.AND R0, float, 0x7fffffff; ISETP.LT.U32.AND P0, true, R0, 0x7f800000, true;
12:09 karolherbst: LOP32I.AND is a simple AND I assume
12:09 pmoreau: I would assume so
12:09 karolherbst: okay, so that makes at least sense
12:09 karolherbst: allthough I take it we also have to deal with f64 here
12:10 pmoreau: And potentially fp16
12:11 karolherbst: can we get fp16 as input already?
12:11 karolherbst: oh wow
12:11 karolherbst: that f64 version is smart
12:11 pmoreau: Well, GM20B has fp16 support, and then all Pascal have it as well, to some degree.
12:12 karolherbst: ISUB RZ.CC, float.lo, RZ; LOP32I.AND R0, float.hi, 0x7fffffff; ISETP.LT.U32.X.AND P0, true, R0, 0x7ff00000, true;
12:13 pmoreau: And SPIR-V has fp16 support
12:13 karolherbst: well right
12:15 pmoreau: And cl_khr_fp16 was an available extension already in OpenCL 1.0 it seems
12:16 karolherbst: mhh
12:16 karolherbst: well
12:16 karolherbst: if we don't support that extension it is fine for now
12:16 karolherbst: I am sure we have to deal with a lot of stuff in codegen before even thinking about that in opencl
12:16 pmoreau: I do agree with that :-)
12:17 pmoreau: fp64 is another extension, so we don’t have to support it right now either.
12:17 karolherbst: OpIsInf is basically the same as OpIsFinite, just the setp is different :)
12:17 karolherbst: ahhh
12:17 pmoreau: (Let me just double check that)
12:17 karolherbst: but well
12:17 karolherbst: you have it in the code at some places
12:17 karolherbst: and adding fp64 support is much easier than fp16
12:17 pmoreau: Indeed
12:18 pmoreau: Since it’s already there thanks to OpenGL extensions?
12:18 karolherbst: that as well
12:20 ylwghst: Hi, what does mean this Error GPU lockup - switching to software fbcon ?
12:38 karolherbst: pmoreau: IsNormal seems to be a bit more complicated though
12:40 ylwghst: Why I get nouveau E[ drm] gpu lockup switching to software fbcon when I try to run gnome wayland session from LightDM?
12:41 karolherbst: ylwghst: most likely something bad happened
12:41 ylwghst: Yeah I wonder whats wrong.
12:41 ylwghst: It does actually wotk with GDM. I wonder whats different.
12:43 karolherbst: pmoreau: I am currently wondering if we should do those 64bit optimisations in the lowering already, the reduction to 32bit, or should we just create 64bit ops there and let codegen optimize it?
12:44 karolherbst: ylwghst: no idea really, could be anything. and I also have no idea how to correctly debug this
12:45 karolherbst: pmoreau: and how should I check against TypeArray, just use isCompooundType?
12:46 karolherbst: or check that spv::Op type thing?
12:47 ylwghst: karolherbst: there's nothing so much interessting in /var/log/lightdm :/
12:48 npnth: I have a silly question. The Bugs page says I should run the latest version of "all the pieces". The kernel link on the front page is to a github repo that hasn't been updated in over two years. Am I supposed to be running -mainline, or is there another repo I should use?
12:51 karolherbst: npnth: you can use mainline, drm-next is also an option
12:51 karolherbst: or https://github.com/skeggsb/linux/tree/linux-4.15
12:52 npnth: karolherbst: sounds good, thanks.
12:52 karolherbst: npnth: you most likely just checked the master branch, right?
12:52 npnth: karolherbst: indeed, I didn't look at the tags at all.
13:00 karolherbst: pmoreau: https://github.com/karolherbst/mesa/commit/65ec23edfd5132e10d3eb438b2e960946f2e89de
13:00 karolherbst: pmoreau: I think that "spvValues.emplace(resId, SpirVValue{ SpirvFile::IMMEDIATE, resType, values, resType->getPaddings() });" is wrong,
13:01 karolherbst: I guess I need to put something like SpirvFile::REGISTER?
13:06 pmoreau: karolherbst: Sorry, I had some students around.
13:06 pmoreau: So, to answer your questions
13:06 karolherbst: ohh wait, it is TEMPORARY?
13:06 pmoreau: TEMPORARY is for regs, yes
13:07 karolherbst: and can I simply remove that padding thing?
13:07 pmoreau: I no longer remember why I picked TEMPORARY for them, but, we can always change it
13:08 karolherbst: or what is that padding about anyway?
13:08 pmoreau: Try removing it, and then test *all* memory tests from SPVTES. I think I needed it for some of the packed structure cases.
13:08 karolherbst: mhh
13:08 karolherbst: I see
13:08 pmoreau: Cause it was a real pain to get right, and I think I am still missing some cases.
13:08 karolherbst: well okay
13:09 pmoreau: I would love to get rid of padding and clean up the whole thing, but hadn’t time for that.
13:09 karolherbst: at least now I just have to generate those instructions
13:10 pmoreau: Regarding the 64-bit operations: not sure which ones you are talking about so hard to say; the ones that are specific to SPIR-V should probably be optimised in the SPIR-V to NVIR pass, but the other ones are probably better left to codegen.
13:11 karolherbst: well
13:11 pmoreau: You don’t want to have to do the 64-bit MUL/MAD lowering in the SPIR-V frontend, the TGSI one, the PTX one, etc.
13:11 pmoreau: for example
13:11 karolherbst: nvidias does this for testp.finite.f64: ISUB RZ.CC, float.lo, RZ; LOP32I.AND R0, float.hi, 0x7fffffff; ISETP.LT.U32.X.AND P0, true, R0, 0x7ff00000, true;
13:12 karolherbst: and I think I would simply do a 64bit and and a 64bit setp
13:12 pmoreau: What’s currently done in Nouveau for GLSL? I would assume there is a finite function there as well, no?
13:12 karolherbst: uhm
13:12 karolherbst: I don't think TGSI knows isFinite
13:13 karolherbst: so we get the lowered stuff
13:13 karolherbst: let me check
13:13 karolherbst: mhh
13:13 karolherbst: maybe it already gets lowered away in glsl
13:13 pmoreau: When you said “how should I check against TypeArray, just use isCompooundType”, what do you want to know exactly and in which situation?
13:14 karolherbst: well
13:14 karolherbst: isInf and the other ones are only valid for floats and vectors
13:14 karolherbst: and I thought I would need different code for both cases
13:14 karolherbst: but it seems like I don't
13:15 pmoreau: Just do like OpIAdd & co: iterate over the number of elements provided by the type. The validator will take care of throwing the binary out if it is trying to do it with the wrong type
13:18 pmoreau: “I think that "spvValues.emplace(resId, SpirVValue{ SpirvFile::IMMEDIATE, resType, values, resType->getPaddings() });" is wrong” right, TEMPORARY is the one to use. IMMEDIATE really is just for the result of mkImm i.e. it isn’t even stored in a register.
13:19 pmoreau: I had to do that since global variables might have an initial value, and trying to allocate a register outside of any functions does not make much sense.
13:25 karolherbst: okay, will take a deeper look at how OpIAdd is doing stuff
13:30 pmoreau: (Let me have a quick look as well) :-D
13:31 pmoreau: Yup, this is how I would do it for IsFinite & co, by copy/pasting the code from OpIAdd and only changing the operation applied within the loop.
13:33 pmoreau: There is a bit of code that could be factorised between the different opcodes, like between OpIAdd OpSRem. So, if you have any ideas, I’m all ears.
13:59 perfinion: hey all, i have an ancient 7700GT that i stuffed in my new machine and when starting X i get this error: nouveau 0000:41:00.0: gr: intr 00100000 [ERROR] nsource 00000010 [LIMIT_COLOR] nstatus 04000000 [PROTECTION_FAULT] ch 1 [000ea000 X[13987]] subc 5 class 008a mthd 0408 data 0001c800
14:00 perfinion: anyone have any idea what it means? the screen goes all garbled, sometimes like snow (not moving tho) and sometimes was black
14:02 perfinion: the console all works fine tho, its a nice high resolution and all that, switches to nouveaufb during boot. it only goes weird when i start X
14:04 imirkin: huh. surprising.
14:06 imirkin: perfinion: class 8a is NV10_IFC (image from cpu)... let's see what method 408 is
14:06 perfinion: NV10? isnt the 7700GT an NV40?
14:07 imirkin: just the class name.
14:07 perfinion: ah
14:07 imirkin: a single chip may have classes from different gens
14:07 imirkin: right, and 0x400- is is the "COLOR" data for the image.
14:08 imirkin: i think the error is that it's writing off the end of the surface, hitting the limit? not 100% sure.
14:08 perfinion: its a dell ultrasharp connected over DVI
14:08 imirkin: i haven't done a ton of debugging on that pre-G80 hardware
14:08 imirkin: is it an extremely-wide display?
14:08 imirkin: e.g. more than 4K pixels?
14:08 perfinion: would lowering resolution or color depth make a diff?
14:08 perfinion: no its a regular 1080p display
14:09 imirkin: yeah that works fine all the way back to NV5 (Riva TNT2)
14:09 imirkin: (how do i know? i tested it)
14:09 perfinion: hehe
14:09 imirkin: (no TNT's here... they were all AGP. o well.)
14:10 perfinion: its really just a stopgap i dug out of an ancient computer so i can use my threadripper machine until GPU prices are reasonable :P
14:10 imirkin: well, if you're trying to run a modern DE on ancient hardware, you're going to be disappointed
14:11 perfinion: i mean i dont expect to game or anything
14:11 imirkin: at what point do you get the error?
14:11 imirkin: heh
14:11 imirkin: i mean like running a modern KDE or GNOME is way too much for these
14:11 imirkin: at least on top of nouveau
14:11 imirkin: a lot of those have become full-fledged feature-hungry GL applications
14:11 perfinion: i do /etc/init.d/xdm start and it blinks and switches to VT7 and looks like its gonna work
14:11 perfinion: then it goes wonky
14:11 perfinion: one time it looked like i could kinda see a password box
14:12 perfinion: others it goes black for a while then snow with a few bands
14:12 perfinion: once it locked up the machine completely had to poweroff
14:12 perfinion: other times ssh keeps going just fine
14:13 perfinion: the COLOR_LIMIT part was the same all the times i saw it tho
14:13 imirkin: what happens if you just run like ... xinit
14:14 imirkin: which should start X and pop up an xterm
14:16 perfinion: no idea, lemme try
14:19 perfinion: oh, its a 7900GT, i typod earlier
14:19 perfinion: not that it probably makes much difference
14:19 imirkin: not at all
14:24 perfinion: imirkin: okay, xinit was identical. black screen with the cursor in top left for a wihle, then snow with some bands then the display went off into powersaving mode
14:24 perfinion: same protection fault exactly
14:24 perfinion: then i get a few of these: nouveau 0000:41:00.0: X[44882]: failed to idle channel 2 [X[44882]
14:24 imirkin: *interesting*
14:24 imirkin: i wonder if we broke something
14:24 imirkin: what kernel are you on?
14:25 perfinion: machinie is still responsive over SSH
14:25 perfinion: imirkin: 4.14.4
14:25 imirkin: can you try a smattering of older kernels?
14:25 perfinion: imirkin: i would be on 4.15 rc's if ZFS worked on them
14:25 imirkin: i've just never heard of anyone having issues like this
14:25 perfinion: imirkin: perhaps? not too far back tho its a threadripper so newish are better
14:26 imirkin: well, this isn't to use for long-term, this is to test nouveau :)
14:26 imirkin: presumably any ol' kernel boots on them, no? x86 and all...
14:27 perfinion: well, power management and stuff perhaps? it'll probably work okay just not optimal
14:27 perfinion: imirkin: any in particular you want me to try
14:27 perfinion: 4.13.16 is latest in the gentoo repo for that branch
14:28 imirkin: whatever's easy for you, i'd say
14:28 perfinion: i was thinking about testing the nvidia driver but thats such a pain
14:28 imirkin: i haven't had a NV4x plugged in since ... a little while. maybe 4.9 or so?
14:28 perfinion: imirkin: well, more like how far back would you prefer
14:28 perfinion: i could try 4.9.67 i guess
14:28 imirkin: doesn't have to be super-far back. no more than a year, definitely.
14:29 perfinion: or 4.12.14
14:29 imirkin: oh
14:29 imirkin: wait, just had a thought
14:30 imirkin: can you pastebin xorg log?
14:30 imirkin: (from any of those attempts)
14:31 perfinion: imirkin: https://paste.pound-python.org/show/t36J3E9lIJ3FFg47ZAUB/
14:31 perfinion: latest xinit attempt
14:31 imirkin: nope. was afraid you were using modesetting.
14:32 perfinion: arnt i? doesnt everything need KMS now?
14:33 perfinion: or do you mean non-kernel modesetting?
14:33 imirkin: the xorg modesetting "ddx" driver, which uses GL to do its dirty work
14:33 imirkin: not that there's anything intrinsically wrong with that approach, but it won't work well on nv4x hardware.
14:34 imirkin: (due to buggy nouveau GL drivers)
14:34 perfinion: ah
14:50 karolherbst: ohh set can only take short imms
14:50 karolherbst: imirkin: what is the best way do to something like this? set u32 %r53 lt %r52 0x7f800000
14:51 imirkin: karolherbst: uhm ... i presume there's something wrong with just doing that?
14:51 karolherbst: the emiter doesn't like me
14:51 imirkin: there's probably a f32 missing in there
14:51 karolherbst: or does it complain about and u32 $r0 $r0 0x7fffffff
14:51 karolherbst: mhh
14:51 imirkin: i.e. set u32 %r53 lt f32 $r52 0x7f800000
14:51 karolherbst: let me check
14:51 karolherbst: no
14:51 karolherbst: I compare ints
14:51 imirkin: oh ok
14:51 karolherbst: nvidia: ISETP.LT.U32.AND P0, true, R0, 0x7f800000, true;
14:52 karolherbst: this is for IsFinite/isNaN/....
14:52 imirkin: that's not what that op will emit
14:52 karolherbst: I know
14:52 imirkin: that op will emit ISET
14:52 karolherbst: yeah, I know
14:52 imirkin: if you want ISETP, you need a predicate dst
14:52 karolherbst: I still need to change it into a predicate
14:52 karolherbst: just wanted to explain why I do u32
14:52 imirkin: so ... what gets emitted?
14:53 imirkin: what is this for again? isFinite?
14:53 karolherbst: set u32 $r0 lt $r6 0x7f800000 (8)
14:53 imirkin: so it masks off the sign bit
14:54 karolherbst: yeah, I already checked what nvidia does
14:54 karolherbst: just wondering why the emiter crashes
14:54 imirkin: well, i don't think there's a ISET32I
14:54 imirkin: are you stiking the imm directly into the instruction maybe?
14:54 karolherbst: assert(!(val & 0xfff00000) || (val & 0xfff00000) == 0xfff00000);
14:54 karolherbst: val is 0x7f800000
14:54 imirkin: always use bld.loadImm()
14:54 imirkin: yeah, 0x7f800000 is not a short imm for a uint op
14:54 karolherbst: ahhhh
14:54 karolherbst: okay, that explains
14:55 karolherbst: I were already wondering why nvidia puts it into c2 space
14:55 karolherbst: they basically load the imm from c[0x2][0x0]
14:55 karolherbst: keeps me wondering if that is indeed faster than storing it into a reg and use just that
14:56 imirkin: but ... you should never ever ever ever ever ever stick an imm directly into an op on IR input
14:56 karolherbst: okay
14:56 imirkin: (with *very* rare exceptions, like where the arg *has* to be an imm, e.g. SHLADD or EXTBF/INSBF for their last op)
14:57 imirkin: actually nevermind - it doesn't have to be an imm for EXTBF/INSBF
14:57 imirkin: so pretty much just SHLADD... maybe PFETCH
14:58 karolherbst: does tgsi know opcodes like IsInf?
14:58 imirkin: nope
14:58 imirkin: i do believe glsl has those
14:58 imirkin: but they get lowered way ahead of tgsi
14:59 imirkin: http://docs.gl/sl4/isnan
14:59 imirkin: (and isinf)
15:00 karolherbst: what is the best way to build a predicate Value for input IR?
15:00 karolherbst: okay
15:00 karolherbst: that's what I thought
15:00 karolherbst: well, spir-v knows those
15:00 karolherbst: which shouldn't be any surprise
15:01 imirkin: you don't
15:01 imirkin: let the optimizer handle it
15:01 karolherbst: well, why not?
15:02 imirkin: mmmmm
15:02 imirkin: well, there's some annoyance with nv50 vs nvc0
15:02 karolherbst: the spir-v kind of tells me I should create a boolean value, not an int or a float.
15:02 karolherbst: well
15:02 imirkin: i think it's possible soemtimes
15:02 karolherbst: we don't care about pre nvc0 with spir-v for now anyway
15:02 imirkin: so have a look at how the tgsi importer deals with it
15:02 imirkin: see if it uses FILE_PREDICATE
15:02 imirkin: why not?
15:02 karolherbst: val0 = new_LValue(func, FILE_PREDICATE);
15:03 imirkin: for which op?
15:03 karolherbst: VOTE
15:03 imirkin: bad example
15:03 karolherbst: KILL_IF, but I doubt that is any better
15:03 imirkin: much better
15:03 karolherbst: Converter::exportOutputs also uses that
15:04 imirkin: so yeah, then FILE_PREDICATE is sufficiently handled by nv50 lowering
15:04 karolherbst: okay, can I do getScratch(1, FILE_PREDICATE); or should I use that new LValue thing?
15:05 imirkin: wtf? nv50 hw doesn't have alpha tests?
15:05 karolherbst: I think tesla can do OpenCL 1.1 at most anyway, no idea how many care about OpenCL below 1.2
15:05 imirkin: oh. of course. only for certain formats.
15:06 imirkin: why support it for all formats, when you can just support it for some.
15:06 imirkin: karolherbst: nice to expose something...
15:06 imirkin: also vk on tesla should be doable
15:06 karolherbst: okay, right
15:06 imirkin: tesla supports images btw
15:06 karolherbst: mhh, and u32 %r7 %p4 %r9
15:06 imirkin: not sure if that's the differentiating factor of 1.2 or not
15:06 imirkin: yeah you can't do that.
15:07 karolherbst: yeah I know
15:07 karolherbst: just wondering if the issue is me or handling other opdoces
15:07 karolherbst: *opcodes
15:07 imirkin: i think there are a handful of ops that support predicates as ars
15:07 imirkin: args
15:07 imirkin: and might be one of them
15:07 imirkin: but it'd have to be *both* args
15:08 karolherbst: ohh, I don't implement OPSelect now
15:08 karolherbst: well, I do the same as isInfo for OpSelect :D
15:08 karolherbst: that's why the result gets used as the input again
15:08 karolherbst: messy
15:08 pmoreau: karolherbst: I guess people might care about OpenCL support vs no OpenCL support for a particular hardware.
15:09 karolherbst: pmoreau: maybe, yes
15:09 imirkin: OpSelect might be the same as SELP or CMP
15:09 karolherbst: but seriously, if you care about OpenCL, you don't care about OpenCL on tesla gen nvidia hw, right?
15:09 karolherbst: but well
15:09 pmoreau: Yeah
15:09 imirkin: karolherbst: you got it backwards
15:09 karolherbst: I would still don't want to make it harder to support it
15:09 karolherbst: but I don't care much myself to support it there
15:09 imirkin: if you have a tesla gpu, you want to make the most of it
15:09 pmoreau: ^
15:09 karolherbst: well right
15:09 karolherbst: but that's mostly not involving opencl
15:11 karolherbst: but yeah, I am fine with fixing bugs regarding tesla hw
15:11 karolherbst: but well
15:11 karolherbst: I doubt many will care
15:11 karolherbst: I like those parts in the spir-v specs
15:11 karolherbst: "Result Type must be a scalar or vector."
15:13 karolherbst: pmoreau: the code for OpSelect might get messy...
15:13 karolherbst: pmoreau: by the way, the spv tools do all the validation for us, right?
15:14 karolherbst: so we can assume that things like "It must have the same number of components as Result Type." are already checked and covered, right?
15:14 pmoreau: It is lacking things, but it should.
15:14 karolherbst: okay
15:14 karolherbst: that makes things easier
15:14 pmoreau: Yeah, those should be covered (I assume)
15:14 pmoreau: If they aren’t, we should open bug reports/submit PRs to get it fixed.
15:15 karolherbst: okay
15:15 karolherbst: we need to take care of convincing people to package that at some point as well
15:15 karolherbst: well maybe in 0.5y or something
15:15 karolherbst: dunno how much time we need to start pushing stuff
15:15 karolherbst: don
15:16 karolherbst: 't want to have a situation where things like that gets added to mesa and people can't really include it
15:16 karolherbst: and those tools are usefull without it anyway
15:16 pmoreau: I will work for the rest of the week and during the weekend on getting patches in SPIRV-Tools: fixes to the decoration manager, exporting a pkg-config file and validating required/optional capabilities for OpenCL.
15:16 karolherbst: nice
15:17 pmoreau: I might have an idea on how to get rid of storageFile and paddings from the SpirVValue, not sure when I’ll be able to try it though.
15:17 karolherbst: ohh wait, OpSelect might be easier than I thought
15:20 pmoreau: Well, you’ll need a double loop: loop over each component of the vector, and for each component, loop over of its member, but otherwise should be pretty straight forward I think.
15:20 pmoreau: *members
15:21 karolherbst: imirkin: what is the best opcode for a select like this: OpSelect Type dst CondCode src0 src1?
15:21 karolherbst: pmoreau: I don't think I need to double loop, why should I?
15:22 imirkin: karolherbst: what's CondCode?
15:22 karolherbst: imirkin: if CondCode is true, src0 is returned otherwise src1
15:22 imirkin: so just true/false?
15:22 karolherbst: yeah
15:22 imirkin: then SELP
15:22 karolherbst: okay
15:22 karolherbst: nice
15:22 karolherbst: and predicate goes to third src, right?
15:22 pmoreau: You can have a vector: you need to do the select for each element of it. Each element might not be just an int, but for example a struct { int, long, char }. So you need to do the select for each member as well.
15:22 imirkin: don't remember offhand, sorry
15:23 karolherbst: pmoreau: ... meh
15:23 pmoreau: ;-)
15:24 karolherbst: imirkin: is this even a good idea for input IR?
15:24 karolherbst: OP_SELP I mean
15:24 karolherbst: I thought it isn't there for nv50, or is it?
15:25 imirkin: it's not, but iirc we handle it in lowering
15:25 imirkin: if we don't it'd be trivial to do so
15:25 karolherbst: okay
15:26 pmoreau: “selp.f64 requires sm_13 or higher”, and selp itself was introduced in PTX ISA 1.0, so they must have had some easy way to do it if there was no hw instruction for it.
15:26 karolherbst: well right
15:26 karolherbst: the thing is, we don't really do any SELP things anywhere...
15:27 pmoreau: Time to get to it then :-D
15:27 karolherbst: well there is lowering code
15:27 karolherbst: and emit code
15:27 karolherbst: but I don't see how we create those SELP instructions at all
15:28 karolherbst: ohh, some lowering code
15:28 karolherbst: and it uses mkOp3
15:28 karolherbst: not mkCmp
15:29 karolherbst: pmoreau: wondering if we can get bools as input in OpSelect
15:29 karolherbst: well
15:29 karolherbst: I am sure we do
15:29 karolherbst: we might want to handle stuff like that
15:29 pmoreau: It doesn’t look like there is a restriction on it, true
15:30 karolherbst: OP_SELP with all sources as predicates
15:30 karolherbst: I am sure this will just don
15:30 karolherbst: 't work
15:30 pmoreau: Yeah
15:31 karolherbst: nice
15:31 karolherbst: it is looking very good
15:31 pmoreau: We could do something like getOp that automatically inserts a mov to a reg if the value is an immediate.
15:32 pmoreau: So, if you ask for a predicate as an int, use selp to move it to a reg.
15:32 karolherbst: pmoreau: https://github.com/karolherbst/mesa/commit/c7b32e6cc837801ca57efc0d1e5e70bbb6216609
15:33 pmoreau: But you also need to move it back to a predicate afterwards, as the output of OpSelect is supposed to be of the same time as its inputs.
15:33 karolherbst: pmoreau: https://gist.github.com/karolherbst/0512976ebd54b7398be1bf161fb96a45
15:33 karolherbst: yeah
15:33 karolherbst: something like that
15:33 karolherbst: or
15:33 karolherbst: we are smart
15:33 karolherbst: and use OP_SLCT
15:34 karolherbst: I think
15:34 karolherbst: mhh, wait
15:34 karolherbst: I am sure we can be smarter somehow
15:35 karolherbst: pmoreau: well ignoring structs and so on, the OpSelect thing should look correct, right?
15:35 pmoreau: I was going to say that it wouldn’t work on structures :-D
15:35 karolherbst: well right
15:35 karolherbst: I want to fix the test first
15:35 karolherbst: then look into structures
15:36 pmoreau: Sure
15:36 karolherbst: doesn't help that I do the same for IsInf and IsNan...
15:36 karolherbst: :p
15:36 pmoreau: O:-)
15:36 karolherbst: the painful op is normal though
15:36 pmoreau: Looking good otherwise
15:36 karolherbst: because it adds two SETs
15:37 karolherbst: and I need to support f64
15:37 pmoreau: You should add a TODO note about aggregate types and booleans.
15:37 karolherbst: but that just changes those masks
15:37 pmoreau: (For OpSelect)
15:45 karolherbst: imirkin: what should be the dType of a set writing to a predicate?
15:45 karolherbst: or doesn't it matter
15:48 karolherbst: pmoreau: something is weird here
15:48 karolherbst: ohhh
15:48 karolherbst: pmoreau: I passed the first subtest \o/
15:48 karolherbst: but the second test fails to compile
15:49 karolherbst: Failed to parse the SPIR-V binary:
15:49 karolherbst: nv50_ir_generate_code: ret = -2
15:49 karolherbst: %float_0x1p_128 = OpConstant %float 0x1p+128 ?
15:50 karolherbst: what is that
15:50 karolherbst: OpConstant %float 0x1.fffffep+128
15:50 karolherbst: but that sounds like double stuff
15:50 karolherbst: pmoreau: how hard would it be to get proper support for fp64?
15:54 karolherbst: mhh
15:54 karolherbst: the second test also just does float stuff
15:54 karolherbst: weird
15:57 karolherbst: imirkin: mind taking a look at this again? Just to make sure there is nothing in there we shouldn't do for input IR: https://github.com/karolherbst/mesa/commit/c7b32e6cc837801ca57efc0d1e5e70bbb6216609
15:58 pmoreau: karolherbst: I’ll have a look at it later. Need to go to my Swedish course + oral exam :s
15:58 karolherbst: uhm
15:58 karolherbst: imirkin: wrong commit: https://github.com/karolherbst/mesa/commit/2fc47567cc1ace38cb5069985b0487308a43c0c4
15:59 karolherbst: pmoreau: k, have fun
16:02 karolherbst: mhh, spvBinaryParse returns SPV_UNSUPPORTED
16:04 perfinion: imirkin: okay ive got 4.12 and 4.13 build, what kind of tests should i do? just xinit again or anyhting special?
16:05 perfinion: imirkin: also, how do i turn debugging to the highest? echo into some /sys entry?
16:06 imirkin_: btw, if you want to move between predicates and regs, you can use cvt
16:07 imirkin_: perfinion: yeah, same thing
16:07 imirkin_: perfinion: nouveau.debug=debug will get some more stuff...
16:07 imirkin_: perfinion: but we just don't have a ton of debugging info
16:07 imirkin_: [that would be relevant to what you're seeing]
16:09 karolherbst: imirkin_: ohh, good idea
16:09 imirkin_: i don't really have time to look at your patches
16:09 imirkin_: sorry
16:10 karolherbst: no worries
16:10 karolherbst: but we might have some potential optimisation opportunities here: https://gist.githubusercontent.com/karolherbst/c216b97eb889c1bebe2274c1cc90a97c/raw/df9dca8b0daa3d866b23d07b609c4a1d83f4b20f/gistfile1.txt
16:10 karolherbst: set u8 %p78 eq u32 and two immediates
16:12 karolherbst: might be due to the fact we don't loop through those
16:12 karolherbst: oh well
16:27 karolherbst: I don't get that ISUB RZ.CC, c[0x0][0x148], RZ; part...
16:27 karolherbst: why would CC have any value besides 0 if I subtract 0 from anything
16:30 imirkin_: it's an opt gone wrong
16:31 imirkin_: CC will always be empty
16:31 imirkin_: my guess is that the second arg isn't always RZ
16:31 karolherbst: I see
16:32 imirkin_: it's conceivable that my understanding is wrong
16:32 imirkin_: actually
16:32 imirkin_: i wonder if CC will be set if c[] is negative
16:32 imirkin_: hmmm ... no. doubtful.
16:32 imirkin_: coz it's not 0 - c[], it's c[] - 0
16:33 karolherbst: yeah, it looks like this
16:33 karolherbst: in the non opted code they have some moves
16:33 karolherbst: MOV32I R2, 0x0; MOV R7, R2; ISUB RZ.CC, R0, R7;
16:33 imirkin_: yeah ok
16:33 karolherbst: ohhhhhh
16:33 karolherbst: that is the low bits of the float
16:33 karolherbst: that 0
16:34 karolherbst: so
16:34 imirkin_: that makes more sense for detecting nan
16:34 karolherbst: ISUB RZ.CC, float.hi, float.lo;
16:34 karolherbst: they have a MOV32I R3, 0x7ff00000; below that MOV32I R2, 0x0;
16:34 karolherbst: and that is the constant from the shader
16:34 imirkin_: this is fp64?
16:34 karolherbst: yeah
16:35 karolherbst: ohh wait, mhh
16:35 karolherbst: no wait
16:35 imirkin_: so ...
16:35 karolherbst: 0x7ff00000 is the mask they use in the set
16:35 karolherbst: mhh
16:35 imirkin_: i recommend you read up about fp representation
16:35 imirkin_: and how inf / nan are represented
16:35 imirkin_: and then that code will make a LOT more sense
16:35 imirkin_: there's not a single nan repr
16:35 karolherbst: well, the code already makes sense
16:35 imirkin_: it's basically exp = something, and mantissa != 0
16:36 karolherbst: just wondering why they kept doing that carry thing
16:36 karolherbst: yeah... dunno, I don't see how the low bits matter at all
16:37 imirkin_: <imirkin_> i recommend you read up about fp representation
16:37 imirkin_: <imirkin_> and how inf / nan are represented
16:37 karolherbst: that's all in the high bits, right?
16:37 imirkin_: should i paste again?
16:37 imirkin_:thinks he's made his point.
16:37 karolherbst: well, the low bits don't matter
16:38 imirkin_: ok, so now i *know* you didn't read up about it.
16:38 imirkin_: good.
16:38 imirkin_: go read the IEEE-754 wikipedia page which talks about encodings and all this stuff, and let me know when you're done.
16:38 karolherbst: well I do
16:39 karolherbst: and I still don't see why the low bits matter
16:39 imirkin_: forget fp64
16:39 imirkin_: how do you represent nan in fp32?
16:41 karolherbst: exponent all 1, and high bit of mantissa 1, or at least I thought so, well sign bit should be 0 as well
16:41 imirkin_: give me the hex representation.
16:41 karolherbst: I think mantissa all 1 is also a valid one
16:41 karolherbst: 0x7f900000
16:42 imirkin_: >>> np.uint32(0x7f900000).view(np.float32)
16:42 imirkin_: nan
16:42 imirkin_: good.
16:42 imirkin_: is there any other representations?
16:43 karolherbst: [0x7f900000, 0x7fffffff]
16:43 imirkin_: are you sure?
16:43 karolherbst: I am not saying those are all
16:43 imirkin_: those are definitely some.
16:43 imirkin_: but most importantly, https://hastebin.com/kevipubise.go
16:43 karolherbst: might be that [0xff900000, 0xffffffff] is NaN as well
16:43 karolherbst: ahh, right
16:44 karolherbst: ohh
16:44 imirkin_: this is all spelled out quite clearly on the wikipedia page
16:44 imirkin_: so if you were really reading it and didn't pick up on that ... seems unlikely
16:44 imirkin_: sorry, i just don't have time for this right now.
16:45 karolherbst: k
16:45 karolherbst: ohhhhhhhhhhh, I think I know what happens
16:46 karolherbst: ISUB 0.CC, float.lo, 0; throws something into CC if float.lo contains anything except 0
16:46 karolherbst: because it writes into RZ
16:46 karolherbst: smart...
18:07 perfinion: imirkin: i tried 4.14.4, 4.13.16, 4.12.14, all behave identically and show the same snowy band things on the screen
19:30 pmoreau: imirkin: I’m glad I’m not the only for whom he doesn’t read links/explanation that send/write. :-D
21:10 oday: Alright, so I've decided to reverse / patch the nvidia windows drivers to load the vbios using pci rombar even when it detects an Optimus device, instead of fucking with acpi tables any further
21:11 oday: Looks like I'd basically have to write tables entirely from scratch, or pull in pretty much my entire acpi configuration (most of which doesn't make sense to the VM)
21:13 oday: (The original intent was to replace the copy from system memory with an equivalent copy from a buffer defined within the ssdt, containing the contents of the vbios byte by byte)