00:00 RSpliet: But what's just as important is to realise that with instruction scheduling you can reduce live sets, invalidating the need to spill in the first place.
00:46 karolherbst: imirkin_: mhh, still thinking about the situation where we want to check against limms before converting an instruction. Well actually the insnCanLoad problem without actually having the instruction
00:47 imirkin: could change the api around
00:48 karolherbst: was thinking about having something more complex
00:49 karolherbst: like isInsValid(op, SRC_REG | SRC_MOD_NEG, SRC_LIMM, ...) and a isLimm(op, value) or something?
00:50 karolherbst: which is in the end what we actually want to know anyway, if the given combination of stuff makes up for an instruction we are able to emi
00:50 karolherbst: t
00:52 imirkin: yeah, something like that
01:37 karolherbst: imirkin: well... at least we can have the add+add -> add3 optimizations without immediates as in other cases we already know we will be able to construct an add3
01:39 karolherbst: mhh, wait. Only one immediate or c[] access :/
02:31 karolherbst: imirkin: ohhh, we moved LoadPropagation after LateAlgebraicOpt, so we don't actually have to check for immediates :)
03:20 imirkin: hmmmmmm... since we do jumps in a predicated manner, i wonder if we really need to split critical edges -- we can just insert mov's with the same predicates ... hm.
09:28 CrystalGamma[m]: so … Maxwell 1 (GTX750) … what can I do with it using nouveau? specifically, without non-free firmware …
17:11 imirkin: CrystalGamma[m]: what do you want to do with it? and what do you consider to be "non-free firmware"
17:14 joepublic: non-free firmware is firmware not released under a free license. This is a consideration mostly because Linux-libre, parabola, trisquel, debian do not ship firmware that has a non-free license (no permission to reverse-engineer or to modify, for instance.) For debian you can manually modify your sources to include non-free, but for these others listed, it's not possible to do so. Thus, only hardware that works with no non-free needed, works
17:14 joepublic: in these environments.
17:14 imirkin: and what is firmware?
17:15 imirkin: like ... if a board shipped with a data blob (embedded on the board) that contained JVM bytecodes that were meant for the CPU to execute, would that be OK?
17:15 imirkin: (it's not a purely hypothetical question, btw)
17:16 joepublic: The fsf, whose standards these products and distributions follow (to whatever degree), maintains that something contained within a hardware device is part of the hardware. if it contained code to run on the host cpu, its license and terms would determine its software freedom.
17:17 CrystalGamma[m]: tough question though, but I'd say no, bytecodes for the CPU are actually driver
17:17 CrystalGamma[m]: and I would prefer not needing it
17:17 imirkin: ;)
17:17 imirkin: wouldn't we all
17:18 imirkin: however the VBIOS contains initialization instructions in a high-level language which are meant to be interpreted on the CPU
17:18 imirkin: (not JVM specifically, but i was just using something people would be familiar with)
17:19 CrystalGamma[m]: I thought it was little more than tables for the memory map and power states and such?
17:19 imirkin: the tables are there too
17:19 joepublic: in the case of bios and other generally non-volatile things, they are considered to be freedom issues for the hardware, but are not software freedom issues unless they are loaded onto the card from the computer.
17:19 imirkin: but it's also initialization code, as well as code meant to run on modeset
17:19 imirkin: (and at reclock time iirc?)
17:20 CrystalGamma[m]: I guess I'd put the limit at Turing-completeness …
17:21 CrystalGamma[m]: but yes, data and code are sometimes hard to distinguish
17:22 imirkin: CrystalGamma[m]: i haven't done the proof or anything, but i'm fairly sure it's turing-complete
17:23 imirkin: you can write to memory, and have conditional jumps
17:23 CrystalGamma[m]: either way I'm just looking for something that is less blobby than modern radeon cards …
17:24 imirkin: well, GM107 should mostly work
17:24 imirkin: but there's a big delta between "mostly" and "completely"
17:24 CrystalGamma[m]: yeah I recently bought one of those (hasn't been delivered yet though)
17:25 imirkin: all sorts of software that has no need for GL-based accel is now using GL, so you get to hit issues a lot more often than you used to
17:25 CrystalGamma[m]: oops I meant GK104
17:25 imirkin: you said GTX 750, no?
17:25 imirkin: that's universally GM107
17:25 CrystalGamma[m]: GM107 I'm thinking about for lighter loads
17:26 imirkin: GK104 should work ok too
17:26 CrystalGamma[m]: AKA accelerated desktop
17:26 imirkin: both of those support reclocking
17:26 imirkin: modesetting
17:26 imirkin: and a GL driver
17:26 CrystalGamma[m]: but so far reclocking is manual, right?
17:26 imirkin: GK104 will also get you video decoding accel if you use non-free firmware.
17:26 imirkin: yeah. which is a lot better than non-existent.
17:27 imirkin: it's not 100% reliable, so making it automated is scary
17:27 imirkin: a _ton_ of effort went into kepler reclocking though, so it's mostly good
17:27 CrystalGamma[m]: ah, I was wondering why there was no simple policy mechanism yet :)
17:27 CrystalGamma[m]: what kind of things should I expect if it fails?
17:27 imirkin: system hang
17:28 CrystalGamma[m]: system or just GPU?
17:28 imirkin: system
17:28 imirkin: depends, i suppose
17:28 imirkin: but system at worst
17:28 CrystalGamma[m]: hmm no worse than my Intel GM45 then :P
17:29 orbea: most of the times I saw nouveau crash I could still ssh in.
17:29 CrystalGamma[m]: that sometimes hangs the system when blanking …
17:29 joepublic: ssh in and reclock to a lower state?
17:29 orbea: ssh in and reboot
17:29 imirkin: again, it depends on the specifics
17:29 imirkin: sometimes you can do that
17:30 imirkin: sometimes you can't
17:30 imirkin: if the reclock just messes up e.g. display, then sure
17:30 imirkin: if the reclock messes up the vram, then not so much
17:30 joepublic: I have a GK104 that I can freely reclock, very satisfied with it, thanks for all you do
17:30 imirkin: if the reclock messes up the PCIe state machine, then you get a system hang
17:30 imirkin: (or causes a hang in it or whatever)
17:31 CrystalGamma[m]: how much work do you think it would be if I wanted to make nouveau work with non-4k page sizes? (aside from the scary prospect of getting into kernel dev)
17:31 orbea: also, Im not sure I ever got a crash because of recloocking, more likely was some program was doing things that made nouveau unhappy
17:53 imirkin: CrystalGamma[m]: i think some recent rework has actually made that substantially more possible
17:53 imirkin: CrystalGamma[m]: what's your target page size? 64K?
17:54 CrystalGamma[m]: yeah
17:54 imirkin: i wonder if one approach would be to just always use large pages, and set the large page size to 64k
17:54 CrystalGamma[m]: I have a ppc64(le) system
17:54 imirkin: large pages may be (globally) configured at 64 or 128k
17:54 imirkin: we default to 128k
17:55 CrystalGamma[m]: I mean it can operate with 4k pages obviously, but 64k is the default in some environments
17:55 imirkin: yeah
17:55 CrystalGamma[m]: is that GPU pages you are talking about?
17:55 imirkin: having 4k gpu pages and 64k cpu pages just makes for some really awkward situations
17:55 imirkin: yeah
17:56 imirkin: G80+ GPUs have their own VM
17:56 CrystalGamma[m]: makes sense if you want bindless :)
17:56 imirkin: which can allow GPU VA's to address either vram or system memory
17:56 imirkin: well ... more like if you want to be able to seamlessly move resources back and forth between sysmem and vram
17:57 imirkin: you can do bindless without any of that (as in ARB_bindless_texture)
17:57 CrystalGamma[m]: wait … you can use textures over PCIe?
17:57 imirkin: sure
17:58 imirkin: almost anything
17:58 CrystalGamma[m]: that sounds really slow
17:58 imirkin: it is
17:58 imirkin: "doctor, it hurts when i do that"
17:58 CrystalGamma[m]: and about bindless … how do you keep processes separate (basic security) without an MMU?
17:59 imirkin: poorly
17:59 imirkin: each process has its own GPU-side VM
17:59 CrystalGamma[m]: well I expected worse honestly :S
18:00 imirkin: but again, i don't see how this has anything to do with bindless
18:00 imirkin: [and i implemented bindless support in nouveau, so i'd like to think i know at least a bit about it]
18:00 CrystalGamma[m]: I thought in bindless the shader can just supply a pointer to the texture (or am I misunderstanding something?)
18:01 imirkin: it's an opaque handle
18:01 CrystalGamma[m]: so there is some indirection?
18:01 imirkin: but even if it's a pointer (which at least for nouveau it isn't, although i'm thinking of redoing it)
18:01 imirkin: then it would be a pointer into GPU VA
18:01 imirkin: which is per-process
18:02 imirkin: thing is, a texture is a lot more than an address
18:02 imirkin: there's a whole descriptor structure that needs to be in a table
18:02 imirkin: and all you do is provide an offset into that table
18:02 imirkin: [on nvidia gpu's]
18:02 imirkin: (stuff like width/height/format and a bunch of other harder-to-describe parameters)
18:03 CrystalGamma[m]: makes sense
18:03 imirkin: on e.g. radeon, it's a pointer to that descriptor structure in memory
18:03 imirkin: on nvidia, you can't just supply those values willy-nilly, they have to be in a table
18:04 imirkin: afaik the blob driver has a pointer to a descriptor which contains those offsets (and some other things which are necessary for a fully-correct implementation, which is why i'm thinking about redoing it in the first place)
18:05 imirkin: [like if you have a 2d view of one of the layers of a 3d texture ... we don't support that.]
18:05 imirkin: [but as i said before... "doctor, it hurts when i do that" ... "DONT DO THAT!"]
18:08 CrystalGamma[m]: thanks anyway for now … I actually have to receive my card (well, a working one) before I can think about doing any development anyway (assuming I take time off my other projects)
22:25 imirkin: skeggsb: let me know when you're around and have time to talk about how FIFO works on nv50
22:25 imirkin: i'm noticing a number of differences against what i see in blob traces