00:00RSpliet: But what's just as important is to realise that with instruction scheduling you can reduce live sets, invalidating the need to spill in the first place.
00:46karolherbst: imirkin_: mhh, still thinking about the situation where we want to check against limms before converting an instruction. Well actually the insnCanLoad problem without actually having the instruction
00:47imirkin: could change the api around
00:48karolherbst: was thinking about having something more complex
00:49karolherbst: like isInsValid(op, SRC_REG | SRC_MOD_NEG, SRC_LIMM, ...) and a isLimm(op, value) or something?
00:50karolherbst: which is in the end what we actually want to know anyway, if the given combination of stuff makes up for an instruction we are able to emi
00:52imirkin: yeah, something like that
01:37karolherbst: imirkin: well... at least we can have the add+add -> add3 optimizations without immediates as in other cases we already know we will be able to construct an add3
01:39karolherbst: mhh, wait. Only one immediate or c access :/
02:31karolherbst: imirkin: ohhh, we moved LoadPropagation after LateAlgebraicOpt, so we don't actually have to check for immediates :)
03:20imirkin: hmmmmmm... since we do jumps in a predicated manner, i wonder if we really need to split critical edges -- we can just insert mov's with the same predicates ... hm.
09:28CrystalGamma[m]: so … Maxwell 1 (GTX750) … what can I do with it using nouveau? specifically, without non-free firmware …
17:11imirkin: CrystalGamma[m]: what do you want to do with it? and what do you consider to be "non-free firmware"
17:14joepublic: non-free firmware is firmware not released under a free license. This is a consideration mostly because Linux-libre, parabola, trisquel, debian do not ship firmware that has a non-free license (no permission to reverse-engineer or to modify, for instance.) For debian you can manually modify your sources to include non-free, but for these others listed, it's not possible to do so. Thus, only hardware that works with no non-free needed, works
17:14joepublic: in these environments.
17:14imirkin: and what is firmware?
17:15imirkin: like ... if a board shipped with a data blob (embedded on the board) that contained JVM bytecodes that were meant for the CPU to execute, would that be OK?
17:15imirkin: (it's not a purely hypothetical question, btw)
17:16joepublic: The fsf, whose standards these products and distributions follow (to whatever degree), maintains that something contained within a hardware device is part of the hardware. if it contained code to run on the host cpu, its license and terms would determine its software freedom.
17:17CrystalGamma[m]: tough question though, but I'd say no, bytecodes for the CPU are actually driver
17:17CrystalGamma[m]: and I would prefer not needing it
17:17imirkin: wouldn't we all
17:18imirkin: however the VBIOS contains initialization instructions in a high-level language which are meant to be interpreted on the CPU
17:18imirkin: (not JVM specifically, but i was just using something people would be familiar with)
17:19CrystalGamma[m]: I thought it was little more than tables for the memory map and power states and such?
17:19imirkin: the tables are there too
17:19joepublic: in the case of bios and other generally non-volatile things, they are considered to be freedom issues for the hardware, but are not software freedom issues unless they are loaded onto the card from the computer.
17:19imirkin: but it's also initialization code, as well as code meant to run on modeset
17:19imirkin: (and at reclock time iirc?)
17:20CrystalGamma[m]: I guess I'd put the limit at Turing-completeness …
17:21CrystalGamma[m]: but yes, data and code are sometimes hard to distinguish
17:22imirkin: CrystalGamma[m]: i haven't done the proof or anything, but i'm fairly sure it's turing-complete
17:23imirkin: you can write to memory, and have conditional jumps
17:23CrystalGamma[m]: either way I'm just looking for something that is less blobby than modern radeon cards …
17:24imirkin: well, GM107 should mostly work
17:24imirkin: but there's a big delta between "mostly" and "completely"
17:24CrystalGamma[m]: yeah I recently bought one of those (hasn't been delivered yet though)
17:25imirkin: all sorts of software that has no need for GL-based accel is now using GL, so you get to hit issues a lot more often than you used to
17:25CrystalGamma[m]: oops I meant GK104
17:25imirkin: you said GTX 750, no?
17:25imirkin: that's universally GM107
17:25CrystalGamma[m]: GM107 I'm thinking about for lighter loads
17:26imirkin: GK104 should work ok too
17:26CrystalGamma[m]: AKA accelerated desktop
17:26imirkin: both of those support reclocking
17:26imirkin: and a GL driver
17:26CrystalGamma[m]: but so far reclocking is manual, right?
17:26imirkin: GK104 will also get you video decoding accel if you use non-free firmware.
17:26imirkin: yeah. which is a lot better than non-existent.
17:27imirkin: it's not 100% reliable, so making it automated is scary
17:27imirkin: a _ton_ of effort went into kepler reclocking though, so it's mostly good
17:27CrystalGamma[m]: ah, I was wondering why there was no simple policy mechanism yet :)
17:27CrystalGamma[m]: what kind of things should I expect if it fails?
17:27imirkin: system hang
17:28CrystalGamma[m]: system or just GPU?
17:28imirkin: depends, i suppose
17:28imirkin: but system at worst
17:28CrystalGamma[m]: hmm no worse than my Intel GM45 then :P
17:29orbea: most of the times I saw nouveau crash I could still ssh in.
17:29CrystalGamma[m]: that sometimes hangs the system when blanking …
17:29joepublic: ssh in and reclock to a lower state?
17:29orbea: ssh in and reboot
17:29imirkin: again, it depends on the specifics
17:29imirkin: sometimes you can do that
17:30imirkin: sometimes you can't
17:30imirkin: if the reclock just messes up e.g. display, then sure
17:30imirkin: if the reclock messes up the vram, then not so much
17:30joepublic: I have a GK104 that I can freely reclock, very satisfied with it, thanks for all you do
17:30imirkin: if the reclock messes up the PCIe state machine, then you get a system hang
17:30imirkin: (or causes a hang in it or whatever)
17:31CrystalGamma[m]: how much work do you think it would be if I wanted to make nouveau work with non-4k page sizes? (aside from the scary prospect of getting into kernel dev)
17:31orbea: also, Im not sure I ever got a crash because of recloocking, more likely was some program was doing things that made nouveau unhappy
17:53imirkin: CrystalGamma[m]: i think some recent rework has actually made that substantially more possible
17:53imirkin: CrystalGamma[m]: what's your target page size? 64K?
17:54imirkin: i wonder if one approach would be to just always use large pages, and set the large page size to 64k
17:54CrystalGamma[m]: I have a ppc64(le) system
17:54imirkin: large pages may be (globally) configured at 64 or 128k
17:54imirkin: we default to 128k
17:55CrystalGamma[m]: I mean it can operate with 4k pages obviously, but 64k is the default in some environments
17:55CrystalGamma[m]: is that GPU pages you are talking about?
17:55imirkin: having 4k gpu pages and 64k cpu pages just makes for some really awkward situations
17:56imirkin: G80+ GPUs have their own VM
17:56CrystalGamma[m]: makes sense if you want bindless :)
17:56imirkin: which can allow GPU VA's to address either vram or system memory
17:56imirkin: well ... more like if you want to be able to seamlessly move resources back and forth between sysmem and vram
17:57imirkin: you can do bindless without any of that (as in ARB_bindless_texture)
17:57CrystalGamma[m]: wait … you can use textures over PCIe?
17:58imirkin: almost anything
17:58CrystalGamma[m]: that sounds really slow
17:58imirkin: it is
17:58imirkin: "doctor, it hurts when i do that"
17:58CrystalGamma[m]: and about bindless … how do you keep processes separate (basic security) without an MMU?
17:59imirkin: each process has its own GPU-side VM
17:59CrystalGamma[m]: well I expected worse honestly :S
18:00imirkin: but again, i don't see how this has anything to do with bindless
18:00imirkin: [and i implemented bindless support in nouveau, so i'd like to think i know at least a bit about it]
18:00CrystalGamma[m]: I thought in bindless the shader can just supply a pointer to the texture (or am I misunderstanding something?)
18:01imirkin: it's an opaque handle
18:01CrystalGamma[m]: so there is some indirection?
18:01imirkin: but even if it's a pointer (which at least for nouveau it isn't, although i'm thinking of redoing it)
18:01imirkin: then it would be a pointer into GPU VA
18:01imirkin: which is per-process
18:02imirkin: thing is, a texture is a lot more than an address
18:02imirkin: there's a whole descriptor structure that needs to be in a table
18:02imirkin: and all you do is provide an offset into that table
18:02imirkin: [on nvidia gpu's]
18:02imirkin: (stuff like width/height/format and a bunch of other harder-to-describe parameters)
18:03CrystalGamma[m]: makes sense
18:03imirkin: on e.g. radeon, it's a pointer to that descriptor structure in memory
18:03imirkin: on nvidia, you can't just supply those values willy-nilly, they have to be in a table
18:04imirkin: afaik the blob driver has a pointer to a descriptor which contains those offsets (and some other things which are necessary for a fully-correct implementation, which is why i'm thinking about redoing it in the first place)
18:05imirkin: [like if you have a 2d view of one of the layers of a 3d texture ... we don't support that.]
18:05imirkin: [but as i said before... "doctor, it hurts when i do that" ... "DONT DO THAT!"]
18:08CrystalGamma[m]: thanks anyway for now … I actually have to receive my card (well, a working one) before I can think about doing any development anyway (assuming I take time off my other projects)
22:25imirkin: skeggsb: let me know when you're around and have time to talk about how FIFO works on nv50
22:25imirkin: i'm noticing a number of differences against what i see in blob traces