00:06imirkin_: skeggsb: anything else exiciting on your horizon?
02:44pabs3: imirkin_: the pstate thing worked fine. slight flash of black when changing pstate but no hangs
02:46imirkin: yeah, we still don't have it 100% right
02:46imirkin: you see some flashes
02:46imirkin: linebuffer or something
02:46pabs3: not a big deal IMO :)
02:46imirkin: until it hangs
02:47pabs3: oh, the flashes can be associated with hangs?
02:47imirkin: not sure.
02:47skeggsb: no, they're a result of not completing the clock change within a vblank period
02:47imirkin: should be a whole lot faster though, with 5ghz vs 600mhz vram
02:47imirkin: skeggsb: what's the linebuffer stuff then?
02:48skeggsb: to prevent that happening and give more time :P
02:48imirkin: right ok
02:48skeggsb: we can't sync to vblank at all on kepler, you need to configure isohub properly to give us a "ok, go now!" signal
02:49imirkin: those signals are gone from the pmu?
02:49imirkin: i guess they realized it was folly with multiple screens...
02:49skeggsb: yeah, that'd be my guess too
02:49skeggsb: the addition of the linebuffer works around that
02:50pabs3: I see, thanks for the info
02:50skeggsb: configuring that stuff is tricky, and hard to follow in traces... it's done in several (3? maybe?) different stages
08:02skeggsb: imirkin_: did you fix up demmt for the binary driver btw?
08:02kwizart: hello, I have an issue with nouveau on tegra armv7hl on 4.16.x kernels. I have the following message:
08:02kwizart: [ 1140.014947] nouveau 57000000.gpu: fifo: SCHED_ERROR 20 
08:04kwizart: this occurs many times by second, and the device is frozen (so I need to boot with blacklist.modprobe=nouveau)
08:10skeggsb: kwizart: i don't think any of us outside of nvidia employees actually has tegra boards setup, what'd be most useful is to bisect the kernel and track down when it broke
08:11skeggsb: that error doesn't make a lot of sense, it's referring to a feature we don't use in nouveau, so it's probably some kind of memory corruption issue
08:12kwizart: skeggsb, okay, I think the whole 4.16 is broken for me, I will try to bisect around the early 4.16 PR first and try to see (reported in #tegra alsoà
08:13kwizart: btw, unrelated, I saw recent pascal gpu got a nvdec firmware, is there plans to enable hw decode capability with theses gpu at some point ?
08:13skeggsb: it's not proper firmware, unfortunately
08:14skeggsb: it's a small piece of code that has something to do with secure boot that they run on nvdec for some reason
10:23RSpliet: imirkin: clock change on single-monitor GT2xx should not give a flashing screen (apart from the first time on some DDR3 GT218s due to link training)
10:23RSpliet: I'd like to move that link training sequence to boot - where nobody cares, but when are we ever 100% confident about stability of this routine...
12:48karolherbst: imirkin_: currently I toy a little around with compiling builtins through codegen, but convertToSSA already messes things up: https://gist.githubusercontent.com/karolherbst/a7e9bb6e14fa3010ed3a1d0da8812e76/raw/f2da5f71b6f3e476fcea6dffd10ff61e06ba0e2e/gistfile1.txt
12:48karolherbst: it seems like the outputs are handled fairly well, but not the inputs
12:48karolherbst: any ideas?
12:50karolherbst: I guess I could just change convertToSSA in a way that it doesn't add those, because everything after that should be able to handle those just fine
12:57karolherbst: I tried tricking it with a NOP to the reg, but that didn't work out so well
13:07karolherbst: checking reg.id seems to be enough though
13:08karolherbst: mhh and a small RA fail
13:09karolherbst: mhh, that works here
13:10karolherbst: some opt removes the first mov
13:10karolherbst: LoadPropagation I guess
13:13imirkin_: skeggsb: i wasn't even trying ... i was trying to fix it up for nouveau
13:13skeggsb: imirkin_: oh, i thought you were doing both for some reason - nevermind!
13:13imirkin_: the plan is to eventually do both
13:13imirkin_: unless someone beats me to it
13:14imirkin_: at this rate, should be ready by the year 2025
13:24karolherbst: imirkin_: looks better already ;) https://gist.github.com/karolherbst/4319ffa785460a1176583f168e3c71ab
13:24karolherbst: allthough I thing the sat is wrong..
13:24karolherbst: shl b32 $r2 $r3 clamp $r2 (what does the clamp do here?)
13:24imirkin_: i forget
13:25imirkin_: oh, whether the shift should be clamped or not?
13:25karolherbst: I suppose
13:25imirkin_: i.e. if you pass in a value of 37
13:25imirkin_: should it &0x1f, or clamp
13:25karolherbst: ohh, subOp = 0 is clamp, 1 wrap and 2 high
13:25imirkin_: not 100% sure though
13:25karolherbst: so clamp might be default
13:25imirkin_: it is.
13:26karolherbst: sadly the binary isn't dumped for the library :(
13:27karolherbst: or is it?...
13:27karolherbst: "add $r2 (mul high u32 $r2 u32 $r3) $r2" fun
13:30karolherbst: ahh, that's imad.hi
13:31imirkin_: what are you trying to do?
13:33karolherbst: writing builtin code via codegen API
13:34karolherbst: and not uploading precompiled binaries
14:07karolherbst: imirkin_: nice, it works for u32 div :)
14:08imirkin_: hmmmmm ... why?
14:08karolherbst: it isn't that hard to fix codegen for this, ony really minor issues
14:08karolherbst: or were you asking why I do this?
14:10imirkin_: why are you doing this
14:10imirkin_: i like the library thing... dunno
14:11karolherbst: 1. no manual sched calculations 2. we might benefit from opts we never thought off 3. free support for new ISAs
14:11skeggsb:likes the idea of not rewriting it by hand every time we add new SASS support
14:11karolherbst: 4. no code duplication 5. no annoyence regarding different envyas syntax
14:12karolherbst: still messy, but: https://github.com/karolherbst/mesa/commit/6ba49760b2fd9f492a4c7bed54de6465348cfd44
14:12imirkin_: the envyas issue is self-created
14:12karolherbst: who wants to fix it?
14:13karolherbst: but then again, you have to manually fix the sched opcodes
14:13karolherbst: maybe we get a better calculater later and have to redo all those
14:13karolherbst: I just see doing it in codegen as less work overall
14:18RSpliet: karolherbst: ... I imagine twenty steps in the future we could use shadercache in DDX if we want to avoid recompilation every single boot?
14:18karolherbst: imirkin_: but we already talked about all that some months ago regarding porting the fp64 stuff to maxwell+
14:19karolherbst: RSpliet: well we could also just use codegen as an offline compiler
14:19RSpliet: Yeah, but that's only 10 steps in the future :-P
14:19karolherbst: then you don't need to compile it at runtime
14:20karolherbst: maybe I combine all that with moving codegen out of gallium
14:21karolherbst: and then we can use nv50_ir.h as a C API and just write tools against that
14:21karolherbst: and one of them would generate the files for the builtins
14:21imirkin_: well hopefully the next arch will look similar to the maxwell one in terms of envyas syntax
14:21imirkin_: since it's based on nvdisasm
14:21RSpliet: envyas has the questionable benefit of being written as a compiler generator - meaning it tends to be a good source of documentation
14:22karolherbst: RSpliet: it shares the same stuff with envydis
14:22karolherbst: and we want envydis anyway
14:23RSpliet: karolherbst: yes. Not to mention it targets some other ISAs as well. Like fμc and HWSQ
14:23karolherbst: also later if we do proper scheduling we would want to rewrite those builtins maybe anyway. The divs are pretty trivial, but the fp64 are kind of complex
14:24karolherbst: and I don't want to go through all the asm builtin things everytime we fix a bug in the sched code calculater as well
14:24karolherbst: or improve things
14:24karolherbst: or whatever
14:24karolherbst: and with codegen we get all this for free
14:25RSpliet:is all in favour ;-)
14:25RSpliet: I bet you'll find a nice selection of codegen bugs in the process too!
14:25karolherbst: not really
14:25karolherbst: I think I already hit all I assumed I would hit
14:26karolherbst: well the LoadPropagation bug was a small surprise
14:28imirkin_: pendingchaos: btw, i was going to test your patches, but my box at home died. i think i have to reseat the cpu heatsink.
14:28imirkin_: so ... i haven't forgotten, just some temporary trouble :)
14:28imirkin_: (and i decided i'd rather go to sleep and deal with it over the weekend)
14:28pendingchaos_: I think I'll try to get NVC0_3D_MACRO_QUERY_BUFFER_WRITE to work with predicate 64-bit outputs and optionally write the availability of the query's result instead while I'm at it
14:29pendingchaos_: I'll probably end up passing some parameters through scratch
14:29imirkin_: pendingchaos_: iirc it supports the availability thing ... somehow. i forget.
14:29imirkin_: pendingchaos_: yeah, that was the thing - i ran out of registers :)
14:29imirkin_: and i so didn't want to use scratch
14:29imirkin_: note that macros can also write to scratch registers
14:29imirkin_: (and then read back out)
14:29imirkin_: i think i do that in one of the indirect macros
14:30imirkin_: where there were just too many values to hold at once
14:30pendingchaos_: it checks the availability, but I don't think it can write it
14:30imirkin_: ah, maybe not
14:30pendingchaos_: "exit braz $r6 #qbw_ready"
14:30pendingchaos_: it exits if the result is not available
14:31imirkin_: well, also keep in mind that additional macros are cheap (until we run out of global macro code space, but i don't think we're anywhere in the vicinity)
14:31imirkin_: so having a separate similar one would not be out of the question
14:31karolherbst: skeggsb: btw, are you able to do any mmt stuff with the newest nvidia drivers?
14:31karolherbst: I just get an empty trace
14:31imirkin_: karolherbst: update your valgrind
14:32imirkin_: new blob uses openat instead of open
14:32imirkin_: patch courtesy of pendingchaos_
14:32RSpliet: I've been restricting my workstation to 367.27 for ages because I couldn't trace w/ anything newer
14:32karolherbst: will try out later!
14:33karolherbst: pendingchaos_: I hope you keep up with fixing mmt bugs :p
14:33RSpliet: two ages, to be precise
14:33pendingchaos_: note that I don't think demmt works with newer blobs
14:34RSpliet: Newer blobs or newer GPUs?
14:34karolherbst: yeah, I kind of think so as well
14:34karolherbst: they changged something
14:34RSpliet: For 367.27 I was able to use demmt properly for cards < GK110
14:34karolherbst: I couldn't trace stuff even before this switch apprently
14:34pendingchaos_: with my pascal gpu, 375.26 works fine, 390.25 doesn't work with demmt
14:35pendingchaos_: so I think newer blobs
14:36RSpliet:shakes his fist at NVIDIA like a 70yo senile
14:36pmoreau: There was them adding an extra field in the structure used for identifying the chipset, but there is probably more to it.
14:39karolherbst: RSpliet: :D
14:46imirkin_: demmt is variously broken for newer blob versions due to differing ioctl's
14:55karolherbst: mhh, seems like more bugs: "mov u32 $r1 $r0 (8) mov u32 $r0 $r1 (8)'
14:55imirkin_: there are fixed var movs
14:55imirkin_: that you have to rmeove
14:57karolherbst: The input for RA is like this: "mov u32 %r12 $r0 (0) mov u32 %r13 $r1 (0)" I kind of hoped that would be enough?
14:58karolherbst: ohh, you mean I have to remove those?
14:58imirkin_: they implement the calling convention
14:58imirkin_: since you're not calling anything, you have to remove them, along with the clobbers
14:58karolherbst: imirkin_: is there something which has to be done to mark values as function inputs?
14:59imirkin_: fixed movs
14:59karolherbst: ahh so ->fixed = 1 as well as using moveFromReg?
14:59imirkin_: moveFromReg / ToReg
14:59imirkin_: fixed = 1 is different
14:59karolherbst: okay, I am already using those
15:00imirkin_: you shouldn't be.
15:00imirkin_: if you're inlining the implementation
15:00imirkin_: note that the division stuff is already implemented in the nv50 lowering
15:00karolherbst: I don't plan to
15:00imirkin_: although it's done differently
15:00imirkin_: since nv50 doesn't have the same instruction set
15:00karolherbst: mhh, why is there both then?
15:01imirkin_: because library seemed (and, to me, still seems) like a good idea
15:01karolherbst: yeah, I am not against having a library. It would be just much nicier to write that with the nv50ir API we have
15:01imirkin_: the one-time cost of doing this thing per-isa is fairly low
15:02karolherbst: only if you don't keep later adjustments in mind
15:02imirkin_: it's stable
15:02karolherbst: or if performance doesn't matter
15:03karolherbst: I wouldn't mind if we only do the compilation manually or something. I am more interested in getting all the benefits of codegen and later improvements basically for free
15:04karolherbst: and that we can easily recompile on demand
15:04karolherbst: and demand might be: improved passes/sched opcode calculations/whatever
15:05karolherbst: and with passes I also mean instruction scheduling and so on
15:05imirkin_: this is all nice in theory
15:05imirkin_: but ... these are just super-stable implementations
15:05imirkin_: they don't change
15:05imirkin_: and they are hand-optimized
15:06imirkin_: you're investing a bunch of effort for what seems like no reason
15:06imirkin_: there are hundreds of things to do, and you're picking the things that already work
15:06skeggsb:isn't getting the "no reason" thing, karolherbst has listed heaps of good reasons
15:06karolherbst: I don't really think it is so much effort
15:06imirkin_: it's not
15:09imirkin_: you could, instead, create a scheduling pass. this would be tremendously useful.
15:10karolherbst: yeah, but a lot more work
15:10imirkin_: you could figure out how to fix the RA bugs that arise from spilling merged nodes
15:10karolherbst: I was searching for a 10 hours task on my train ride home ;)
15:10imirkin_: these are actual issues that actually cause problems
15:11imirkin_: (and/or should yield significant benefits)
15:11karolherbst: yeah I know, but right now I don't have the head for digging into the scheduling thing. I actually want to work on the scheduling thing quite soonish
15:11imirkin_: you could work on a kernel uAPI to allocate bo's at a specific VA
15:11karolherbst: skeggsb wants to do that :p
15:11imirkin_: skeggsb wants to do lots of things
15:12karolherbst: well I am optimistic that we might end up with a vulkan driver this year, it is quite high on the priority list
15:12imirkin_: unfortunately his eyes are bigger than his stomach -- too many things on his plate means it can take a long time for any particular thing to actually happen
15:58pmoreau: Speaking of Vulkan driver, any news from the student who wanted to work on it for GSoC?
15:59pmoreau: And any news from the other GSoC interested student (I can’t remember what the student wanted to work on though)?
19:31karolherbst: pmoreau: uhh. my message didn't make it through. No news afaik. Maybe mupuf knows more
19:32karolherbst: but due to the lack of communication I am kind of leaning towards a no even if that student fullfils the requiernments
19:32pmoreau: Okay; I’m not going to mentor anyone, but I was just curious about the current status.
19:32pmoreau: Sounds reasonable, and I think the proposal deadline for the students is quite soon.
19:33karolherbst: the deadline was already
19:33karolherbst: March 27 16:00 UTC
19:35pmoreau: BTW, I tried your updated OpenCL branch, and it definitely works better.
19:35pmoreau: I do have a couple of SPIR-V fails in the spirv_to_nir code, for some of the tests with structs in global mem.
19:35karolherbst: would be bad if it would be worse than my older branches :p
19:36karolherbst: pmoreau: yeah... pointer stuff is still annoying
19:36karolherbst: robclark wants to rework it
19:36karolherbst: and also add support for generic pointers
19:36karolherbst: pmoreau: but structs inside global mem should work in most cases though
19:37robclark: karolherbst, well.. I kinda already did.. half-way at least ;-)
19:37robclark: (just need to do the lower_io part of it)
19:37karolherbst: robclark: right :)
19:37karolherbst: you mean the parameter loading?
19:37karolherbst: or general derefing?
19:38robclark: actually I don't think we need to add anything to load fxn param anymore (with the deref-instr stuff)
19:38robclark: it was already added
19:39karolherbst: huh? you mean in master?
19:39karolherbst: or did you mean the changes I've made?
19:40karolherbst: I kind of expected some changes there if you have deeper deref chains, but maybe that just works now (tm)
19:40karolherbst: because you just load the base + deref instructions
19:41pmoreau: karolherbst: https://hastebin.com/tiyacicaqi.hs on GK107
19:42robclark: karolherbst, not in master, in nir-deref-instr-vN branch
19:42robclark: (which I'm rebasing pointer stuff on.. since it is *soo* much saner)
19:42karolherbst: robclark: yeah okay, I lost track of which patches are inside which branches :)
19:44robclark: karolherbst, fwiw https://github.com/freedreno/mesa/commits/nir-deref-instr-v3 needs some cleanup, but is more or less working (ie. just disable nir_validate since it has some issues still) if you want to try nvir/nir stuff with that..
19:45karolherbst: I think I would need to make some changes for nouveau nir anyway... because all the image stuff should use derefs now with that, I think
19:46karolherbst: I need to focus more on getting one thing done though :D
19:49karolherbst: robclark: but there is no OpenCL stuff on that branch, or is there?
19:52robclark: karolherbst, no.. that was just if you wanted to see if the deref instr had any impact on nvir
19:53robclark: fwiw, for ir3 it was pretty minor (but it will be somewhat more for radv/radeonsi since they translate deref's to llvm.. although I guess the result will be simpler than what they have now)
19:59karolherbst: imirkin_: I think we really have to support 64 bit bindless handles, because you can just do random math on that
20:11imirkin_: karolherbst: sure
20:11imirkin_: certainly in nir you do
20:12imirkin_: in nouveau, it just has to support the handles that the driver produces
20:13karolherbst: but what if you do >> 0x10 on the handle ;)
20:13imirkin_: it's 64-bit until it's fed into tex
20:13karolherbst: mhh, right
20:13karolherbst: I forgot about that
20:13imirkin_: tex just ignores the high word
20:13imirkin_: coz it's never anything useful
20:13karolherbst: ohh, okay
20:14imirkin_: (because of what values i return for tex/img handles)
20:17karolherbst: imirkin_: what though if something sets the upper bits on purpose and we just don't fail or something
20:20imirkin_: program termination is allowed.
20:20imirkin_: but not required :)
21:02stratact: I'm planning to upgrade to a gtx 1080, does nouveau work with that card or should I use the nvidia drivers?
21:04mupuf: stratact: well, it will work, but performance will be terrible
21:04mupuf: and not all features are there
21:05pmoreau: But you can control the LEDs (if it has some)!
21:09stratact: I guess I'll go with the nvidia driver then
21:19imirkin_: you definitely want to use nvidia driver for any production use
21:19imirkin_: (or even better - get an AMD board)
21:28lachs0r: stratact: I’d avoid nvidia if I were you. the proprietary drivers are getting worse and worse
21:29lachs0r: 4.16 broke them again, and they’re already causing random freezes and data loss for me under specific circumstances
21:32stratact: lachs0r: oh okay, good to know
21:35stratact: Sigh, it seems like everywhere things are breaking. Especially with amdgpu and DC. I've tried the Vega but I had issues with the performance in Budgie and sometimes when waking up my monitors from being blank, I either end up with a glitched X session or lost of display which forces me to do a hard reset...
21:37imirkin_: stratact: work with the amd guys to fix your problems
21:38imirkin_: they're responsive
21:40stratact: imirkin_: I've already returned the card, so I don't have it anymore
21:41imirkin_: stratact: ok. well with nouveau, you're going to have a not-great experience. it should work though.
21:42stratact: hmm :/
21:44imirkin_: dunno what the blob driver situation is. i hear people complaining, but i suspect people for whom it works don't run around singing its praises
21:47lachs0r: tbf it works okay most of the time
21:47pendingchaos_: the blob works well for me (with Fedora 27 w/ GTX 1060) as long as I keep it updated so it works with the current kernel
21:47lachs0r: if you need to switch displays on and off… beware
21:48lachs0r: (standby/dpms is not an issue though)
21:48pendingchaos_: the boot screen and virtual terminals look rather ugly with it though (the terminals are also a little small iirc)
21:48imirkin_: doesn't it have some kind of drm adapter now?
21:48imirkin_: so that fbcon can modeset
21:48lachs0r: well last time I tried that it was very unstable
21:49lachs0r: but it might work nowadays
21:49lachs0r: don’t think nvidia has tried very hard though, given that it’s not used by default
21:49imirkin_: ah i see
21:50lachs0r: I also tried sway (wayland compositor) with nvidia during the brief period that it attempted to support it. ran at 0.2 fps and crashed the kernel on logout
21:51lachs0r: I don’t think nvidia cares about people who are not running centos and use their linux support only to do big enterprise things
21:52lachs0r: there’s no real incentive for them to fix things anyway
21:53lachs0r: impossible to get official support for anything if you’re a mere mortal :D
21:53lachs0r: I tried to remind them that they had enabled 30bpp support for linux on all geforce 8+ GPUs when their drivers truncated everything to 8 bits anyway, but only on newer models
21:54lachs0r: 3 months and 4 sample programs (even compiled the binaries for their ancient systems) later they finally ignored me
21:55lachs0r: the issue did eventually get fixed, but I don’t think it had anything to do with the bug I filed
21:56lachs0r: complete opposite of what I’ve seen with open source drivers (those which are in mainline kernel anyway)
21:57stratact: which nvidia blob version should I use with the 4.16 kernel?
21:58lachs0r: I just mentioned it doesn’t work with 4.16 yet
21:58lachs0r: you need to stick to 4.15.x until someone patches the module for nvidia
21:59imirkin_: lachs0r: well, 30bpp on pre-kepler is a bit odd -- hard to get more than 8 bpc out of the hw
21:59lachs0r: imirkin_: fair enough, but you at least get *dithered* 8 bits
21:59lachs0r: on newer hardware it ignored the dithering setting
21:59imirkin_: yeah, that one's easy to get wrong
21:59imirkin_: it's all *very* confusing
22:00lachs0r: apparently the LUT size was fixed to 8 bpc or something
22:00lachs0r: I don’t remember the details
22:00imirkin_: coz you have a 10bpc buffer, an 8bpc LUT, and a 12bpc HDMI thing
22:00imirkin_: pre-kepler can only do 8bpc HDMI i think
22:00imirkin_: and it's an 8-bit lut, but can have more dynamic range than 8bpc
22:01imirkin_: it's all quite confusing.
22:01imirkin_: i think i kinda have my head wrapped around it all
22:01lachs0r: that sums up nvidia hardware
22:01imirkin_: just have to go and implement it.
22:01imirkin_: mmmm ... more like the problem space.
22:01lachs0r: I guess. colors are hard
22:01lachs0r: nobody gets it right
22:03imirkin_: yeah, just different levels of wrong :)
22:04lachs0r: ← uses a color-managed video player…
22:04imirkin_: problem is that there's not a lot of good information about all this stuff
22:04lachs0r: can’t even make a *proper* screenshot without browser users complaining that the blacks are crushed
22:04imirkin_: so someone just coming into it can very quickly and easily make rookie mistakes
22:05lachs0r: because of course browsers pretend all displays are sRGB in the absence of a profile
22:05lachs0r: but video players don’t really do that
22:06lachs0r: I wonder if chrome still uses bt.601 on all video except when it’s youtube with its custom metadata
22:09imirkin_: and then you start worrying about whether you're using rgb or yuv data over hdmi
22:09imirkin_: definitely not confusing at all.
22:10lachs0r: well I’m absolutely sure my mpv is more competent at yuv → rgb color conversion than whatever secret sauce is in TVs :>
22:11lachs0r: but then there are those silly display controllers that convert back to yuv, do some fancy filtering, then convert to rgb for display
22:11imirkin_: but that's my point
22:11imirkin_: you could be handing an rgb buffer to the card
22:11imirkin_: but then it uses a yuv 4:2:0 mode to talk to your screen
22:11imirkin_: and does the conversion
22:11lachs0r: oh urgh that
22:11imirkin_: like i said, "not confusing at all" :)
22:12lachs0r: or you get stuff like with the raspberry pi
22:12lachs0r: which might default to limited range rgb no matter what the display tells it to do
22:12lachs0r: so you get washed out colors
22:13lachs0r: been told this happened to many windows users with hdmi displays as well, not sure if amd or nvidia drivers or both
22:13lachs0r: including overscan fun
22:15lachs0r: it’s quite strange how much we pretend our digital displays are analog CRTs
22:17lachs0r: what the heck are front/back porch and all that stuff even good for on a digital display