07:12 mardikene: morning guys, let us mildly also talk how instruction arbitration works, fetch LSU and decode stages are handled via wavefront queues, they have a dispatcher which allocates wavefronts/warps dynamically, next thing is whole CU opcodes are stored in issue flops
07:14 mardikene: there they get arbitrated in FIFO manner whose instructions only are executed in the order from the issue flops, they are executed only if the operands have ever changed
07:14 mardikene: the underlying data out value
07:15 mardikene: that works so that opcode is fetched from input_flops to PS_issue_alu_flops.c and serially provided to alu_control which finally executes it, before LSU opcode decoder redirects pointers and opcode manager uses the new incoming pc
07:16 mardikene: on AMD one CU has 512 opcode flops in issue0 that belongs to that CU
07:20 mardikene: thoe details actually should not matter much, but it is just for confirmation or illustration purposes
07:20 mardikene: final code to exploit the hw, is actually long since known by most, and does not need very much knowledge of the hw innerworks, as it is actually pathetically thin
07:25 mardikene: all the point is, if you had something among LSU opcodes to change the value of pointer flops in private or dest pools, i.e private and global persisten respecitively, then the instruction executes, otherwise fifo will skip that
07:25 mardikene: and it can be done based of the VADDR in the handler, so every reg is unique by then
07:25 mardikene: why so, because hw flops this address
07:28 mardikene: in miaow code those flops are stored in LSU, issue and finally in vgpr module
07:30 mardikene: respecively in 2to1 mux that flips the regfile source and destination data, so private reg operand1 is the virtual address and operand2 is the data
07:39 mardikene: but those details are slightly more complex, it will fill automatically a VADDR reg that you had marked with the data from previous iteration, readback stage works in interleaved fashion can be viewed also as fifo, but it does not work entirely parallel, since two readbacks can collide
07:40 mardikene: so once you get that virtual address, you'd really want to have no physical location to it, i.e pagefault the texturelookup, and level1 indirection overwrites the data value
07:46 mardikene: it is worthwile to remember that private pool can be only accessed with pointers and writebacks can be broadcasted from those pointers
07:47 mardikene: without pointers they would need wavefronts to be present though
07:48 mardikene: but then the final private reg is unique and you can not play with it unless you captured it
07:49 mardikene: usable pointers/arrays is not concept that came along with opencl, but with arb fragment program allready, i see even 945gme had those , just a texture lookup
07:52 mardikene: embarrassingly enough it was also me, with my longer career, who messed this thing up too, quite badly
07:53 mardikene: i got hazardous and did not spot it too back times, and waas going crazy
07:54 mardikene: but it seemed that most except the hw designers or whoever aaron lefoun was , badly missed or messed up the concept there
07:55 mardikene: lefon , damn
07:56 mardikene: anyhow i was under extreme pressure , and i did not spot that paper which was best paper of 2005 and it's preceding one, and quite frankly had no hw experience too
07:59 mardikene: well despite that i messed it up being younger, i want newcomers to know how to write code correctly in graphics world
08:04 mardikene: and as you figured there are two ways of ensuring that, make the end user developer more aware, or even better way, force the correct on transparently from driver
08:10 mardikene: i did look at brookgpu back then and figured something, but largely i screwd up as expected under pressure cause of losses that i did not handle back time
08:14 mardikene: if i only started 10years before as originally planned with educating myself, i wouldn't had, i am a smart guy, but those plans were ruined, and it got intense for me, i lost the momentum everywhere due to this, also had to use my own precious money while being very dissapointed anxious and stuff, to do the research, grand total was expected failure
08:14 mardikene: i dunno why other messed up, but surely you know that yourself somehow too
08:17 mardikene: kownledge and eperience comes with excersiszing or stuyding, it's best to start from early teenage , so the pressure is off later in larger degree, because it is well handled there, but if that time large humiliation is faced instead, i did expect to have drawbacks too
08:24 mardikene: living according to plans is something i was critisized about, i liked to have clear plans of what i try to acheive from youtg ages, allround attack diiferent problems to be solved, to grow up as not a doll but mature , i had physical performance covered early, but it has disadvantage to go wanting to acheive according to plans
08:24 mardikene: people can build fences and hurdles to block you, cause it's all readable that one wants to acheive yet more
08:32 mardikene: thanks anyone who felt like i should learn how difficult it is to lose all days, thanks dads , i did not want to feel that but thanks anyways, some have coded this to organs, eurosport commercial showed it too, loosing is pretty pathetic thing to do, show me the second, i'll show you the loser
08:59 mardikene: how fast it goes from 512 opcode flops skipping stuff...yeah as they have instances and execution is present only from the core when operands did in fact change, vgpr flops are parallel, yeah very fast
09:06 mardikene: GLSL compiler has had all the features for longer period, as inlining/flattening and stuff like that, compiler wise i do not have much to try eloborate on, cause this you all know, the precompiler stuff is more code, but it's not difficult, there is not much point
09:06 mardikene: to stop on elaborating this stuff much
09:14 mardikene: and the thread-safety stuff i talked about, it's os that gives mesa heap and stack and such, concepts of thread private and shared memory, all should know that, when you map the heap dynamic allocation to the same address is considered to be shared, stack that mesa gives is hence either private or shared depending how dispatcher pins it
09:15 mardikene: shared memory needs locking, cause threads access the same memory, private memory needs no locking in the application
09:33 mardikene: what i expect is stack being thread private copied so by default, but only global and static variables are excpetions, i can not remember where they considered heap allocations , probably in that form anyways
09:45 mardikene: however i am not sure, it's depending on i think if the dispatch is tls or not
10:04 mardikene: was it something like a shared context right, this is what opengl mandates what objects can have shared state, if i were to design the multithreading i would do that via dispatcher
10:04 mardikene: and tls, i.e threaded dispatcher, i dunno or vaguely remember , perhaps mareko allready did this
10:05 mardikene: i do not attend in mailing lists, but perhaps i saw something on phoronix
10:08 mardikene: long story short, i do not need that cpu multithreading much later anyways in opengl
10:13 mardikene: my intended linker don't care about cpu threads, it works on programs x86/arm disassembly, and precompiles stuff upfront ignoring the threaded code if needed
10:53 mardikene: if someone meant where the power save stuff comes from, than again, you can execute a flopped alu with pointers without wavefront and dispatcher being in use, they entirely sleep later, so once again instruction can be executed either by wavefronts or without .. than using pointers instead
10:53 mardikene: so if major blocks do not do any work, then they do not use any power , but that is more sophisticated
11:00 mardikene: as you see by default it starts with fetch, then decode , then dispatch from ibuffer that fetched from i-cache if needed all those parts will stall
11:01 mardikene: but i go now, so soon i'd say happy gaming computer enthusiasts your dream was noted ;) , bye.
12:50 imirkin: skeggsb: don't forget about my hwmon patch. less significant than your recent push, but important to me :)
13:00 imirkin: skeggsb: hm, so a bunch of those "virtualize" commits should actually bump perf a bunch on non-totally-shitty boards, right?
13:02 skeggsb: not a damn clue.. presumably nvidia go to such trouble for *some* reason
13:07 karolherbst: skeggsb: how far did you go with testing on various different chipsets? Looks like you touched quite a lot of fermi+ code
13:07 skeggsb: i tested everything i have, compared against nvidia, and did full piglit runs on most too
13:07 karolherbst: okay
13:08 imirkin: skeggsb: looks like a handful should probably also be directed to stable
13:08 imirkin: e.g. ttm: don't dereference nvbo::cli, it can outlive client
13:08 skeggsb: yeah, i planned on it
13:08 imirkin: coolio
13:09 karolherbst: does anybody know how patches are selected for stable? I get mails quite often that some of my patches went to stable branches without me doing anything about it...
13:09 karolherbst: auto picked?
13:09 imirkin: someone nominated
13:09 karolherbst: ohh, okay
13:09 imirkin: and you didn't object loudly
13:10 karolherbst: I am actually surprised, because even my mmiotrace fix got picked
13:12 imirkin: also if you stick in 'cc: stable'
13:22 imirkin: skeggsb: hah, nice you added overlay on nv50+! did you test it out at all with modetest (i.e. does it work)?
13:22 skeggsb: yes, it works
13:22 imirkin: neat
13:22 skeggsb: not that its terribly useful
13:23 imirkin: no. but that's not your fault :)
13:23 skeggsb: but i wrote it before i had volta hw as NVDisplay windows required similar changes
13:24 imirkin: do you restrict the scaling somewhere?
13:24 imirkin: or do you just allow it to not work / work poorly?
13:24 skeggsb: yeah, the call to some atomic helper will check it and fail
13:24 imirkin: ah
13:25 imirkin: right. that little friend.
13:25 skeggsb: it doesn't work on NVDisplay yet either, which surprised me a bit
13:25 skeggsb: the class supports it, the HW capabilities say no scaler though...
13:25 imirkin: it does support h-scaling though, no?
13:25 skeggsb: EVO does, yes
13:25 imirkin: which is actually more common than you'd think
13:25 imirkin: like DVD's that were encoded at 720x480
13:25 imirkin: but the underlying image had a different AR
13:26 skeggsb: oh sorry
13:26 karolherbst: 720x568?
13:26 skeggsb: no, evo does width-only scaling
13:26 imirkin: karolherbst: in PAL country :p
13:26 karolherbst: :D
13:26 karolherbst: right
13:26 imirkin: skeggsb: is AU PAL or NTSC? Or something crazy like SECAM?
13:26 skeggsb: PAL
13:27 imirkin: and 50hz a/c?
13:27 skeggsb: something like that, yeah
13:27 imirkin: never checked it yourself? :)
13:27 skeggsb: can't say as i have!
13:28 skeggsb: 240v 50/60hz is written on shit though
13:28 imirkin: huh, surprising
13:28 imirkin: i guess most stuff doesn't care nowadays
13:28 imirkin: probably 240/50 is the "native". in us it's 120/60
13:28 imirkin: pal is 50hz, ntsc is 60hz
13:28 imirkin: so yeah.
13:29 imirkin: anyways ... 720x480 scales out into like 640x480 for 4:3 content
13:29 skeggsb: i may also be misrembering :P you're probably right
13:29 imirkin: and ... something wider for 16:9 and more movie-friendly AR's
13:29 imirkin: but of course as screens are now also occasionally higher than 480 pixels, vertical scaling might be nice too
13:31 imirkin: all this aside, congrats on polishing up the gv100 support!
13:31 skeggsb: i wouldn't say it's polished, but it "works" :P
13:31 imirkin: more polished than before you had pushed :)
13:33 skeggsb:now just has to get the GL support out
13:33 karolherbst: skeggsb: good luck with that :p
13:33 karolherbst: but could we kind of have the same internal API for the emitter?
13:34 karolherbst: mhh
13:34 karolherbst: well
13:34 skeggsb: what do you mean same internal api?
13:34 karolherbst: we have to rework a lot of peephole anyway
13:34 karolherbst: skeggsb: well all the emitters are kind of different
13:34 skeggsb: i've copied gm107's, and been modifying it
13:34 karolherbst: okay, should be good enough
13:34 skeggsb: for the new encodings
13:37 karolherbst: imirkin: ohh, ping on the RA regression patch
13:37 imirkin: fuck, forgot about it :(
13:37 imirkin: i don't even remember any of the details by now
13:38 imirkin: and i don't have time, about to head out
14:40 mardikene: i forgat, so why do we need dest/persistent pool lsu flops in hw, i answered that too on radeon?
14:41 mardikene: let us imagine we capture free lane worth of data in to the buffer as growing stack worth of regs
16:57 karolherbst: imirkin: any bug you would rate higher than the bug we hit with for example Plasma or having too many GL applications at the same time?
16:58 imirkin_: define 'rate'
16:58 karolherbst: importance
16:58 imirkin_: to whom? :)
16:58 karolherbst: users
16:58 karolherbst: :p
16:58 imirkin_: i don't use plasma, so ... that one's totally irrelevant.
16:58 karolherbst: well, it causes issues elsewhere anyway afaik
16:58 imirkin_: anyways
16:59 karolherbst: or piglit doesn't work reliable
16:59 karolherbst: if run multithreaded
16:59 imirkin_: there's the concept of "bang for the buck"
16:59 imirkin_: there are some outright horrible bugs which are ... fixable
16:59 imirkin_: there are also some horrible bugs which aren't fixable without rewriting all the command-handling code
17:00 karolherbst: well I would like to fix some core issues next, which tackle fundamental issues and kind of affect everybody
17:00 imirkin_: (although such a rewrite would obviously learn from our earlier implementation, so wouldn't all be from scratch)
17:00 imirkin_: ok, well, rewriting things so that we don't use libdrm_nouveau would be a biggie
17:00 imirkin_: (and fixing threads/etc while you're at it)
17:01 karolherbst: what would be the big benefit of not using libdrm_nouveau?
17:01 imirkin_: fixing threading :)
17:01 karolherbst: well..
17:02 karolherbst: wouldn't it also mean we can't have custom syscalls? or what would be the replacement for libdrm_nouveau? or how does it differ/integrate with libdrm?
17:02 imirkin_: it's entirely separate from libdrm
17:02 pmoreau_: skeggsb: That was quite a few patches pushed out! :o I’ll try to test them on my MCP79 laptop, though it shouldn’t be that different from the Tesla cards you tested, as you didn’t touch the memory subsystem this time.
17:02 imirkin_: other than the name, there's no connection.
17:03 karolherbst: imirkin_: I see
17:03 imirkin_: skeggsb: btw, are you still tracking that vmm bug which appears to still persist for people?
17:03 imirkin_: [something like rounddown vs ALIGN_DOWN]
17:04 karolherbst: imirkin_: okay, and I guess with libdrm we have to implement some kernel functions to be able to use that and rewrite our winsys stuff?
17:04 imirkin_: errr ... not sure what you mean
17:04 imirkin_: but yes, we'd have to write a winsys
17:05 imirkin_: it'd be a substantial rewrite of all the infra
17:05 imirkin_: libdrm_nouveau is unfortunately woven into the fabric of the current driver
17:05 karolherbst: I always thought that the libdrm_nouveau thing is just an addon to libdrm to use custom syscalls or something
17:05 imirkin_: it is...
17:05 imirkin_: but the interface it presents causes us all sorts of trouble
17:05 imirkin_: it was more designed for xf86-video-nouveau
17:06 karolherbst: I see
17:06 imirkin_: where it performs quite aptly
17:06 karolherbst: and libdrm API is enough for all our needs and if not we could just add a new set of custom APIs for nouveau, correct?
17:06 imirkin_: we can just call those directly, no need for a libdrm helper
17:06 imirkin_: it doesn't help anything :)
17:07 karolherbst: uhm.. yeah well, but maybe we don't want to do linux syscalls directly
17:07 karolherbst: just a thought
17:07 imirkin_: i think they're normally go through a drmIoctl() wrapper
17:08 imirkin_: anyhow ...
17:08 imirkin_: there are smaller things too
17:08 imirkin_: like the whole texture thing
17:08 imirkin_: as well as the texture buffer object on tesla/fermi thing
17:13 imirkin_: and there was some other thing, i think, which was pretty important ... i forget what it was
17:14 karolherbst: texture thing as it causes bugs like in hitman where we basically run out of "space" for textures?
17:14 imirkin_: no
17:14 imirkin_: texture thing where the validation of textures is screwed up with marek's "recent" cso changes
17:14 imirkin_: recent in quotes since it was like a year ago
17:15 karolherbst: ohhh, right
17:15 karolherbst: that bug
17:15 karolherbst: yeah, might be worth to look into this before fixing that threading bug
17:15 karolherbst: what is the issue with texture buffer objects on tesla/fermi?
17:16 imirkin_: they don't work :)
17:16 imirkin_: they implicitly reference sampler 0
17:16 karolherbst: :)
17:16 imirkin_: so if sampler 0 is unbound
17:16 karolherbst: ahh
17:16 imirkin_: then ka-boom
17:16 karolherbst: does it have severe enough consequences?
17:16 imirkin_: so we have to ensure that *something* is bound to sampler 0. i think it can literally have any contents.
17:16 karolherbst: I mean, where does it cause problems?
17:16 imirkin_: severe enough that someone reported it
17:17 imirkin_: it causes problems when you use texture buffer objects?
17:17 karolherbst: well, that is valid for basically every bug ;)
17:17 imirkin_: (not sure what you're looking for here)
17:17 karolherbst: mhh, okay
17:17 karolherbst: so basically every GL3.1+ application might be affected
17:18 imirkin_: yeah. but specifically shaders which don't have a non-tbo bound to slot 0
17:18 karolherbst: oky
17:18 HdkR: Threading issue? :)
17:19 imirkin_: HdkR: the one where when you try to draw from multiple threads, then boom
17:19 HdkR: I was hoping for that :D
17:19 imirkin_: HdkR: simple solution - don't do that
17:19 karolherbst: as always ;)
17:19 imirkin_: HdkR: not like the hw will go any faster
17:20 imirkin_: in fact, it'll go slower coz of all the inter-context extra unnecessary validation
17:20 karolherbst: I guess it could help with CPU based bottlenecks
17:20 HdkR: ^
17:20 HdkR: That's my single use case these days
17:20 imirkin_: and if the hw truly uses multiple hw contexts, then it'll be a LOT slower
17:20 imirkin_: since hw context switches are suh-loooowwww
17:20 karolherbst: imirkin_: I think that basically changed with Volta, not quite sure though
17:21 imirkin_: could be.
17:21 imirkin_: i have a NV34 plugged in at home :p
17:21 karolherbst: they support full preemption at least
17:21 imirkin_: heh
17:21 imirkin_: supporting a feature doesn't mean it's fast :p
17:21 karolherbst: like pausing shaders mid execution :p
17:21 imirkin_: a gr context is megabytes of state iirc
17:21 karolherbst: yeah, dunno
17:21 karolherbst: anyway, wiht Volta stuff changes there
17:21 imirkin_: a cpu context is what ... 1K at the outside?
17:22 RSpliet: the "non-preemptive" context isn't too bad... 60-200KiB. If you need to push out your register file, local memory and rasteriser state you can start thinking in the order of MiBs
17:22 karolherbst: maybe they have direct storage for multiple hw contexts and can copy to VRAM async? I could imagine if you are smart, you can make that less painful
17:23 RSpliet: Presumably they go for a "preemption on workgroup boundary" model to limit the overhead while reducing the latency... that's what I would do at least
17:23 karolherbst: RSpliet: yeah. maybe
17:24 karolherbst: uhh actually they changed stuff with Pascal already
17:24 karolherbst: <100us they say
17:24 HdkR: imirkin_: A shared context + context priority makes me happy :D
17:25 RSpliet: karolherbst: Yeah... on Fermi/Kepler (non-preemptive) that's <25μs
17:25 karolherbst: RSpliet: the 100us are for preemptive
17:25 HdkR: (Although true async compute or async rendering would make me happier)
17:27 karolherbst: RSpliet: with "preemption on workgroup boundary" you mean to context switch the entire workgroup, right?
17:27 karolherbst: not wait until one workgroup is done
17:27 RSpliet: karolherbst: for certain values of preemptive :-) That means they've either found ways of achieving massively higher bandwidth, or they somehow managed to limit the increase in context to about 5x the size of non-preemptive. The latter seems unlikely if it's "stop the clock this clock-cycle and stow away your work" full preemptive
17:28 skeggsb: imirkin_: you know libdrm_nouveau isn't the issue right? it's how mesa uses it (read: very very badly).. but yes, not opposed to doing it directly in mesa regardless
17:28 karolherbst: RSpliet: uhm... they talk about 100us for instruction level preemption
17:28 karolherbst: full stop
17:28 RSpliet: karolherbst; they're not giving away the secret sauce obvs.
17:28 karolherbst: of course not
17:28 skeggsb: pretty sure pascal added instruction-level preemption for compute / pixel-level for graphics.. we don't enable it actually
17:28 karolherbst: but that's what they talk about
17:29 karolherbst: skeggsb: correct
17:29 karolherbst: we might want to support it to make the system more stable in terms of cycling shaders
17:29 karolherbst: or something
17:29 RSpliet: with "workgroup boundary" I mean finish the current in-flight workgroups and save yourself the effort of storing the reg file (256KiB/SM) and local memory (16-48KiB/SM)
17:29 karolherbst: RSpliet: yeah, then no
17:29 karolherbst: instruction level preemption is what I talk about ;)
17:29 skeggsb: karolherbst: nvidia don't seem to enable it everywhere, i actually haven't seen a mmiotrace where they do, but i didn't try too hard either
17:30 skeggsb: presumably there's some kind of drawback
17:30 karolherbst: yeah.. I imagine
17:30 karolherbst: skeggsb: I think they only do so for compute
17:30 karolherbst: and for graphics they just do pixel level preemption
17:30 karolherbst: or do you mean even then, they don't?
17:30 skeggsb: i haven't traced compute, but the graphics bits i traced, they use wait-for-idle preemption
17:30 skeggsb: "preemption" * sorry
17:31 skeggsb: so, none
17:31 karolherbst: mhh
17:31 skeggsb: i didn't trace any real kind of app though in recent traces, so i might just have missed it
17:31 karolherbst: maybe only supported on high level cards?
17:31 karolherbst: maybe only P100
17:31 skeggsb: nope, plus, i have high-level cards
17:32 karolherbst: right
17:32 karolherbst: maybe they just start with it if you have enough applications or something..
17:32 skeggsb: i bet if i traced cuda, or a real app, i'd see it.
17:32 karolherbst: might be worth to ask the correct person
17:32 karolherbst: skeggsb: yeah, most likely
17:33 skeggsb: i thought about implementing it in the recent series, but delayed it until later
17:33 karolherbst: I think it only makes sense to enable it if there is a chance to hang a thread
17:33 karolherbst: or maybe it doesn't have to be enabled and it is kind of transparent?
17:33 RSpliet: karolherbst: all I'm saying is that for compute to be fully preemptive on Kepler, you need about a 10x increase in context. If swapping that takes 4x more time I'm unsure what their tricks are. For GL workloads 10x is an underestimate because the fixed-function logic (texture processor, rasteriser) contain additional state.
17:33 skeggsb: it does have to be enabled, it's an explicit per-context option
17:34 karolherbst: RSpliet: yeah, dunno. All I know is, they say it is much faster than before
17:34 RSpliet: Double-buffering to speed up the process might work, but is a cheat as it eats away from DRAM bandwidth during execution of the new thread... I can see that leading to unreliable framerates and occasional jerkiness
17:34 karolherbst: skeggsb: I see
17:35 karolherbst: RSpliet: well, you acutally only want to do it if some work gets stalled anyway, so anything is better than not switching
17:35 skeggsb: some of the weird graphics context buffers have to be setup differently too, i think i took care of that part already
17:35 karolherbst: ahh, nice
17:36 karolherbst: the bigger question is, does it matter while executing stuff
17:36 karolherbst: if you get 5% less perf in avg, then yeah, this might be a tread off to think about
17:36 karolherbst: *of
17:36 karolherbst: but if it is <1%, meh
17:37 RSpliet: karolherbst: their motivation was something along the lines of "we must swap to w/e context (compositor? ctx 0? a bit vague in my memory) at some point during this frame. If preemption can be delayed for a long time, that means we need to play it safe and invoke the preempt early. Now that we can reduce the delay, we can postpone our preempt longer leading to more compute time for the heavy workload"
17:42 imirkin_: skeggsb: i'm not convinced that there's a non-very-badly way of using libdrm_nouveau in mesa, given what all it has to deal with.
17:43 imirkin_: but i'm happy to be surprised ;)
17:50 imirkin_: skeggsb: https://bugs.freedesktop.org/show_bug.cgi?id=106334 https://bugs.freedesktop.org/show_bug.cgi?id=105687 https://bugs.freedesktop.org/show_bug.cgi?id=105174
17:51 imirkin_: all bugs in nouveau_mem_host
17:51 skeggsb: i think those are fixed by the commit you mentioned earlier about sending to stable
17:51 skeggsb: i hit it during testing, and that's what i found
17:52 imirkin_: oh. neat-o.
17:52 imirkin_: i didn't even make the connection between those
17:52 karolherbst: skeggsb: maybe it is only enabled for shaders with a chance to get stuck, like loop breaks depend on outside data or something
17:53 skeggsb: imirkin_: it's not obvious, but i was running with kasan enabled at the time i hit it
17:57 imirkin_: skeggsb: i still don't get how that affects nouveau_mem_host...
17:57 imirkin_: does nouveau_mem_host call those functions and they got inlined?
17:57 imirkin_: [not from a quick look, but what do i know...]
17:59 skeggsb: struct nouveau_cli *cli = mem->cli;
17:59 imirkin_: right ... but your patch doesn't change that
17:59 skeggsb: that mem->cli is set from &drm->master in nouveau_mem_new(), and the drm pointer was wrong
18:01 imirkin_: because by the time nouveau_gart_manager_new was called, nvbo->cli was just fubar, and hence its drm pointer was too.
18:01 skeggsb: yep
18:01 imirkin_: the drm pointer was never dereferenced explicitly until much later
18:01 imirkin_: so the true bug didn't "appear" in the place that caused it
18:01 imirkin_: i see.
18:01 imirkin_: not exactly trivially obvious :)
18:02 skeggsb: no, but kasan was pretty explicit about it :P
18:02 imirkin_: i hate the bugs where you store some wrong pointer, and then half an hour later you get it and use it and it blows up
18:03 skeggsb: those should never have been left in the final version of the code, it's a remnant from a different approach to things that i abandoned for now
18:03 imirkin_: yeah, seems that way, since you can get a nouveau_drm directly from a bo
18:03 skeggsb: never managed to hit it in initial testing.. was very surprised to see it happen when testing gm107 piglit for recent changes
18:05 imirkin_: anyways, well-found. a bunch of people will be happy about that
18:05 imirkin_: btw - anx9805 - that's just g200 & co, right?
18:05 skeggsb: yeah, stupidly rare
18:05 imirkin_: there's still someone who's suffering from the zero-address transaction thing
18:05 imirkin_: on gm200
18:05 imirkin_: (or gm20x)
18:06 imirkin_: https://bugs.freedesktop.org/show_bug.cgi?id=103351
18:07 imirkin_: the fun bit is that it USED to work before you fixed it for gf119+
18:07 imirkin_: where it was jamming random bits in place
18:07 imirkin_: (or all 1's or whatever)
18:09 skeggsb: yeah, i don't really have a good idea about what to do with that without being able to reproduce
18:09 imirkin_: this is a stupid question, but you checked on gm20x+ that the address-only txn is the same bit?
18:10 skeggsb: i use DP on basically everything when testing, so, it works
18:11 skeggsb: including a GTX980 (like in the bug) that had busted vram (even bios screen scrambled) and can't do much before dying horribly
18:16 imirkin_: skeggsb: stupid question, but ...
18:16 imirkin_: gm200_i2c = {
18:16 imirkin_: .pad_x_new = gf119_i2c_pad_x_new,
18:16 imirkin_: .pad_s_new = gm200_i2c_pad_s_new,
18:16 imirkin_: should that be gm200_i2c_pad_x_new ?
18:18 imirkin_: i'd be lying if i knew what a "pad" or "x" or "s" were...
18:18 imirkin_: but otherwise gm200_i2c_pad_x_new does not have call sites
18:21 glennk: depth charge pointers traversal
18:26 skeggsb: imirkin_: i don't think it matters on that hw, it'll always hit the _s path if there's an aux channel
18:27 skeggsb: so the end result will be the same, however, perhaps should either remove the unused function, or point at it just-in-case
18:58 imirkin_: skeggsb: ok, well i pinged the bug just in case
20:22 pmoreau_: imirkin_, skeggsb: Could one of you please ping the nouveau_mem_host bugs with the patch that fixes it?
20:30 imirkin_: pmoreau_: https://github.com/skeggsb/nouveau/commit/bdc36dcf3fe469e6bb2a1366452dcb16b84e8bcf
20:30 pmoreau_: Ah, that one, okay
20:32 pmoreau_: I can probably ping the bug reports, though I should stay as much as possible out of the loop, due to my current job.
20:32 imirkin_: oh
20:37 pmoreau_: imirkin_: I am not allowed to contribute to Nouveau for the next 6 months (during my internship @ NV), but should be able to continue contributing after that, as long as I’m not revealing anything that is not already publicly known.
20:52 imirkin_: cool
20:52 imirkin_: pmoreau_: how much longer are you in school for?
20:52 karolherbst: pmoreau_: what about mesa?
20:53 karolherbst: pmoreau_: and is this a "should" as in probably or a "should" as in, "I know for sure"?
20:53 pmoreau_: imirkin_: About 2-3 more years until the end of my PhD
20:53 Lyude: could someone review the patch I posted to fix some rpm issues with nouveau? https://patchwork.freedesktop.org/series/42603/ i've been getting reports from some people that they've been having issues with their kernel deadlocking on boot due to that issue
20:54 pmoreau_: karolherbst: That’s what I’ve been told by HR/managers, so unless they decide to change their minds. As for Mesa, bits that aren’t specific to Nouveau, I’ll need to check. I haven’t really had time to work on it any way.
20:55 karolherbst: pmoreau_: things like that are usually super tricky. Sometimes they can disallow you to do "competetive" work, sometimes that's totally illegal (regarding what you do in your spare time). Then do you have to sign an NDA? Then it might be valid for your entire life, but only includes the stuff mentioned in that NDA, etc...
20:55 karolherbst: pmoreau_: well... luckily for them, what they say, means nothing
20:56 karolherbst: or there is no way to proof otherwise
20:56 karolherbst: you signed something, that's valid
20:56 karolherbst: of course you can say, but they told me something else... but then again, where is the proof?
20:57 karolherbst: if you have that in an email with a clear statement, then yes, otherwise?
20:57 imirkin_: Lyude: i think ben just pushed out a similar patch
20:57 Lyude: imirkin_: links?
20:57 imirkin_: and when i say similar, i think he just took your patch :)
20:57 Lyude: oh cool
20:57 imirkin_: https://github.com/skeggsb/nouveau/commit/f84973838226b8a599bce64376665ec8f0087629
20:57 imirkin_: (i just had not paid attention to the fact that you were the author when i saw it)
20:58 pmoreau_: karolherbst: I think I have that in an email.
20:58 Lyude: oh nice!
20:58 karolherbst: pmoreau_: all I am saying here is, that you should be careful with things like that and that good will accounts for nothing if you get sued for violating an NDA or violating IP.
20:58 karolherbst: pmoreau_: be sure you have that ;)
21:00 karolherbst: pmoreau_: or even ask for a clear statement. If nothing is fishy they can just provide that, otherwise.. they might try to weasel out of it