00:00 nodots: I can't find anywhere on the nouveau site that tells me whether or not I can run different gen cards on the same driver. Is it possible?
00:04 RSpliet: Don't see a reason why not, but multi-GPU 3D accel is not a thing last time I checked
00:05 nodots: It's so I can have 3 monitors, not for acceleration.
00:06 nodots: I am learning C and need the real estate.
00:12 nodots: I will try it. If it doesn't work, I'll join the mailing list I guess. Thanks RSpliet for answering me in under half an hour. (Not joking.)
00:13 nyef`: Worst comes to worst, there are cards out there with four outputs.
00:39 dboyan: imirkin_: mpfr test shows that all 2ulp differences has 1 on CPU and 1 on GPU, at least in the 64M tests that I've run. There are about 33k of them
00:40 dboyan: That's pretty good news i think
00:43 dboyan: I'll leave the machine running more tests while I go out for some morning exercise :)
01:04 notanoob: Is it possible to use nouveau on arm proccessors that have pci(e) support?
01:29 gnarface: i can't imagine why not
01:30 gnarface: i mean, in theory it should be possible, that doesn't mean anyone has added support for those devices
01:30 gnarface: (necessarily)
01:30 gnarface: you found an ARM computer with desktop nvidia hardware in it?
01:30 nyef`: Might be a bit more rough if the ARM is in big-endian mode, though.
01:31 dboyan: there are nouveau support on arm (tegra), not pcie though
01:32 gnarface: i'm still trying to figure out if such a device actually exists
01:32 gnarface: (an ARM computer with pcie slots)
01:32 Horizon_Brave: hey everyone
01:32 gnarface: like, is this a realistic concern IRL or just a hypothetical?
01:33 airlied: gnarface: yes you can get one
01:34 airlied: at least an aarch64 one
01:34 gnarface: aarch64 == ARM64?
01:34 airlied: yes
01:35 gnarface: hmm, neat concept
01:35 gnarface: such a thing would make a sweet steam in-home streaming client
01:35 gnarface: shame there's no ARM64 build for Steam
01:35 dboyan: gnarface: http://www.96boards.org/product/cello/
01:36 gnarface: amd *opteron* ARM64???
01:36 gnarface: that's quite enticing, dboyan ... too bad it's vaporware
01:37 dboyan: yeah
01:37 gnarface: do such things actually exist in the wild right now though, i meant
01:37 gnarface: ?
01:37 gnarface: could i actually just buy one, rather than hope for the day when i could theoretically pre-order one in the UK then have someone trustworthy mail it to me
01:38 airlied: there are apm mustang boards as well that had pcie
01:38 airlied: but really the fact they exist, doesn't really mean you want to use them
01:39 gnarface: well, obviously that's the second hurdle
01:39 gnarface: finding a *good* one
01:48 Horizon_Brave: has ARM moved away from 32bit chips? are they still developing ARM 32 bit exclusive programs? or has it transitioned to mostly all 64?
01:49 airlied: ARM doesn't make chips
01:49 airlied: they still make 32-bit and 64-bit cores available
01:49 notanoob: Most android targeted soc's with arm have 32 and 64 bit support now.
01:50 notanoob: Also there is no such laptop, I was thinking of making it though https://www.olimex.com/Products/DIY%20Laptop/ I just wanted to know if software support was started.
01:51 airlied: it's pci express, I'm guessing it's more whether that works more than the drivers
01:51 airlied: I don't think there is anything to start on the driver side until someone goes and does it
01:56 gnarface: Horizon_Brave: ARM64 is still new on the street. most ARM stuff in the wild is 32-bit still, though I expect that going forward it will quickly become rarely manufactured compared to the 64-bit stuff, it's not really up to ARM, they aren't like Intel selling chips. they just license tech to others.
01:57 gnarface: Horizon_Brave: so in a very real sense, the public response to these early ARM64 implementations will be a huge determining factor in how much more ARM64 stuff gets built in the near future
01:59 Horizon_Brave: oh right... sense you can't go out and buy an ARM chip... it'd more up to rather consumers go after devices like phones/tablets, ras pi's etc. that are making use of the 64 bit version rather than devices still using the 32bit
01:59 gnarface: Horizon_Brave: i get the impression that price:performance ratios will not disappoint, but the (mostly Linux) software that *should* work is largely untested, and the offerings by other vendors is more vaporous than that cello board linked above
02:03 notanoob: Ok my warrenty is now void :P. When nouveau's kernel module communicates with the hardware is it using ISA's per device or nvidia's ptx? Excluding GM20x and newer of course.
02:03 notanoob: Or am I misunderstanding the hardware-software model?
02:03 airlied: notanoob: the latter
02:04 airlied: misunderstanding
02:04 dboyan: notanoob: there are command bufffers
02:04 dboyan: and shader code is only part of that
02:05 dboyan: shader binary is in native isa, so no ptx in nouveau
02:06 notanoob: Are the command buffers exported with hardware bringup or something the kernel does?
02:07 airlied: what does "exported with hardware bringup" mean?
02:07 dboyan: i wonder, too
02:08 gnarface: i think he's asking if the hardware outputs a mapping anywhere when powered on, or if lacking explicit knowledge of them means you can't use them
02:09 notanoob: ^^
02:09 airlied: the latter
02:10 gnarface: though i've often wondered if there would be any way to "brute force" the information out of a card
02:10 gnarface: i have always assumed if that was possible it'd be done already
02:10 airlied: easier to RE the driver
02:13 notanoob: Yes a oscilator and hardware pirate. So I would have to RE the hardware to a point of making it useable or even useful. Before we even enter kernel space and then GNU userland c/*gl* programs.
02:16 dboyan: imirkin_: I ran 1.3 billon tests of rsq. All of the cases where 2ulp difference happens (516ppm) are 1ulp error on cpu and 1ulp error on GPU.
02:16 dboyan: now I'm rather confident on my algorithm
02:19 notanoob: @dboyan 'shader binary' I thought nouveau could be blob free to the bios hardware init on x86 prior to GM20x?
02:22 dboyan: notanoob: I guess so.
02:24 airlied: notanoob: shader binaries are generated by userspace rendering librarues
02:24 airlied: they aren't blobs
02:24 dboyan: yeah, shader program is another thing
02:25 dboyan: notanoob: there are also firmwares pre gm20x, but nouveau uses its own version. no longer possible since gm20x
02:26 notanoob: By own version you mean I can see the source code and modify it? Or like modified versions of nvidia's stuff with stuff stripped out?
02:26 airlied: notanoob: the former
03:33 bob_: nick bob
03:41 bob_: topics
03:42 bob_: HELP
03:45 Horizon_Brave: ?
05:00 gnarface: poor bob_
05:00 gnarface: he needed to use /help
05:09 dboyan: yeah, he seems to be missing slashes in all the commands
05:19 Satchelboi: Could I get a pointer real quick? I'm looking at the gm200 NV_conservative_raster ticket, and I'm a bit lost on what exactly it's asking to be implemented or patched.
05:36 dboyan: Satchelboi: implementing GL_NV_conservative_raster extension and probably its family on nouveau. Although what I said doesn't seem very helpful.
05:37 dboyan: You have to understand what that extension does, and how it should be implemented in various places
05:37 dboyan: And finally, what's the hardware interface
05:37 dboyan: luckily, there are examples and traces, which are quite helpful, and I think that's why it's marked as "easy"
06:15 Satchelboi: dboyan: Thanks, that's actually a big help. I spent yesterday figuring out how conservative rasterization works, but I was stuck on the next step. I know what to work on next
06:38 dboyan: Satchelboi: you may be interested in how INTEL_conservative_rasterization was implemented
06:39 dboyan: although it is not the same, and also for intel hardware
06:39 Satchelboi: dboyan: That was recommended to me last night too actually, but I'm only just getting to how to actually implement it now. As soon as I get an idea how to do that
06:42 dboyan: I'm not familiar with this extension, but generally the steps are: wire things up in core mesa and (probably) state tracker, the you can get idea about the former in the intel one
06:43 dboyan: and then write something in nouveau that generate the needed command
06:43 dboyan: you can see what the command is in mmt traces, and how to generate in nvc0 directory of nouveau
09:48 tjaalton: 1.0.14 tarball is missing
09:50 tjaalton: Lyude: ^
14:48 tarragon: does the new mesa texture on disk cache work with nouveau?
14:49 karolherbst: tarragon: afaik no, but somebody could work on that. But I am sure we have more critical issues right now :/
14:50 karolherbst: if there is somebody who wants to work on that, we won't say no
14:50 tarragon: gaming isn't critical?
14:50 tarragon: karolherbst: which language does need nouveau?
14:50 pq: like, don't die on stupid threads?
14:50 tarragon: wouldn't copy paste off radeon work?
14:50 pq:hides
14:50 karolherbst: tarragon: it isn't about gaming, but do you prefer games which run fast or games without graphical issues?
14:51 karolherbst: or don't crash on multi context situations (aka multiple applications doing OpenGL)
14:51 karolherbst: tarragon: well, there is a lot of gallium/mesa core ground work
14:51 karolherbst: and nouveau could use this
14:51 karolherbst: it's just, other things are of higher priority right now
14:52 imirkin_: pq: i have a solution to that: don't call gl from multiple threads!
14:52 pq: imirkin_, if only it was my app. Or if only there was even source code...
14:52 karolherbst: well ben fixes that right now
14:52 karolherbst: tarragon: nouveau uses C on the kernel side and libdrm and C/C++ inside mesa
14:53 pq: ...that said, I wonder if it actually would be possible to make sure no threading happens with LWJGL
14:53 karolherbst: It's java, there are threads
14:53 karolherbst: and there are thousend of threads in addition to that
14:53 karolherbst: :p
14:53 pq: but ones that call into GL
14:54 karolherbst: it's java ;)
14:54 karolherbst: there are threads, just take it as given
14:54 imirkin_: pq: amusingly enough, i suspect that mesa_glthread will "fix" that :)
14:54 karolherbst: :D
14:54 imirkin_: since it moves all GL calls onto a separate dispatch thread
14:54 pq: I mean, once I get past the game start-up, it's more solid than my ass.
14:54 imirkin_: thus eliminating or minimizing the amount of cross-thread stuff
14:55 pq: and by that I mean I play as long as my butt can take it, the game is stable
14:55 pq: imirkin_, hm, that's... an unexpected side-effect :-D
14:55 karolherbst: yeah, that may be true, because most java apps tends to use like all the CPU at startup
14:55 karolherbst: and race conditions trigger more liekly
14:56 pq: I was looking forward to maybe trying a Mesa branch just added a big GL lock or something.
14:58 pq: a big lock sounds like less overhead than mesa_glthread, and considering there is contention only on start-up
14:59 tarragon: karolherbst: I see, if I want to see shaders/textures cache on disk I need to learn C.
14:59 tarragon: darn! I wish my mother had spoken to me in C as a child :/
14:59 karolherbst: well, more like C++ because the compiler in mesa is C++
14:59 imirkin_: what problem are you trying to fix?
14:59 tarragon: instead of a useless human language
15:00 imirkin_: (or create)
15:00 karolherbst: tarragon: well, that's how development of nouveau works: random people having fun hack on nouveau and publish the patches
15:00 tarragon: imirkin_: on-disk compressed cache of textures/shaders and opencl would be nice too.
15:00 karolherbst: tarragon: well true, you are free to pay us for our work, and we do whatever you want to have :p
15:01 imirkin_: compressed cache of textures? because things are too fast and you want to slow them down?
15:02 pq: imirkin_, ISTR you made some experiments on a simple locking scheme just to get over crashes or something, where is it? I won't bother you with it, I promise. :-)
15:02 imirkin_: pq: i had to take it down. people started including it in distro builds.
15:02 pq: oof
15:02 karolherbst: odd
15:02 karolherbst: my patches never landed inside distro builds :(
15:02 imirkin_: (and then not honoring my request to not do that.)
15:04 karolherbst: tarragon: sorry for things being that way, but nobody gets paid here except for one developer and he works on more basic things, like display, new GPUs and stuff)
15:05 dboyan: karolherbst: I might want to take a look after I finish the fp64 series
15:08 karolherbst: dboyan: nice :)
15:09 imirkin_: it should be straightforward
15:09 imirkin_: just needs someone to do it
15:09 imirkin_: and i STILL haven't unpacked my desktop
15:10 pmoreau: imirkin_: Enjoying life without computers? :-)
15:11 imirkin_: work desktop, laptop at home... i still have computers.
15:15 dboyan: imirkin_: it seems we are missing a lot of rounding mode flags in cvt f2f in envydis
15:15 imirkin_: feel free to fix the situation ;)
15:16 dboyan: according to my test the frm2a flag applies whenever src and dst disagrees
15:16 imirkin_: you can also look at gf100.c which is much better done
15:16 dboyan: when types of src and dst are the same, it becomes the .PASS flag in nvdisasm
15:16 dboyan: no idea what that means
15:17 imirkin_: it means it passes through all this shit
15:17 imirkin_: the way MOV might
15:17 dboyan: yeah, that's sane
15:18 imirkin_: btw, whatever tool you wrote to test accuracy of rcp/etc -- that may be useful to put up somewhere
15:19 dboyan: well, the assembly syntax in the series i sent today might be dead then...
15:19 imirkin_: no worries.
15:20 dboyan: surely i won't
15:23 dboyan: imirkin_: silly question again. Are there imm form of cvt in gf100?
15:24 imirkin_: there better be!
15:25 imirkin_: dboyan: https://github.com/envytools/envytools/blob/master/envydis/gf100.c#L913
15:25 imirkin_: the way that's communicated to the instruction can differ
15:25 imirkin_: on gk110 the imm forms are "separate" from the reg/const forms. on gf100 they're all together.
15:26 dboyan: ah, I see, my mind was still stuck at the gk110 stereotype
15:28 john_cephalopoda: Hey
15:37 dboyan: imirkin_, I know what the "missing rounding mode flag" means. Rounding mode doesn't make any difference when converting like from f32 to f64
15:37 xerpi: hi! I was wondering if there's a list of hw supporting atomic drm
15:38 imirkin_: dboyan: i tend to agree ;)
15:38 imirkin_: xerpi: G80 and newer
15:38 imirkin_: (in kernel 4.10)
15:38 xerpi: I have a GeForce 8400M GS (NVIDIA G86) and I guess it's too old for atomic
15:38 imirkin_: nope, should work
15:38 xerpi: :O
15:39 xerpi: awesome then
15:40 xerpi: imirkin_, I'm using arch with 4.10.1, should I compile the upstream kernel myself to get the most recent nouveau changes?
15:41 imirkin_: xerpi: with 4.10.1 you should have atomic.
15:42 xerpi: yeah, I'll follow the next kernel releases closely to see if there are any nouveau bugfixes/improvements
15:42 imirkin_: are you having issues with atomic?
15:42 imirkin_: i'm not aware of any fixes...
15:43 xerpi: haven't tried it yet, but I like to be on the bleeding edge hehe
15:43 xerpi: anyways, any boot flags I need to set to enable atomic? (like i915.nuclear_pageflip=1 on intel)
15:43 imirkin_: nope. just boot. it'll be there.
15:44 imirkin_: skeggsb didn't chicken out unlike danvet :)
15:44 xerpi: fantastic!
15:44 mlankhorst: imirkin_: <<
15:44 imirkin_: (ignorance is bliss!)
15:45 john_cephalopoda: (ignorance is strength)
15:45 mlankhorst: and i915 supports atomic by default on gen5+ - -vlv -braswell
15:45 xerpi: kinda unrelated, but sometimes (when opening too many chrome tabs) I get hard poweroffs, can't even get the kernel log from those crashes
15:46 imirkin_: xerpi: well, among other things, those G86M's are right in the sweet spot for chips with bad solder joints
15:46 imirkin_: but also nouveau sucks at error recovery etc
15:46 imirkin_: so could be a number of things
15:47 xerpi: it happens when for example I switch to another tab, so it could be related to bad handling of a page fault?
15:47 imirkin_: sure, could be related to any number of things.
15:48 imirkin_: there are also some errors that have yet to be diagnosed that affect all tesla's
15:48 xerpi: damn if I could just get the kernel log
15:48 imirkin_: basically you get a CACHE_ERROR and then things go south
15:48 imirkin_: something with RAMHT lookup ends up dying
15:48 imirkin_: which in turn makes the chip sad.
15:48 imirkin_: no clue why it happens.
15:48 xerpi: I get "random" fifo: CACHE_ERROR
15:48 imirkin_:blames ctxsw
15:48 imirkin_: yeah, those. sometimes the chip recovers. sometimes not.
15:49 xerpi: interesting, I guess graphics hw is hard to get right
15:50 john_cephalopoda: Any kind of hw is hard to get right.
15:50 john_cephalopoda: I am working with 3d cams on a firewire card. It's extremely unstable and tends to crash stuff. And sometimes even the computer.
15:51 xerpi: :/
15:52 john_cephalopoda: It's not as complex as a graphics card, but it isn't produced in large batches. So only one lib for controlling those cams exist.
15:53 xerpi: I see
15:55 john_cephalopoda: Hmm, got some weird glitches with my nvidia card. I'll run an apitrace.
16:05 imirkin_: dboyan: uhhhh... are you sure about that neg30? i thought it was like neg3b or something
16:07 imirkin_: dboyan: there might actually be 2 negs :)
16:07 imirkin_: the neg which is the high bit of the FIMM and the neg which acts on top of that. i dunno. double-check though.
16:08 imirkin_: dboyan: also please flip the order of c60 and c5c
16:08 imirkin_: and lastly, the mask should probably be f7c not ffc
16:13 dboyan: imirkin_, it is neg30
16:14 dboyan: 00001405 c5410020 -> @P0 F2F.F16.F16 R1, 2.NEG;
16:14 imirkin_: then what's 3b?
16:14 imirkin_: ah
16:14 imirkin_: so it's the "second" neg
16:14 dboyan: nothing
16:14 dboyan: it seems
16:14 imirkin_: check again with F32 source
16:14 imirkin_: it's the 20th bit
16:17 dboyan: interesting
16:17 imirkin_: =]
16:17 dboyan: 2c09 c54101fc -> @P0 F2F.F64.F32 R2, 1.NEG;
16:17 imirkin_: yeah
16:17 imirkin_: vs -1 if you set 0x3b bit
16:17 dboyan: 2c09 cd4001fc -> @P0 F2F.F64.F32 R2, -1;
16:17 imirkin_: and you can even do -1.NEG :)
16:17 imirkin_: so 30 really is the neg bit
16:18 imirkin_: while 3b is the high bit of the 20-bit immediate
16:18 dboyan: how should it be fixed though?
16:19 imirkin_: goooood question
16:20 imirkin_: not sure.
16:20 imirkin_: just add both in ;)
16:20 dboyan: well, neg3b + neg30?
16:21 imirkin_: yes.
16:21 imirkin_: double-check what it does for an integer source
16:21 dboyan: so we can say something like cvt f64 $r0d f32 neg neg 0x3f800000?
16:21 imirkin_: yes.
16:21 dboyan: that's funny
16:21 imirkin_: yes.
16:22 imirkin_: but so is -1.NEG
16:22 dboyan: so at least i won't add this flat for f16 input
16:23 dboyan: *flag
16:23 imirkin_: well, again, double-check what it does
16:23 dboyan: to fit it somehow it into a shader?
16:23 imirkin_: no
16:23 imirkin_: just see what it decodes as
16:24 imirkin_: (with nvdisasm)
16:24 dboyan: well, I'd check it tomorrow, I'm too sleepy now for that.
16:25 dboyan:has been up for ~19 hours
16:25 imirkin_: :)
16:26 dboyan: btw, do I need to create a new branch or just add fixes to the current one?
16:26 dboyan: I'm not so sure about github conventions
16:26 karolherbst: dboyan:
16:26 karolherbst: ....
16:26 karolherbst: sorry for the noise, I mistyped
16:27 imirkin_: dboyan: wtvr, i don't care. about branches, or, tbh, github conventions.
16:27 imirkin_: i try to use github as little as humanly possible.
16:27 dboyan: okay, I see
16:27 imirkin_: they try to suck you into their universe with their various bs
16:28 imirkin_: i don't want to be constrained to their universe
16:33 john_cephalopoda: It's strange, that after all those years of programming, nobody has yet found a way to work on code collaboratively, with a tool that everybody understands.
16:33 imirkin_: huh?
16:33 imirkin_: people do it all the time
16:33 john_cephalopoda: I have to look up git commands so often.
16:33 imirkin_: just ... not with github :)
16:34 john_cephalopoda: github actually helps by letting me merge things automatically. I wouldn't know how to do that by hand :þ
16:34 imirkin_: i recommend gaining a solid understanding of git's underlying model. then the commands will make sense and you'll remember them.
16:34 john_cephalopoda: I should probably. :D
16:34 john_cephalopoda: There is an xkcd about that.
16:34 imirkin_: until then you're basically just banging on the keyboard until something works
16:35 imirkin_: which is, understandably, frustrating.
16:35 john_cephalopoda: https://xkcd.com/1597/
16:35 imirkin_: you'll find that it's 30 minutes of your time well invested.
16:42 Echelon9: dboyan: You can make local edits, then do 'git add .' 'git commit --amend' and then 'git push <upstream_repo> <branch_name> --force'
16:42 Echelon9: No need for a new branch or a new GitHub PR
17:19 john_cephalopoda: https://lut.im/7NAOzv8uT0/Co9AQZud9DWEwvSH.png
17:29 john_cephalopoda: I am horrible at debugging. Never used apitrace before.
17:50 imirkin_: well, sounds like you found the draw call that goes wrong
18:55 john_cephalopoda: imirkin_: Now I have to trace back, why it went wrong.
18:55 imirkin_: what are you using as a reference?
18:55 imirkin_: you might be interested in apitraces's scripts/contrib/tracediff.py
19:04 john_cephalopoda: I don't really know much about openGL and apitrace. I am trying to find out, where that weird bug came from.
19:04 john_cephalopoda: So I search through the textures and try to find the part where it was created.
19:12 imirkin_: john_cephalopoda: if you use a tracediff, and have a reference impl that works (e.g. llvmpipe, or intel) on the same machine at the same time, then it will tell you where the first difference is
19:12 john_cephalopoda: Unfortunately, I haven't got a working implementation.
19:12 imirkin_: then what were the two screenshots?
19:12 imirkin_: oh, just previous draw and next draw?
19:13 john_cephalopoda: Yes
19:13 imirkin_: i see.
19:19 john_cephalopoda: I'll check out mesa 17.
19:20 imirkin_: oh, yeah, you definitely want to be using latest. stuff gets fixed.
19:21 imirkin_: not much point in re-debugging previously fixed issues
19:21 imirkin_: (latest = git head btw)
19:22 john_cephalopoda: What should I use from git? mesa3d only or also the xorg module and similar?
19:22 imirkin_: only mesa
19:23 imirkin_: well - any item you plan on hacking on
19:23 imirkin_: sounds like mesa for now
19:23 imirkin_: i prefer to keep my system install at some released version
19:24 imirkin_: and then i have a separate install elsewhere with my currently-hacking version
19:24 imirkin_: also gives me a nice baseline to compare against
19:24 john_cephalopoda: I'll try mesa 17 and see what happens. When it doesn't work, I'll try something else.
19:25 john_cephalopoda: Got to wait a bit. I'm on a source-based distro, got to compile.
19:25 imirkin_:uses gentoo
19:25 john_cephalopoda:uses crux
19:25 imirkin_: sounds like slackware of yore
19:26 imirkin_: bsd-style init means rc.S, rc.M, etc?
19:26 karolherbst: imirkin_: guess how I made hitman run on nouveau without issues now: :D
19:26 imirkin_: karolherbst: rebooted?
19:26 karolherbst: no
19:26 karolherbst: MESA_VENDOR_OVERRIDE="ATI Technologies Inc."
19:26 karolherbst: :)
19:26 imirkin_: ah, of course.
19:26 karolherbst: I could try to upstream that patch
19:26 imirkin_: shoulda done that up front :)
19:26 karolherbst: yeah
19:27 karolherbst: so engine bug I guess
19:27 john_cephalopoda: imirkin_: Yep, that kind of init.
19:27 karolherbst: I will write them I suppose
19:27 imirkin_: john_cephalopoda: i love that... i was sad when they switched to sysvinit
19:27 imirkin_: [they = slackware]
19:28 imirkin_: karolherbst: that'd be nice :)
19:31 orbea: imirkin_: slackware still has a bsd-styled init, at least rc.S, rc.M and etc are still used.
19:33 imirkin_: orbea: really? i could have sworn it moved to sysvinit...
19:33 imirkin_: i stopped using it when my HD died in ... 2004?
19:33 orbea: it did, but it still is somewhat bsd-styled :P
19:33 orbea: *its
19:34 imirkin_: (and i installed gentoo at that point on my home box)
19:35 orbea: I cant say I spent enough time with what came before to appreciate the differences, but its not syvinit like what debian used to do
19:36 john_cephalopoda: Reminds me, I should update my laptop to crux 3.3.
19:37 imirkin_: orbea: well, you used to just stick stuff directly into rc.*. then it became just scripts to run the various /etc/rc.d scripts
19:38 orbea: yea, it now uses rc.d http://termbin.com/gpgl
19:39 imirkin_: but do those just run stuff directly, or do they call /etc/init.d junk?
19:39 orbea: i think its mostly calling stuff directly, /etc/init.d is very sparse on my system
19:41 orbea: seems to only have a fedora 'functions' script which can be used if someone is so inclined, but is not really used by the salckware init scripts in /etc/rc.d http://termbin.com/mvad
19:45 imirkin_: ah ok
19:45 imirkin_: well, it was over a decade ago, i could be misremembering the details ;)
19:45 orbea: heh, fair enough
19:46 karolherbst: imirkin_: GL_ARB_bindless_texture is the fany new bindless thing?
19:47 imirkin_: not that new, but yeah
19:49 airlied: it even has a game using it now
19:50 imirkin_: which one?
19:50 karolherbst: airlied: can you confirm that hitmanpro uses it?
19:51 airlied: hakzsam: will know :)
19:52 hakzsam: it doesn't
19:52 hakzsam: by default
19:52 karolherbst: hakzsam: does it on nvidia?
19:52 hakzsam: no
19:52 karolherbst: mhhh
19:52 karolherbst: okay, then that's not the problem
19:52 hakzsam: it should be disabled by default
19:52 imirkin_: karolherbst: we'd also be getting errors galore if they were trying to call those endpoints
19:53 karolherbst: I see
19:53 karolherbst: doesn't matter anyway, the nvidia game path doesn't work
19:53 karolherbst: the AMD does
19:53 karolherbst: was able to run a benchmark without any issues
19:53 imirkin_: cool beans
19:53 karolherbst: except perf
19:53 karolherbst: but this would be sovled by having a better compiler
19:53 karolherbst: memory is like bored
19:53 imirkin_: really?
19:53 karolherbst: yeah
19:54 karolherbst: memory load < 10%
19:54 imirkin_: hmmm
19:54 karolherbst: core at 100%
19:54 imirkin_: ok yea
19:54 imirkin_: probably missing some dumb opt on a pattern that they use a lot
19:54 karolherbst: yeah
19:54 karolherbst: I could try with my mad opt
19:56 imirkin_: heh
19:56 karolherbst: loading times are huge
19:56 imirkin_: i doubt it'll be something simple like that.
19:57 karolherbst: well true, but thats a so general thing used, that engine bound games should get a small perf boost
19:57 karolherbst: only around 0.5% but still
19:57 karolherbst: :D
19:58 karolherbst: avg FPS is at 0.5 with 1280x720 :/
19:58 karolherbst: ...
19:58 karolherbst: 9.5
19:59 karolherbst: ohhh, game is crashing again
19:59 imirkin_: 9.5 is better than 0.5 :)
19:59 Pie_Mage_: getting better all the time!
19:59 imirkin_: unfortunately it means that i should be expecting 0.5 on my hw =/
20:00 karolherbst: yes
20:01 karolherbst: hakzsam: do you know if hitmanpro does MT stuff?
20:01 hakzsam: it uses many contexts
20:01 karolherbst: splendid
20:03 karolherbst: I think the game does something really terrible... because it kind of have really fast and really slow frames
20:03 imirkin_: or nouveau sync's too much
20:03 karolherbst: or this
20:03 imirkin_: or just at the wrong time :)
20:03 karolherbst: how can I disable their silly crash reporter?
20:04 hakzsam: even in super low?
20:04 karolherbst: no clue, I use the phoronix benchmark params
20:05 hakzsam: presumably there is low setting
20:06 karolherbst: yeah well... perf is no concern now... the benchmark crashes ;)
20:06 karolherbst: I was just lucky once
20:06 hakzsam: not surprising :)
20:07 karolherbst: the game produces this silly binary crash dump.... even preloading libSegfault doesn't help... annoying
20:07 hakzsam: you should ask skeggsb about his multithread work, maybe it's ready now
20:07 imirkin_: it's not. and he's out for a while on vacation.
20:07 hakzsam: ok
20:08 karolherbst: will try to check if there are any reasonable settings I can disable
20:09 karolherbst: jo, SMAA enabled
20:09 karolherbst: ....
20:09 karolherbst: and SSAO as well
20:17 karolherbst: imirkin_: "WARNING: out of code space, evicting all shaders."
20:17 imirkin_: that's not a real problem
20:17 imirkin_: unless it happens a lot
20:17 karolherbst: twice
20:17 karolherbst: but the game dead locked
20:17 imirkin_: that's fine
20:17 imirkin_: yeah, while it reuploads a few shaders
20:17 imirkin_: or you mean in general?
20:18 karolherbst: no, it just dead locked right now
20:18 imirkin_: with multiple concurrent contexts, that could end badly ;)
20:18 karolherbst: okay
20:19 karolherbst: what can we do about those "gr: GPC0/TPC0/MP trap: global 00000004 [MULTIPLE_WARP_ERRORS] warp 1000e [OOR_ADDR]" ?
20:19 imirkin_: that means you accessed a constbuf that wasn't there
20:19 imirkin_: which could either mean that we're configuring thigns wrong
20:19 imirkin_: or the game is accessing stuff out of bounds
20:20 karolherbst: the deadlocked threads are inside "exit_to_usermode_loop+0x52/0x80"
20:20 karolherbst: :(
20:20 karolherbst: I see
20:21 imirkin_: erm, in-kernel dead locked threads? that's unfortunate.
20:21 karolherbst: yeah
20:21 imirkin_: are you using a kernel with skeggsb's improved recovery stuff?
20:21 karolherbst: mhh I use master rebased for 4.10
20:22 karolherbst: I use this tree: https://github.com/karolherbst/nouveau/tree/clk_update_v2
20:22 karolherbst: seems like it is in there
20:23 imirkin_: ok cool
20:23 karolherbst: sadly there is no dead thread on the kernel side
20:23 karolherbst: so maybe it's just something terribly wrong
20:23 karolherbst: mhh, I can start another OpenGL applications on nouveau
20:33 karolherbst: noooo, the game ate my cursor :(
20:34 karolherbst: imirkin_: looks like a MT issue to me: https://gist.github.com/karolherbst/c0b60a04528676c2fa2628d097f19c2d
20:35 karolherbst: mhh or maybe not?
20:35 karolherbst: most likely is
20:36 karolherbst: still odd that there is just one thread doing nouveau stuff
20:38 imirkin_: karolherbst: yep, two writes happening at the same time.
20:38 karolherbst: well, where is the second one?
20:39 imirkin_: oh wait.
20:39 imirkin_: that IS thread 24
20:39 karolherbst: ;)
20:39 karolherbst: yes
20:39 imirkin_: well, maybe the other thread is done
20:39 karolherbst: maybe
20:39 imirkin_: or maybe i messed something up
20:39 imirkin_: both popular options
20:39 karolherbst: I will try again
20:39 imirkin_: size=901120
20:39 imirkin_: that's a lot of size
20:39 imirkin_: PUSH_DATAp (push=0x5b3b9a0, size=2046
20:39 imirkin_: ok, so that's the max iirc
20:41 karolherbst: it seems to happen at random scenes though
20:45 karolherbst: imirkin_: same place same backtraces
20:45 karolherbst: mhhhh
20:45 karolherbst: interesting
20:45 karolherbst: there is a significant difference in Thread 24
20:46 karolherbst: #2 nve4_p2mf_push_linear (nv=<optimized out>, dst=<optimized out>, offset=1048576, domain=<optimized out>, size=901120, data=<optimized out>)
20:46 karolherbst: #2 nve4_p2mf_push_linear (nv=<optimized out>, dst=<optimized out>, offset=0, domain=<optimized out>, size=901120, data=<optimized out>)
20:46 karolherbst: why did the offset change...
20:46 imirkin_: different buffer
20:46 imirkin_: different place in the buffer
20:46 karolherbst: mhh, true, makes sense
20:46 imirkin_: offset is relative to the bo
20:47 imirkin_: ohhhhh hrm
20:47 imirkin_: hold on
20:47 imirkin_: data=0xffffffff
20:47 imirkin_: ok, that's CLEARLY not good ;)
20:47 karolherbst: hihihi
20:47 imirkin_: i don't know much, but i know that much.
20:48 karolherbst: sadly mesa isn't a debug build right now, but should I check something out?
20:48 karolherbst: I can dig through the gdb session
20:49 imirkin_: go back up to nouveau_buffer_transfer_flush_region
20:49 imirkin_: and print the transfer
20:49 karolherbst: {resource = 0x7ffdab18b730, level = 0, usage = 6146, box = {x = 0, y = 0, z = 0, width = 901120, height = 1, depth = 1}, stride = 0, layer_stride = 0}
20:49 imirkin_: right... not interesting.
20:49 imirkin_: sec
20:50 imirkin_: where's tx->map?
20:50 imirkin_: p *tx
20:51 tarragon: karolherbst: alright, so where do I start to contribute? Since the code is layed out already I imagine is a little porting excercise
20:51 karolherbst: imirkin_: optimized out :(
20:52 imirkin_: p *(struct nouveau_transfer *)transfer
20:52 karolherbst: {base = {resource = 0x7ffdab18b730, level = 0, usage = 6146, box = {x = 0, y = 0, z = 0, width = 901120, height = 1, depth = 1}, stride = 0, layer_stride = 0}, map = 0xffffffff <error: Cannot access memory at address 0xffffffff>, bo = 0x0, mm = 0xbb3000000000, offset = 0}
20:52 imirkin_: okkkk
20:52 imirkin_: so ... map = 0xffffffff. that is ... odd.
20:52 karolherbst: tarragon: first find issues you wanna see fixed or which annoy you :)
20:52 karolherbst: tarragon: then ask question about where to look and try to figure out how that stuff works
20:53 karolherbst: imirkin_: yeah.... gdb already thinks it's odd ;)
20:53 karolherbst: also
20:53 imirkin_: the mm is set... but tx->bo is null?!
20:54 tarragon: karolherbst: as I said the on-disk shaders/texture caching.
20:55 imirkin_: karolherbst: p *(struct nouveau_mm_allocation *)0xbb3000000000
20:55 karolherbst: imirkin_: invalid memory
20:55 imirkin_: ok, so, yeah. that's not great.
20:56 imirkin_: someone did something funky
20:56 imirkin_: i'm guessing it's a multi-threading issue... it's trying to flush a transfer that's already deleted? dunno.
20:56 karolherbst: mhh
20:56 karolherbst: should I create an apitrace?
20:56 karolherbst: allthough that won't help I figure
20:56 imirkin_: yeah, that may be helpful in identifying the issue.
20:56 karolherbst: okay
20:56 imirkin_: i.e. it may not be able to repro
20:56 imirkin_: or, conversely, it may
20:56 imirkin_: if it can, then it's great
20:57 imirkin_: since there are a lot fewer things that go wrong when replaying traces
21:04 karolherbst: uhhhhhh, what was that :O
21:04 karolherbst: [15113.526881] asynchronous wait on fence nouveau:HitmanPro[4312]:8004e14a timed out
21:05 imirkin_: that was the recovery kicking in :)
21:05 karolherbst: nice
21:06 karolherbst: sad thin is just, that intel hard blocks now :O
21:06 karolherbst: airlied: ! I need your patch again
21:06 imirkin_: or maybe that was something else. dunno.
21:06 imirkin_: maybe intel was waiting on something from nouveau
21:06 karolherbst: could be, but intel syncs with nouveau now
21:06 karolherbst: since 4.9 or so
21:06 karolherbst: super annoying
21:06 karolherbst: if nouveau renders at 9 fps, so does intel
21:08 karolherbst: imirkin_: https://gist.github.com/karolherbst/17289325dcaed8c8b9a3259f6c5b91d5
21:08 karolherbst: kick timeouts
21:08 karolherbst: the usual stuff
21:08 karolherbst: BIND_ERROR 03 [UNBIND_WHILE_RUNNING]
21:18 karolherbst: hum.... okay
21:18 karolherbst: imirkin_: the trace stays black :(
21:18 imirkin_: disable ARB_buffer_storage
21:18 karolherbst: ohh... right
21:18 karolherbst: I am sure the game would crash then
21:18 imirkin_: one way to find out =]
21:19 karolherbst: yep, crash
21:20 karolherbst: I don't feel like to report random bs like that to feral though ... :/
21:20 imirkin_: yea
21:20 imirkin_: wait, how did the other apitraces work?
21:20 karolherbst: I didn't faked the GL_VENDOR ;)
21:20 karolherbst: *fake
21:20 imirkin_: gah
21:21 imirkin_: GL_VENDOR should not be used to enable/disable features :(
21:21 karolherbst: I bet the rendering path without buffer_storage is simply broken
21:21 karolherbst: yeah
21:21 karolherbst: I guess they do it, so that no new extensions gets added which may break the game for some time
21:22 karolherbst: and they don't want to be annoyed by bug reports?
21:22 karolherbst: or they just didn't care enough
21:22 karolherbst: meh... no crash with apitrace
21:22 imirkin_: sounds like their detection code makes wild assumptions about drivers
21:22 karolherbst: really
21:23 karolherbst: mhh
21:23 karolherbst: in the trace is stuff like this: glDrawElementsInstanceBaseVertexBaseInstance
21:23 imirkin_: my favorite method!
21:24 karolherbst: glBindVertexBuffers 3x glBindBufferRange
21:24 imirkin_: (coz it's the longest)
21:24 karolherbst: random noise
21:24 karolherbst: glEnable GL_BLEND 1-3
21:24 imirkin_: along with glGetFramebufferAttachmentParameteriv
21:24 karolherbst: :D
21:25 karolherbst: trace looks quite boring though
21:26 karolherbst: glBlendEquationSeparatei and FuncSeperatei? what's that
21:30 imirkin_: lets you set separate blend equations/funcs per RT
21:30 karolherbst: imirkin_: what is odd though, that within gdb the game crashes at the exactly same frame
21:31 imirkin_: yeah... i mean ... i don't really know what to do with that information
21:31 imirkin_: it's probably not some mt
21:31 imirkin_: failure
21:31 imirkin_: but it's probably something dumb and tricky to work out
21:31 karolherbst: mhh
21:31 imirkin_: the transfer's getting released too early
21:31 imirkin_: but why
21:32 karolherbst: I will try to make an apitrace running inside gdb.... :/
21:32 glennk: what does helgrind say?
21:32 imirkin_: it says "hell, i'm slow!"
21:32 karolherbst: uhhh... no
21:32 imirkin_:hasn't used it much
21:33 karolherbst: it takes like 3 minutes to even get to the frame
21:33 karolherbst: I am sure it takes an hour with anything valgrind related :O
21:36 karolherbst: \o/ I got the same crash now... let's hope the trace crashes as well within gdb
21:40 karolherbst: damn...
21:59 karolherbst: imirkin_: okay, so meh, I don't know if that's really debugable through an apitrace, because I can't get it to crash there as well
21:59 karolherbst: any other ideas?
22:00 imirkin_: start adding print's with backtraces when allocating and freeing transfer objects
22:02 imirkin_: and print tx->mm on allocation
22:02 imirkin_: and at free time
22:02 imirkin_: and make damn sure they're the same
22:03 karolherbst: okay
22:03 karolherbst: through what function are those allocated, fred?
22:03 karolherbst: release_allocation for free I guess?
22:04 imirkin_: transfer_map and transfer_unmap iirc
22:04 karolherbst: ohh okay
22:04 karolherbst: nouveau_buffer_transfer_map?
22:04 imirkin_: only for buffer transfers, not texture transfers
22:04 imirkin_: yes.
22:04 karolherbst: ahh "MALLOC_STRUCT(nouveau_transfer)"
22:05 imirkin_: that's the one!
22:05 karolherbst: is there a handy function in mesa to print the stacktrace?
22:05 karolherbst: uhhh backtrace? whatever term applies
22:05 imirkin_: no, but it's in like libc
22:06 karolherbst: yeah I know
22:06 karolherbst: with that void** mess
22:06 imirkin_: int backtrace(void **buffer, int size);
22:06 karolherbst: I already wrote a printer function for that, wasn't fun
22:06 imirkin_: just use the example.
22:06 imirkin_: void *buffer[BT_BUF_SIZE];
22:06 imirkin_: char **strings;
22:06 imirkin_: nptrs = backtrace(buffer, BT_BUF_SIZE);
22:06 imirkin_: strings = backtrace_symbols(buffer, nptrs);
22:06 karolherbst: mhh, okay, seems asy enough
22:07 imirkin_: for (j = 0; j < nptrs; j++)
22:07 imirkin_: printf("%s\n", strings[j]);
22:07 imirkin_: oh. there's a u_debug for it
22:07 imirkin_: kind of a pain to use, wtvr
22:08 imirkin_: u_debug_backtrace if you want it
22:08 karolherbst: somewhere I have a really good one
22:08 karolherbst: which could do like everything
22:08 imirkin_: the above should be enough.
22:13 karolherbst: yeah sad, the traces are pretty useless
22:16 karolherbst: imirkin_: okay, it wasn't fred after an allocatiopn, but it was fred before one
22:16 karolherbst: huh
22:17 glennk: traces are run in single thread aren't they?
22:17 imirkin_: they do.
22:17 karolherbst: it is no trace
22:17 karolherbst: but yeah
22:17 imirkin_: well, same address may be reused.
22:17 karolherbst: imirkin_: https://gist.github.com/karolherbst/272183e61ba67c92613085bd02cccef1
22:17 imirkin_: i.e. if you free() then malloc the same amount, you might get the same ptr back
22:17 karolherbst: figures
22:18 imirkin_: print tx->mm in nouveau_buffer_transfer_map as well
22:19 imirkin_: (after it gets set with nouveau_transfer_staging or whatever)
22:20 imirkin_: interesting, it allocs 3 tx's without freeing them.
22:22 imirkin_: (nothing wrong with that. just interesting.)
22:22 karolherbst: where did I put my backtrace parser :( it could even parse out the symbols out of non debug libraries :/
22:32 karolherbst: imirkin_: you won't believe this, but tx->mm is never allocated
22:32 imirkin_: what's it set to?
22:32 imirkin_: do we forget to clear it?
22:32 karolherbst: after allocating the tx you mean?
22:33 imirkin_: are you printing it in the proper place?
22:33 imirkin_: pastebin patch
22:33 karolherbst: I print inside nouveau_transfer_staging
22:33 imirkin_: where
22:33 karolherbst: after tx->mm = nouveau_mm_allocate(nv->screen->mm_GART, size, &tx->bo, &tx->offset);
22:34 imirkin_: and tx->mm is null?
22:34 imirkin_: or never set in the first place?
22:34 karolherbst: it is never called
22:34 imirkin_: ok
22:34 imirkin_: can you do it where i asked
22:34 imirkin_: in nouveau_buffer_transfer_map?
22:35 imirkin_: hold on
22:35 imirkin_: ok. so we never clear tx->map
22:35 imirkin_: at the top, can you do
22:36 imirkin_: tx->map = NULL;
22:36 karolherbst: in nouveau_buffer_transfer_map?
22:36 imirkin_: (after the malloc obviously)
22:36 imirkin_: yes
22:36 karolherbst: same for mm or just map?
22:36 imirkin_: oh wait, we do
22:36 imirkin_: in nouveau_buffer_transfer_init
22:37 karolherbst: true
22:37 imirkin_: so... hrm
22:37 imirkin_: we never clear tx->mm
22:37 imirkin_: but if tx->bo is null
22:37 imirkin_: then it shouldn't ever play
22:37 imirkin_: and if tx->bo is set, we will always have set tx->mm
22:39 imirkin_: right... so presumably buf->data would have been set
22:39 imirkin_: er
22:39 imirkin_: no.
22:42 imirkin_: something weird's happening
22:42 imirkin_: i want to know where that tx->map value is coming from.
22:47 karolherbst: should I just check all tx->map assignments?
22:48 imirkin_: i guess
22:48 imirkin_: i'm guessing something is overwriting that thing
22:48 karolherbst: uhh, I think I found a potential place
22:48 karolherbst: let me check
22:55 karolherbst: okay
22:55 karolherbst: it is overwritten
22:55 karolherbst: most likely
22:57 karolherbst: or never set
22:57 karolherbst: imirkin_: after an allocation of the transfer, there should be an assignment to tx->map, shouldn't one?
23:00 karolherbst: those are my current changes to mesa: https://gist.github.com/karolherbst/82464d5f8b329d41cb39d8759f1796e2