00:17karolherbst: imirkin: do you have some hardware where we aren't able to correctly detect the hdmi max clock we can use?
00:17karolherbst: or uhm, other issues like that?
00:19karolherbst: I mean.... we might be able to check for that, no?
00:19karolherbst: skeggsb_: ^?
00:20karolherbst: there is also a NV907D_CORE_NOTIFIER_3_CAPABILITIES_CAP_SOR0_20_DUAL_TMDS
01:38imirkin: karolherbst: not personally
01:38imirkin: there was someone with a fermi that supposedly could do 297MHz but everything under the sun reported 225MHz
01:38imirkin: karolherbst: those are features of a SOR though
01:38imirkin: a particular SOR could do dual-link DVI
01:39imirkin: at 165MHz and single-link HDMI at 225MHz
01:41skeggsb_: imirkin: the 297 case, were they sure the binary driver wasn't being sneaking and making a new reduced-blanking mode that fits under 225?
01:42skeggsb_: i've witnessed it doing such things
01:42imirkin: skeggsb_: sure enough. it already was a reduced-blanking mode iirc.
01:42skeggsb_: was it gf119?
01:42imirkin: GF106 (or GF116)
01:43imirkin: skeggsb_: https://bugs.freedesktop.org/show_bug.cgi?id=91236
01:43imirkin: but that's not the original guy
01:44imirkin: the original guy sent patches
01:44imirkin: 2560x1440@56 -- i suspect he thought of reduced blanking ;)
01:45imirkin: the cvt -r modeline for 2560x1440 is 241MHz, so pretty far
02:27nyef: In attempting to corral details about tesla context-switching, I came to the conclusion that tesla hasn't been sufficiently reverse-engineered beyond a certain point.
02:28nyef: I'm not sure what that point is, other than "probably about halfway".
02:28mooch2: i doubt it's even CLOSE to halfway, honestly
02:29nyef: I meant halfway through the evolution of tesla.
02:29mooch2: ah lol
02:29nyef: So early-model teslas probably work more-or-less okay, while late-model teslas are trivially easy to lock up.
02:30mooch2: eh, still. i think tesla RE has only just BEGUN in terms of the amount of stuff to uncover about the hardware
02:30mooch2: and i don't just mean "getting the cards working in drivers"
02:30mooch2: i mean shit like being able to accurately EMULATE the dang things, while having them running windows
02:30mooch2: *windows drivers
02:33nyef: Oh, wouldn't THAT be nice. (-:
02:35nyef: Anyway, the implication is that trying to get my NVAF working reliably is going to involve a LOT more work than I had anticipated.
02:38mooch2: yeah, unfortunately :c
02:38nyef: ... and probably some investment in more tesla hardware to work with.
02:39mooch2: though, tbh, even with the nv3, it hasn't been reversed enough to run the windows drivers in emulation :c
02:39nyef: That sucks.
02:39mooch2: the current vid_nv_riva128.c in 86box is the result of my emulation trials, and mwk's expertise on the hardware to attempt to emulate this card
02:39mooch2: it doesn't even run the LINUX drivers yet
02:40mooch2: though it DOES run vesa
02:40mooch2: so linux with a vesa driver works
02:40mooch2: which is likely to be the only nv3-compatible driver modern distros ships lol
02:41mooch2: then again, 86box doesn't support PAE, even on P6 cpus, so distros like ubuntu won't work by default
03:09nyef: 86box looks interesting, modulo the "uses C++" and "is x86oid only" bits.
03:17imirkin: nyef: check with mwk
03:18imirkin: he knows many things.
03:18imirkin: and occasionally shares such knowledge
03:34mooch2: nyef, well, 86box is MOSTLY in c90
03:34mooch2: including almost all of the core emulation
03:34mooch2: also, the interpreter works on other platforms, but dynarec is mandatory for pentium and up anyway so :/
03:35mooch2: imirkin, wait, i didn't think mwk worked on tesla that much
03:39nyef: Oh, I'm not complaining about host CPU type limitations, I'm complaining that it doesn't support non-x86oid guests. d-:
03:47mooch2: oh why?
03:47mooch2: it's specifically DESIGNED to only emulate x86 guests
03:47mooch2: it's based off of pcem, which only emulates ibm pcs and their descendants
03:48mooch2: ...which all use x86
03:48nyef: You got as far as PCI, which is used in other architectures.
03:48mooch2: so of course it's not going to support something like mips
03:48mooch2: nyef, so did pcem lol
03:48mooch2: also, the code REALLY isn't NEARLY flexible enough to support other architectures anyway
03:48mooch2: for instance, our dynarecs currently emit PURE MACHINE CODE
03:49mooch2: no irs, no optimization, no emitters even
03:49mooch2: just straight up addbyte(x) or something
03:49mooch2: i shit you not
03:56nyef: It really just feels like a waste to not support other guest architectures if you're going to the trouble of supporting a goodly amount of infrastructure that would be required for the other guest architectures anyway.
03:57nyef: ATA and SCSI bus and drive emulation, SCSI controller chips, PCI bus, serial ports, networking, and so on.
03:59nyef: Plus you already have the basics for the UI bits, the screen, keyboard and mouse interfaces, even if the simulated hardware would need changing out.
03:59mooch2: eh, we support the mca ps/2s
03:59mooch2: ehhhh, our ui, threading, and input don't work on anything other than windows
03:59mooch2: sure, it can be run through wine, but still
04:00nyef: For a couple of architectures, it's almost down to the point of throwing a new CPU and host bus adaptor into the mix.
04:00mooch2: ehhh, that's not 86box's goal tho
04:00mooch2: our goal is to ONLY emulate x86
04:00mooch2: also, our cpu code is STILL not flexible enough
04:00mooch2: too many globals that should really be in structs
04:01mooch2: also, cpus AREN'T device_ts yet, so
04:01mooch2: yeah, if you want different guest archs, you're gonna want the fork varcem, where it's at least part of the project goals
04:01mooch2: even though it's not implemented yet
04:03mooch2: oh, and nvidia emulation is not in varcem
04:03mooch2: mostly due to the code being deemed "too dirty"
04:03mooch2: even though, according to coverity, varcem currently has MORE buffer overflows than 86box, by a WIDE margin
04:03nyef: More and more reasons to write my OWN emulator. Again. /-:
04:04mooch2: how so?
04:04mooch2: also, mame has this in its project scope
04:04mooch2: so really, the best help you could be would be to rewrite their new pci code to be a slot device, so that we could work together to write nv3 emulation
04:04nyef: So many things to explore.
04:04mooch2: mame has EVERYTHING in its project scope lol
04:05mooch2: though it currently doesn't support processors with more than 32 address bits fully
04:05mooch2: but oh well
04:05mooch2: that's only really needed for pentium pro and up anyway
04:05mooch2: and even 86box doesn't accurately emulate ALL of the features of the pentium mmx
04:05mooch2: hell, it doesn't even emulate the 386 debug registers AT ALL
04:05mooch2: due to performance concerns
04:05mooch2: though i do hope to rectify this
04:05nyef: MIPS R12000 requires more than 32 address bits.
04:06mooch2: i don't think that was ever released lol
04:06mooch2: oh wait, it was
04:06mooch2: well, mame doesn't even emulate an r10000 right now anyway
04:06mooch2: of course, you're welcome to contribute it, provided it's been tested
04:06nyef: Mmmhmm. I also have R10000, R14000, and maybe R16000 systems.
04:07mooch2: oh nice
04:08mooch2: then yeah, if you dumped them, and added drivers (mame's term for emulator cores) for them, then i'm sure they'd be accepted!
04:08mooch2: hell, i even wrote an iphone 2g driver lol
04:08mooch2: and mame's debugger actually helped me get rid of some hacks in my own iphone 2g emulator
04:10mooch2: oh, and mame also needs an x86 dynarec
04:10mooch2: currently, its interpreter is even SLOWER than 86box's
04:12mooch2: nyef, da heck do you think about all of this?
04:12nyef: I don't know what to think anymore, really.
04:15mooch2: oh? why not?
04:15mooch2: at least mame has the scope you want AND the ability to make your wishes come true :/
04:17nyef: I'll add mame to my list of things to re-investigate. I remember not being a fan about a decade and a half ago, but maybe things have changed (or maybe I have changed).
04:18mooch2: they did
04:18mooch2: mame merged with mess
04:18mooch2: and in the process became FAR more generic
04:18mooch2: and their scope became FAR larger
04:18mooch2: as in, their scope became basically EVERYTHING with a cpu, and even some things WITHOUT a cpu
04:19mooch2: oh, and they're on github now
04:19mooch2: and accept pull requests
04:19mooch2: they still don't have a dedicated mailing list, or dedicated forums really
04:19mooch2: then again, 86box and varcem don't either lol
04:21mooch2: hell, mame even has a semi-working sony ps2 driver now
04:21mooch2: which only happened in the last two weeks
04:21mooch2: runs at like 3% of real speed, because it needs CYCLE-BY-CYCLE scheduling in mame's framework for some reason, and EE-specifics aren't implemented in the drc but STILL
04:22mooch2: or at least, it runs at that speed on my shitty old i3-3210 lol
04:38mooch2: uh, it turns out that mame's i386 core doesn't implement the 386 debug registers EITHER
08:29karolherbst: skeggsb_: do you know if you have a Fermi card which can do more than 165/225MHz?
08:30karolherbst: because in that case I would just write a patch to read out the cap for this as well and we can check how well that works generally
11:12RSpliet: karolherbst: Think you can add a little tool to envytools to read out relevant caps? I've got a few Fermi's I could gather some data for you if you want, but please make my life easy ;-)
11:12RSpliet: Or are these caps not mapped to regular registers - and only accessible through an EVO channel?
11:12karolherbst: RSpliet: there is a reg 0x61c000 or something
11:13karolherbst: RSpliet: rnndb already has the bit
11:13karolherbst: but we kind of want to do that through evo anyway as things might change for some reasons
11:14RSpliet: The max clocks as well? Oh well in that case send me a list of regs to poll and I'll check my Fermi's at home for the capabilities you're interested in
11:14karolherbst: ohh, it seems to miss the max clock thing
11:14karolherbst: RSpliet: just that one
11:14karolherbst: I think...
11:14karolherbst: let me check
11:15karolherbst: or maybe there is no mmio reg for that
11:15karolherbst: let me dig through the stuff a bit
11:16karolherbst: RSpliet: I think I will ismply write a nouveau patch and add a printk
11:17RSpliet: That'll take me significantly longer to gather data for you...
11:18RSpliet: (And I'm chronically low on time)
11:18karolherbst: right, but I don't know if there is a reg for that at all
11:20karolherbst: on volta there seems to be something
11:21karolherbst: anyway, if there would be a known reg, we would already use it
11:22RSpliet: There's lots of known regs we don't use :-P
11:22karolherbst: RSpliet: patches would be here btw: https://github.com/karolherbst/nouveau/commits/fix_interlaced_reject
11:23RSpliet: Yeah saw the one on the ML, good stuff.
11:23karolherbst: well the top patch is for reading out a bit more and do a printk
11:24karolherbst: it contains 4 8 bit values
11:24karolherbst: 7:0 CAP_SOR0_21_DP_CLK_MAX
11:24karolherbst: 23:16 CAP_SOR0_21_TMDS_LVDS_CLK_MAX
11:24karolherbst: other values are reserved
11:25RSpliet: I always read "reserved" as "yet undocumented" in NVIDIA docs O:-)
11:25karolherbst: maybe, yeah
11:25karolherbst: but if we collect a bit of data on those values, we might be able to figure something out :)
11:29karolherbst: 113c0036 on my maxwell
11:29karolherbst: 0x36: 54 = 540 MHz?
11:30karolherbst: or 5.4 Gbit/s?
11:30karolherbst: which is still the same
11:30karolherbst: it looks like a sane value to me
11:30karolherbst: 0x3c: 60 which should be 600MHz, which is also sane for HDMI 2.0
11:30RSpliet: 600MHz for DP?
11:31karolherbst: 0x11 : 17
11:31karolherbst: that one is a reserved field
11:32karolherbst: on newer hardware
11:32karolherbst: SOR0_21_TMDS_CLK_MAX 23:16
11:32karolherbst: SOR0_21_LVDS_CLK_MAX 31:24
11:32karolherbst: so lvds has 170MHz?
11:33karolherbst: value was split in the 957d class
11:33karolherbst: which might be GM200
11:34karolherbst: 90: GF110 91: GK104 92: GK110 94: GM107 95: GM200 97: GP100 98: GP102
11:35karolherbst: RSpliet: sooo yeah, the code only works for GF110+ anyway
11:37karolherbst: which _makes_ sense
11:39karolherbst: imirkin: remember for what GPUs we added that hdmimhz parameter? So we might be able to tell on GF110+ hardware what clocks we can set for DP, TMDS and LVDS
19:41karolherbst: imirkin: do we cap HDMI at 300MHz for all gens currently?
19:41karolherbst: or is there a way to get higher clocks without setting the module parameter?
19:41karolherbst: skeggsb_: ^^
19:43karolherbst: ohh, we do a *2 when dual link is possible but that only gives us 594, not 600
19:43karolherbst: or is single link 600MHz even possible?
19:43karolherbst: HDMI 2.0 gives us 600MHz
21:13pendingchaos: imirkin, karolherbst: shouldn't this code have a join/joinat: https://hastebin.com/abicoromig.txt? without flattening it or adding a join/joinat, things seem to break
21:24karolherbst: pendingchaos: join is just a synchonized branch
21:26karolherbst: azaki: I think the uploaded trace is broken
21:26karolherbst: if I replay it with glretrace it crashes at the end
21:26karolherbst: size is 35099911603
21:33nyef: I'm getting close to finishing an initial document about the Tesla context-switch microengine. Most of the basic structure seems to hang together, but there are so bloody many unknowns. /-:
21:36mooch3: OH NICE nyef
21:36mooch3: i just finished a new feature for mame's i386 core that's never been in ANY other i386 emulator in full
21:51karolherbst: azaki: f46c186c5ac2d229c46b1220b6e12497 is the md5
21:53pendingchaos: karolherbst: I'm not sure if that answers my question?
21:55karolherbst: pendingchaos: well, did you mean breaking as in missrendering?
21:55karolherbst: anyway, the shader needs more context
21:59nyef: mooch3: Neat. I think that it doesn't get implemented very often because of how hard it seems to implement efficiently and how infrequently it gets used outside of a debugging context.
21:59mooch3: nyef, yeah, but here, there's NO measurable speed penalty
21:59mooch3: speed literally went from 220-something% to exactly the same range on my i3
21:59mooch3: hey, btw, do you mind if i pm you?
22:00pendingchaos: broken: https://hastebin.com/ehusozodup.txt working: https://hastebin.com/jawofisuxo.txt
22:00pendingchaos: it's the pixmark_piano shader btw
22:04pendingchaos: (yes = yes, breaking as in missrendering)
22:06HdkR: That's a huge shader
22:06mooch3: yeah, it is
22:06mooch3: holy crap
22:07pendingchaos: a lot of shadertoy shaders are
22:07pendingchaos: it renders a complex scene with just a fullscreen quad/triangle and does everything in the fragment shader
22:08HdkR: Ah, right, of course
22:09mooch3: wait, are you working on nouveau's compiler? o.o
22:09karolherbst: I don't like such shaders, quite painful to debug :D
22:09mooch3: godspeed, if you are
22:09pendingchaos: yes, I am
22:10mooch3: oh jesus
22:10mooch3: welp, godspeed and all that
22:10HdkR: They implemented an optimization that I've been looking forward to ;)
22:10HdkR: Can't wait for that to be merged
22:11azaki: karolherbst: that's the md5 of my apitrace file, just calculated it.
22:11karolherbst: azaki: weird
22:11karolherbst: azaki: does it crash when you run it through glretrace?
22:13azaki: uhm, it's running the game and replaying everything i did. wow, this is kind of cool. but no it hasn't crashed yet.
22:14azaki: i see now why you said that this can't be shared publically. =p
22:15mooch3: why can't it?
22:15mooch3: it's just a series of gl calls
22:15azaki: i mean, the game logged back in without me doing anything.
22:16mooch3: oh weird
22:16mooch3: i'm stupid lol
22:18karolherbst: azaki: it isn't running the game :p
22:18karolherbst: azaki: it just replays every gl call
22:19mooch3: oh that
22:19karolherbst: but normally apitraces are kind of okayish to share as you basically record what the game pushes though the OpenGL API
22:19azaki: yeah, sorry. i figured it out after a few minutes XD
22:19azaki: this is how the apitrace ends for me
22:20azaki: but it quits at about the point that i was in-game though
22:20azaki: so i dunno why it says segfault
22:20karolherbst: because the file is corrupt or something
22:20azaki: i had gotten in game and demonstrated the issue by that point though. hm
22:20karolherbst: maybe apitrace can only record 32GB
22:21azaki: i don't think i stayed in-game for much longer than that, since you said to get in and quit right away
22:21azaki: but the missing ground textures are visible there at the end
22:21karolherbst: well I see the issue kind of
22:21karolherbst: "4322984: warning: error: Too many fragment shader texture samplers"
22:22karolherbst: which is what we already know
22:22karolherbst: azaki: thing is just that I can't open it with qapitrace
22:23karolherbst: mhhhh actually
22:24karolherbst: I mean, I have the shaders
22:24karolherbst: those wine shaders always look fun :D
22:25azaki: i'm trying it in qapitrace now. it's "loading" now. 50% so far
22:30karolherbst: "uniform sampler2D ps_sampler36;" nice
22:30karolherbst: no "layout(binding = 31)" declaration after the 31 anymore
22:30karolherbst: oh wow
22:30karolherbst: what an insane shader
22:34azaki: well, it's bethesda.
22:34mooch3: karolherbst, are you guys debugging wolfenstein 2?
22:34mooch3: oh? which game then?
22:34azaki: elder scrolls online
22:35mooch3: how does wolf 2 run on nouveau tho?
22:35mooch3: slowly, i bet/
22:35karolherbst: no clue
22:36karolherbst: it isn't native, or is it?
22:36azaki: i guess it'd depend if you had reclocking support on the card.
22:36azaki: it's not native, but it does support vulkan i think. but nouveau doesn't have vulkan yet right?
22:36mooch3: oh true lol
22:38mooch3: why doesn't it have vulkan, btw? is there just not enough knowledge about the card? is nobody interested?
22:39mooch3: ah, yeah, fair enough :/
22:40azaki: so far this gt630 is holding up pretty decently. i might be able to ride this until the gpu prices start dropping (they're dropping way more slowly in canada than in the usa unfortunately)
22:40azaki: it's not a very good card, but it does work better than i expected. i honestly didn't think it would work even with graphics settings turned all the way down to "low"
22:41azaki: but it does ok. and reclocking hasn't really burned me yet.
22:42azaki: it's pretty awesome how far nouveau has come =D
22:45HdkR: Time is all that is necessary for writing a new driver :P
22:48pendingchaos: IIRC, looking at the video, the terrain bug in elder scrolls looked like the ground was either not being rendered at all or it was reading from the current framebuffer
22:48pendingchaos: I don't see how having too many samplers would cause it to not be rendered as the shader doesn't seem to discard?
22:48pendingchaos: and reading from the current framebuffer sounds unlikely
22:48mooch3: well, karolherbst, da heck would i need in order to make a VERY BASIC vulkan driver, like, one that just logs function calls and creates a window context?
22:50karolherbst: pendingchaos: more than 32 samplers
22:50karolherbst: mesa basically just supports 32
22:50karolherbst: and some shaders require 37 there
22:50karolherbst: mooch3: dunno, I doubt this would take much time
22:51mooch3: karolherbst, oh? how so?
22:52mooch3: are there any official guides for vulkan driver devs, or do i have to literally just read the spec over and over again?
22:52karolherbst: I guess reading the spec?
22:52karolherbst: with vulkan things are a bit different as you already have the loader
22:53airlied: cp src/amd/vulkan or src/intel/vulkan and cut lots of stuff out :-P
22:54mooch3: ah, good point
22:56airlied: nouveau has a bit of a blocker in that it needs a new kernel API to allow the vulkan memory management semantics
22:56airlied: but I think you could still write a big chunk of boilerplate before getting to that problem :-P
22:57skeggsb_: i've got a branch with the boilerplate written somewhere :P
22:57skeggsb_: other stuff keeps blocking the new apis :/
23:11karolherbst: azaki: so, the compiler is fixed now (hopefully)
23:11karolherbst: now checking what messes up at actual runtime
23:11mooch3: airlied, which kernel api would that be?
23:13karolherbst: azaki: so uhm.. I guess there is a bit more to it
23:13karolherbst: azaki: _but_ it seems to be much faster
23:13karolherbst: azaki: https://github.com/karolherbst/mesa/commit/9b78797d33bf7c906bc7eee7ed92cd9a1c2c5e7b
23:13karolherbst: could you test this patch?
23:14karolherbst: and see if loading times improve by a lot?
23:16azaki: karolherbst: ok, i'll give it a try
23:21airlied: mooch3: the ability I think for the userspace to allocate virtual memory addresses for objects
23:22mooch3: yeah :/