00:17 karolherbst: imirkin: do you have some hardware where we aren't able to correctly detect the hdmi max clock we can use?
00:17 karolherbst: or uhm, other issues like that?
00:19 karolherbst: I mean.... we might be able to check for that, no?
00:19 karolherbst: skeggsb_: ^?
00:20 karolherbst: there is also a NV907D_CORE_NOTIFIER_3_CAPABILITIES_CAP_SOR0_20_DUAL_TMDS
01:38 imirkin: karolherbst: not personally
01:38 imirkin: there was someone with a fermi that supposedly could do 297MHz but everything under the sun reported 225MHz
01:38 imirkin: karolherbst: those are features of a SOR though
01:38 imirkin: a particular SOR could do dual-link DVI
01:39 imirkin: at 165MHz and single-link HDMI at 225MHz
01:41 skeggsb_: imirkin: the 297 case, were they sure the binary driver wasn't being sneaking and making a new reduced-blanking mode that fits under 225?
01:41 skeggsb_: sneaky*
01:42 skeggsb_: i've witnessed it doing such things
01:42 imirkin: skeggsb_: sure enough. it already was a reduced-blanking mode iirc.
01:42 skeggsb_: ah
01:42 skeggsb_: was it gf119?
01:42 imirkin: GF106 (or GF116)
01:43 imirkin: skeggsb_: https://bugs.freedesktop.org/show_bug.cgi?id=91236
01:43 imirkin: but that's not the original guy
01:44 imirkin: the original guy sent patches
01:44 imirkin: 2560x1440@56 -- i suspect he thought of reduced blanking ;)
01:45 imirkin: the cvt -r modeline for 2560x1440 is 241MHz, so pretty far
02:27 nyef:sighs.
02:27 nyef: In attempting to corral details about tesla context-switching, I came to the conclusion that tesla hasn't been sufficiently reverse-engineered beyond a certain point.
02:28 nyef: I'm not sure what that point is, other than "probably about halfway".
02:28 mooch2: i doubt it's even CLOSE to halfway, honestly
02:29 nyef: I meant halfway through the evolution of tesla.
02:29 mooch2: ah lol
02:29 nyef: So early-model teslas probably work more-or-less okay, while late-model teslas are trivially easy to lock up.
02:30 mooch2: eh, still. i think tesla RE has only just BEGUN in terms of the amount of stuff to uncover about the hardware
02:30 mooch2: and i don't just mean "getting the cards working in drivers"
02:30 mooch2: i mean shit like being able to accurately EMULATE the dang things, while having them running windows
02:30 mooch2: *windows drivers
02:33 nyef: Oh, wouldn't THAT be nice. (-:
02:35 nyef: Anyway, the implication is that trying to get my NVAF working reliably is going to involve a LOT more work than I had anticipated.
02:38 mooch2: yeah, unfortunately :c
02:38 nyef: ... and probably some investment in more tesla hardware to work with.
02:39 mooch2: though, tbh, even with the nv3, it hasn't been reversed enough to run the windows drivers in emulation :c
02:39 nyef: That sucks.
02:39 mooch2: the current vid_nv_riva128.c in 86box is the result of my emulation trials, and mwk's expertise on the hardware to attempt to emulate this card
02:39 mooch2: it doesn't even run the LINUX drivers yet
02:40 mooch2: though it DOES run vesa
02:40 mooch2: so linux with a vesa driver works
02:40 mooch2: which is likely to be the only nv3-compatible driver modern distros ships lol
02:40 mooch2: *sip
02:40 mooch2: *ship
02:41 mooch2: then again, 86box doesn't support PAE, even on P6 cpus, so distros like ubuntu won't work by default
03:09 nyef: 86box looks interesting, modulo the "uses C++" and "is x86oid only" bits.
03:17 imirkin: nyef: check with mwk
03:18 imirkin: he knows many things.
03:18 imirkin: and occasionally shares such knowledge
03:34 mooch2: nyef, well, 86box is MOSTLY in c90
03:34 mooch2: including almost all of the core emulation
03:34 mooch2: also, the interpreter works on other platforms, but dynarec is mandatory for pentium and up anyway so :/
03:35 mooch2: imirkin, wait, i didn't think mwk worked on tesla that much
03:39 nyef: Oh, I'm not complaining about host CPU type limitations, I'm complaining that it doesn't support non-x86oid guests. d-:
03:47 mooch2: oh why?
03:47 mooch2: it's specifically DESIGNED to only emulate x86 guests
03:47 mooch2: it's based off of pcem, which only emulates ibm pcs and their descendants
03:48 mooch2: ...which all use x86
03:48 nyef: You got as far as PCI, which is used in other architectures.
03:48 mooch2: so of course it's not going to support something like mips
03:48 mooch2: nyef, so did pcem lol
03:48 mooch2: also, the code REALLY isn't NEARLY flexible enough to support other architectures anyway
03:48 mooch2: for instance, our dynarecs currently emit PURE MACHINE CODE
03:49 mooch2: no irs, no optimization, no emitters even
03:49 mooch2: just straight up addbyte(x) or something
03:49 mooch2: i shit you not
03:56 nyef: It really just feels like a waste to not support other guest architectures if you're going to the trouble of supporting a goodly amount of infrastructure that would be required for the other guest architectures anyway.
03:57 nyef: ATA and SCSI bus and drive emulation, SCSI controller chips, PCI bus, serial ports, networking, and so on.
03:59 nyef: Plus you already have the basics for the UI bits, the screen, keyboard and mouse interfaces, even if the simulated hardware would need changing out.
03:59 mooch2: eh, we support the mca ps/2s
03:59 mooch2: ehhhh, our ui, threading, and input don't work on anything other than windows
03:59 mooch2: sure, it can be run through wine, but still
04:00 nyef: For a couple of architectures, it's almost down to the point of throwing a new CPU and host bus adaptor into the mix.
04:00 mooch2: ehhh, that's not 86box's goal tho
04:00 mooch2: our goal is to ONLY emulate x86
04:00 mooch2: also, our cpu code is STILL not flexible enough
04:00 mooch2: too many globals that should really be in structs
04:01 mooch2: also, cpus AREN'T device_ts yet, so
04:01 mooch2: yeah, if you want different guest archs, you're gonna want the fork varcem, where it's at least part of the project goals
04:01 mooch2: even though it's not implemented yet
04:01 nyef: Hrm.
04:03 mooch2: oh, and nvidia emulation is not in varcem
04:03 mooch2: mostly due to the code being deemed "too dirty"
04:03 mooch2: even though, according to coverity, varcem currently has MORE buffer overflows than 86box, by a WIDE margin
04:03 nyef: More and more reasons to write my OWN emulator. Again. /-:
04:04 mooch2: how so?
04:04 mooch2: also, mame has this in its project scope
04:04 mooch2: so really, the best help you could be would be to rewrite their new pci code to be a slot device, so that we could work together to write nv3 emulation
04:04 nyef: So many things to explore.
04:04 mooch2: mame has EVERYTHING in its project scope lol
04:05 mooch2: though it currently doesn't support processors with more than 32 address bits fully
04:05 mooch2: but oh well
04:05 mooch2: that's only really needed for pentium pro and up anyway
04:05 mooch2: and even 86box doesn't accurately emulate ALL of the features of the pentium mmx
04:05 mooch2: hell, it doesn't even emulate the 386 debug registers AT ALL
04:05 mooch2: due to performance concerns
04:05 mooch2: though i do hope to rectify this
04:05 nyef: MIPS R12000 requires more than 32 address bits.
04:06 mooch2: i don't think that was ever released lol
04:06 mooch2: oh wait, it was
04:06 mooch2: well, mame doesn't even emulate an r10000 right now anyway
04:06 mooch2: of course, you're welcome to contribute it, provided it's been tested
04:06 nyef: Mmmhmm. I also have R10000, R14000, and maybe R16000 systems.
04:07 mooch2: oh nice
04:08 mooch2: then yeah, if you dumped them, and added drivers (mame's term for emulator cores) for them, then i'm sure they'd be accepted!
04:08 mooch2: hell, i even wrote an iphone 2g driver lol
04:08 mooch2: and mame's debugger actually helped me get rid of some hacks in my own iphone 2g emulator
04:10 mooch2: oh, and mame also needs an x86 dynarec
04:10 mooch2: currently, its interpreter is even SLOWER than 86box's
04:12 mooch2: nyef, da heck do you think about all of this?
04:12 nyef: I don't know what to think anymore, really.
04:15 mooch2: oh? why not?
04:15 mooch2: at least mame has the scope you want AND the ability to make your wishes come true :/
04:17 nyef: I'll add mame to my list of things to re-investigate. I remember not being a fan about a decade and a half ago, but maybe things have changed (or maybe I have changed).
04:18 mooch2: they did
04:18 mooch2: mame merged with mess
04:18 mooch2: and in the process became FAR more generic
04:18 mooch2: and their scope became FAR larger
04:18 mooch2: as in, their scope became basically EVERYTHING with a cpu, and even some things WITHOUT a cpu
04:18 mooch2: seriously
04:19 mooch2: oh, and they're on github now
04:19 mooch2: and accept pull requests
04:19 mooch2: they still don't have a dedicated mailing list, or dedicated forums really
04:19 mooch2: then again, 86box and varcem don't either lol
04:21 mooch2: hell, mame even has a semi-working sony ps2 driver now
04:21 mooch2: which only happened in the last two weeks
04:21 mooch2: runs at like 3% of real speed, because it needs CYCLE-BY-CYCLE scheduling in mame's framework for some reason, and EE-specifics aren't implemented in the drc but STILL
04:22 mooch2: or at least, it runs at that speed on my shitty old i3-3210 lol
04:38 mooch2: uh, it turns out that mame's i386 core doesn't implement the 386 debug registers EITHER
04:38 mooch2: weird
08:29 karolherbst: skeggsb_: do you know if you have a Fermi card which can do more than 165/225MHz?
08:30 karolherbst: because in that case I would just write a patch to read out the cap for this as well and we can check how well that works generally
11:12 RSpliet: karolherbst: Think you can add a little tool to envytools to read out relevant caps? I've got a few Fermi's I could gather some data for you if you want, but please make my life easy ;-)
11:12 RSpliet: Or are these caps not mapped to regular registers - and only accessible through an EVO channel?
11:12 karolherbst: RSpliet: there is a reg 0x61c000 or something
11:13 karolherbst: RSpliet: rnndb already has the bit
11:13 karolherbst: s
11:13 karolherbst: but we kind of want to do that through evo anyway as things might change for some reasons
11:14 RSpliet: The max clocks as well? Oh well in that case send me a list of regs to poll and I'll check my Fermi's at home for the capabilities you're interested in
11:14 karolherbst: ohh, it seems to miss the max clock thing
11:14 karolherbst: RSpliet: just that one
11:14 karolherbst: I think...
11:14 karolherbst: let me check
11:14 karolherbst: hum
11:15 karolherbst: or maybe there is no mmio reg for that
11:15 karolherbst: let me dig through the stuff a bit
11:16 karolherbst: RSpliet: I think I will ismply write a nouveau patch and add a printk
11:17 RSpliet: That'll take me significantly longer to gather data for you...
11:18 RSpliet: (And I'm chronically low on time)
11:18 karolherbst: right, but I don't know if there is a reg for that at all
11:20 karolherbst: on volta there seems to be something
11:21 karolherbst: anyway, if there would be a known reg, we would already use it
11:22 RSpliet: There's lots of known regs we don't use :-P
11:22 karolherbst: RSpliet: patches would be here btw: https://github.com/karolherbst/nouveau/commits/fix_interlaced_reject
11:23 RSpliet: Yeah saw the one on the ML, good stuff.
11:23 karolherbst: well the top patch is for reading out a bit more and do a printk
11:24 karolherbst: it contains 4 8 bit values
11:24 karolherbst: 7:0 CAP_SOR0_21_DP_CLK_MAX
11:24 karolherbst: 23:16 CAP_SOR0_21_TMDS_LVDS_CLK_MAX
11:24 karolherbst: other values are reserved
11:25 RSpliet: I always read "reserved" as "yet undocumented" in NVIDIA docs O:-)
11:25 karolherbst: maybe, yeah
11:25 karolherbst: but if we collect a bit of data on those values, we might be able to figure something out :)
11:29 karolherbst: 113c0036 on my maxwell
11:29 karolherbst: 0x36: 54 = 540 MHz?
11:30 karolherbst: or 5.4 Gbit/s?
11:30 karolherbst: which is still the same
11:30 karolherbst: but
11:30 karolherbst: it looks like a sane value to me
11:30 karolherbst: 0x3c: 60 which should be 600MHz, which is also sane for HDMI 2.0
11:30 RSpliet: 600MHz for DP?
11:30 karolherbst: HDMI
11:31 karolherbst: TMDS_LVDS
11:31 karolherbst: 0x11 : 17
11:31 karolherbst: mhh
11:31 karolherbst: that one is a reserved field
11:32 karolherbst: ohhohooho
11:32 karolherbst: on newer hardware
11:32 karolherbst: SOR0_21_TMDS_CLK_MAX 23:16
11:32 karolherbst: SOR0_21_LVDS_CLK_MAX 31:24
11:32 karolherbst: so lvds has 170MHz?
11:33 karolherbst: value was split in the 957d class
11:33 karolherbst: which might be GM200
11:34 karolherbst: 90: GF110 91: GK104 92: GK110 94: GM107 95: GM200 97: GP100 98: GP102
11:35 karolherbst: RSpliet: sooo yeah, the code only works for GF110+ anyway
11:37 karolherbst: which _makes_ sense
11:39 karolherbst: imirkin: remember for what GPUs we added that hdmimhz parameter? So we might be able to tell on GF110+ hardware what clocks we can set for DP, TMDS and LVDS
19:41 karolherbst: imirkin: do we cap HDMI at 300MHz for all gens currently?
19:41 karolherbst: or is there a way to get higher clocks without setting the module parameter?
19:41 karolherbst: skeggsb_: ^^
19:43 karolherbst: ohh, we do a *2 when dual link is possible but that only gives us 594, not 600
19:43 karolherbst: or is single link 600MHz even possible?
19:43 karolherbst: HDMI 2.0 gives us 600MHz
21:13 pendingchaos: imirkin, karolherbst: shouldn't this code have a join/joinat: https://hastebin.com/abicoromig.txt? without flattening it or adding a join/joinat, things seem to break
21:24 karolherbst: pendingchaos: join is just a synchonized branch
21:26 karolherbst: azaki: I think the uploaded trace is broken
21:26 karolherbst: if I replay it with glretrace it crashes at the end
21:26 karolherbst: size is 35099911603
21:33 nyef: I'm getting close to finishing an initial document about the Tesla context-switch microengine. Most of the basic structure seems to hang together, but there are so bloody many unknowns. /-:
21:36 mooch3: OH NICE nyef
21:36 mooch3: i just finished a new feature for mame's i386 core that's never been in ANY other i386 emulator in full
21:36 mooch3: https://github.com/mamedev/mame/pull/3761
21:51 karolherbst: azaki: f46c186c5ac2d229c46b1220b6e12497 is the md5
21:53 pendingchaos: karolherbst: I'm not sure if that answers my question?
21:55 karolherbst: pendingchaos: well, did you mean breaking as in missrendering?
21:55 karolherbst: anyway, the shader needs more context
21:55 pendingchaos: yes
21:59 nyef: mooch3: Neat. I think that it doesn't get implemented very often because of how hard it seems to implement efficiently and how infrequently it gets used outside of a debugging context.
21:59 mooch3: nyef, yeah, but here, there's NO measurable speed penalty
21:59 mooch3: speed literally went from 220-something% to exactly the same range on my i3
21:59 mooch3: hey, btw, do you mind if i pm you?
22:00 pendingchaos: broken: https://hastebin.com/ehusozodup.txt working: https://hastebin.com/jawofisuxo.txt
22:00 pendingchaos: it's the pixmark_piano shader btw
22:04 pendingchaos: (yes = yes, breaking as in missrendering)
22:06 HdkR: That's a huge shader
22:06 mooch3: yeah, it is
22:06 mooch3: holy crap
22:07 pendingchaos: a lot of shadertoy shaders are
22:07 pendingchaos: it renders a complex scene with just a fullscreen quad/triangle and does everything in the fragment shader
22:08 HdkR: Ah, right, of course
22:09 mooch3: wait, are you working on nouveau's compiler? o.o
22:09 karolherbst: I don't like such shaders, quite painful to debug :D
22:09 mooch3: godspeed, if you are
22:09 pendingchaos: yes, I am
22:10 mooch3: oh jesus
22:10 mooch3: welp, godspeed and all that
22:10 HdkR: They implemented an optimization that I've been looking forward to ;)
22:10 HdkR: Can't wait for that to be merged
22:11 azaki: karolherbst: that's the md5 of my apitrace file, just calculated it.
22:11 karolherbst: azaki: weird
22:11 karolherbst: azaki: does it crash when you run it through glretrace?
22:13 azaki: uhm, it's running the game and replaying everything i did. wow, this is kind of cool. but no it hasn't crashed yet.
22:14 azaki: i see now why you said that this can't be shared publically. =p
22:15 mooch3: ?
22:15 mooch3: why can't it?
22:15 mooch3: it's just a series of gl calls
22:15 azaki: i mean, the game logged back in without me doing anything.
22:16 mooch3: oh weird
22:16 mooch3: OH
22:16 mooch3: durr
22:16 mooch3: i'm stupid lol
22:18 karolherbst: azaki: it isn't running the game :p
22:18 karolherbst: azaki: it just replays every gl call
22:19 mooch3: oh that
22:19 mooch3: yeah
22:19 karolherbst: but normally apitraces are kind of okayish to share as you basically record what the game pushes though the OpenGL API
22:19 karolherbst: but...
22:19 azaki: yeah, sorry. i figured it out after a few minutes XD
22:19 karolherbst: :D
22:19 azaki: https://paste.fedoraproject.org/paste/W8g5RaSDhfuGI-IPPdrpMQ/raw
22:19 azaki: this is how the apitrace ends for me
22:20 azaki: but it quits at about the point that i was in-game though
22:20 azaki: so i dunno why it says segfault
22:20 karolherbst: ;)
22:20 karolherbst: because the file is corrupt or something
22:20 azaki: i had gotten in game and demonstrated the issue by that point though. hm
22:20 karolherbst: maybe apitrace can only record 32GB
22:20 karolherbst: dunno
22:21 azaki: i don't think i stayed in-game for much longer than that, since you said to get in and quit right away
22:21 azaki: but the missing ground textures are visible there at the end
22:21 karolherbst: mhhh
22:21 karolherbst: well I see the issue kind of
22:21 karolherbst: "4322984: warning: error: Too many fragment shader texture samplers"
22:22 karolherbst: which is what we already know
22:22 karolherbst: azaki: thing is just that I can't open it with qapitrace
22:23 karolherbst: mhhhh actually
22:24 karolherbst: I mean, I have the shaders
22:24 karolherbst: those wine shaders always look fun :D
22:25 azaki: i'm trying it in qapitrace now. it's "loading" now. 50% so far
22:30 karolherbst: "uniform sampler2D ps_sampler36;" nice
22:30 karolherbst: no "layout(binding = 31)" declaration after the 31 anymore
22:30 karolherbst: oh wow
22:30 karolherbst: what an insane shader
22:34 azaki: well, it's bethesda.
22:34 mooch3: karolherbst, are you guys debugging wolfenstein 2?
22:34 karolherbst: no
22:34 mooch3: oh? which game then?
22:34 azaki: elder scrolls online
22:34 mooch3: ah
22:35 mooch3: how does wolf 2 run on nouveau tho?
22:35 mooch3: slowly, i bet/
22:35 karolherbst: no clue
22:36 karolherbst: it isn't native, or is it?
22:36 azaki: i guess it'd depend if you had reclocking support on the card.
22:36 azaki: it's not native, but it does support vulkan i think. but nouveau doesn't have vulkan yet right?
22:36 mooch3: oh true lol
22:38 karolherbst: right
22:38 mooch3: why doesn't it have vulkan, btw? is there just not enough knowledge about the card? is nobody interested?
22:39 karolherbst: time
22:39 mooch3: ah, yeah, fair enough :/
22:40 azaki: so far this gt630 is holding up pretty decently. i might be able to ride this until the gpu prices start dropping (they're dropping way more slowly in canada than in the usa unfortunately)
22:40 azaki: it's not a very good card, but it does work better than i expected. i honestly didn't think it would work even with graphics settings turned all the way down to "low"
22:41 azaki: but it does ok. and reclocking hasn't really burned me yet.
22:42 azaki: it's pretty awesome how far nouveau has come =D
22:45 HdkR: Time is all that is necessary for writing a new driver :P
22:48 pendingchaos: IIRC, looking at the video, the terrain bug in elder scrolls looked like the ground was either not being rendered at all or it was reading from the current framebuffer
22:48 pendingchaos: I don't see how having too many samplers would cause it to not be rendered as the shader doesn't seem to discard?
22:48 pendingchaos: and reading from the current framebuffer sounds unlikely
22:48 mooch3: well, karolherbst, da heck would i need in order to make a VERY BASIC vulkan driver, like, one that just logs function calls and creates a window context?
22:50 karolherbst: pendingchaos: more than 32 samplers
22:50 karolherbst: mesa basically just supports 32
22:50 karolherbst: and some shaders require 37 there
22:50 karolherbst: mooch3: dunno, I doubt this would take much time
22:51 mooch3: karolherbst, oh? how so?
22:52 mooch3: are there any official guides for vulkan driver devs, or do i have to literally just read the spec over and over again?
22:52 karolherbst: I guess reading the spec?
22:52 karolherbst: with vulkan things are a bit different as you already have the loader
22:53 airlied: cp src/amd/vulkan or src/intel/vulkan and cut lots of stuff out :-P
22:54 mooch3: ah, good point
22:54 mooch3: thanks
22:56 airlied: nouveau has a bit of a blocker in that it needs a new kernel API to allow the vulkan memory management semantics
22:56 airlied: but I think you could still write a big chunk of boilerplate before getting to that problem :-P
22:57 skeggsb_: i've got a branch with the boilerplate written somewhere :P
22:57 skeggsb_: other stuff keeps blocking the new apis :/
23:11 karolherbst: azaki: so, the compiler is fixed now (hopefully)
23:11 karolherbst: now checking what messes up at actual runtime
23:11 mooch3: airlied, which kernel api would that be?
23:13 karolherbst: azaki: so uhm.. I guess there is a bit more to it
23:13 karolherbst: azaki: _but_ it seems to be much faster
23:13 karolherbst: azaki: https://github.com/karolherbst/mesa/commit/9b78797d33bf7c906bc7eee7ed92cd9a1c2c5e7b
23:13 karolherbst: could you test this patch?
23:14 karolherbst: and see if loading times improve by a lot?
23:16 azaki: karolherbst: ok, i'll give it a try
23:21 airlied: mooch3: the ability I think for the userspace to allocate virtual memory addresses for objects
23:22 mooch3: ah
23:22 mooch3: yeah :/